Rebuilding messaging: How we bootstrapped our platform

When member A sends a message to member B, the message travels via a backend service and then is persisted in the database. Member B then reads the message via the read API exposed by the service. In this blog, we will be referring to the old backend as “old-system” and the new backend as “new-system.”

Fundamentally, a messaging platform’s persistence layer comprises shared data and personal metadata associated with each message. For instance, when member A sends “Hi” to member B, the shared data is the actual message content, “Hi.” The time at which member B reads it is member B’s personal data. If member A marks the message as important or starred, that’s considered member A’s personal metadata (not visible to member B) for this message.

In the database, we can either normalize the data or store it denormalized. Old-system chose to store the data denormalized, and the general benefit of this was improved read performance by eliminating the need for joins. It also provides complete isolation between the sender and recipient’s copies of the message. In the above example, both member A and member B have their own records in the database with the text “Hi” duplicated in each of their records. However, as the system grows, and as conversations increase in the number of participants, this approach soon becomes costly with a lot of duplication. Additionally, providing features, like editing and deleting messages, with low latency is almost impossible, since every participant’s copy needs to be modified.

To address these issues in new-system, we chose to normalize the data. This means that shared data and personal metadata are stored in separate tables, and personal metadata has foreign keys back to the shared content. In the above example, there’s a single copy of the message “Hi,” and member A and member B each have their own personal metadata records which have a pointer to the record ID of the “Hi”’s.

A similar pattern of shared and personal data is seen in conversation-level data. Properties such as titles are shared, but the number of messages that a participant hasn’t read in a conversation is personal metadata.

We spent several quarters building out all the new microservices needed for new-system. Next, all existing messages in old-system had to be made available to read and update in new-system. At LinkedIn’s scale, having downtime to bootstrap the new databases was not an option. So we had to come up with a way to migrate billions of messages while millions of new ones were added and updated daily by members on the site.

We chose the classic approach of dual-writing/updating to both the old and new systems while serving read traffic only from old-system. We rolled out this change of dual-writing/updating and gradually started ramping the approach to members. Once 100% of live write traffic was being continuously replicated to new-system, we could take a snapshot of the data in old-system and move the 17 years’ worth of existing messages over to the new-system via offline flows (we chose Hadoop). I’ll talk about this in more detail in the next few sections.

The bootstrap

As briefly touched upon earlier, a prerequisite for bootstrapping is the dual-writing of live traffic. Once that was set up, the bootstrap was performed by running massive Hadoop jobs and uploading the data into online databases. 

The migration was executed in three phases. The goal of the first phase was to replicate every write and update action performed by the user on the site to the new-system in real time. The second phase was focusing on finding and reserving a place for every existing historic message from the old-system into the new-system. In this phase, we generated ID mappings between the two systems. The third phase took every record in the old-system’s databases written since day 1, applied schema transformations, and uploaded them to new-system’s databases at the ID chosen in phase 2. The three-phased approach allowed us to break a large problem into more manageable ones.

Phase 1: The dual-write

We called this phase online replication. As mentioned above, the end-objective for this phase was for every write and update action by the members to be replicated in (almost) real-time to the new-system. If not, new data written during and after the offline bootstrap would not be replicated to new-system. We built robust retry mechanisms that guaranteed eventual replication. The replication writes were completely transparent to the sender and recipient as they were asynchronous, and recipients continued to read from old-system’s databases.

Source link