Manual recovery and reconfiguration
For several years, we relied on machine-specific configuration files to distribute our table’s partitions among Jhubbub hosts. For example, Host A’s configuration file would list partitions 1, 2, 3 as Host A’s responsibility; Host B’s configuration file would list partitions 4, 5, 6; and so on. If we had to turn off a host, a site reliability engineer (SRE) would need to redistribute that host’s share of partitions by manually editing configuration files and redeploying instances.
This process was cumbersome and error prone. If a host failed unexpectedly, it could take hours or days before a new host was started and assigned partitions. In the meantime, thousands of RSS feeds could be ignored, causing stale news to show up on the feed. A human error in editing configs could cause even more severe outages. This led us to look for a solution that could help us:
- Automate the failover and recovery process
- Eliminate the need for human intervention, not leaving any room for human mistake and error
We sought to enhance the fault-tolerance of the system by adding automatic failover to improve reliability.
Scalability and flexibility with stateless and elastic architecture
For Jhubbub, all the mappings of the partitions were hard-coded to hosts with static instance identifiers. This made the application stateful because certain partitions would have to be handled by certain designated hosts according to the mapping manually created by SREs. Adding or removing partitions or hosts would therefore require manual reconfiguration. Because it was difficult to scale Jhubbub up or down, we maintained a constant capacity regardless of the volume.
On the other hand, a stateless application can accommodate the addition and removal of resources and nodes without impacting service availability. The number of hosts allocated to the application can be elastically provisioned up and down at will to meet demand without requiring manual intervention. As such, a stateless and elastic architecture makes it easier to manage a service and reason about its behavior.
As we scaled and grew our footprint as a content platform, we needed to be more agile in keeping our content up to date. One way to accomplish this was to scale Jhubbub to meet its needs. This involved doing away with the current way of manually distributing the RSS feed ingestion work.
Migration to Azure
Last year, we announced our multi-year migration to Azure. One of the goals for this migration was to make all of our internal applications stateless because stateless architectures are more suited for the public cloud infrastructure. When the move was announced, most of LinkedIn’s applications were already stateless, but Jhubbub was not.
Motivated by the pain points described above and our company-wide commitment to move to public cloud, we made Jhubbub application stateless and elastic with the help of Helix, a cluster management framework developed at LinkedIn.
Apache Helix is an open source cluster management framework for distributed systems. Helix automatically assigns and relocates partitions and their replicas in the events of node failure, cluster expansion, or reconfiguration.
Helix separates cluster management from application logic by representing an application’s lifecycle as a finite state machine. The Apache Helix website describes in detail Helix’s architecture and how it can be used to manage distributed systems.
Helix was initially developed at LinkedIn and currently manages data infrastructure services such as Espresso (No-SQL document store), Brooklin (a near real-time streaming system), Venice (derived-data store), and Pinot (offline OLAP store).
Modeling Jhubbub as a Helix application
In order to distribute Jhubbub and manage its distributed workloads, we needed a coordination solution with a central brain and a metadata store to store the metadata about the nodes and workloads. One possible option was to build a controller/coordinator using a distributed file-system like Apache ZooKeeper.
The downside of this approach was the cost: building a brand-new coordination service would require a nontrivial amount of engineering effort and is generally difficult to get right. Another downside to this approach was the difficulty of using ZooKeeper. We considered options like Apache Curator, which is an open source ZooKeeper library that makes ZooKeeper easier to use, but that was not the official way applications are programmed with ZooKeeper at LinkedIn. Finally, deploying and maintaining a complex infrastructure-level coordination service without prior expertise would be a major commitment for the Jhubbub team.
However, given Jhubbub was already partitioned manually, using Helix to make Jhubbub stateless and scalable seemed like a better alternative.
Picking the right state model
Helix uses a declarative state model to describe each resource of the application. For Jhubbub, we wanted to have one resource with multiple partitions. Each partition would match that of the NoSQL table that we wanted to source our RSS feeds from. The goal here for the resource was to have some degree of fault tolerance and failover built in via Helix’s state model.
Upon reviewing the predefined state models that Helix offers, we decided to use the LeaderStandby state model. The LeaderStandby state model allowed each of Jhubbub’s partitions to have one Leader replica and two Standby replicas, all located on different hosts. In the event of a node failure, Helix would automatically recognize any Leader replicas missing and temporarily bring one of the Standby replicas up to the Leader state. This meant that there would be no human intervention necessary for reconfiguration and reassignment, and that all of this could take place within a matter of seconds without affecting our availability.