There are two major changes in this architecture:
- A new Samza job is created to consume both ProfileViewEvent and NavigationEvent, instead of the old consumer which only consumed the former.
- All existing offline jobs are removed and, in its place, we created a singular job that will be discussed later.
The Samza job
Samza, now an Apache project, was originally developed by LinkedIn and is LinkedIn’s de facto distributed stream processing service. We chose to migrate from the existing nearline processor job to a Samza job for a number of reasons.
First of all, Samza supports a variety of programming models, including the Beam programming model. Samza implements the Beam API: this allows us to easily create a pipeline of data processing units including filtering, transformations, joins, and more. For example, in our case, we can easily join the PageViewEvent and NavigationEvent to compute the source of views in near-real-time—this could not as easily have been done in the old processor. Secondly, deploying and maintaining Samza jobs at LinkedIn is straightforward once it’s set up because they’re run on a YARN cluster maintained by the Samza team. The dev team does still need to manage the scaling, performance, etc., but it does help immensely on the regular maintenance side (e.g., not need to worry about machine failures). Finally, Samza is well-supported and well-integrated with other LinkedIn tooling and environments.
The new offline job
Some may wonder why we still incorporated an offline job in the Lambda-less architecture. The truth is it isn’t necessary from the architecture transition perspective. However, as depicted in the diagram above, the offline job reads from the ETLed data in HDFS indirectly produced by the Samza job via a Kafka topic. The only purpose of the offline job is to copy all the data that was written to the real-time table in Pinot to the offline table. This is done for two reasons: 1) the offline table has much better performance due to how data is organized (in short offline tables with much fewer data segments than the real-time table enabling faster query); and 2) we store the processed view data for up to 90 days with an automatic data purge, whereas the real-time table only retains the data for a few days. A key difference between the new offline job and its previous iterations in the old architecture is that the job has no overlap with the nearline job in processing logic. None of the logic implemented in the Samza job is implemented here. We can remove this job when Pinot is able to automatically support the consolidation of files from the real-time table to the offline table.
There is no bug-free software in its entire lifecycle; we recognize that things can still go wrong in different ways. In the case of WVYP, an event processed using the wrong logic will remain in the database until it’s reprocessed and fixed. Moreover, unexpected issues can happen outside of the system’s control (e.g., in a way that can impact your data sources). A big promise of batch processing is re-processability. If a job fails, it can reliably re-run and produce the same data. If source data is corrupted, it can re-run to reprocess it.
This becomes a lot more challenging in the streaming case, particularly when the processing relies on other stateful online services to provide additional data. Message processing becomes non-idempotent. WVYP not only relies on online services for states, but also sends notifications to members when messages are processed (important in that we do not want to be sending out duplicate notifications). If the chosen datastore does not support random updates by design, such as Pinot, we needed a dedupe mechanism in place.
We recognize that there is no silver bullet to this problem. Instead, we decided to treat each problem differently and use different strategies to mitigate issues:
- If we need to make minor changes to the processed messages, the best approach will be to write a one-off offline job script to read the processed messages in HDFS (just as we do for the offline job in the new architecture), correct whatever is needed, and push to Pinot to override the previous data files.
- If there is a major processing error or if the Samza job failed to process a large number of events, we can rewind the current processing offset to a previous point in Kafka.
- If the job only degrades its behavior for certain periods of time, such as if the computation of view relevance fails, we would skip the relevance of some views. In this scenario, the system would function in reduced capacity for that period of time.
Duplicate processing happens in various scenarios. One, mentioned above, is an instance in which we explicitly want to re-process the data. Another is inherent to Samza, which guarantees at-least-once processing. When Samza containers are restarted, it will likely process some messages again because the checkpoint it reads will likely not reflect the last message it processed. We are able to address this deduplication at two places:
- The serving layer: When the midtier service reads from the Pinot table, it performs deduplication and picks the view with the latest processed time.
- The notification layer: The notification infrastructure ensures that we do not send duplicate notifications to a member in a configurable period of time.
Lambda architecture has been around for many years and has gained its fair share of praise and critiques. In our case of migrating WVYP, we were able to see the following benefits:
- Significant improvements in development velocity by halving most measures of development time. Reduced maintenance overheads by more than half (the nearline flow had fewer maintenance overheads than the batch processing flow).
- Improved member experience. There is now a reduced likelihood of introducing bugs in the development process. We also have better nearline computation (e.g., fast computation of view sources which was not available earlier) allowing us to serve WVYP information to members faster.
By freeing up developer time, we’re now able to iterate much faster on the system and focus efforts elsewhere. By sharing a look into how the WVYP system was initially developed, operated, and rehauled, we hope that some of our takeaways will help individuals facing similar challenges or considerations in using the Lambda architecture make better decisions.
We’d like to thank Carlos Roman, Ke Wu, Josh Abadie, Chris Harris, David Liu, Hai Lu, Subbu Subramaniam, and Zhonggen Tao for their feedback on the design; Hai Lu, Jean-Francois Desjeans Gauthier, and Chris Ng for their review of this blog post; Priyam Awasthi for her contributions to the implementation; Hristo Danchev, Ray Manpreet Singh Matharu, and Manasa Gaduputi from the Samza team for support and helping troubleshooting issues; and Bef Ayenew for his support on this project.