Stream processing with Samza
Now that we’ve chosen our starting and ending events and are emitting them to Kafka, we need to be able to determine the latency between our starting event and our end event. To do that, we’ll use Samza, a stream processing framework built to consume events from Kafka, and will perform some computation over them. In this case, we’ll be using Samza to determine the time difference between our StartTrackingEvent and our EndTrackingEvent, the latency of our flow.
Joining events using Samza
In order to determine the latency for a particular push notification, we need to join a StartTrackingEvent for a single push notification with its corresponding EndTrackingEvent. To do so, we’ve written two jobs: a Partitioner, which partitions the events to ensure that the events we’re tracking as a single push notification end up on the same process, and a Joiner, which calculates the latency of those matched events.
The first thing we want to do is ensure that we can match the StartTrackingEvent and EndTrackingEvent that make up a particular push notification within a single host. We do this with a Partitioner step. All events are initially consumed by Aeon in a random way, meaning there’s no guarantee that a StartTrackingEvent and EndTrackingEvent for one push notification end up being processed by the same machine.
Luckily, both of the events share some data in common, typically a key that a StartTrackingEvent will share with only one other EndTrackingEvent. We can use this key to partition our events so that they ultimately end up being processed by the same Joiner. In this case, each event has a pushId, so we can publish all events with the same pushId to the same message queue. By doing this, we ensure that the StartTrackingEvent with pushId A will be consumed by the same Joiner which consumes the EndTrackingEvent with pushId A.
This looks like the following: