Event Sourcing made Simple – Kickstarter Engineering


tl;dr: Event Sourcing is to data what Git is to code. We’ve implemented a minimal event sourcing framework at Kickstarter to power d.rip. It’s simple and it has made our life so much better! Read on!

Most software developers use a tool to keep track of code history. Git is a fantastic example that’s used widely across the industry. Type git log and you can see all the changes made to a codebase. Who made the change, when it happened, what the change was (that’s the commit title), why the change was made (that’s a good commit description) and how the change was performed (well, that’s the diff). Git is also a time machine, that allows you to go back in time and see what the code looked like back then (git checkout @{12.days.ago}). You can also replay history and play what-if scenarios: go back in time, checkout a new branch, commit a change, and replay all the events commits that happened after that. When something goes wrong, you can find how a bug happened and when it was introduced. And thanks to all that, you can generate useful reports: number of commits per month, hotspots… and superb visualizations:

Rails contributions animated with gource.

Think a second about how life would be (and was) without Source Version Control (git, svn, cvs, mercurial…). We would have to annotate files by hand, copy files to have some sort of backups and share code via… ftp?

# Change:    14532
#
Update At: 2018–04–23
#
Update By: Philippe Creux
#
Reason: Fix a bug that was introduced sometimes between
# change #12320 that’s in users.rb.backup-2018–02–11
# and change #14211 above (around line 2400)
# def destroy
# self.destroy_at = Time.now
#
end
def destroy
self.deleted_at = Time.now
end

Looks painful. Not fun.

Could you work without a tool managing code history? Nope.

Now, look at your database.

Does your database manage data history?

Unless you’re using Datomic or libraries like Papertrail.rb, the answer is very likely to be: Sigh.

The tedious hand crafted comments you see above are very similar to what we do to keep (some) data history. We add attributes like: updated_at, updated_by_user_id, accepted_at, destroyed_by_admin_id. We backup our database hourly. And even then it’s quite hard to know “how we got there”.

“Why is this subscription marked as inactive here but active on the payment platform? The customer is still getting charged for it!”
“Was this post re-published at some point?”
“Which posts had the category we just deleted?”

These questions could be answered in seconds if we had a full history.

So in this post we’d like to talk about Event Sourcing.

  • We’ll go over a high level introduction to Event Sourcing where we will highlight the four components that make a (minimal) Event Sourcing system: Events, Calculators, Aggregates and Reactors.
  • We will then talk about how we implemented a (minimal) Event Sourcing Framework at Kickstarter for d.rip.
  • And finally we’ll reflect a bit on the ah-ha moments and the challenges that we’re going through with this approach — 9 months after having started to work on d.rip and 4 months after launch.

What is Event Sourcing

Martin Fowler defines Event Sourcing as:

“All changes to an application state are stored as a sequence of events.”

Let’s illustrate this with an imaginary e-commerce platform.

Events

User action, API calls, callbacks (webhooks), recurring cron jobs can all generate Events. Events are persisted and immutable.

Here are some events generated as a customer placed an order on the platform:

These events are this order’s history. We know when they happened, who triggered them and what they were. Note that the events above hold various pieces of information: product id, user id, order id, parcel tracking number, truck identifier etc.

By going through these events, we get a sense of what the current state of the world is and how it came to be in that state. It would be nice not to play all events every time we want to build application state. That’s the role of Aggregates and Calculators.

Aggregates and Calculators

Aggregates represent the current state of the application.
Calculators read events and update aggregates accordingly.

In the diagram below, the little blue circle are calculators and the green sticky notes are aggregates.

The calculator reads the sequence of events and updates the order accordingly: it adds and removes items, updates the total and marks the shipping and delivery dates.

You can create as many aggregates (and calculators) as you need. For example, an aggregate could read through the same set of events to generate a Daily Sales report.

Now that we have the current state of the world, we also want to do things when that state changes. Like it would be sweet to send our customer an email confirmation when their order has just been shipped. We need something to “react” to events. Good news, there is such a thing. It’s a Reactor.

Reactors

Reactors “react” to events as they are created. They trigger side-effects and might create other events in turn.

The reactor on the right hand side listens to the “Shipped” event. Whenever a “Shipped” event is created, it sends an email notification to the customer.

The reactor on the left hand side has a local state. Whenever a Cart has two articles, it displays a promotional offer and creates an event to keep track of this. This is actually where the “Promo displayed” event comes from.

So those are the four components of a minimal event sourcing system:

  • Events to provide a history
  • Aggregates to represent the current state of the application
  • Calculator to update the state of the application
  • Reactors to trigger side effects as events happen

Why Event Sourcing?

Having a full history of the events is one of the main benefits. We can know how we got there which helps with a lot of customer support tasks and debugging sessions.

Being able to Replay Events unlocks very neat features.

  • You can go back in time by replaying all events up to a certain point. Replay all events up until Oct 31st… and you get what the application state was on Halloween Day. Spooky!
  • All software has bugs. So when a calculator has a bug, you can fix the calculator, replay the events and get back to a valid state.
  • Finally, adding columns to an aggregate and backfilling the data is quite simple:
    – 1. Add the column.
    – 2. Update the calculator.
    – 3. Replay the events.
    – 4. The data is backfilled!

On a “regular” relational database, the data you store is the data you read. With event sourcing, the data you write (events) is decoupled from data you read (aggregates). So you can design your aggregates for the current needs of the application. Not having to “future-proof” aggregates for eventual future usage and data needs is quite nice — and avoids a lot of “gut feeling based debates”.

Aggregates can also be optimized for various usages which comes in handy in read-intensive applications: orders’ summary (for list views), orders’ details (to display one order), orders’ daily reports (for business), etc. You could basically get to the point where your aggregate fields match one to one your UI or report fields. That. Is. Fast reads!

And finally, Event Sourcing is a great pattern for distributed systems that tend to be asynchronous and have various services or serverless functions. Services can listen to events they are interested in to update their local state, perform actions and publish other events in turn.

Event Sourcing for d.rip

Drip is a platform to support creators’ practice. Creators publish content (comics, podcasts, behind the scene videos, etc…) that supporters get access to by subscribing to the creator’s Drip.

We launched the first version of Drip on November 15th — roughly 6 months after the first line of code was being written. The Back-end is a Ruby on Rails application offering a GraphQL API. The front-end is React based.

We were a couple of engineers to suggest that we experiment with “Event Sourcing” when we started to work on Drip. It was pretty easy to convince the rest of the team to give it a try since it would address a lot of the pain points that most apps (including Kickstarter) run into after a couple of years (or months) of existence.

Event Sourcing Experiment Requirements

The deadline was pretty tight (6 months to launch) so the Event Sourcing experiment had the following requirements:

  1. It should not slow down development (too much)
  2. It should be quick for an engineer to learn the concept and be proficient
  3. If the experiment fails, it should be easy to rip-out and rollback to a regular Rails / MVC pattern.

Based on those requirements, we decided to make the Event Sourcing framework an implementation detail of the back-end. The event sourcing implementation is not surfaced to GraphQL. The client application consuming the GraphQL API is not aware there is some Event Sourcing going on behind the scene.

We wanted the Aggregates to be regular ActiveRecord models that follow patterns that you’d find on a regular Rails application. This way, we could remove the Event Sourcing framework altogether and replace it with in-place data mutation: create!, update! and destroy! calls.

We looked at various Event Sourcing frameworks written in Ruby but most of them were actually too complex for our needs or would store data in a way that was too different from your regular Rails app. So we decided to build our own minimal framework. It’s about 200 lines of code. And it’s been good enough so far.

Homemade minimal Event Sourcing framework

Aggregates and Events are stored in a relational database.

Each Aggregate (ex: subscriptions) has an Event table associated to it (ex: subscription_events).

Events are created and applied to the aggregate synchronously in a SQL transaction. We avoid situations where events only partly applied and we don’t have to deal with the complexity that asynchronicity introduces. Relying on database transactions to keep the data consistent requires almost no effort on our part.

All Reactors respond to the call method and take an event as an argument. The Dispatcher connects Reactors to Events.

Let’s look at some Code

Let’s talk about the Subscription model. When a user subscribes to a Drip, we create a Subscription.

The Subscription aggregate

class Subscription < ApplicationRecord
belongs_to :user
belongs_to :reward
has_many :events
end

Sample content:

Notice that this model and its attributes are very similar to models you would come across in any Rails application. The only difference is that we have access to the history via has_many :events.

Subscription events

All events related to an aggregate are stored in the same table. All events tables have a similar schema:

id, aggregate_id, type, data (json), metadata (json), created_at

We rely on ActiveRecord’s Single Table Inheritance mechanism to store all the events related to the Aggregate in the same table. Active Record stores the event classname in the type column. Being specific to each event, event data and metadata are stored as json.

Below is the “Subscription Activated” event. Like all events related to “Subscriptions” it inherits from the “Subscription Base Event”.

class Events::Subscription::Activated < Events::Subscription::BaseEvent
  data_attributes :stripe_key
  def apply(subscription)
subscription.stripe_key = stripe_key
subscription.status = “active”
subscription.activated_at = self.created_at
    subscription
end
end

data_attributes defines setters and getters for the attributes passed in. They will all get stored in the data column.

The apply method is the actual Calculator for this event. Most calculators are embedded into the event code to simplify things. A couple of events delegate apply to external calculators when the calculation is complex (international taxes, I’m looking at you!).

apply takes an aggregate and applies changes to it. You might notice that activated_at is set to the event creation time — not the current time. That’s because we don’t want that timestamp to change when we replay events. Replaying events should be idempotent. As a rule of thumb, the calculator (apply) should only use constants (here: “active”) or attributes defined on the event (stripe_key and created_at) and events should embed all the information necessary to update aggregates.

Below are the entries for “Subscription Created” and “Subscription Activated” events:

Looking at the metadata, you might guess that the “Created” event is triggered by a user while the “Activated” event comes from a webhook notification.

When an event is created it is automagically applied to the associated aggregate. The following would create the events and update the aggregate:

subscription = Subscription.find(12)
Events::Subscription::Activated.create!(
subscription: subscription,
stripe_key: “sub_66123”,
metadata: { notification_id: 33456 }
)
subscription.activated? # => true

Reactors and dispatcher

Here are two reactors that react to the “Subscription Activated” event. They both queue up an email for delivery. The first one sends a confirmation email to the subscriber, the second one a notification email to the creator.

class Reactors::Notifications::SubscriptionConfirmation
def self.call(event)
SubscriberMailer.confirm_subscription(
subscription_id: event.subscription_id
).deliver
end
end
class Reactors::Notifications::NewSubscriberNotification
def self.call(event)
CreatorMailer.queue_new_subscriber(
subscription_id: event.subscription_id
).deliver
end
end

We subscribe reactors to events in the Dispatcher.

class Dispatcher
# ...
on Events::Subscription::Activated,
async: Reactor::Notifications::SubscriptionConfirmation
  on Events::Subscription::Activated,
async: Reactor::Notifications::NewSubscriberNotification
# ...
end

Most reactors are triggered asynchronously (notice the async keyword above) and a couple of reactors are triggered synchronously using trigger: instead of async:. We tend to run synchronously reactors triggering events that update related records. For example, only one post can be pinned at a time. On “Post Pinned” the dispatcher triggers a reactor that will unpin any other pinned posts by creating a “Post Unpinned” event. We want all those changes to happen atomically to keep things simple and consistent.

Commands

While not part of the mechanics of an Event Sourcing framework, on Drip we use an additional layer called “Commands”. They are responsible for:

  1. Validating attributes
  2. Validating that the action can be performed given the current state of the application
  3. Building and persisting the event

Below is a command that activates a subscription. It includes the “Command” mixin which provides some validation capabilities, syntactic sugars to define attributes and default behavior.

class Commands::Subscription::Activate
include Command
attributes :subscription, :stripe_key, :metadata
validate stripe_key, presence: true

def build_event
Events::Subscription::Activated.new(
subscription: subscription,
stripe_key: stripe_key,
metadata: metadata
)
end

  def noop?
subscription.activated?
end
end

The command above will be a noop (it won’t create an event) if the subscription is already activated. It will raise an exception (ActiveModel::ValidationError) if the stripe_key is missing. Commands are triggered via call:

Commands::Subscription::Activate.call(
subscription: subscription,
stripe_key: “sub_66123”,
metadata: { notification_id: 33456 }
)
# => <#Events::Subscription::Activated ...>

5 month after launch…

Drip is currently in Public Beta. As of April 2018, we’ve invited 85+ creators to the platform that are supported by 7000+ active subscribers.

Code wise, we have:

  • 12 Aggregates
  • 90 Events
  • 35 Reactors
  • 50 Commands

And data wise:

  • 25,000+ aggregates
  • 150,000+ events

Ah ha! Moments

Replaying events is awesome! Whether we replay events to add a column to an aggregate, fix a bug in a calculator or restore deleted aggregates (yes!) it always feels magical and powerful. No need to write a custom script to backfill or fix your data. Just update the calculator, replay events, and you’re done! You’re in a safe place where you cannot lose data. It’s like when you delete code or files and know that you’ll be able to get that content back anytime if needs be.

You get reporting and charting for (almost) free. All the codebases I’ve worked on are cluttered with code that sends hand crafted events to an event bus or a third party service like Google Analytics, Mixpanel, Intercom, etc. It’s tedious to maintain, often inconsistent, not tested and you need to add more and more event tracking as the application gets more mature. Events being a first class citizen in event sourcing, you can create one Reactor to forward them all to your favorite analytic platform(s).

Obviously understanding “how we get here” by looking at the history makes tracking bugs a breeze and helps tremendously the customer success team.

We also thought that versioning events would be hard. So far, we’ve only had to add new attributes to events. When that happens, there are two scenarios:

  1. Either the attribute value was “implicit” before it was added. For example, if the “currency” attribute is not defined on an old record of an event, we can assume it’s “USD”.
  2. If there is no “implicit” value (ex: subscriber country), you can persist “backfilling” events (“CountryGuessedForBackfilling”) that use various data sources to guess the country (e.g. user address, credit card company, etc)

Challenges

Naming is hard. And there are so many immutable events and attributes to name. The names you choose now will be the ones stored forever. So take a good dictionary and make sure that you nail down names that are explicit and future-proof.

Destructuring one action (GraphQL mutation in our case) into multiple commands and events is quite complex. It is actually the most complex part of the system. There are lots of combinations, so we (should) rely on generative testing to ensure that all combinations result in valid states.

Take the mutation to update a post. All the attributes are optional, so you can call

updatePost(title: “My new title”)

or

updatePost(
title: “Current title,
description: “Same description”,
published: true
)

The first call should only update the title.

The second one should only publish the post. Why? Because the title and the description are unchanged. They have the same value as the ones persisted in the database.

Here is (a subset) of the attributes, commands and events that the updatePost mutation is destructured into:

Wrapping up

We put together a simple implementation of the Event Sourcing framework.

  • There are 4 components:
    – Aggregate (regular Active Record models)
    – Events
    – Calculator (built into events)
    – Reactors
  • The data is persisted on a regular SQL database.
  • Updates are Synchronous and Atomic.
  • Home-made “framework” (about 200 line of code).

Yet, it brings a lot of value.

  • Full History, Audit Log
  • Updating aggregates and backfilling data is easy
  • Fixing bugs is easier
  • There is less risk of “losing” data
  • Events can be sent to your favorite analytic platforms with (almost) no additional code

So, is it like git for data? Pretty much yeah! We definitely encourage you to evaluate event sourcing for your next app, next feature or any existing feature that’s mission critical.

Resources

  • Our Event Sourcing implementation is available as a gist for educational purpose.
  • Original presentation “Event Sourcing Made Simple” given at Conf & Coffee, in Vancouver BC on April 15 can be found there. Recording coming soon.
  • Martin Fowler gave an excellent talk where he highlights that Event Sourcing implementations don’t need to be asynchronous. It made us feel good about putting together such a simple implementation.
  • We encourage you to look at the Ruby frameworks that we’ve evaluated. They were a great source of inspiration and they might fit your needs better than that 200 lines long gist. Event Sourced Record, Rails Event Store, Sandthorn and Sequent.

Thanks to Natacha, Amy, JJ, Brian and Mark for their feedback and special thanks to Janel meticulously reviewing this post. 💚



Source link