Airbnb is built on a service-oriented architecture (SOA). In our production infrastructure, we run hundreds of services that do everything from calculating pricing to returning search results to sending messages to users. To unify and scale our infrastructure, we use Kubernetes, an open source container orchestration engine, to define and manage our workloads. We currently run hundreds of workloads in Kubernetes across tens clusters and tens of thousands nodes. In this post, we will talk about how we use Krispr to inject infrastructure components into pods. The name “Krispr” is a play on words with two different references: 1) the CRISPR gene editing technique used to mutate the genomes of live organisms, and 2) the crisper drawer in a refrigerator that is used to keep vegetables fresh. One of the goals of Krispr is to keep our infrastructure components up to date and fresh.
Airbnb and Kubernetes
Airbnb has put significant effort toward simplifying the process of building and running services on Kubernetes. A major contributor to this simplification was the development of kube-gen, an in-house tool built to allow engineers to keep the configuration of various environments — such as production, staging, and canary — in sync. Kube-gen also provides a simplified interface for service owners. Instead of exposing all the bells and whistles of Kubernetes, we provide standard defaults, opinionated configurations, and validation. Kube-gen is effectively a compiler that runs in the pre-build phase of every service. It takes an internally defined format as input and outputs Kubernetes manifests. Like a compiler, the kube-gen binary is explicitly versioned. In order to get new features and settings, services are required to upgrade their version of kube-gen.
Shared Infrastructure Components
More and more features were added to kube-gen as it grew in complexity. One feature, called “components”, allows infrastructure engineers to create shared infrastructure components that can be injected into a service’s definition. This is a very powerful concept as it allows core infrastructure concerns like logging, metrics, and service discovery to run in separate sidecars and evolve independently from each other.
Given that kube-gen binaries are explicitly versioned, the rollout of new shared components was dependent on kube-gen version upgrades. So if the service discovery component was changed, it would not get picked up by a service until the service owner had upgraded their service to the newest version of kube-gen.
At its core, this model put product engineers, rather than infrastructure engineers, in the driver’s seat when it came to rolling out shared infrastructure components. This had detrimental downstream effects. One of the most problematic of these was that it was difficult to know when a shared component would be fully rolled out.
With hundreds of services owned by many different teams, this became a logistical challenge. Each time a shared infrastructure component needed to be updated, we had to corral all service owners to upgrade and deploy their service with the newest kube-gen version. Our infrastructure components ended up with significant version fragmentation, which increased complexity and costs of maintenance.
Among other disadvantages were that infrastructure engineers lacked the ability to target specific services or environments when rolling out changes, and product engineers lacked the necessary context to monitor rollouts adequately when upgrading kube-gen versions. All in all, no one was completely happy with the current state of things.
As we searched for ways to address these issues, we came across Kubernetes’ mutating admission controller webhook. In short, an admission controller webhook is an HTTP callback that intercepts API calls and can modify objects before they are stored in the Kubernetes API. We realize that we could use a mutating admission controller to inject and/or modify pods as they are created in the cluster. We could leverage such a controller to inject components like service discovery. When the service discovery team wants to release a new version of their component, they need only update their webhook and all new pods will start picking up their changes.
We already had experience running a validating admission controller to enforce security policies in our clusters, but we had some reservations with mutating webhooks. Our biggest concern was that every webhook that we added would be part of the critical path to creating a new pod, meaning that we would be introducing new potential points of failure. Though many infrastructure teams wanted a new solution, they were not thrilled at the idea of maintaining and being on-call for these webhooks.
To leverage mutating webhooks without creating maintenance overhead, we decided to separate the concern of “what” was being changed about the pod specification from “how” that change happens. We came up with a new approach that uses what we’ve dubbed a “mutator” to define “what” to change. A mutator is a pure function that accepts a Kubernetes manifest byte stream as input and returns a Kubernetes manifest byte stream as output.
As we looked at more and more examples, we realized that nearly all shared components were doing the same thing: injecting either an init container or a sidecar into a pod. To make it easier for other infrastructure developers to build mutators, we built a higher level “container mutator”. The container mutator requires just a single configuration file, which defines the container you want to inject into pods.
Instead of writing a function that knows how to manipulate a Kubernetes manifest byte stream, infrastructure engineers now need only provide a Docker image and a configuration file. We’ve named this framework “Krispr”.
At its core, Krispr is a command line tool that is responsible for finding all the mutators that need to be applied and applying those mutators, one at a time, to a Kubernetes manifest byte stream. That also makes Krispr itself a mutator, since the set of mutators applied by Krispr fully defines what the output of a resulting pod will be.
We’ve built a mutating admission controller that passes all pods through to Krispr. Krispr knows how to find all the mutator configuration files and applies those changes to the pods. The mutating admission controller then takes the final pod definition and computes a JSON patch, which is used by the AdmissionReview API to translate the original incoming pod definition to the final one.
Since Krispr provides an abstraction layer on how mutators are run, we can run these mutators in other contexts besides an admission controller. In fact, we also run Krispr at build time, right after kube-gen generates the initial set of Kubernetes manifests. This provides us with two very useful properties. First, it allows us to relax the runtime requirements of the mutating admission controller. If it times out, or is temporarily down, we can still admit pods into the cluster, knowing that we have run Krispr and all of its mutators at least once at build time. This is huge from a reliability and operational perspective since we can now tolerate temporary downtime in our admission controller. Second, it lets us see errors and problems in Krispr much earlier. If we detect a bug in Krispr that causes build failures, we can roll back those changes before Krispr rolls out onto the admission controllers.
What happens if there is a bad rollout of an infrastructure component now? Previously, a service owner could abort and rollback the deploy in order to undo the infrastructure component change. Now, however, that is not the case. The rollback pods will get the new, bad infrastructure components injected. We’ve addressed this problem in Krispr by implementing a mutation pause period. If we detect that a pod has been mutated within the past two week, we will not re-mutate the pod. This allows service owners to deterministically rollback to a build within the last two weeks.
Previously, the rollout of shared infrastructure components was tightly coupled with the rollout of new versions of kube-gen. This gave service owners full control of when to do upgrades, at the cost of slowing down infrastructure changes. Furthermore, the resulting fragmentation and complexity made our systems less stable and reliable. We introduced the concept of mutators to make it easier for other infrastructure developers to build and to roll out new infrastructure components. We built Krispr to aggregate and to run mutators both as a pre-build step and in the mutating admission controller to ensure infrastructure components are always kept up to date, while keeping these mutators out of the critical path of creating new pods. Finally, we added a two-week mutation pause period to allow service owners to deterministically roll back builds up to two weeks old, while giving infrastructure developers an upper bound on how long it will take their components to roll out. We feel that this approach strikes the right balance between infrastructure stability and development velocity.
Krispr is the work of many different collaborators, and it would not have been possible without the contributions and support of Laurent Charignon, Bruce Sherrod, Evan Sheng, Nick Adams, Jian Cheung, Joseph Kim, Rong Hu, Chen Luo, Brian Wolfe, Rushy Panchal, Changgeng Li, Hua Zheng, Juwan Yoo, Daniel Evans, Stephen Chan, Ramya Krishnan, Johannes Ziemeke, Jason Jian, Liuyang Li, and Sunil Shah.