Taming Service-Oriented Architecture Using A Data-Oriented Service Mesh | by Adam Miskiewicz | Airbnb Engineering & Data Science | Nov, 2020


At Hasura’s Enterprise GraphQL Conf on October 22, we presented Viaduct, what we’re calling a data-oriented service mesh that we believe will bring a step function improvement in the modularity of our microservices-based Service-Oriented Architecture (SOA). In this blog post, we describe the philosophy behind Viaduct and provide a rough sketch of how it works. Please watch the presentation for a more detailed look.

For a while, Service-Oriented Architectures have been moving towards ever larger numbers of small microservices. Modern applications can consist of thousands to tens of thousands of microservices connected in unconstrained ways. As a result, it’s not uncommon to see dependency graphs like the following:

This particular dependency graph happens to be from Airbnb, but it’s not uncommon. Amazon, Netflix, and Uber are examples of those that shared similarly tangled dependency graphs.

These dependency graphs are reminiscent of spaghetti code, just at the microservices level. Similar to how spaghetti code becomes harder and harder to modify over time, so does spaghetti SOA. To help manage the larger number of services inherent in a microservices-based architecture, we need organizing principles as well as technical measures to implement those principles. At Airbnb, we undertook an effort to find such principles and measures. Our investigations led us to the concept of a data-oriented service mesh, which we believe brings a new level of modularity to SOA.

Organizing large programs into modular units is not a new problem in software engineering. Up until the 1970s, the main paradigm of software organization focused on grouping code into procedures and procedures into modules. In this approach, modules publish a public API to be used by code outside of the module; behind this public API, modules hide their internal, helper procedures and other implementation details. Languages such as Pascal and C are based on this paradigm.

Starting in the ’80s, the paradigm shifted to organizing software primarily around data, not procedures. In this approach, modules define classes of objects that encapsulate an internal representation of an object accessed via a public API of methods on the object. Languages such as Simula and Clu pioneered this form of organization.

SOA is a step back to more procedure-oriented designs. Today’s microservice is a collection of procedural endpoints — a classic, 1970s-style module. We believe that SOA needs to evolve to support data-oriented design, and that this evolution can be enabled by transitioning our service mesh from a procedural orientation to a data orientation.

Central to modern, scalable SOA applications is a service mesh (e.g., Istio, Linkerd), which routes service invocations to instances of microservices that can handle them. The current industry standard for service meshes is to organize exclusively around remote procedure invocations without knowing anything about the data that makes up the application architecture. Our vision is to replace these procedure-oriented service meshes with service meshes organized around data.

At Airbnb, we are using GraphQL™️ to build a data-oriented service mesh called Viaduct. A Viaduct service mesh is defined in terms of a GraphQL schema consisting of:

  • Types (and interfaces) describing data managed within your service mesh

The types (and interfaces) in the schema define a single graph across all of the data managed within the service mesh. For example, at an eCommerce company, a service mesh’s schema may define a field productById(id: ID) that returns results of type Product. From this starting point, a single query allows a data consumer to navigate to information about the product’s manufacturer, e.g., productById { manufacturer }; reviews of the product, e.g. productById { reviews }; and even the authors of those reviews, e.g., productById { reviews { author } }.

The data elements requested by such a query may come from many different microservices. In a procedure-oriented service mesh, the data consumer would need to take these services as explicit dependencies. In our data-oriented service mesh, it is the service mesh, i.e., Viaduct, not the data consumer, that knows which services provide which data element. Viaduct abstracts away the service dependencies from any single consumer.

In our talk we discuss how, unlike other distributed GraphQL systems like GraphQL Modules or Apollo Federation, Viaduct deals with the schema as a single artifact and has implemented several primitives that allow us to keep a unified schema while still allowing for many teams to collaborate on that schema productively. As Viaduct replaces more and more of our underlying procedure-oriented service mesh, its schema captures the data managed by our application more and more completely. We have taken advantage of this “central schema,” as we call it, as a place to define the APIs of some of our microservices. In particular, we have started using GraphQL for the API of some microservices. For these microservices, their GraphQL schemas are defined as a subset of the central schema. In the future, we want to take this idea further, using the central schema to define the schema of data stored in our database.

Among other things, using the central schema to define our APIs and database schemas will solve one of the bigger challenges of large-scale SOA applications: data agility. In today’s SOA applications, a change to a database schema often needs to be manually reflected in the APIs of two, three, and sometimes even more layers of microservices before it can be exposed to client code. Such changes can require weeks of coordinating among multiple teams. By deriving service APIs and database schemas from a single, central schema, a database schema change like this can be propagated to client code with a single update.

Often in large SOA applications, there are many stateless “derived-data” services and “backend-for-frontend” services that take raw data from lower-level services and transform it into data that’s more appropriate for presentation in clients. Stateless logic like this is a good fit for the serverless computing model, which eliminates the operational overhead of microservices altogether and instead hosts logic in a “cloud functions” fabric.

Viaduct has a mechanism for computing what we call “derived fields” using serverless cloud functions that operate on top of the graph without knowledge of the underlying services. These functions allow us to move transformational logic out of the service mesh and into stateless containers, keeping our graph clean and reducing the number and complexity of services we need.

Viaduct is built on graphql-java and supports fine-grained field selection via GraphQL selection sets. It uses modern data-loading techniques, employs reliability techniques such as short-circuiting and soft dependencies, and implements an intra-request cache. Viaduct provides data observability, allowing us to understand, down to the field level, what services consume what data. As a GraphQL interface, Viaduct allows us to take advantage of a large ecosystem of open source tooling, including live IDEs, mock servers, and schema visualizers.

Viaduct started powering production workflows at Airbnb over a year ago. We started from scratch with a clean schema consisting of a handful of entities and have grown it to include 80 core entities that are able to power 75% of our modern API traffic.

As mentioned in the introduction, more details on the motivation and technology behind Viaduct can be found in our presentation.



Source link