How we improved latency through projection in Espresso


The software architecture of the identity services stack

The services in this diagram can be grouped into four categories. On the left side are 150+ clients requesting data, such as member profiles. An example of a client is the frontend application handling profile pages. On the bottom is Espresso, providing a scalable document store. In the middle, we have two services: identity and identity-mt. Identity is a service on top of Espresso exposing raw CRUD operations that defines various tables in Espresso for different entities, such as profiles, settings, privacy settings, etc. Identity-mt is a mid-tier service implementing business logic related to profile queries. It is also the place where member privacy settings are enforced. In order to do so, the mid-tier service calls a number of downstream services to fetch information such as invitation status, network distances, etc., shown on the right side of the diagram. The identity services are among the most scalable systems we have built at LinkedIn. At its peak, the identity service serves close to one million QPS. Almost all major features on the LinkedIn platform rely on this service to serve member data. 

Problems
Because of the massive scale and the critical functionality of the system, we’re constantly looking for ways to improve its quality of service (measured by reliability, latency, etc.) as well as to reduce cost to serve. Before the Espresso projection was available, identity had to fetch full records from the database and filter them through other mechanisms. This has caused a few issues in the past that we knew could be resolved once and for all. 

Let’s start with latency. A profile object has over 90 top level fields, with many fields being of complex types. Those complex types may contain many more other subfields. For instance, the education field is an array of type Education, which itself contains fields such as school ID, start date, end date, etc. If a client only asks for a few fields, such as first name and last name, it makes more sense to fetch the whole document.

Next, let’s discuss the impact on reliability. All well-designed systems impose some limits to protect themselves from abusive clients. This applies to both Espresso and our service infrastructure, Dynamic Discovery. Both impose a 2MB limit to response sizes. If the response size is larger than 2MB, a 500 error is thrown instead. This implies that if a client asks for more than one profile (perhaps up to 10) in a batch call, even if they only need first name and last name, we would fetch the whole profiles for all of them, potentially exceeding 2MB and resulting in a bad member experience. Configuring the limit is an option, but is not the most desirable solution as profile size can continue to grow. 

Finally, there is the cost-to-serve issue. The more data we fetch, the more data we need to transfer over the network. This requires more bandwidth and incurred costs. Given the sheer scale of the service, we knew that fixing this issue could potentially have dramatic effects in reducing the cost to serve.

Applying Avro projection
While we had implemented some temporary hacks to mitigate the issues described earlier (e.g., falling back to single parallel gets if the combined response is larger than 2MB), we knew utilizing the new Espresso projection feature was the right way to go for a long-term fix.

First, we needed all clients to stop sending queries that requested for whole profiles. This was not a trivial task for a service with 150+ clients. We achieved this via a horizontal initiative—a process within LinkedIn that runs every quarter for technical initiatives that require significant effort from multiple teams. After this initiative, we implemented a request blocker to make sure all requests containing field projections were represented as Rest.li projections.

Next, we translated the Rest.li projections into Avro projections expected by Espresso as described earlier.

Results

We saw significant savings in latency and cost to serve alongside the feature ramp.

Mean response sizes
The graph below shows how much the mean response size of calls to profile-get has changed since ramping Avro projections. The mean response size dropped from ~26KB to ~10KB at 50% ramp, and from ~10KB to ~7KB at 100% ramp. This improved latencies both directly and indirectly. The services spent less time transferring bytes over the network. In addition, the services spent less time serializing and deserializing the bytes, both processes that consume memory and CPU cycles.



Source link