Clearly, the delta in these two sets of latencies was unaccounted for and we needed to figure out where it was being spent. We made use of distributed tracing to debug where the delta was being spent (to learn more about how Lyft set up distributed tracing, you can check out this talk).
Using our distributed traces, we found that upstream requests from pricingservice were being queued and were spending a long time waiting around. We suspected that this might have something to do with the way that pricingservice’s web server was scheduling requests.
As mentioned earlier, pricingservice is a Python microservice which leverages Flask as its web application framework. Gunicorn is the web server used to front Flask and gunicorn uses gevent to schedule multiple requests at the same time.
Let’s recall that Python’s global interpreter lock effectively makes any CPU bound Python program single threaded. So, how do gunicorn and gevent serve thousands of incoming requests per second with this constraint? gevent spins up a greenlet (also known as green thread) for every request received by the microservice. These greenlets are scheduled to be run using an event loop using the cooperative multitasking framework we talked about in Part 1.
A typical greenlet for a request (let’s call this greenlet A) might do some work on the CPU, make an upstream service call (which involves blocking I/O) and finally perform some more work on the CPU. While performing blocking I/O, i.e the network request in this case, the greenlet will yield control to the event loop.
Gevent can then find another greenlet (let’s call this greenlet B) to schedule on its event loop. Once greenlet B is done, if greenlet A has finished its blocking I/O, the event loop schedules greenlet A to be run again. Greenlet A completes its CPU work and is taken off the event loop. The event loop now finds the next greenlet to run.
The key takeaway is that when a greenlet makes a network call, we wait not only for the call to complete, but also for the event loop to once again run our greenlet. If we run too many concurrent greenlets, CPU work for other requests can “block” our request from running.
Let’s illustrate with an example (imagine that each cell represents a 10ms time slice). Our networking infrastructure would measure the latency of Request 7’s network call as 70ms, however, the application would measure it as 180ms since we waited an additional 110ms to get scheduled on the event loop. CPU work for other requests “block” our request from running.
In the above example, running 3 concurrent requests would be nearly perfect:
Now, how does this relate to pricingservice and the high response times we were seeing earlier?
pricingservice houses hundreds of different endpoints that have varying levels of CPU and I/O workloads. Some endpoints are CPU intensive, whereas some (such as the endpoint that prices bike and scooter offers) are I/O heavy (since they call several upstream services).
The greenlets for CPU intensive endpoints in pricingservice were blocking greenlets for I/O heavy requests from being run. This was leading to the bike and scooter pricing endpoint’s greenlets being queued up and waiting longer to be scheduled by the event loop.
To solve this problem, we decided to run this endpoint on dedicated application service processes so that we’d get dedicated gunicorn processes to field our I/O heavy bike and scooter pricing requests. These requests and their greenlets would no longer need to contend with CPU intensive greenlets from other types of requests.
Using tooling built by our infrastructure teams, we were able to run our bike and scooter pricing endpoints on dedicated processes and we pointed downstream services at these dedicated hosts for all bike and scooter pricing requests. Once we rolled these changes out, we got exactly the results that we were hoping for!
Timeouts for requests to our bike and scooter pricing endpoint stopped immediately, which led to the flickering we described earlier stopping. Reliability of bike and scooter pricing requests leaped from an average of 2.5 to 4.5 nines — a massive improvement!