Marshmallow is a fantastic ODM that we leverage extensively at Lyft. It has an extensible, easy-to-use API and is framework agnostic. We use it to validate incoming requests, to deserialize data from Mongo or Dynamo, and to serialize outgoing requests. Unfortunately, we ran into a sticky problem with Marshmallow. Benchmarks suggest it’s one of the slowest serialization frameworks for Python. This put us in a tough spot…we love the powerful, easy-to-use API, but we wanted better performance than Marshmallow could offer. We huddled up and consulted with our trusted confidant, who had an incredible suggestion:
To understand how both are possible, one must first understand how Marshmallow works. Marshmallow is slow because it’s effectively implementing an interpreter in an interpreter. When you serialize an object through Marshmallow it iterates through all the fields it knows about, then does a ton of reflection to extract the field from the input object. In practice, this is hugely wasteful. We are serializing the same object over and over again and it generally complies with the provided schema.
Let’s Heat Things Up with Toasted Marshmallow
With Toasted Marshmallow we decided to take an approach where we generate a serialization method at runtime that assumes the incoming object complies with the schema, falling back to the original Marshmallow code if it doesn’t.
As a concrete example, say we had the following Schema:
The generated serialization method would be:
Toasted Marshmallow will first use this method, and — should any exceptions get thrown or the object fail to parse — fall back to the original Marshmallow code. This approach is 15–20x faster than reflection alone.
Results, Hot Off The Press
In addition to reducing CPU consumption of our hosts using Marshmallow, rolling out Toasted Marshmallow resulted in nearly a 50% drop in our p95 response times!
Toasted Marshmallow is a drop in replacement for Marshmallow, there should be no difference in your code. Simply replace your dependency on
toastedmarshmallow==0.1.0 and set the environment variable
toastedmarshmallow.Jit(you can also optimize individual schemas). Toasted Marshmallow includes a slightly modified version of Marshmallow to add some necessary hooks for calling into the generated code first and regenerating the code when needed. We structured the code to minimize the modifications needed in Marshmallow making it easier to track upstream in the future.
We treat any differences as a bug, so if you have any code that worked with Marshmallow but fails with Toasted Marshmallow please file an issue.
Check out Toasted Marshmallow today, and add s’more speed to your code!