At LinkedIn, we ship hundreds of command-line utilities to every machine in our data centers and to all of our employees’ workstations. The vast majority of these utilities are written in Python.
In addition to developing these command-line utilities, we have hundreds of supporting libraries that are constantly being iterated on, with new versions published daily. Because of the inherent problems present when dealing with such a huge and ever-changing dependency graph, we need to package the executables individually to avoid dependency conflicts. Initially, we took advantage of the great open source tool PEX. PEX elegantly solved the isolated packaging requirement we had by including all of a tool’s dependencies inside a single binary file that we could then distribute.
However, as our tools matured over time and picked up additional dependencies, we became acutely aware of the performance issues being imposed by the pkg_resources library, which are well documented here.
To briefly summarize:
- pkg_resources must scan every element of Python’s sys.path in order to build metadata objects that represent each “distribution.”
- This work is done at import time, which can significantly slow down CLI invocation time.
Since PEX leans heavily on pkg_resources to bootstrap its environment, we found ourselves at an impasse: either lose out on the ability to neatly package our tools in favor of invocation speed, or impose a few-second start-up penalty for the benefit of easy packaging.
After spending some time investigating the possibility of extricating pkg_resources from PEX, we decided to start from a clean slate, and thus, shiv was created.
What does shiv do?
Just like PEX, shiv allows us to create a single binary artifact from a Python project that includes all of its dependencies. The only thing required to run a full-fledged Python application is an interpreter.
How does shiv work?
Inspired by PEP 441, shiv creates a “zipapp” containing a Python project, all the project’s dependencies, and a special bootstrap module that extracts and injects all required dependencies at run time with very little latency cost.
shiv completely avoids the use of pkg_resources. If it is included by a transitive dependency, the performance implications are mitigated by limiting the length of sys.path and always including the -s and -E Python interpreter flags.
Instead of shipping our binary with downloaded wheels inside, we package an entire site-packages directory, as installed by pip. We then bootstrap that directory post-extraction via the stdlib’s site.addsitedir function. That way, everything works out of the box: namespace packages, real filesystem access, etc.
Because we optimize for a shorter sys.path and don’t include pkg_resources in the critical path, executables created with shiv are often even faster than running a script from within a virtualenv.
The tool freezes a Python environment, so you can think of shiv as a shorter way of saying “shiver.”