By Wayne Cunningham
Open source software pervades the work we do at Uber. On the infrastructure side, we have contributed projects like Jaeger, which lets engineers trace complex architectures, and M3, a metrics platform that works with Prometheus. For front-end development, we built RIBs, a cross-platform architecture for mobile apps, along with Fusion.js, a plugin-based web framework. In the rapidly advancing area of machine learning, we have open source tools such as Horovod, a distributed training framework, and Pyro, a deep probabilistic programming language written in Python.
Of course, we use other open source software extensively at Uber. Given that we process the data for 15 millions trips per day, among other compute tasks, we use Apache Hadoop for large-scale data analytics. It lets our data scientists figure out the most efficient means of getting people to their destinations. Our engineers have contributed a number of features to Hadoop, making the project even more scalable. Likewise, Apache Spark pipelines infuse our architecture, wrapping many unique and innovative products and enabling fast and streamlined data analytics. We leverage machine learning through our AI Labs, which contributes advanced research, and in our day-to-day operations, such as with our financial planning platform. Our in-house machine learning platform, Michelangelo, powers many of these efforts, and is built on such open source projects as TensorFlow and Cassandra.
Many engineers come to Uber with extensive experience developing open source projects, and others learn the value of open source on the job. The ability to build innovative software and release it to open source, or improve on an existing open source project, is a rewarding experience, amplified by acceptance from the open source community.
At Uber Open Summit 2018, our first annual conference devoted to Uber’s open source ecosystem, our engineers and those in the community will demonstrate and discuss their work. Here are seven of the open source projects Uber Open participants can expect to learn about:
A distributed training framework for TensorFlow, Keras, and PyTorch
Category: Machine Learning
“We developed Horovod, a distributed training framework for TensorFlow, Keras, and PyTorch, to speed training of machine learning models at scale and to make it easier for developers to run new models. Along with Uber, NVIDIA, Amazon, Alibaba, and many other companies use Horovod for distributed deep learning. In the past year, we’ve seen a tremendous uptick in both the adoption of Horovod and in the amount of external contributions. A team of researchers from Oak Ridge National Laboratory, Lawrence Berkeley National Laboratory, and NVIDIA has scaled climate analytics deep learning model training beyond one exaflop on 27,360 V100 GPUs using a modified version of Horovod, which they contributed back to the project. Amazon has contributed support for parallelized hierarchical allreduce, which improved speeds from 35 to 40 percent. IBM contributed support for PowerAI DDL, and included integration with Horovod into IBM Watson Studio and IBM FfDL. Cloud infrastructures AWS, GCP, Azure, and NVIDIA GPU Cloud now include Horovod as part of their standard deep learning images. We’re looking forward to helping more users scale their workloads and a continued stream of impactful contributions!”
– Alex Sergeev, Horovod Project Lead
A deep universal probabilistic programming language
Category: Machine Learning
“Pyro is a deep probabilistic programming language built on PyTorch, a modern, GPU-accelerated deep learning framework. Developed at Uber AI Labs by Noah Goodman and team, Pyro is used as a platform for research in modern Bayesian machine learning, where deep neural networks can be used both in models and in inference. To scale to large datasets and high-dimensional models, Pyro uses stochastic variational inference algorithms and probability distributions built on top of PyTorch. To accommodate complex or model-specific algorithmic behavior, Pyro leverages Poutine, a library of composable building blocks for modifying the behavior of probabilistic programs. Applications inside Uber include sensor fusion and time series forecasting. The Pyro team works closely with the PyTorch team and many open source collaborators to create a rich, stable toolset for probabilistic machine learning research.”
– Fritz Obermeyer, Pyro engineer
A plugin-based universal web framework
– Matt Morgan, Fusion.js Project Lead
A hexagonal hierarchical geospatial indexing system.
“H3 is Uber’s hexagonal grid system, which we use for indexing geospatial data, creating visualizations, and optimizing the Uber marketplace. We recently completed open sourcing Python and Go bindings for the H3 library, adding to the bindings contributed by the open source community. Kepler.gl recently gained the ability to visualize H3 indexed data. We are excited to be working with the open source community to develop algorithms for H3, and build integrations for open source software applications to use H3.”
– Isaac Brodsky, H3 Project Lead
A distributed TSDB and query engine, Prometheus Sidecar, metrics aggregator, and more
“M3, a metrics platform, and M3DB, a distributed time series database, were developed at Uber by the Observability team in New York City. Now with support for Prometheus, a popular open source monitoring system, M3 is a real-time, turnkey, scalable, and configurable multi-tenant store for application, system, and infrastructure metrics, thereby enabling next generation monitoring and data-driven decision making. M3 stores petabytes of metrics at Uber and is starting to be picked up by a few organizations across the U.S., Europe, and China as a centralized system to store and query their decentralized short-term Prometheus metrics instances. The growing community helps make it easy for anyone to run their own reliable and scalable metrics system. We will be giving a keynote talk at KubeCon 2018 in Seattle about cloud deployments of M3 on Kubernetes, and presentations about M3 as a platform at conferences in Nuremberg, Sofia, and New York.”
– Rob Skillington, M3 Project Lead
A tracing system to monitor and troubleshoot transactions in complex distributed systems
“Jaeger provides distributed tracing for Uber’s thousands of microservices. Leveraging trace data from a number of RPC frameworks, Jaeger consolidates service calls into a unified call graph. Jaeger is a member of the Cloud Native Computing Foundation, and made it onto InfoWorld’s list of best open source software for cloud computing for the last two years. Lately, Jaeger has been developing new methods to harness the mountain of trace data that Uber’s services produce. The team open sourced Kafka-based publishing and data ingestion capabilities that serve as the foundation for the data mining and aggregate analysis of tracing data utilized heavily at Uber, as well as novel visualization tools for comparing traces and trace cohorts.”
– Yuri Shkuro, Jaeger Project Lead
A suite of open-source visualization frameworks
“The Vis.gl Framework Suite, by Uber’s Visualization and Urban Computing teams, currently consists of luma.gl, react-map-gl, deck.gl, and kepler.gl. We continue to see strong support and interesting uses for these open source visualization libraries. Recently, the team at OmniSci (formerly MapD) demonstrated how deck.gl’s Z-axis can be leveraged to visualize multiple layers. BEAM, a joint effort between Lawrence Berkeley National Laboratory and UC Berkeley, also made use of deck.gl to model transportation choices made in San Francisco, giving a visual representation of data to help cities plan transit options. Further on the transportation front, Lime made use of kepler.gl to show its mobility heartbeats in Santa Monica and Paris as part of its 10 million rides announcement. We, and the rest of the open source community, continue to develop and find new uses for Vis.gl, and will be giving presentations at upcoming conferences in Berlin, Munich, and New York.”
– Nicolas Garcia Belmonte, Vis.gl Project Lead
To learn more about Uber Open, visit: https://uberopen2018.splashthat.com/
For more information about Uber Open Source, visit: https://opensource.uber.com