This article is the first in a series covering how Uber’s mobile engineering team developed the newest version of our driver app, codenamed Carbon, a core component of our ridesharing business. Among other new features, the app lets our population of over three million driver-partners find fares, get directions, and track their earnings. We began designing the new app in conjunction with feedback from our driver-partners in 2017, and began rolling it out for production in September 2018.
In early 2017, Uber made the decision to rewrite our driver app. This is the sort of decision that Joel Spolsky, the CEO of StackOverflow, once called “the single worst strategic mistake that any software company can make.”
Rewrites are incredibly risky, resource-intensive, and take a long time to deliver a tangible benefit for users. For this particular rewrite, hundreds of engineers contributed in some capacity, not to mention designers, product managers, data scientists, operations, legal, and marketing. In practice, our rewrite took a year and a half to implement and roll out globally.
Our case is an extreme example of a question that engineers in all organizations face. If you are an engineer working for a start-up and are considering rewriting some code or a feature, you might ask, “How much of our runway are we burning?” If you are working on a small team in a large organization, you might ask, “Are these changes worth the features we are not building?” A good engineer and a good team will look at these broader questions before they take on the challenge of a rewrite.
So, while the rewrite process involved a number of important technical decisions (to be covered in future articles), the decision to rewrite involved a combination of both technical considerations and broader business concerns. While these questions are hard to answer, good answers to the above questions will help you justify a rewrite to your organization or team.
Ultimately, these decisions do not get made in a vacuum. We did not make the decision to rewrite the app as a result of theoretical architectural thinking (“our code might be better, if only we…”), but rather as a result of an intensive, three-month research process that involved hundreds of pages of documentation and broad, cross-organizational buy-in. In the following sections, we discuss our decision to rewrite the Uber driver app and what we discovered as a result of this process.
Setting the stage
The need for a rewrite does not always naturally follow from the simple recognition of the need for a new architecture. Rewrites are expensive, and while engineering organizations often want to rewrite code, there are other demands on engineers’ time that do not involve rewriting the same features over and over again with shinier architectural girdings. For the driver app, there were three trends that helped push the decision for a rewrite:
To start, there was real technical debt in the driver app itself. This debt was a result of the rapid pace of Uber’s growth, as well as changing product requirements (discussed in the next section). Beyond this, tech debt arose from the desire to fix previous tech debt: the application itself was mired in multiple ongoing migrations that made features look increasingly complicated.
It’s also worth pointing out that the tech debt that existed in the driver app at this time wasn’t theoretical. We saw real business impact as a result of ongoing outages and maintenance costs in terms of developer productivity hits. At the end of 2016, we had to pause development on the app in order to fix multiple feature regressions. Until we addressed these, it became difficult to both implement and launch new features.
Any outages for our driver app are a huge problem, as users rely on this app to make a living. In our world, anything less than 99.99 percent uptime isn’t acceptable, but we were regularly shipping builds that experienced major regressions across the app’s core flows
One of the biggest issues we faced with the previous version of the driver app was that the product was not scaling well to new business use cases. While the earliest iteration of Uber’s driver app was designed for simple UberX trips, our services had grown to include Uber Pool, Uber Eats, and also market-specific experiences such as cash-paid trips, among others.
Beyond taking trips, we found that drivers needed additional features to manage their own finances and personal business concerns. For example, earnings and ratings transparency is critical to the driver experience, and something that was underinvested in the earliest iterations of Uber’s driver app. As we scaled the product experience, we needed to provide space for these sorts of features.
We took some initial steps to mitigate these concerns in 2015 and 2016, shipping an iteration of the application. Unfortunately, we had boxed off pieces of the UI for different teams to build as opposed to designing around driver needs and work flows. If you looked at our UI around this time, you would see a tab for earnings, a tab for ratings, a tab for settings, and a home tab for every other feature. The every-other-feature bucket became larger, and the earnings and ratings tabs were often repurposed for features for which they had not originally been designed.
The lessons we learned from this iteration of the app, along with our long-term product vision, had actually already driven us to completely rethink how the driver application should look for our driver-partners. Even if a rewrite wasn’t inevitable, a redesign was.
Our engineering team had previously made investments in a new direction. In particular, with the rewrite of the rider app in 2016 we had introduced a new mobile architecture, which we called RIBs (a variation of VIPER), to help us handle our ever-growing scale. It provided solutions to most of the problems that we recognized in the driver app: a framework for scalable extension points, a cogent application structure, and an eloquent memory management model. We released the RIBs architecture to the open source community in 2017.
While the RIBs architecture certainly improved our rider app, it also represented a new engineering direction for our mobile organization. Future investments by our core platform group would primarily involve improvements to RIBs. Supporting multiple apps with different architectures would be more expensive than standardizing on RIBs.
Making our decision
Given the context of a UI redesign and a new architecture, we had essentially three different options on how to proceed: redesign the driver app without RIBs; migrate the existing driver app to the RIBs architecture; or do a full rewrite of the app based on RIBs.
No RIBs architecture
The first approach we looked at was a redesign without RIBs. The reason we considered this first was that we knew migrating to RIBs would be resource-intensive. RIBs came with a number of new libraries, but also a new approach to building apps with a hierarchical scope structure that decouples business logic from presentation logic. The RIBs architecture offers an eloquent, but highly opinionated, memory management system.
First, we considered whether or not the existing application could handle the major product changes that we were considering. What we discovered is that, because our application supported a slight variation on view controller containment, much of our business logic was tightly coupled to View presentation. This meant that a UI redesign would inevitably involve a lot of changes to business logic anyway.
Second, as previously discussed, the existing driver app architecture had issues that needed to be resolved. These issues partially related to the very logic of the app, which had in some places (particularly on Android) devolved into a pattern all too common to mobile developers: a different version of MVC, a Massive View Controller, where most of our core code was contained in a multi-thousand line controller file. Consequently, we weren’t willing to go to bat for the existing mobile architecture, which was becoming increasingly convoluted and difficult to develop on.
Finally, even if the architecture of the old driver app we had was perfect, it might still make sense from a strategic perspective to adopt RIBs so as to avoid a situation in which we had split app architectures at Uber. With a single cogent architecture, our platform level investments became twice as valuable, and code written in one part of the organization (e.g., rider) could potentially be reused in another part (e.g., driver).
If we were going to adopt RIBs anyway, how were we going to do it?
Many organizations favor carefully paced migrations, allowing them to continue feature development while the underlying architecture of a system changes. While perfectly valid in most cases, in the past we had discovered some issues with this approach at Uber.
First, we analyzed 10 major mobile migrations we had attempted at Uber over the past several years and discovered that they had a high incompletion rate. Which is to say, we would begin to migrate a given underlying library, but then fail to do so entirely. New features were built with the new library, and some old features were migrated, but we had legacy code that still executed in the codebase.
After further investigation, we discovered that the root cause of much of our tech debt within the driver app was the result of such migrations. For example, we had race conditions because our application pub/sub model was bifurcated on Android. Our core application structure, which started by leveraging fragments on Android, was partially migrated to an in-house framework. This sort of incomplete migration led to adapter layers and general developer confusion. These incomplete architectural thrusts would ultimately lead to outages which directly impacted our users.
Second, we often found that migrations create a large amount of instability while they were happening. We had numerous outages caused by migrations that were intended to improve underlying application frameworks, such as our networking protocol. These should technically have no immediate, tangible impact on our users, but ended up breaking core features of the app.
Finally, even the promise of continued feature development was, in our experience, often unfulfilled. If a team had a dependency on an ongoing migration, then they were often blocked until that migration was complete. It also led to a world where rolling back a migration would often mean that we also had to roll back numerous features.
Consequently, as we assessed whether or not to move forward with a complete product redesign and the adoption of the RIBs architecture, the risk of an incomplete migration or endless adapter layers and facades that greatly increase application instability was too high.
To some degree, we made this decision by negation (the other options, no RIBs architecture and a migration, weren’t tenable), but there were positive benefits to a rewrite that heightened our confidence in the final decision.
First, a rewrite would unlock our capacity to work on a redesign of the app without being limited by predefined understandings of how it was already built. This meant that its design was open to a wider range of possible flows.
Second, choosing to rewrite the app meant that our architecture would be a lot cleaner, as it would derive from a cogent strategic direction from the get-go. If we had chosen a migration, we would likely be stuck with legacy code that we reused from expediency or convenience.
Third, rewriting the app let us go back to the drawing board and think more fully about what direction we wanted the product to move in. As a result, certain major frameworks in the app ended up being rewritten.
For an engineer, a rewrite is an opportunity to do some amazing work, and we were excited to get started.
It’s worth highlighting that our decision to rewrite our driver app was not predicated on the speculative idea that “it would be better if we could do it all over again.” In fact, some engineers might be surprised to hear that, even after a rewrite, we shipped an app with not only new features and a new architecture, but even a little bit of new technical debt.
Which is to say, you never get these things perfect. An engineer who reads this article should be hesitant to come to the conclusion “migrations never work, rewrites lead to perfect code.” Instead, it’s important to recognize that the decision to rewrite comes within the context of very tangible organizational, business, and technical needs.
If we had not created a brand new mobile architecture in the months prior, we may not have rewritten the app. If we did not have a product team willing to research the decision, we may not have rewritten. If migrations at Uber tended to be more successful in the past, we may not have rewritten. Certainly, what drove us to rewrite was not that rewrites were inherently good or even generally a good idea.
Instead, the rewrite of the driver application came within the context of the desire to build a more reliable and stronger product experience for our users, and, at the same time, amplify our organization’s ability to execute on this vision. This decision-making process is perhaps less exciting than simply the desire to invent the next best abstraction layer, but it is also the decision calculus that drove the creation of a successful and entirely improved mobile application.
Interested in developing the next generation of mobile apps? Consider joining our team!
Subscribe to our newsletter to keep up with the latest innovations from Uber Engineering.