Rebuilding our Support Tech architecture at Lyft
On the support tech team at Lyft, we’ve been rebuilding our support systems to better stand the test of time and enable our growth. We chose to “move thoughtfully” and “not be afraid to change things” instead of the popular hacker mentality of “move fast and break things.” This strategy was fueled by a deep understanding of what challenges existed in the past, what is needed now, and what is required for the future.
The result is a new generation of support infrastructure that is built to last and adapt to our business needs over several years. We focused on two areas:
- Routing, Storage, and Reporting
- Data Model Design
Routing, Storage, and Reporting
Several years ago, Lyft offered support primarily via email through a third party tool/service provider. Gradually we added other channels for support such as phone and chat, along with other 3rd party providers for sending this communication.
The system architecture was simple with one support channel and one 3rd party. However, each new type of support or provider of support drastically increased the complexity of the overall system. This was because each node in the system needed to know about the overall system.
Beyond being overly complex, the implication of this architecture was that we did not maintain an internal source of truth for support interactions. We delegated support requests to 3rd party providers, which fundamentally limited our ability to deliver a customized, high quality support experience. We were not able to have easy application layer access to a given user’s support history so we could not use that data to understand the previous experience or current status of a user.
To address this, we created a new system architecture that enables us to be more interoperable. We created a routing micro-service which acts as the entry point to all of our support infrastructure. In our new design, we maintain an internal database of support interactions which provides a complete picture of a customer’s support history, regardless of the underlying channel or provider.
This architecture makes it easier to add new types of support and new support providers in a way that does not increase of complexity of the overall system. We delegate to the 3rd parties only to send the channel communication.
As an added bonus, we introduced an abstraction layer to hide all provider-specific implementation details which enables other teams at Lyft to integrate into support without writing provider-specific code. Integrating with support can now be distilled into one simple API request: “contact this user via this channel about this issue.”
An added benefit came out of this new core support system: it allows us to report more accurately, in real time, how the support system is being used, rather than waiting to sync data from 3rd parties or trying to merge data from various 3rd party formats.
It sounds so simple and obvious in hindsight and that’s exactly why it took time to get there. One entry point to support tech that does: routing, storage, reporting, and abstracts 3rd party logic.
Data Model Design
The Data Model Problem
Most off-the-shelf support platforms are very opinionated about the way they model their data. Whether through their data model or interface, the product defines how one can structure their support system. This is a good thing for most small/medium-sized companies because their requirements are simple enough to fit out of the box solutions. At Lyft’s scale, we need much more flexibility to differentiate our support experiences for a diverse range of situations.
There are several common paradigms that exist in the support world:
Most systems use a ticket architecture. A ticket has a set of metadata to define what it is about. Then there is usually a thread of conversation and events that happen regarding that ticket. For Lyft, this would look something like a generic driver payment ticket to cover all driver pay issues.
Simplicity, which is great for when your customer contact you once about a single topic and the topics tend to be broad.
Can be overly simplistic and unnecessarily rigid for more complex situations that involve multiple issues, multiple mediums, and multiple people.
Case (a.k.a. Issue)
The case system acts an umbrella that connects multiple ticket-like interactions. You can contact the support team and they can track your progress toward resolving your issue as you move through multiple interactions. You can think about a service center needing to track the work done on your car by different mechanics. For Lyft, this would look something like having multiple support associates help you at different stages of your driver sign up process.
Able to group tickets by an overall issue, making it easier to track resolution of that issue across multiple tickets.
Could add unnecessary overhead if your support needs are fairly straightforward.
This system does not try to categorize why people are contacting support, instead it models after a conversation where you can easily flow through different topics. In practice, this is something like a chat thread you may have with a friend. For Lyft, all support would be one thread of conversation, regardless of why you are contacting support.
The most simple. Easy to understand from the customer perspective as it’s one long conversation.
A single thread can be confusing for a more complex business that needs to work with customers in multiple capacities or have multiple issues to solve simultaneously.
Each one of these paradigms has a set of pros and cons, heavily dependent on use case.
The Data Model Exploration
The Lyft support engineering team regularly visits our customer support locations to shadow associates. We get to experience first-hand how customers use Lyft and how associates use the tools we build. These visits help us build a deeper understanding of support use cases and gather details to inform our data model designs.
At a very high level, there are a few main factors that surface among the diversity of support interactions. The below exist in every support interaction, at varying degrees of intensity:
- Frequency of response
- Accuracy of the identified reason for support
- Simplicity of resolution
- Information immediately available to provide a resolution
- Number of topics present in a single support request
- Number of topic changes that happen during resolution of a support request
- Necessity of specialty skillsets that require assistance from other internal teams
By combining each of these factors, we can represent nearly every real-world support case. For example, a user contacts support presenting one issue but it is really about another issue (#2), this issue requires more information from the user (#4), and the complexity of the issue also requires assistance from other support associates (#7). The list of possible use cases is very long and not necessary to list out exhaustively.
What these seven factors do tell us is that there is huge variability to the ways people may need support, the initial reason and complexity of issues often evolve throughout the support interaction, and imposing an opinionated data model upfront restricts the ability for the system to accurately model a support experience by the time it ends. These factors point out the crux of the problem with the current solutions.
The New Data Model
So let’s throw out everything we know about support systems and get creative. Given how people actually interact with support, how might we better design the engineering architecture for these support systems?
To answer this question, we synthesized months of conversation with support associates, tool admins, and other engineering teams. The common thread was flexibility in the data model . On the most basic level, the data model needs to be able to accurately model any type of support interaction that may happen and continue to be able to accurately model the support interaction throughout the duration of the interaction.
Our solution was: a flexible data model.
This flexible data model is designed in such a way that via code, we can model any of the three paradigms listed above, as well as model any new type of paradigm we want to invent. In fact, we can run multiple paradigms at the same time, to solve for multiple types of business needs throughout the company. We can even transition between various paradigms. For example, we can transition between a ticket paradigm to a case paradigm. And we can do this all with the same underlying datastore so we have a single source of truth.
Effectively, we built a system that can model any support software that exists today and any model we may invent down the road. The opinionated ideology of this new system is that it needs to be able to support multiple support data model paradigms. And it’s the ability to be able to easily move between paradigms and invent our own that makes this model so interesting.
For the technical underpinnings of the flexible data model, we chose a Relational Database Management System (RDBMS) for three reasons:
- Consistency of data to be represented.
- Ease of being able to query across individual or multiple data fields.
- Flexibility in defining the relations between database tables.
Our implementation includes both many-to-many relationships (via an id mapping reference table) and one-to-many relationships.
There are three main concepts that define our database design and relations: case, interaction, and step.
- A case represents an overall topic. It has a many-to-many relationship to interactions.
- An interaction represents an instance of two parties being connected to each other. It has a one-to-many relationship to steps.
- A step represents an activity stream within a given interaction.
The below diagram is an illustrative example of how these concepts can relate to each other.
Given these three main concepts and the ways relations can be defined between them, let’s apply them to the three main paradigms discussed in the The Data Model Problem section.
A ticket is a case and interaction combo, with the activity as steps. It is a one-to-one relationship between the case (topic) and the interaction (connecting two parties). The conversation about a ticket happens as step entries.
This is most similar to the implementation we used. It is a one-to-many relationship between the case (topic) and the interaction (connecting two parties). The conversation about a case happens as step entries.
This showcases the flexibility of the database design. It would use a one-to-many relationship between the case (topic) and the interaction (connecting two parties). The case would be thought of as the entire lifespan of the user. Each interaction would be thought of as the given medium (e.g. phone, chat, etc.) of communication. The conversation through a given medium happens as step entries. You can retrieve the entire step history for a given user across the interactions and display that as the thread, adaptive to whatever medium you may be using.
These are just the three main paradigms. As we can see, the way the relations are defined between the different parts of the database model can be adjusted to represent the different main paradigms mentioned above.
At Lyft, we have invented our own modification of the case paradigm that will fully utilize the many-to-many relationship available between cases and interactions. This will enable us to solve for all of the different types of business cases we uncovered during our research. It fits even our most complex situations that involve multiple issues, multiple support associates, multiple mediums, multiple time periods, and multiple types of resolutions.
With this ability to model different paradigms, we will be able to tailor support for the different use cases within our business. We can truly create the paradigms that fit best for the experience we want to provide. This is not something any company that offers support services has historically been able to do.
By gathering as much context as possible about the support infrastructure problem space, taking into account information within Lyft and external to Lyft, we were able to design a solution that solved our current needs while planning for the future. We will continue to learn and adapt this model. By being able to point to the same data source throughout the evolution of our support product offerings, we will be able to more rapidly develop, experiment, and not be tied to any particular paradigm that exists in the industry for modeling support data.
At the end of the day, we want to provide the best support experience possible— one that makes you excited to stick with Lyft. Having technology that’s flexible enough to capture multiple support paradigms helps us to provide an ideal experience in every situation.