By Marie Douriez, James Murphy, Kerrick Staley
When you request a ride, Lyft tries to match you with the driver most suited for your route. To make a dispatch decision, we first need to ask: where are the drivers? Lyft uses GPS data from the drivers’ phones to answer this question. However, the GPS data that we get is often noisy and does not match the road.
To get a clearer picture from this raw data, we run it through an algorithm that takes raw locations and returns more accurate locations that are on the road network. This process is called map-matching. We recently developed and launched a completely new map-matching algorithm and found that it improved driver location accuracy and made Lyft’s marketplace more efficient. In this post, we’ll discuss the details of this new model.
Map-matching is a process that maps raw GPS locations to road segments on a road network in order to create an estimate of the route taken by a vehicle.
Why do we need map-matching?
At Lyft, we have two main use cases for map-matching:
- At the end of a ride, to compute the distance travelled by a driver to calculate the fare.
- In real-time, to provide accurate locations to the ETA team and make dispatch decisions as well as to display the drivers’ cars on the rider app.
These two use cases differ by their constraints: in real-time, we need to be able to perform map-matching quickly (low latency) and with the locations available up to the current time only. At the end of a ride however, the latency requirements are less stringent and the whole history of the ride is available to us (allowing us to work “backwards” from future locations). As a result, End-Of-Ride Map-Matching (EORMM) and Real-Time Map-Matching (RTMM) are solved using slightly different approaches. In this post, we will focus on algorithms used for Real-Time Map-Matching.
Why is map-matching challenging?
A bad map-matched location leads to inaccurate ETAs, then to bad dispatch decisions and upset drivers and riders. Map-matching thus directly impacts Lyft’s marketplace and has important consequences on our users’ experience. There are several main challenges to map-matching.
First, as Yunjie noted in his blog post, the location data collected from the phones can get very noisy in urban canyons (where streets are surrounded by tall buildings), around stacked roads, or under tunnels. Those areas are particularly challenging for map-matching algorithms and are all the more important to do correctly since many Lyft rides happen there.
Beyond noise and road geometry, another challenge is the lack of ground truth: we don’t actually know the true locations of the drivers when they are driving and we have to find proxies to evaluate the accuracy of our models.
Finally, the performance of the map-matching algorithms relies on the quality of the road network data. This problem is being solved by another team within Mapping at Lyft; see Albert’s blog post to learn how we make our internal map more accurate.
We won’t go into all of the techniques for solving map-matching, but for a review of common approaches, please refer to this study by Quddus et al. .
A nice way to frame the problem is to use state space models. State space models are time series models where the system has “hidden” states that cannot be directly observed but give rise to visible observations. Here, our hidden states are the actual positions of the car on the road network that we are trying to estimate. We only observe a modified version of the hidden states: the observations (raw location data). We assume that the state of the system evolves in a way that only depends on the current state (Markov assumption) and further define a hidden-state-to-hidden-state transition density and a hidden-state-to-observation density.
A commonly used state space model for map-matching is the discrete-state Hidden Markov Model (Newson & Krumm , DiDi’s IJCAI-19 Tutorial , Map Matching @ Uber ). In this system, we generate candidates by looking at the closest points on the road segment and use the Viterbi algorithm to find the most likely sequence of hidden states.
However, the Hidden Markov Model (HMM) has several limitations:
- It is relatively inflexible to different modeling choices and input data
- It scales badly (O(N²), where N is the number of possible candidates at each state)
- It can’t cope well with high(ish) frequency observations (see Newson & Krumm )
For these reasons, we developed a new real-time map-matching algorithm that is more accurate and more flexible to incorporate additional sensor data.
Kalman filter basics
Unlike the discrete-state HMM, the Kalman filter allows for the hidden state to be a continuous distribution. At its core, the Kalman filter is simply a linear Gaussian model and models the system using the following equations:
Using these equations, the Kalman filter iteratively updates its representation of the system’s state using a closed form predict-correct step to go from step t-1 posterior to step t posterior.
One limitation of the Kalman filter, however, is that it can only handle linear problems. To deal with non-linearities, generalizations of the Kalman filter have been developed such as the Extended Kalman Filter (EKF) and the Unscented Kalman Filter (UKF) . As we’ll see in the next section, our new RTMM algorithm uses the UKF technique. For the rest of the post, the technical differences between the Kalman filter and the UKF don’t really matter: we can simply assume that the UKF works like a standard linear Kalman filter.
The Marginalized Particle Filter
Let’s now describe how our new RTMM algorithm works. We will refer to it as a Marginalized Particle Filter (MPF).
At a high level, our MPF algorithm keeps track of multiple “particles”, each representing a position on a trajectory on the road network, and runs an unscented Kalman filter conditioned on each of these trajectories. To be more precise, let us define the following objects:
An MPF state is a list of particles.
A particle represents one possible road position of the car on the map, associated with some probability. Each particle has 4 attributes:
- A probability p ∈ [0,1]
- A trajectory (i.e. a list of intersections from the map)
- A mean vector x = [d v]’ where d is the position of the car on the trajectory (in meters) and v is the car’s velocity (in meters/second)
- A 2×2 covariance matrix P representing the uncertainty around the car’s position and velocity
We update the MPF state each time we receive a new observation from the driver’s phone in the following way:
If the previous MPF state has no particle (for example, if the driver just logged in to the app), we need to initialize a new one. The initialization step simply “snaps” the GPS observation onto the map and returns the closest road positions. Each particle’s probability is computed as a function of its distance to the observation.
At the next update (new observation), we iterate through our state’s (non-empty) list of particles and perform two steps for each particle. First, the trajectory extension step finds all the possible trajectories from the particle’s current position on the road network. Second, we loop through these new trajectories and on each of these, we run our UKF and update the newly created particle’s probability. After these two nested loops, we end up with a new list of particles. To avoid keeping track of too many of them in our MPF state, we simply discard the most unlikely ones (pruning).
The downstream teams can then decide to use the most probable particle from the MPF state as the map-matched location or can directly exploit our probabilistic output (e.g. to create a distribution of possible ETAs).
To recap, the Marginalized Particle Filter maintains a set of particles representing possible positions of the car on trajectories and each particle is updated using the Kalman filter algorithm. The new algorithm provides not only location but also speed estimates and uncertainty. In practice, we have observed that it yields more accurate map-matched locations than the HMM, in particular in downtown areas and around intersections where errors would lead to very inaccurate ETAs.
After experimenting with this new real-time map-matching algorithm, we found positive effects on Lyft’s marketplace. The new model reduced ETA errors, meaning that we could more accurately match passengers to the most suited driver. It also reduced passenger cancels, showing that it increased passengers’ confidence that their drivers would arrive on-time for pickup.
We’re not done yet, though: there are many ways that we plan to improve this model in the coming months. We’re going to incorporate data from other phone sensors, such as the gyroscope, which will allow us to detect when drivers turn. We also plan to take into account the driver’s destination (if they have one, e.g. when en route to a pickup) as a prior. Indeed, another strength of the Marginalized Particle Filter is that it allows us to easily add these new types of information to the model in a principled way, and it is a good foundation on which we can continue to make the Lyft experience a little more seamless for our passengers and drivers.