How Lyft Creates Hyper-Accurate Maps from Open-Source Maps and Real-Time Data

Albert Yuen, James Murphy, Sumanth Ravipathi, Deeksha Goyal, Han Kim, Adithya Hemakumar, Alex Ung, Clare Corthell

At Lyft, our novel driver localization algorithm detects map errors to create a hyper-accurate map from OpenStreetMap (OSM) and real-time data. We have fixed thousands of map errors in OSM in bustling urban areas. Later in the post, we share a sample of the detected map errors in Minneapolis with the OSM Community to improve the quality of the map.

Lyft’s mission to build the world’s best transportation relies on its inherent geospatial capabilities. For example, driver and passenger geolocations must be precisely known in order to efficiently pair drivers and passengers. We also need precise knowledge of the road network to compute efficient routes and accurate estimated time of arrival from current driver position to pick-up point, and from pick-up point to drop-off point. Moreover, meticulous understanding of the road network is crucial to correctly compute the distance travelled by the drivers.

What is the role of the mapping team?

These technical challenges require a team with a strong geospatial expertise. Lyft’s mapping team provides a rich, fresh, and accurate model of the physical world, and how our users move around within it. We enable:

  • Generating optimal and infer probable routes of drivers to passengers
  • Making accurate time and distance prediction
  • Localizing drivers, passengers and vehicles
  • Building a knowledge base of physical places
  • Inferring map features

Why is having an accurate basemap important?

Our internal map of the road network is based on OSM, which has been built and improved over the years by the open source community. More recently, larger organizations (such as Apple, Microsoft, Mapbox, Mapillary, Facebook, Uber, Grab, Lyft, etc.)¹ have also worked to improve the map. Akin to Wikipedia as an open-source encyclopedia, OSM as an open-source map may contain missing or erroneous data for several possible reasons. Old roads may have never been mapped, new roads may not have been mapped yet, previously closed roads may be reopened, roads may be digitally vandalized, buildings may be non-existent, turn restrictions may be erroneous, road directions may be incorrectly labeled, and so forth. As OSM is a source for our basemap, we need to monitor its quality and accuracy. Upon detecting map errors in OSM, we work with our Data Curation Team to fix them in OSM. This can be done using our proprietary data.

Before discussing map error detection, it is necessary to have an understanding of what map-matching is. At Lyft, we geo-localize drivers from the sensors embedded in their smartphones. This includes a GPS receiver that receives sparse (due to battery constraints) and often noisy (due to urban canyons) locations.

If we do not have any understanding of the road network, we can only employ a space-free tracking algorithm such as a Kalman Filter, as shown in Fig. 1. Drivers would therefore not be localized on the road network.

Fig. 1 — The road network is represented as black lines. The blue dots are the sequence of GPS locations emitted by the driver’s smartphone. The path computed by the Kalman ilter, in green, does not leverage our knowledge of the road network.

However, OSM provides us knowledge of the road network. Taking both a sequence of sparse and noisy GPS traces and a map of the road network as input, map matching algorithms can infer the most accurate trajectory on the road network, as shown in Fig. 2. An example of a map-matching algorithm is the one based on Hidden Markov Models (HMM) developed by Newson and Krumm². The quality of the map is essential for accurate map-matching.

Fig. 2 — The road network is represented as black lines. The blue dots are the sequence of GPS locations emitted by the driver’s smartphone. A traditional map matching algorithm [2] in red, leverages our knowledge of the road network and accurately compute the trajectory of the driver.

Not all map features in OSM are useful for Lyft. While hiking trails in OSM delight my outdoorsy self, this is not a crucial map feature for the Lyft app. The most important and fundamental map features for Lyft are those that characterize the building blocks of the road network: the existence of road segments, the directions of road segments, and the existence of turn restrictions.

At Lyft, we distinguish two types of road network errors:

  • The road network errors that trigger map-matching issues and routing issues, denoted Type I map errors
  • The road network errors that mostly trigger routing issues, denoted Type II map errors

Type I Map Error: Map-Matching-based

Because Type I map errors are the only ones that trigger map-matching issues, we can detect them by finding where driver localization is failing on the road network. Figure 3 shows a case where map-matching fails to reconstruct the correct trajectory of the driver.

Fig. 3 — The road network is represented as black lines. A missing road is in the center of schematic. The blue dots are the sequence of GPS locations emitted by the driver’s smartphone. A traditional map matching algorithm, in red, fails to detect the missing road, and inaccurately computes the trajectory of the driver on the imperfect road network.

Type II Map Error: Routing-based

Because Type II map errors are the only ones that triggers routing issues without triggering localization issues, we can detect them by finding where on the road network routing is failing while localization is not failing.

In the following two examples, the dotted line is a road segment that does not exist in the physical world, and yet, this road is mapped in OSM.

Assuming the GPS locations are not too sparse and not too noisy, the extra road segment in OSM does not cause map-matching failures, as displayed in Fig. 4.

Fig. 4 — In this example, the extra road in dotted line does not cause map-matching errors.

However, the extra road segment causes shortest/fastest path calculations to be wrong, resulting to routing issues, as displayed in Fig. 5.

Fig. 5 — In this example, the extra road in dotted line in OSM that does not exist in the physical world causes routing to be wrong as the red trajectory is not possible in the physical world. The green star is the origin. The red star is the destination.

Type I and II map errors for road existence, road direction, turn restrictions

The map errors related to the existence of the road segments, the direction of road segments, and the existence of turn restrictions can be categorized using this framework, as displayed in the following Tables 1, 2 and 3 for road existence, road direction and turn restriction.

Road existence:

Table 1 — Type I and Type II map errors for road existence.

Road directions:

Table 2 — Type I and Type II map errors for road directions.

Turn restrictions:

Table 3 — Type I and Type II map errors for turn restrictions.

When the map is wrong, Kalman Filters perform better than traditional HMM map-matching algorithm. However, the map is often right. We developed an algorithm based on a semi-interacting multiple model (sIMM) in which an off-road Kalman Filter is run in parallel to an on-road HMM map-matching algorithm. When the map is correct, our algorithm uses the output of the HMM, while the Kalman Filter is preferred when the map is wrong, as explained by our paper here³. Note that a particle filter could easily substitute the HMM in this framework.

At Lyft, the output of the Kalman Filter — the off-road locations — are used to detect Type I map errors, which encompasses missing roads, roads in OSM that are set to the wrong one-way direction, and turn restrictions that should not have been mapped in OSM. (In some sense, those are the features that over-constrains our routing graph).

Figure 6 shows an example of how our sIMM filter employing an off-road Kalman Filter and an on-road HMM operates. There is a missing road at the middle, and a driver travels through that missing road, as shown by the GPS traces.

The correct section of the map (represented by the red path on the left) is used by the HMM to compute the initial trajectory of the driver. In the portion of the map where the road is missing (represented by the green path), our system detects that the map is wrong and switches to the off-road Kalman Filter mode. Eventually, as the map is correct again, the algorithm switches back to the on-road HMM mode, as shown by the red line on the right.

Fig. 6 — The road network is represented as black lines. A missing road is in the center of schematic. The blue dots are the sequence of GPS locations emitted by the driver’s smartphone. The sIMM filter captures when the map is correct and when the map is wrong, and uses the on-road (in red) and off-road (in green) modes accordingly.

The output of the sIMM filter when the map is wrong can be used to detect Type I map errors. By leveraging weeks of anonymized Lyft’s driver locations when they are connected to the Lyft app, and making a large-scale plot of off-road trajectories, we can highlight areas where we observe Type I map errors. At Lyft, we found that most of the Type I map errors are due to missing roads (although we have observed a few incorrectly labelled road directions and turn restrictions that trigger Type I map errors). We used the python package datashader and our internal visualization tool to render the off-road trajectories. We made raster tiles of those off-road trajectories in order to speed up the loading of the detected map errors from the off-road trajectories. Here, raster tiles are more appropriate than vector tiles. Loading vector tiles would be similar to recomputing each off-road trajectory poly-line for display as we scroll around the map while loading raster tiles does not require recomputing, at the expense of less interactivity — it is difficult to add metadata on raster tiles.

Figure 7 showcases off-road locations in yellow-red around the MSP airport in Minneapolis. An analysis of an (outdated) satellite imagery shows that roads were probably recently built and that are now navigable by cars. OSM is not up to date yet as those roads are not mapped yet.

Fig. 7 -The off-road locations in yellow-red indicate missing roads at the MSP Airport in Minneapolis. This is probably due to recent construction. The satellite imagery is outdated.

Figure 8 showcases off-road locations in a parking lot in Minneapolis. The parking lot building is mapped in OSM. However, because the aisles of the parking lot are not mapped, routing within the parking lot is not possible, which triggered the off-road locations.

Fig. 8 — The off-road locations in yellow-red indicate missing roads in a parking lot in Minneapolis.

Figure 9 showcases off-road locations near the Northtown Mall in Minneapolis. Accurately mapped roads at venues such as malls are crucial to Lyft as their absence prevents smooth pick-up experiences.

Fig. 9 — The off-road locations in yellow-red indicate missing roads in at the Northtown Mall in Minneapolis.

We have nevertheless observed that our Type I map error detector does not perform well on wide roads, when the GPS accuracy is poor, or when drivers do not observe the road network. This is because this would activate the off-road mode even though the map is correct.

Wide roads are mapped as a single line with no thickness in OSM, even though there is a road width tag, which is often unfilled. This is the main reason why our sIMM algorithm does not perform well in the case of wide roads.

GPS accuracy is particularly bad in urban canyons when high density of tall buildings corrupt GPS readings due to multi-path or occlusion.

Even when the road network is correct, if, for example, a driver decides to ignore a turn restriction, the algorithm will generate off-road locations. Those off-road locations unfortunately falsely indicate that the map is wrong.

Using our map error detector, we have fixed and contributed thousands of critical Type I map errors in OpenStreetMap, hoping that it will be beneficial for the OSM community. Furthermore, in order to get more feedback from the community and see if a larger release would be useful, we are releasing a sample of our detected Type I map errors in Minneapolis in MapRoulette (see Fig. 10). Check them out here!

At Lyft, mapmaking and map quality assessment is central to our business. Our burgeoning map-making team is tacking the routing-based Type II map errors as well as other types of map inferences, such as traffic control elements (check out Deeksha Goyal’s work during her internship at Lyft⁴). Stay tuned for more exciting mapping blog posts!

If you enjoyed this post, follow and recommend! To learn more, check out our other Science posts.

As always, Lyft is hiring! If you’re passionate about developing state-of-the-art quantitative models or building the infrastructure that powers them, read more about our Science and Engineering roles and reach out to us.

We would like to thank the entire mapping organization for their contribution/support to this work, Alex Kazakova and Spencer Jaquish for leading the Data Curation Team, as well as the copy editors Ryan Lane, Julien Silland, Andrew Hao and Mark Grover for their feedback. Many thanks!


[1] Corporate Editors in the Evolving Landscape of OpenStreetMap, Jennings Anderson et al., ISPRS Int. J. Geo.-Inf., 2019. link

[2] Hidden Markov Map Matching Through Noise and Sparseness, Paul Newson and John Krumm, In Proceedings of the 17th ACM SIGSPATIAL, 2009. link

[3] Map matching when the map is wrong: Efficient on/off road vehicle tracking and map learning, James Murphy, Yuanyuan Pao, Albert Yuen, arXiv, 2018. link

[4] Traffic Control Elements Inference using Telemetry Data and Convolutional Neural Networks, Deeksha Goyal, Albert Yuen, Han Suk Kim, James Murphy, KDD UrbComp Workshop 2019, 2019. link

Source link