Location and navigation using global positioning systems (GPS) is deeply embedded in our daily lives, and is particularly crucial to Uber’s services. To orchestrate quick, efficient pickups, our GPS technologies need to know the locations of matched riders and drivers, as well as provide navigation guidance from a driver’s current location to where the rider needs to be picked up, and then, to the rider’s chosen destination. For this process to work seamlessly, the location estimates for riders and drivers need to be as precise as possible.
Since the (literal!) launch of GPS in 1973, we have advanced our understanding of the world, experienced exponential growth in the computational power available to us, and developed powerful algorithms to model uncertainty from fields like robotics. While our lives have become increasingly dependent on GPS, the fundamentals of how GPS works have not changed that much, which leads to significant performance limitations. In our opinion, it is time to rethink some of the starting assumptions that were true in 1973 regarding where and how we use GPS, as well as the computational power and additional information we can bring to bear to improve it.
While GPS works well under clear skies, its location estimates can be wildly inaccurate (with a margin of error of 50 meters or more) when we need it the most: in densely populated and highly built-up urban areas, where many of our users are located. To overcome this challenge, we developed a software upgrade to GPS for Android which substantially improves location accuracy in urban environments via a client-server architecture that utilizes 3D maps and performs sophisticated probabilistic computations on GPS data available through Android’s GNSS APIs.
In this article, we discuss why GPS can perform poorly in urban environments and outline how we fix it using advanced signal processing algorithms deployed at scale on our server infrastructure.
A bit of background on GPS/GNSS
Before discussing our approach in detail, let us do a quick recap of how GPS works in order to understand why it can be inaccurate in high-rise urban environments.
GPS is a network of more than 30 satellites operated by the U.S. government, orbiting the earth at an altitude of about 20,000 kilometers. (Most cell phones these days can pick up similar Russian “GLONASS” satellites too.) These satellites send out radio frequency signals that GPS receivers, such as those found in cell phones, can lock onto. Importantly, these satellites advertise the time at which they launch their signals.
For each satellite whose signal the receiver processes, the difference between reception time and launch time (time-of-flight), multiplied by the speed of light, is called the pseudorange. If the satellite and receiver’s clocks are synchronized, and the signal travels along the straight line-of-sight path, then this would equal the actual distance to the satellite. However, the clocks are not synchronized, so the receiver needs to solve for four unknowns, its own 3D coordinates on the globe, and its clock bias. Thus, we need a minimum of four satellites (four equations) to solve for these four unknowns.
If we ignore clock bias, we can intuitively interpret the location estimate performed by the GPS receiver by intersecting spheres centered at the satellites with the radius of each sphere given by the pseudorange. In practice, a GPS receiver processes signals from a significantly larger number of satellites (up to 20 GPS and GLONASS satellites are visible in an open field), and having more than the minimum number of equations provides extra robustness to noise, blockages, etc. In addition to GPS and GLONASS, some new/future receivers can/will process signals from other satellite systems. Other navigation satellite systems coming online are Galileo, operated by the European Union, IRNSS in India, and BeiDou, operated by China. The more general term GNSS (global navigation satellite systems) encompasses these systems. (We will use this term in the remainder of the article.)
Why GNSS location is inaccurate in urban environments
A major assumption behind GNSS-based positioning is that the receiver has a direct line-of-sight to each satellite whose pseudorange it is computing. This works seamlessly in open terrain but really breaks down in urban environments, as shown in Figure 3, below:
Buildings often block the lines of sight to satellites, so the receiver frequently processes signals corresponding to strong reflections off of other buildings. The significant inaccuracy (positive offsets) in pseudoranges resulting from this phenomenon can lead to errors in position estimates that can be 50 meters or more in urban canyons. Most of us who have wandered, driven around, or requested an Uber in big cities have experienced these problems first hand.
Satellite signal strengths to the rescue
Our approach to improving location accuracy makes a feature out of the very blockage of GNSS signals that causes trouble for standard receivers. How? For Android phones, the LocationManager API provides not just the phone’s position estimate, but also the signal-to-noise ratio (SNR) for each GNSS satellite in view. If we put this “signal strength” information together with 3D maps, then we can obtain very valuable location information. Figure 4, below, shows a simplified version of how satellite SNRs and 3D maps can be used to infer which side of the street we are on:
Zooming into the details, our approach relies on putting the following intuition in a mathematical framework: if the SNR for a satellite is low, then the line-of-sight path is probably blocked or shadowed; if the SNR is high, then the LOS is probably clear. The qualifier “probably” is crucial here: even when the receiver is in a shadowed area, strong reflected signals can still reach it, and even if it is in a clear area, the received signal can be weak (because of destructive interference between LOS and reflected paths, a phenomenon referred to as multipath fading). Also, in general, the 3D map is not entirely accurate, and certainly does not capture random blockages by large moving objects not in the map, like trucks. This adds additional uncertainty to the process.
Probabilistic shadow matching using ray tracing
While the intuition on satellite signal strengths carrying useful location information is sound, it must be fleshed out within a probabilistic framework. For any possible location for the receiver, we can check whether the ray joining the location to the satellite is blocked using our 3D map. Now, using a model for the probability distribution of the SNR under LOS and shadowed conditions, we determine the likelihood of the SNR measured for that satellite. For example, if the location is shadowed, then the likelihood of a high SNR is low. The overall likelihood of a given location, based on the satellite SNRs, is the product of the likelihoods corresponding to the different satellites. By doing this over a grid of possible locations, we obtain a likelihood surface—or heat map—of possible receiver locations, based on satellite signal strengths alone. We call this procedure probabilistic shadow matching.
The likelihood surface, or heat map, from probabilistic shadow matching summarizes the information from satellite SNR measurements. However, as we see from Figure 6 below, this heat map can be pretty complicated. It can have many distinct, widely separated hotpots (local maxima) often corresponding to a given side of the street, but sometimes still in the wrong location (i.e., phantoms). In order to narrow down our location estimate and to avoid locking onto the phantoms, we must now fuse this information with even more information.
Information fusion via particle filter
For Android phones, the information we use in addition to satellite signal strengths is usually just the standard GNSS position fix, but can also be Android Fused locations, which may include WiFi-based positioning. Since this location can be very inaccurate, single time instant (one-shot) fusion of GNSS fix with shadow matching likelihoods typically leads to poor performance. In order to take advantage of the information from satellite signal strengths, we trust GPS less in built-up areas (the gray GPS uncertainty ellipse in Figure 6 is a typical model that we use, while the black uncertainty ellipse for improved GPS is an output of our algorithm). We therefore use past measurements and constrain the location evolution over time using a motion model adapted to the application (e.g., pedestrian vs. vehicular motion). This is accomplished by using a particle filter, which approximates the probability distribution of the receiver’s location at any given time by a set of weighted particles. In other words, we estimate where the phone is using thousands of hypothesized locations (i.e., particles).
Over time, the probability weights and particle locations evolve based on the measurements and the motion model. Since the heat map from probabilistic shadow matching has so many local maxima and because the GNSS fix can have such large outliers, we cannot use standard techniques such as the Kalman filter or the extended Kalman filter, which rely on the tracked probability distribution being well approximated by a bell-shaped Gaussian distribution. The particle filter allows us to approximate arbitrary distributions, at the expense of higher complexity, and this is where our server infrastructure comes in.
From signal processing to software at scale
The combination of particle filtering and ray tracing introduces complexity to the back-end server ecosystem, making for a very stateful service.
There are two kinds of state at play: per-user particle filter state and per-region 3D maps used for ray tracing. The use of particle filters necessitates a level of server affinity. Each new request to our service must be routed to the same back-end server for processing in order to update the correct particle filter. Additionally, due to the large size of 3D maps, each back-end server can only hold a few small sections of the 3D world in RAM.
Since each server can only hold a few square kilometers of map data, not all servers are capable of serving all users. Essentially, implementing the back-end systems for our solution necessitated the creation of a sticky session routing layer that takes server 3D map state into account. In addition to internal tests and performance evaluations, we also run spot checks on our own Android devices using an internal version of the Uber rider app, as illustrated in Figure 9, below:
Accurate estimation of rider and driver location is a crucial requirement for fulfilling Uber’s mission of providing transportation as reliable as running water, everywhere, for everyone. To meet our mission, the Sensing, Intelligence, and Research team is working on a variety of approaches for improving location with creative use of sensors and computation on mobile devices, coupled with the computational power of our server infrastructure. The combination of advanced signal processing, machine learning algorithms, and software at scale has huge potential, and we are always looking for talented and highly motivated individuals (software and algorithms engineers, data visualization engineers, and machine learning engineers) to join us to help realize this potential.
Danny Iland, Andrew Irish, Upamanyu Madhow, & Brian Sandler are members of Uber’s Sensing, Inference and Research team. Danny, Andrew, and Upamanyu were part of the original group that led this research at the University of California, Santa Barbara. After spinning this work into a startup, they demonstrated server-based particle filtering for location improvement in San Francisco using a 3D map constructed from publicly available aerial LiDAR data. They joined Uber in July 2016.