Freezing Time – Lyft Engineering


– “Time doesn’t exist, clocks exist”

There are many features at Lyft where it’s critical to have an accurate (and trustworthy) time source. This apparently trivial requirement presents a few challenges when the timestamp comes from a device we can’t control:

  • Can we trust the clock? Is the clock synchronized with an accurate time source?
  • Can we ensure that the request to our APIs containing the timestamp was not tampered with?

This post will focus on the first challenge, which is having an accurate time source but won’t get into details about the second part. The safest assumption is that any request can be tampered with, and safeguards on the server need to be implemented.

Sane Time

Although most devices are synchronized to some external time source, many still have skewed clocks when time syncing is not enabled (which can happen either because it’s not configured correctly or because the user manually set the clock to an arbitrary time). As it turns out; based on more than 5M+ Lyft sessions on iOS, 1% of our users have skewed clocks.

99% of iOS clocks (sample of 4,568,533 sessions) are synchronized between a second

But how can two clocks ever be synchronized in the first place?

… or we might even be more inquisitive about this issue by asking: how can anyone (or anything) observing a clock, communicate what the current time is with good enough precision?

By taking a peek at the clock you can see the “current” time but, how long does it take you to read it, to think about it and to put together the phonetic combination of your speech? Is it the same time by the moment you say it out loud? What about when the recipient finally hears it? Was the wall clock even accurate in the first place?

The keen reader may have noticed at this point that making this (arguably trivial) synchronization both accurate and precise is surprisingly difficult.

Luckily computers are fast reading the local clock. This operation is heavily optimized on the kernel to the point that asking for the time would not even access the actual clock (the kernel synchronizes an independent system clock every once in a while with the Real Time Clock and keeps track of time by counting timer interrupts). But even if reading the local time is not an issue; trusting the clock and sending the information to another machine… well, it’s complicated.

Sync Time

As the need for precise time synchronization has increased, several protocols have been developed to control system time. Three of the most common protocols are Network Time Protocol (NTP), Inter-Range Instrumentation Group (IRIG) time code and Precision Time Protocol (PTP). Given the simplicity of its implementation, its suitability for mobile networks and the fact that it doesn’t need extra hardware; we’ll focus on NTP.

NTP relies on very accurate clocks and a protocol for transmitting the time between devices across shaky networks. It defines a layered hierarchy (strata) of time quality: on the top of the chain (a.k.a. stratum 0) we find extremely accurate reference clocks that can either be a very expensive atomic (cesium, rubidium) clock, receivers for time signals like GPS (which are just big cesium clocks orbiting the Earth) or other radio clocks. Stratum 1 devices are connected directly to these clocks. Each synchronization between devices increments the stratum number so, for example, devices on Stratum 3 are 2 “hops” to a reference clock.

Network Time Protocol Strata

The synchronization between layers of the Strata is done using UDP/IP by sending several packets, the greatly simplified explanation is: when sending out a request the client stores the local time and for the response the server includes the server time when it got the request and when it sent the response. When the response is received by the client, a new timestamp is stored on the client. With these 4 times the client can calculate:

  • The time difference between the two devices.
  • The traveling time (delay) between the client and the server, which will be estimated to be half of the total delay minus the remote processing time.
  • The maximum offset error (dispersion) which is an estimate of the total amount of error/variance between that server and the correct time.

Great, we understand how NTP works but how can you use it from your iOS app?…

Introducing Kronos

We’re open sourcing Kronos which is a Swift library that calculates a sane time from a pool of NTP servers using the Network Time Protocol. Kronos’ interface is extremely simple with just two public interfaces (Clock.now, Clock.sync) and its design is optimized for:

  • Supporting a monotonic clock.
  • Getting a sane time as fast as possible.
  • Continuously improving the time accuracy by sending more NTP packets to many servers on a pool.

Unlike the system’s clock, or a wall clock the time reported by this monotonic clock is not based on the device’s clock, and therefore is unaffected when the local time is changed while your app is running. Instead, the first time the clock is synchronized, Kronos will store the accurate time along with a delta since some arbitrary instant (e.g. uptime). After that, accessing:

Clock.now

Will return the local time based on the last known accurate time + delta since last sync.

Clock.now will get more and more accurate as the synchronization receive more responses from NTP servers but you can use it right away after the first packet. You can also access the time as soon as it’s available by:

Clock.sync(
first: { date, offset in
print("Least accurate time: (date)")
},
completion { date, offset in
print("Most accurate time: (date)")
}
}

Kronos supports IPv4, IPv6 and up to NTP v4. If you need an accurate NTP clock try Kronos out, we believe it will show you a good time!

Get started with Kronos on Github

Interested in open source work and having a big impact? Lyft is hiring! Hit me up on Twitter or at martin@lyft.com.

N E X TExtending IAM Policy and AWS APIs Using KMS and Lambda



Source link

Write a comment