Deploying TLS 1.3 at scale with Fizz, a performant open source TLS library – Facebook Code


The new generation of Transport Layer Security (TLS 1.3) incorporates several new features that make internet traffic more secure, including encrypting handshake messages to keep certificates private, redesigning the way secret keys are derived, and a zero round-trip connection setup, which makes certain requests faster than TLS 1.2. Every day, more than a billion people use Facebook to connect with their friends and family — and TLS 1.3 secures their data in transit from apps to our servers. To implement TLS 1.3 here at Facebook, we built Fizz, a robust, highly performant TLS library written in C++ 14. In addition to the protocol enhancements that come with TLS 1.3, Fizz offers a number of implementation features, including support for asynchronous I/O by default, and scatter/gather I/O to eliminate the need for extra copies of data.

We have deployed Fizz and TLS 1.3 globally in our mobile apps, Proxygen, our load balancers, our internal services, and even our QUIC library, mvfst. More than 50 percent of our internet traffic is now secured with TLS 1.3. We also deployed zero round-trip resumption (0-RTT) data, the newest feature of TLS 1.3, with our mobile apps. Fizz now handles millions of TLS 1.3 handshakes every second. We believe this makes it the largest deployment of TLS 1.3 — and early (0-RTT) data — on the internet. Fizz has reduced not only the latency but also the CPU utilization of services that perform trillions of requests a day. We are excited to be open-sourcing Fizz to help speed up deployment of TLS 1.3 across the internet and help others make their apps and services faster and more secure.

Performance

Our team worked with the Internet Engineering Task Force (IETF) for several years to standardize TLS 1.3. Previously, in order to improve the security and performance of TLS, we deployed Zero protocol, a custom protocol that allowed us to experiment with establishing 0-RTT secure connections. Using 0-RTT data reduces the latency of requests using TLS and the latency overhead needed to deploy TLS. Fizz delivers the reliability and performance of TLS 1.3 on par with Zero protocol, and therefore we have replaced our deployment of Zero protocol with TLS 1.3.

Efficiency has also been a focus with the development of Fizz. With zero copy encryption and decryption, tight integration with other parts of our infrastructure, and other optimizations, we see reduced usage of memory and CPU with Fizz. Our load balancer synthetic benchmarks show approximately 10 percent higher throughput than our previous stack, and we’re continuing to improve performance.

As with Zero protocol, TLS 1.3 early data significantly reduces latency when establishing secure connections, compared with TLS 1.2. This improves user experience, particularly on app startup when there are no existing connections to reuse.

Latency improvement in percentage (TLS 1.3 with early data vs. TLS 1.2)

Async by default

In modern deployments of TLS, servers are located all over the world. It’s common for a TLS load balancer to be in one location while the TLS certificate signing keys are provided by another secure service located all the way across the world. Techniques like Keyless SSL are used by several deployments to offload cryptographic computations to different hosts.

Since servers usually want to be able to make calls to services in other locations in the middle of a handshake, asynchronous IO becomes very important. As we worked on Fizz here at Facebook, we wanted to offload several functions to remote services: certificate operations as well as ticket decryption operations. As a result, Fizz servers are async by default. We use futures to provide a simple async API, and any callback from Fizz can return an asynchronous response without blocking the service from processing other handshakes. It is also very easy to add new asynchronous callbacks to Fizz for other use cases.

Zero copy writes

TLS APIs in several libraries require a user to supply a contiguous chunk of memory. TLS libraries encrypt this data and write it to a socket. However, applications usually hold data in memory in the form of several chunks of data in different memory locations rather than in one contiguous chunk of memory. In other libraries, applications would need to copy data into a contiguous memory location in order to supply it to the TLS library. This copy operation adds latency overhead. Fizz has first-class support for scatter/gather I/O APIs, as all its APIs accept a scatter/gather abstraction as an input by default. This allows users to pass in chunked data, and Fizz then encrypts data in place into the chunked memory, avoiding the need to copy data. Thus, applications using Fizz perform fewer memory allocations and copies — an important consideration for high-performance apps.

Early data

Fizz supports easy-to-use APIs that enable it to send early data immediately after the TCP connection is established. Early data reduces the latency of requests, especially during mobile app cold start, which, as we have previously shown, is important.

Using early data in TLS 1.3 has several caveats, however. An attacker can easily replay the data, causing it to be processed twice by the server. To mitigate this risk, we send only specific whitelisted requests as early data, and we’ve deployed a replay cache alongside our load balancers to detect and reject replayed data. Fizz provides simple APIs to be able to determine when transports are replay safe and can be used to send non-replay safe data.

Additionally, if the server has forgotten the key for early data, the server will reject the early data. This can pose a challenge, because the client would have to retry the early data. Fizz provides two kinds of APIs to be able to handle rejection of early data, either transparently or by allowing the app to change the data it sends during retry.

Secure from the ground up

Fizz is built with security in mind from the ground up, with secure abstractions. The TLS state machine is complex and poses challenges for the entire security community. Several past vulnerabilities have been caused by state machine issues in TLS implementations. For example, in the CCS vulnerability, injecting a ChangeCipherSpec message in the wrong state forced OpenSSL to use predictable keys. The issue shows that even when a particular protocol is secure, a bad state machine can introduce a serious vulnerability.

To manage the complexity of the state machine of TLS in Fizz, the state machine is explicit. This means transitions are defined in one place based on the messages that are received. Having all states defined explicitly in a single location makes it easier to address security issues.

In TLS, receiving a message can cause a transition to different states, depending not only on the type of message but also the data in the message. For example, when a server receives a ClientHello, in most cases it will transition to waiting for a Finished message from the client. However, if the ClientHello is missing parameters needed by the server, like key exchange algorithms, the server can send a HelloRetryRequest to the client. The next message the server would expect, then, is another ClientHello. As a result of this complexity, most implementations perform state transitions inside state handlers, as shown in the sample code below.

(ExpectingClientHello, ClientHello) → Handler {
    if (isAcceptable(ClientHello)) {
       transit<ExpectingFinished>();
    } else {
       transit<ExpectingClientHello>();
    }
}

It is easy for this handler code to become inconsistent with the explicit state machine, however. If the handler was changed and not kept consistent with the state machine, it could result in either a security vulnerability or the connection closing due to an error.

To avoid these issues in Fizz, we implemented an abstraction that prevents us from using incorrect state transitions. If a state handler uses an incorrect state transition that is not defined in the explicit state machine, the code will fail to compile. This helps us catch bugs during compile time rather than at runtime, thereby preventing mistakes.

For example, based on the previous state machine, if the following code were accidentally introduced, it would cause a compilation error:

(ExpectingClientHello, ClientHello) → Handler {
    if (isAcceptable(ClientHello)) {
       transit<ExpectingFinished>();
    }
    transit<WaitingForCertificateVerify>() // <---- compile error
}

We leverage the C++ type system with variadic templates and variant types in C++. This provides a high level of safety by preventing accidental bugs, which helps us move fast while implementing new features in TLS.

Exported Key Material

Fizz supports APIs that provide exported key material tied to TLS handshake. This has allowed us to support the Token binding protocol, which in turn allows TLS to be used to bind application tokens to improve security and trustworthiness of apps. We’ve also used this to integrate Fizz into our QUIC implementation.

Deployment of TLS

Our work on Fizz helped standardize the TLS 1.3 RFC. When we initially attempted deployment of earlier TLS 1.3 drafts, we saw several issues with middleboxes, causing high handshake failure rates in some regions. These failures were caused by middleboxes with intolerance to changes in the TLS protocol, and resulted in many issues, including dropped handshake messages and reset connections. Along with Firefox and Chrome, we ran experiments with different variations of the protocol, including making the first parts of the TLS 1.3 handshake appear similar to a TLS 1.2 resumption handshake. With this approach, TLS 1.3 became reliably deployable with no fallback to TLS 1.2.

Additionally, when we started deploying TLS early data, we saw new interference issues where middleboxes prevented the TLS handshake from completing. We reduced these by adding several workarounds to Fizz, such as ensuring that the ClientHello is sent in its own TCP packet and that Fizz will fall back to a full 1-RTT handshake if a 0-RTT handshake repeatedly fails.

Today, more than 50 percent of our internet traffic is secured with TLS 1.3, and that will continue to grow as browsers and apps add support for TLS 1.3. RFC 8446 will be published very soon, making TLS 1.3 an internet standard. Even before the RFC is published, we’re happy to show that TLS 1.3 has been successfully deployed at scale. With Fizz we’ve built a robust implementation of the next generation of TLS, and we’re excited to share it with the community so that it can be used in mobile apps, services, and load balancers. We hope the community will use Fizz and also contribute to its further evolution.



Source link