Unstructured Data Transfer in Rest.li

A few years ago, we announced Rest.li 2.x and a Protocol Upgrade Story. Today, we are excited to share another major milestone: the release of Unstructured Data Reactive! The source code is available on GitHub and documented in the user guide. Give it a try and let us know what you think.  

Why Rest.li?

Rest.li is an open source REST framework for building robust, scalable RESTful architectures using type-safe bindings and asynchronous, non-blocking I/O. Rest.li fills a niche for applying RESTful principles at scale with an end-to-end developer workflow for building REST APIs, which promotes clean REST practices, uniform interface design, and consistent data modeling. Rest.li was developed at LinkedIn over a period of six years, and it plays a critical role in LinkedIn’s data-model-centric architecture, which ensures a consistent stateless Restful API model.

Why unstructured data?

The Rest.li framework enables applications to model their data and RESTful interfaces as Rest.li Resources. However, an inherent requirement is that the data must be representable in a JSON-like structure that consists of maps/lists with primitive data types. Unstructured data, such as PDFs or images, cannot be Rest.li Resources. This has become increasingly limiting as we are witnessing more and more unstructured data use cases in Rest.li applications. Supporting unstructured data as native Rest.li resources aims to provide a better and unified solution going forward. It also unlocks the possibility of framework-level performance optimization in the near future.  

Reactive streaming: Asynchronous non-blocking back pressure streaming  

Non-blocking vs. blocking
A blocking execution may potentially wait before it can proceed. For example, when reading from a file system, the execution may wait for slower I/O. When getting data from a downstream service, the execution needs to wait for the response before it can return the data. When adding data to a blocking bounded queue, the execution may block if the queue has reached its capacity. Reading data from an InputStream or writing data to an OutputStream is blocking.

A non-blocking execution doesn’t need to wait before it completes. There can be long-running execution that is non-blocking.

Reactive streaming should be non-blocking. That is, the consumer of the data should not block because the producer can’t produce data fast enough, and the producer of the data should not block because the consumer can’t consume the data fast enough. Additionally, the mechanism to provide back pressure (see below) should also be non-blocking.

InputStream and OutputStream are considered blocking I/O.

Future.get() is blocking.

Methods returning CompletableFuture, Future, and Promise are expected to have a non-blocking implementation.

Synchronous vs. asynchronous
Synchronous function invocation provides data or otherwise completes the requested operation on function return. Synchronous function invocation may or may not be blocking.  

An asynchronous function invocation doesn’t return the data or the signal of completion. It typically provides the data or the signal of completion through a callback, but it can also do it with a promise or other mechanism. An asynchronous function invocation is intended to be non-blocking.

Reactive streaming should be asynchronous to avoid being blocked.

Passing in a callback to receive an execution result is asynchronous.

Returning a CompletableFuture or a promise is asynchronous.

Returning a Future is also asynchronous but getting the execution result is synchronous.

Back pressure
When a consumer can’t consume data fast enough, to avoid caching data indefinitely it needs a mechanism to notify the producer to stop producing data until it can consume more data. This mechanism is called back pressure.

Inside server code, the back pressure is provided by EntityStream API (see below) which is pull-based, so that ensures the ability to apply back pressure inside a JVM (on the contrary, without this back pressure, for example, Netty would willingly buffer all data chunks you write to it, until OutOfMemoryError is thrown). Across the network, the back pressure is achieved via TCP flow control.  

The following figure illustrates how back pressure works in a scenario where a client is uploading a file to the server and the server asynchronously dumps the file into the disk. There are two EntityStreams (W, O, R, representing Writer, Observer, Reader, respectively) in the figure; one is on the client side, where the user code is the Writer that is trying to write the data chunks of the request body, and the other is on the server side, where the user code is the Reader that is trying to read data chunks of the request body.  

An example of back pressure

Source link