An important feature of EFS is that an EFS instance never starts cold. We keep a copy of the keys stored in the in-memory cache in an instance of RocksDB which we call “CachedKeysStore.” Each time a record is added, deleted, or evicted from the cache, the addition or removal is reflected to the CachedKeysStore. When an instance of EFS is initialized (for example, after the application is restarted), code in EFS automatically warms up the cache. This is done by iterating over all the keys present in CachedKeysStore and for each key fetching the data for it from the corresponding RocksDB instance and then loading it into the cache. Creating each instance of EFS already warmed up before a FollowFeed searcher node starts serving live traffic ensures that the first few requests don’t pay a latency penalty because of cold caches.
We also use a Caffeine loading cache inside EFS. If the cache receives a request for a key it doesn’t have, it loads it from RocksDB. Thus, EFS is designed as a write-through cache. When a new record is added to the cache, we write the relevant data to the bloom filter, the Caffeine cache, and to the RocksDB instances backing the cache (the data store and the cached keys store).
EFS is typically used to look up sparse data in FollowFeed. So for a large number of records, we won’t have any values. However, Caffeine cache does not cache negative lookups, i.e., if we find no data in RocksDB for a particular key, the empty response is not cached. This means that if we don’t have a feature value for a particular key, the loading cache will cause disk IO every time, which is bad for performance. To avoid this, we store a special sentinel value for negative lookups. The notion of sentinel values is internal to EFS and is never leaked to callers.
Since some feature data is sparse, the vast majority of the calls to EFS end up being negative look-ups (for content quality, over 99.99999% of calls). With the sentinel approach described above, we have an entry for each miss! This is very memory inefficient; we have to do better.
Bloom filters for negative lookup performance
So, to optimize memory utilization while supporting this access pattern, we use a bloom filter in front of the cache. The bloom filter is built during EFS initialization from the data in RocksDB. When a lookup is made to an EFS instance, we first look up the key in the bloom filter. The bloom filter indicates whether the cache may contain the key, with a pre-configured false positive rate, which we typically set to 1%. Only if the bloom filter indicates that they key may be present in the data do we make a lookup into the cache (and thus possibly cause a lookup from disk). What this means is that for sparse data sets, the vast majority of requests to the EFS are satisfied by the bloom filter. The size of a bloom filter instance used to store N keys is in the order of log(N). The size of a cache instance used to store N key-value pairs would be in the order of N. This translates to several gigabytes of memory space saved for large key sets.
Because we would end up making hundreds of thousands of calls to the bloom filter for each request, and we have many requests being processed in parallel, we will potentially be making millions of bloom filter lookups per second with this system. Since our application also has to process writes to EFS coming in from the indexing system, we needed a thread-safe bloom filter.
We started by using the bloom filter implementation from the Guava library. Since this implementation is not thread-safe, we initially guarded access to the bloom filter using a read-write lock. As we expected, this proved to be a bottleneck. Thus, we needed a lock-free bloom filter. We couldn’t find an open-source one for Java that we liked, so we changed the Guava implementation to be lock-free and submitted a pull request to the project (which was later merged) so everyone could benefit from this work.
While we initially built EFS to support normalized storage of content quality labels, we now have about seven different instances of EFS storing various types of feature data, including PathsToRoots for each Activity. We also have several new features being onboarded to FollowFeed, as this infrastructure enables much faster iteration on feature development.
Making decoration faster
Once the storage issues were resolved and the first use-case of content quality was fully powered by the Activity Graph, we started noticing ways in which we could further improve the member experience at LinkedIn by using the graph in new places. One of the first ideas that came to mind was improving decoration speed.
What is decoration?
LinkedIn follows a service-oriented architecture. Each service acts as a source of truth for data in its domain; for example, member data is owned by a service called Identity. Unique URNs are used to address each data record, e.g., member data records have URNs like urn:li:member:<ID>, where ID is a unique identifier for a record representing member data within the member domain.
LinkedIn’s data is also usually normalized. So if record A needs to reference data contained in another record B, the URN of record B is used inside record A rather than the data of record B being copied inside record A. The normalized data representation has a number of advantages, such as eliminating data duplication and maintaining referential integrity. When we are rendering a URN, all these references need to be resolved to their actual data. This process is called decoration.
Let’s take the example of decorating an Activity of a member liking an article. This Activity is represented by a unique URN and the Activity record might look like this: