For the purposes of comment relevance, we needed a serving subsystem that could satisfy the following requirements:
A system with an index that is able to retrieve all comments on a comment thread (quickly).
Fast access to the list of joined features for each comment on the thread.
The ability to service thousands of QPS at scale and produce relevant comments for each request on the feed.
While (1) and (2) are easy tasks, it’s (3) that drives the design. Being able to service thousands of QPS at scale narrowed the scope down to only two subsystems: Galene (our search stack, document sharded) or FollowFeed (our feed stack, term sharded). Both are robust systems with strong monitoring, deployability, and ops characteristics. Both systems do their respective jobs pretty well: Galene powers LinkedIn’s search traffic and several site features (e.g., job recommendations, people search), while FollowFeed powers all the user-generated content in the feed. After deliberation and some benchmarking, we went with FollowFeed because it was already well-integrated with the feed ecosystem. This led to some interesting design choices, however.
For starters, FollowFeed is a term-sharded system (each leaf node stores documents associated with a dominant term). In FollowFeed’s case, terms are built around the concept of an actor (e.g., a member, a company, etc.) and a list of social actions being done by that actor (comments, likes, shares, etc.). To make FollowFeed return relevant comments, we needed to restructure some basics.
1. Retool the system to accept the wide diversity of items that can be commented upon.
Unlike the handful of actors that can produce comments inside of the LinkedIn ecosystem (e.g., members, schools, companies), we have a much larger variety of things that members can comment upon (e.g., articles, long form posts, shares, anniversaries, videos, etc.). FollowFeed was built largely around a data structure that associates each MemberID with a list of activities. We were turning this data structure on its head and producing a data structure that associates a post id with a list of comment activities. Conceptually, it’s a small change, but accomplishing this in a large production system took us quite a bit longer than expected.
2. Fix the notion of Top N and Fanout.
In the feed world, the challenge is to determine the Top N posts that we should show to a given user. That is, given a single member and their connection list, produce the Top N posts that are applicable to that member. This boils down to taking the connection set of a member and fanning out a request asking for the top N posts from each of the member’s connections.
In the comment relevance case, though, there’s an expansion from the 1:N case to the M:NM case. Given M post ids, produce the Top N comments for each of these posts.
While it would have been tempting to just fan out the requests at the FeedMixer layer and query FollowFeed N times, it would not have been the best solution.
The Variance Sum Law indicates that the variance of N independent latency measurements is additive. That is,