Understanding dwell time to improve LinkedIn feed ranking

Co-authors: Siddharth Dangi, Johnson Jia, Manas Somaiya, and Ying Xuan

The LinkedIn feed is the cornerstone of the member experience. It’s where our members post ideas, career news, questions, and jobs in an array of formats, including short text, long-form articles, images, and videos. The Feed AI Team’s mission is to help LinkedIn’s members discover the most relevant conversations and content in their feed to help them be more productive and successful. In this post, we explore how understanding our members’ time distribution spent on the feed has helped us improve the algorithms that rank content.

Overview of LinkedIn feed ranking

Let’s dive into an example. When member Alice visits LinkedIn, there are tens of thousands of candidate posts or updates that could potentially show up in her feed. A first-pass, candidate generation layer applies an efficient and lightweight ranking algorithm to identify the top candidate updates to show her. But among these top candidates, how is the ranking fine-tuned to determine the final order? If Alice’s connection Bob recently shared an interesting article, what determines where Bob’s post will appear in Alice’s feed?

We start with the assumption that if Alice were to see Bob’s post and find it to be relevant, she would click on it to engage with the content, the author, or the conversation. Specifically, she may react (“Like”, “Celebrate”, etc.), comment, or re-share—these three options are what we call “viral actions” because they can have downstream and/or upstream network effects. For example, re-share will propagate the article downstream, as Alice’s connections will see the article in their feed. On the other hand, a comment from Alice will have an upstream effect, as it provides valuable feedback to the creator (Bob) that may encourage him to post more often. Therefore, for each candidate update, we need to consider both Alice’s likelihood of engagement, and the potential downstream and upstream effects on her network as a result.

To accomplish this, we train our machine learning models to predict several quantities for each possible click and viral action (click, react, comment, share):

  • P(action) = Probability of Alice taking this action on the update
  • E[downstream clicks/virals | action] = Expected downstream clicks/virals if Alice takes this action
  • E[upstream value | action] = Expected upstream value to Bob if Alice takes this action

The outputs of these models are then synthesized into a single score using a weighted linear combination, where the weights are tuned to ensure that all three components are appropriately balanced in order to maintain a healthy feed ecosystem. Finally, this score is used to perform a point-wise ranking of all the candidate updates.

Why dwell time matters

Note that the ML models used above to generate the final score for each update focus primarily on predicting click- and viral-related quantities. This approach has several shortcomings:

  1. Click and viral actions can be rare, especially for passive consumers of the feed. While these members may still visit the feed frequently and find value in the updates they see, they may shy away from taking click and viral actions.
  2. Click and viral actions are primarily binary indicators of engagement—either you carry out the action or you don’t. For actions related to sharing, the text associated with a comment or re-share (if available) can provide a richer signal, although that signal can be more difficult to interpret.
  3. Clicks are noisy indicators of engagement. For example, a member may click on an article, but quickly close out, realizing it’s not relevant, and return to the feed within a few seconds. We call these “click bounces.”

To compensate for some of these shortcomings, we looked at aggregated per-update dwell time to see if it could help us better improve feed ranking. At a high level, each update viewed on the feed generates two types of dwell time. First, there is dwell time “on the feed,” which starts measuring when at least half of a feed update is visible as a member scrolls through their feed. Second, there is dwell time “after the click,” which is the time spent on content after clicking on an update in the feed.

Aside from a few notable exceptions, we assume that members value their time, and will spend it appropriately on feed content that they’re interested in. With this assumption in mind, dwell time has the following advantages over solely looking at click and viral actions:

Source link