By Mohit Vora, Lara Deek, Ellen Livengood
There are many reasons why Netflix cares about the popularity of our TV shows and movies. On the Open Connect Content Delivery team, we predict content popularity to maximize our infrastructure efficiency.
Some months ago, we blogged about how we use proactive caching to keep the content on our global network of Open Connect Appliances (OCAs) current. We also recently gave an overview of some of the data science challenges where Open Connect and our Science & Algorithms teams collaborate to optimize our CDN. In this post, we delve deeper into one of these areas — how and why we predict content popularity.
From the Content Delivery perspective, we view popularity as the number of times a piece of content is watched. We compute it by dividing total bytes streamed from this asset by the size in bytes of the asset.
How is content popularity used to optimize our CDN?
Minimizing network distance
As we described in this blog, the Open Connect global CDN consists of servers that are either physically located in ISP data centers (ISP servers) or IXP data centers (IX servers). We aim to serve as much of the content as possible over the shortest networking path. This maximizes the streaming experience for our members by reducing network latencies.
Given the finite amount of disk space available per server and the large size of the entire Netflix catalog, we cannot fit all content in every cluster of co-located servers. Many clusters that are proximally located to end-users (ISP clusters) do not have enough disk capacity to fit the entire Netflix catalog. Therefore, we cache only the most popular content on these clusters.
Organizing content into server tiers
At locations that deliver very large amounts of traffic, we use a tiered infrastructure — high throughput servers (up to 100Gbps) are used to serve very popular content and high capacity storage servers (200TB+) are used to serve the tail of the catalog. We need to rank content based on popularity to properly organize it within these tiers.
Influencing content replication within a cluster
Within a cluster, we replicate titles over N servers, where N is roughly proportional to the popularity of that content.
Why do we store multiple copies of our files?
An extremely popular file, if deployed only on a single server, can overwhelm the resources of that server — while other servers may remain underutilized. This effect is not as pronounced in our deployment environment due to two crucial optimizations:
- Because we route traffic based on network proximity, the regional demand for even the most popular content gets shared and diffused across our network.
- Popular files are locked into memory rather than fetched constantly from disk. This latter memory optimization eliminates the possibility of disk I/O being the cause of a server capacity bottleneck.
However, we still keep multiple copies for the following reasons.
Maximizing traffic by minimizing inter-server traffic variance
Consistent Hashing is used to allocate content to multiple servers within a cluster. While consistent hashing on its own typically results in a reasonably well-balanced cluster, the absolute traffic variance can be high if every file is served from a single server in a given location.
As an example:
If we try to distribute from a pile of very large rocks into multiple buckets, even with a great allocation algorithm, it is more likely that the buckets will not all have the same weight. However, if we had a pile of pebbles, then we can balance the weights with higher probability. Analogously, high popularity content (large rocks) can be broken down into less popular content (pebbles) simply by deploying multiple copies of this content.
It is desirable to keep servers evenly balanced so that as traffic increases, each server reaches peak utilization at the same overall traffic level. This allows us to maximize the amount of traffic served by the entire cluster.
Resilience to server failures and unexpected spikes in popularity
In the event that a server has failed, all of the traffic bound to that server needs to be delivered from other servers in the same cluster — or, from other more distant locations on the network. Staying within the same cluster, therefore minimizing network distance, is much preferable — especially when it comes to very popular content. For this reason, we ensure that we keep multiple replicas of the most popular content in the same cluster.
In addition, we replicate some mid-tier content as an insurance against traffic being amplified unexpectedly — for example, because of sudden social media attention for a celebrity.
How is our content organized?
Every title is encoded in multiple formats, or encoding profiles. For example, some profiles may be used by iOS devices and others for a certain class of Smart TVs. There are video profiles, audio profiles, and profiles that contain subtitles.
Each audio and video profile is encoded into different levels of quality. For a given title, the higher the number of bits used to encode a second of content (bps), the higher the quality. (For a deeper dive on per-title encode optimization, see this past blog.) Which bitrate you stream at depends on the quality of your network connection, the encoding profiles your device supports, the title itself, and the Netflix plan that you are subscribed to.
Finally, we have audio profiles and subtitles available in multiple languages.
So for each quadruple of (title, encoding profile, bitrate, language), we need to cache one or more files. As an example, for streaming one episode of The Crown we store around 1,200 files!
How do we evaluate content ranking effectiveness?
For a cluster that is set up to service a certain segment of our traffic, caching efficiency is the ratio of bytes served by that cluster versus overall bytes served for this segment of traffic.
From the perspective of our ISP partners, we like to measure this on a per-network basis. We also measure this on a per-cluster basis for optimizing our own infrastructure.
Maximizing caching efficiency at the closest possible locations translates to lesser network hops. Lesser network hops directly improves user streaming quality, and also reduces the cost of transporting network content for both ISP networks and Netflix. Furthermore, maximizing caching efficiency makes responsible and efficient use of the internet.
Although our proactive content updates are downloaded to servers during off-peak hours when streaming traffic is at a minimum, we still strive to minimize the amount of content that has to be updated day-over-day — a secondary metric we call content churn. Less content updates can lead to lower costs for both our ISP partners and Netflix.
How do we predict popularity?
As described briefly in our earlier blog, we predict future viewing patterns by looking at historical viewing patterns. A simple way to do this could be to look at the content members watched on a given day and assume that the same content will be watched tomorrow. However, this short-sighted view would not lead to a very good prediction. Content popularity can fluctuate, and responding to these popularity fluctuations haphazardly could lead us to unnecessary content churn. So instead, we smooth data collected over multiple days of history to make the best prediction for the next day.
What granularity do we use to predict popularity?
We have the following models that compute content popularity at different levels of aggregation:
- Title level: Using this ranking for content positioning causes all files associated with a title to be ranked in a single group. This means that all files (multiple bitrates of video and audio) related to a single streaming session are guaranteed to be on a single server. The downside of this method is that we would have to store unpopular bitrates or encoding profiles alongside popular ones, making this method less efficient.
- File level: Every file is ranked on its own popularity. Using this method, files from the same title are in different sections of the rank. However, this mechanism improves caching efficiency significantly.
In 2016, we migrated most of our clusters from title level to file level rankings. With this change, we were able to achieve the same caching efficiency with 50% of storage!
Orthogonally to the above 2 levels of aggregation, we compute content popularity on a regional level. This is with the intuitive presumption that members from the same country have a similar preference in content.
What about new titles?
As mentioned above, historical viewing is used for ranking content that has been on Netflix for at least a day. For content that is launching on Netflix for the first time, we look at various internal and external forecasts to come up with a prediction of how a title will perform. We then normalize this with ‘organic’ predictions.
For some titles, we adjust this day 1 prediction by how heavily the title will be marketed. And finally, we some time use human judgement to pin certain upcoming titles high in the popularity ranking to ensure that we have adequate capacity to serve them.
We are always evaluating and improving our popularity algorithms and storage strategies. If these kinds of large scale challenges sound interesting to you, check out our latest job postings!