Philip Apps | Data Scientist, Ads Quality
Clickthrough rate falls prey to optimizing for clickbait and ignores other signals and position bias. A better metric benefits Pinners seeing an ad and the advertiser who created it.
“Simple tweaks can make CTR far more useful.”
How can we measure user engagement in an online platform? Clickthrough rate (CTR) is the first metric everyone thinks of, but it suffers from some serious shortcomings. Many companies have made tweaks to make this metric more useful — we’ll discuss what we’ve done here at Pinterest.
This post will be the first in a series examining metrics for users, advertisers, and the company — the three stakeholders that online platforms need to satisfy. We’ll focus on Pinterest’s ad marketplace, but the principles can be applied to many other online settings.
“If products don’t work for people, they don’t work for us.”
We can measure user engagement by how people interact with the website, e.g. via clicks. But before diving into that, let’s answer a more basic question — why do we care what users think? Why not just set up our system to make as much money as possible?
In Pinterest’s case, “putting Pinners first” is a core value. If products don’t work for Pinners, they don’t work for us.
Moreover, you can maximize short-term revenue without caring about user engagement (and many less-reputable sites do!) but, as Lincoln (maybe) said, “You can’t fool all the people all the time”. So any serious online platform needs people to come back and use it over the long term and commit to making sure users have a good experience. And that means we need a metric to track user engagement.
But what metric? How can we aggregate the preferences of 400 million monthly active Pinterest users and see how they change in thousands of A/B tests per year?
“If people click on something, then they must like it, right?”
Imagine that you had to measure user engagement for an online platform, let’s say an eCommerce store with a search engine and paid ads. Your first attempt would probably be Clickthrough Rate (CTR) — the number of clicks on an ad divided by the number of times it was seen. After all, if people click on something, then they must like it, right?
This has a number of advantages:
- CTR data is easily available, so this will be easy to calculate, track, and analyze.
- CTR is a widely known and used metric in the industry, so it will be easy to get this accepted internally
- And, as a bonus, lots of advertisers want to have high CTRs too. So you’ve aligned your advertiser and user interests!
So, just throw a lot of advanced machine learning at the problem to predict and maximize CTRs and you’re done, right? Not so fast!
“CTR is not the key to user engagement.”
There’s a joke about a drunkard who drops his keys on a dark sidewalk and then walks away to a streetlight to find them because the light is better over there. Similarly, CTR sheds a lot of light on things, but it’s not the key to user engagement.
Here are the top four problems with CTR — number 2 will shock you!
Problem # 1: Position bias
Ads higher up the page are more likely to be clicked, but this does not mean the user finds them more useful. This is not unique to online interactions; studies (some linked in this JSTOR article) have shown that people are more likely to vote for candidates at the top of a voting ballot, so the order is often randomized. If you don’t account for this, then you may end up thinking that users like ads more at the top of the page, when in reality they are just more prone to pay attention to them.
Problem # 2: Focusing on CTR leads to clickbait
Items with a high CTR may be “clickbaity” — they encourage the user to click but fail to give them anything useful, instead directing them to ad-loaded slideshows that make money for the advertiser but just annoy the user.
Problem # 3: CTR ignores other signals
“Every platform should have a way to show dislike.”
Users have a number of ways they can engage with content on Pinterest besides clicking. They can Save the content to one of their boards for future inspiration or Hide it to indicate they don’t want to see this sort of thing. These are rarer events than clicks, but they give strong signals of user interest in or dislike of an ad.
In fact, every platform should have a way to show dislike, as it’s such a useful signal in distinguishing lack of interest from active annoyance. And if your platform does have a way to show dislike, you have to pay close attention to it so users don’t feel ignored.
Problem # 4: Some things aren’t meant to be clicked
In the early days of online advertising, advertisers were just trying to get users to click on their ads. This trend continued with the rise of search-based pay-per-click ads. But today’s advertisers may use different, less click-based formats, such as video, or be more interested in building brand awareness. Some other signals we could use include watch time or sideswipes for carousel ads.
Looks like relying solely on CTR opens up a can of worms. How did we address these problems?
We incorporated these concerns to create a new metric that combines multiple engagements (with Save having a positive weight and Hide a negative) and accounts for position bias by comparing this to the engagement on nearby organic (non-ads) content.
For example, an ad in a high position may have a high engagement rate but takes space away from organic content with similarly high engagement rates, so we account for this in our metric.
Defining the User Metric
We define our User Metric as:
Let’s talk about the numerator first.
Weighted Engagement on Ads
Instead of just looking at CTR, we look at a weighted average of different actions — Clicks, Hides, and Saves, among others — on the ads.
As a simple fictional example, suppose we measure user engagement on ads as = CTR — 20*Hide Rate.
Here, Hide Rate is multiplied by a large negative number: “negative” because Hides represent user dissatisfaction, and “large” because this is strong dissatisfaction.
Consider two ads: B is clicked more often than A, but it is hidden by a lot more people.
If we just looked at CTR, then we might think that B was better than A. But once we take hides into account, our metric rates A as better for overall user engagement.
If we choose the actions and weights carefully, we can address concerns 2, 3, and 4 with CTR. As a first step, we can use business judgement on the importance of the actions combined with their frequencies. With more sophisticated analysis, we can investigate the causal effects of actions on long-term revenue and user retention.
That still leaves us with concern 1: position bias. We’ll illustrate how we address this with a simplified example.
Addressing Position Bias
For simplicity, let’s say we’re just using CTR as a measure of user engagement.
Suppose we run an A/B experiment with a control and treatment group, and we get the CTRs below. Which is better for user engagement (assuming the differences are statistically significant)?
With only this information, it looks like the control group is better.
But now suppose I tell you that in the control group, ads were only shown on the first page, while in treatment, ads were only shown on the tenth page or later.
Let’s assume that CTRs drop on ads and organic content as users scroll through Pinterest. Then the treatment group starts to look better — its CTR is only a bit less than the control when we would expect it to be a lot less.
How do we formalize this?
We compare the average engagement rate on ads to that of the organic content in the spots before and after. This is where the denominator comes in.
If ads in the control group and the treatment group are typically served like this:
Then we could end up with a result like this:
So, adjusting for organic engagement, the treatment group now looks much better than the control group for overall user engagement, as we are not taking away as much organic engagement. This comparison of ads to organic content reduces the position bias of CTR. Note also that we don’t actually need to make any assumption about the shape of the position bias — the approach still works even if there is no effect.
“We kept a launch that was bad for CTR but good for users.”
This is a lot of extra machinery. Is it worth it?
Yes! Especially when our metric gives different results to CTR. Here are some examples where the metrics disagree and our new metric better reflects user engagement. (Blue is statistically significant positive, red statistically significant negative, grey is neutral.)
So this new metric is certainly useful, but it’s worth noting that engagement can’t tell you everything about user experience. Ad relevance, in-platform surveys, and user interviews also have roles to play.
We’ve shown how to develop a better metric than CTR for user engagement, and we are always looking for talented people to go even further. If this article clicked with you, then clickthrough to our careers page 🙂
Thanks to a number of colleagues for help in writing this post and developing the metric, including Joshua Cherry, Pamela Clevenger, Aidan Crook, Ashim Datta, Brian Karfunkel, Kellie Ottoboni, And Ozbay, Alexandrin Popescul, and Roelof van Zwol. Your contributions are greatly appreciated!