πŸ’š β†’ πŸ‘ πŸ‘ πŸ‘ – Medium Engineering


A product engineering tale

Back in the heady days of early July 2017, I was tasked with adapting the Medium πŸ’š system into something that allowed variable input. With the binary πŸ’š button, users could tell us when they liked a story, but we wanted to know how much they liked it, in comparison with other stories.

In theory, this new data would allow us to surface truly good content over shiny clickbait. It was also the first piece of a larger project to open up our partner program and start paying Medium writers!

Context πŸ”

The Medium πŸ’š system worked like this.

On web, iOS, and Android, we had front-end components that rendered the button everywhere: on post listings, user profile pages, responses, topic pages, and in a few places on every post page. When you tapped it, it sent a πŸ’š request to our backend.

The request handler for πŸ’š updated a field on the UserPostRelation table to indicate that this user had recommended this post, then emitted an event that fanned out to do many more things, like send push/activity feed/email notifications, update post stats, update author stats, send data to our recommendations pipeline, send data to our social graph database…

We decided to keep most of this infrastructure in place and patch our new multi-recommendation system into it as far upstream as possible, to minimize backend changes as we experimented on the front-end. Conveniently, we already had a field on UserPostRelation that we were using to store how many times a user clapped for a Series.

Explorations 🐣

In my project planning, I allocated two weeks (out of five) to design and prototyping experimentation. So while I built an ugly but functional button and connected it to our backend, Jess, Herzog, Peter, and Wolfe came up with all this cool stuff!

A week in, Tess did some user research. People thought the designs that looked like rating systems required too much cognitive effort, so we moved away from those. Everyone loved Herzog’s cute animations though, so they stayed.

Product questions πŸ€”

After our two weeks of fun, it came time to make some hard calls.

  • What would the new icon be?
  • How many times could a user recommend a single post?
  • What number would we show next to the buttonβ€Šβ€”β€Šhow many times it was recommended, or how many different people recommended it?
  • Would the new action be only for our members?

Internal opinion was…varied. Some people wanted recommends to be capped as low as 3 per person per post. The prototype I released internally had a cap of 100. Some people wanted no cap at all, like claps in Series. For the number to show next to the button, we got an impressive quantum full house of preference: recommends, people, both, neither.

πŸ’š felt like a great icon when the recommend system was binary, but nobody wanted to give 20 πŸ’šs, let alone 50. We considered πŸ’‘ (too erudite), πŸ’§ (too divorced from sentiment), and πŸŽ‰ (too frivolous). Ev, who has the world’s driest sense of humor, half-jokingly (I think) suggested we should make a tree that grew bigger the more you recommended, and eventually produced knowledge-apples.

I, naturally bad at narrowing scope under the best of circumstances, had a stressful week. I coped by gathering advice from everyone in sight. Kathryn, our head of marketing, told me something really insightful: that internal opinion, though it seemed all over the map, was actually divided into two camps:

Camp 1 imagined that the new action would be lightweight and easy to perform, that each recommend would be a drop in an ocean, and that people would want to keep expressing their appreciation until it felt right.

Camp 2 thought the new action should be more weighty and meaningful, that each individual recommend should carry significant value, and that people would want to think carefully about exactly how many recommends they gave.

People’s preferences tended to align with their camp. I was staunchly Camp 1, so I wanted a high cap, I wanted an icon that felt like something you could give 5 or 15 or 40 of, and I wanted to show the total number of recommends (even if it was giant).

I conveyed Kathryn’s insight to our valiant and wise product manager, katie.

katie understood immediately, and further intuited that both camps were valid but the real danger lay in the lukewarm compromise in the middle. Within 12 hours, she put out an internal Medium post explaining the phenomenon and declaring that we were going with Camp 1: πŸ‘ icon, capped at 50, showing total clap count, going out to everyone.

And we were re-aligned.

The data backfill 🚚

Since claps are stored on a different field in the UserPostRelation table than recommends, we needed to backfill them from recommend data. Otherwise, every post written pre-πŸ‘ would suddenly show zero engagement.

I dedicated three cavalier sentences to this integral part of the project in my tech spec and handed it off to my Code2040 intern, Dmitri, on his second week.

I forgot about backfilling claps on author stats (which power our stats pages) and post stats (which are used everywhere). Each of these additional backfills was actually two backfills, one for summed totals and one for time-series data. The original UserPostRelation backfill also turned out to be more complicated than I thought, spanning 30 million objects and requiring sharding.

Within a week, Dmitri had eclipsed me in knowledge of our events and stats pipelines. Within three weeks he’d finished writing code for all five backfills. He spent the remaining time on the project capacity-planning with the dev ops team and shepherding backfills through dev, staging, and production, all the while casually picking off tasks from all over the rest of the stack.

30 million πŸ‘ for Dmitri, the most extraordinary intern in the universe.

Interesting bugsΒ πŸ›

This project involved keeping a lot of different kinds of data in sync, so, unsurprisingly, we spent a lot of time fielding reports of data bugs.

On our internal version of Medium, people would clap for a post, come back later, and their claps would be gone. Sometimes they saw errors in their browser. Sometimes their claps would appear in post totals, but their name wouldn’t show up in the list of applauders.

Once the backfills were complete, we started chasing down reports in earnest. We eventually untangled and fixed three separate bugs:

1. RateΒ limiting

We used to allow people to recommend 100 posts per day. In the vast majority of cases, people were hitting πŸ’š once per post, so we rate-limited per request. Once someone hit their limit for the day, we started sending back 429’s.

When we launched πŸ‘ internally and before we implemented a reasonable batching strategy, people were suddenly hitting their daily limit after interacting with just two posts. Worse, we were updating the number of people who clapped (on post stats) and the number of claps a user gave (on UserPostRelation) in different places in the code. The former was hitting the rate limit; the latter was not.

Kyle and Dmitri fixed this by changing the rate limit to only count a first clap, and reordering code so that even when the rate limit was hit, data wouldn’t get out of sync. I hurriedly implemented better batching.

2. Old mobileΒ clients

We knew we were going to have a graduated rollout for πŸ‘, so there would be some time when πŸ’š and πŸ‘ would have to co-exist. We made sure πŸ‘ was backward-compatible, but we separated πŸ’š and πŸ‘ flows by user.

People who got πŸ‘ early but still had old mobile clients with πŸ’š could briefly increment post totals from their apps without affecting their own UserPostRelation clap count.

Dmitri heroically debugged this while the rest of us napped after a team hiking trip. Herzog and I put in a fix the next day to check client versions on incoming requests.

3. Deactivated users

This was maybe the weirdest bug we found. A whole bunch of posts showed up with 1 clap from -1 people. When we dug in, we found that all of their post stats had recent changes at the exact same timestamp.

Our data platforms team found the culprit: we recently deactivated the account of a prolific spammer, and somewhere in our data pipeline a non-idempotent event to decrement post stats got sent twice, leading to the -1 people counts.

We also weren’t decrementing claps on user deactivation, resulting in the single orphaned claps.

Launch πŸš€

We rolled πŸ‘ out to our iOS and web beta groups first, then to 10% of our users. Initial reactions were mixed, with the main complaint being that πŸ‘ seemed pointless. But interestingly, quite a lot of people correctly guessed that they would become an important signal when Medium started to pay writers. Good work, product prophets!

We launched πŸ‘ to 100% last week, and the Medium Partner Program is rolling out today to a small initial group of writers and publishers. You can learn more here, or apply here to start writing!

This was an amazing project to be a part of, and I’m super proud of the work we did! ❀️ to Herzog, Dmitri, katie, Jess, Peter, Wolfe, and Dan, the DREAM TEAM.



Source link