Jason Jia | Pinterest engineer, Growth
Think about the last time you visited a website. Was there a word that helped draw you in? Now think about how that might be different if you spoke a different language, were a different age or grew up in a different part of your country. What if there was a tool that automatically optimized those words for all users? That’s what we set out to build with Copytune, our in-house copy testing framework. Finding the right hook might be the difference between a user choosing to engage with your site or moving on in the split second it takes to make that decision.
Copytune was Pinterest’s first iteration of a copy testing framework to allow quick and easy testing of different text variations displayed to Pinners. However, the original version had some pain points, and we saw opportunities for improvement. For example, Copytune experiments required an engineer to set them up and only optimized copy on a per-language basis. This meant we missed out on useful segmentation, and the results weren’t always trustworthy. In this post, we’ll explain how we took these learnings and built out Copytune V2 to further optimize copy and increase engagement.
Copytune V2 goals
We had a few goals in mind while building Copytune V2:
- Optimize copy by locale, gender and age. Looking at the gains we got from optimizing per-language, we realized other segmentation might prove useful, too.
- Copy test with already translated localized variants. Generally, copy is translated and tested for all available locales, but sometimes you might want to test it for a specific locale where you have much more context than translators.
- Enable copy testing on groups of related strings at the same time. Groups of strings that help to convey or elaborate on the same message are inherently related, and sometimes it doesn’t make sense to copy test them separately.
- Make creating a copy experiment seamless for both engineers and non-engineers. Running an experiment shouldn’t require code changes.
- Automatically choose the winning variant for each locale, gender and age segment once an experiment gets enough traffic. If a segment doesn’t show significantly positive results, we won’t ship anything and will continue to serve the control variant.
Building Copytune V2
First, all strings in our code base had to be loaded into Mojito so Copytune V2 would know which were valid for experimentation. Users can create an experiment from a UI by searching through all valid strings and setting up some configurations. We built this system so all strings had to pass through Copytune V2, which checks if an experiment is running for that string as well as the user’s locale, gender and age range (if available). Based on this, Copytune V2 would either serve a variant of the string or the correct translation from Mojito. Translations and experiments are updated in real-time as strings are translated or experiments are created, while a nightly job gathers all relevant data and computes the metrics for each running experiment.
Understanding the results
Since Copytune V2 experiments can be run across so many different segments, having a concise way to understand which variants do well is vital. We wrote nightly data jobs that automatically compute the current winning variant for each combination of locale, gender and age. This data is then surfaced in an easy-to-use UI to enable non-technical users to work independently of engineers.
Sometimes the winning results aren’t necessary high-quality. There were some cases where the best performing variant was off brand, so we implemented a feature to let the experiment creator manually deselect winning results and keep them from getting served to end users.
After releasing Copytune V2, we worked with various engineers to run experiments and understand its impact. The first batch of experiments were mostly aimed at optimizing copy in email subject lines and content that had already been copy tested in the past. The first nine experiments we shipped collectively increased our WAU (weekly active users) more than 0.67 percent. This was a huge win since that gain is the additional value that Copytune V2’s new features provided over the previously optimized copy. The cost of running copy experiments is low enough that we expect to continue to see significant positive impact.
Over the course of building Copytune V2, we learned some valuable lessons.
- Get buy in. When trying to build a robust and useful copy testing platform, one of the hardest parts was understanding what would actually make the platform useful and robust. There were countless things we could have done (or not), and getting buy in from all potential users helped us understand what the actual pain points were for not only engineers but also non-technical colleagues.
- Ensure data is trustworthy. This was something that came up over and over again. Positive results and increased engagement are only a good thing when you can prove they actually happened, especially since each Copytune V2 experiment has to deal with locale, gender and age segmentation. Without having trustworthy data and results, even the best copy test can fail if you can’t accurately evaluate it.
There’s still a lot more work to be done to further streamline the processing of copy testing. One idea is to add more customization to how we segment experiments. Currently, Copytune V2 experiments can only be segmented by locale, gender and age. For example, some users might care more about how active a user is instead of their location. Another area of focus is completely automating the experiment process. Currently, a user still has to manually ramp up Copytune V2 experiments and shut them down when there’s enough data to confidently choose a winner. That process could be done automatically to further increase copy testing velocity. We also plan to expand the clients Copytune V2 supports. We currently support web and mobile web and are looking to add iOS, Android and AMP.
Acknowledgements: Thanks to Rajath Prasad, Jean Aurambault and Chidinma Egbukichi for scoping and helping to developing Copytune V2. John Egan, Julie Trier, Koichiro Narita, Tingting Zhu, Kate Taylor and Francesca Di Marco for their guidance and invaluable contributions to the development process.