Kunlong Gu | Engineer, Discovery
Every day, people come to Pinterest to discover new ideas — and when they find a product they want to buy, it should be easy to purchase it. In home decor specifically, millions of people come to Pinterest to find inspiration. That’s why in 2017, we launched Shop The Look as a way to help Pinners get to the products they love faster, and for brands to put customers on a path to purchase. Shop the Look identifies specific objects in a larger scene with white dots that connect the Pinner to the checkout page.
We previously experimented with a human-in-the-loop approach to match product links with Pins, but needed a better way to scale across the billions of images we show Pinners. As a solution, we used computer vision to fully automate the process of matching products to scenes. Today, we’re announcing a fully automated experience on home decor scenes on iOS, with Android coming soon. This update increases our Shop The Look Pin coverage by 22.5x across billions of Pins and products, and in early testing has already lifted engagement by 7%.
Here, we’ll share how we leveraged computer vision and the dataset unique to Pinterest (175B+ Pins saved with user-added context) to automate this experience.
From Pin to purchase
On Pinterest, 97% of the 1,000 most popular searches are non-branded (meaning people generally start with basic searches for products like “shoes” or “couch”), and so brands of all sizes have an opportunity to reach people who are in a shopping mindset. As we scale, automated Shop the Look will tag organic Pins that have not been linked to a business account. For those brands who’d rather not have their Pins tagged, they can claim their domains.
Behind the scenes
We built this technology in three stages: data collection, machine learning (ML) modeling, and serving.
The data collection stage prepares the data that our machine learning models use for training, while the ML modeling stage prepares our models to identify and localize home decor objects in a scene. This process classifies objects into product categories and represents them with embeddings, which is a digital representation of images so that visually similar images have a shorter distance between their embeddings compared to dissimilar images. (You can find more in our recently submitted paper.) Finally, the serving stage uses our trained model to identify the product category from the query Pin (‘the look’), and then uses visual embeddings to find the closest product candidates to ‘the look’.
Pinterest’s unique dataset
Every computer vision task starts with image data. One of the special things about Pinterest is that most of the Pins people have saved are larger images (from blogs, retail sites, etc.) with many products (a lamp, couch, rug) in one scene. Through an earlier curation effort, we’ve accumulated a high quality dataset of 270,000 scene-product (around 1 million object-product) matching pairs. In addition, we annotated in-house 80,000 scenes images (250,000+ objects) with bounding boxes and categories based on Google Product Taxonomy (GPT). We went through multiple iterations of GPT before we had an ideal annotation guideline. Initially we strictly followed GPT, but the model didn’t perform well in some coarse-grained categories (i.e. beddings, tables). We found that products in these coarse-grained categories are of vastly different shapes and functions. For example, the category of “bedding” include bed canopies, bed sheets, pillows. We then manually cleaned up the existing datasets (fine-grained labeling) and improved the model significantly.
In this stage, we trained a Feature Pyramid Network Faster R-CNN detection model that parses scenes into objects and annotates them with product categories.
We also trained an Embedding Model to represent images as mentioned above (which we will talk about in an upcoming blog post — get excited!). The embedding learns from visual similarity across Pin images. In particular, we added this high-quality dataset of 270,000 scene-product matching pairs so the embedding can handle the domain shift from scenes to product images. The domain shift means the objects in the scenes are of various lighting conditions, rotation, and noisy backgrounds while product images usually contain a high-quality front photo with clear white background. Our Embedding Model captures that information well while providing product similarity.
The third model is a re-ranking model that mainly trains on user engagement data and re-ranks the visually similar candidates so the product is optimized for engagement. The re-ranking model also takes in semantic and contextual information, such as the category of the scene image, the boards that frequently include this Pin, and other objects in the scene.
There are three steps in Serving Stage.
When a user engages on a Pin, we first decompose the scene using the detection model. We restrict the search space by matching the annotation from detection and the category annotations in the shopping corpus. Then we use visual similarity scoring (the distance between embeddings) in the shopping index to generate result candidates. Finally, we apply the reranking model on those candidates to fine-tune the results.
This update brings more computer vision-powered results across Pinterest, showing visually similar ideas to more people. With more Shop the Look Pins in the system, Pinners can expect to see a much more consistent user experience across all home decor scenes.
In terms of internal operation, the automation frees people from doing repetitive work so they can spend more time being creative.
In the long term, the scene images are great resources to learn the relationship between objects, i.e. what objects complement each other or go well together in a certain style. We hope to leverage this rich data of object occurrence and build a sophisticated object graph for every object in the world, making Pinterest a personalized stylist for home, fashion and more.
You can expect to see more from us in shopping and visual search in the coming months!
Acknowledgements: Automating Shop the Look is a collaborative effort at Pinterest. Special thanks to Chuck Rosenberg, Andrew Zhai, Dmitry Kislyuk, Raymond Shiau, Eric Kim, Francis Chen, Jeffrey Harris, Angela Guo, Tim Weingarten, Jen Chan, Joyce Zha, and Amanda Strickler for the collaboration on this product.