As mentioned above, our approach was to recast what would typically be a classification problem as a similarity search problem. The basic steps followed were:
- Train a Word2Vec model using Caviar’s restaurant menus as the corpus.
- Convert each menu item into a vector using the Word2Vec model.
- Curate a set of candidate tags and perform the remaining steps for each distinct set.
- Convert each candidate tag into a vector using the Word2Vec model.
- For each menu item vector, compare with each candidate tag vector and classify the menu item as the candidate tag that was most similar.
- Optionally, filter out menu items whose most similar candidate tag was below a minimum threshold.
- Validate the classification results via cluster visualization.
- Select menu items for a given tag and display as a recommendation collection in the Caviar app.
Steps 1 & 2: Word2Vec Model + Vector Averaging
Word2Vec is neural network word embedding technique that learns a vector space model from a corpus of text such that related words are closer in the space than non-related words. This allows for interesting operations like similarity comparisons and vector algebra on concepts. For our purposes, we’d like to see a vector space model similar to the following:
We used the Gensim package to train a Word2Vec model on a corpus of Caviar restaurant menus. While there are a number of good pre-trained Word2Vec models based on large corpuses such as Wikipedia and Google News that we tried first, we found they did not perform as well as our custom model trained on Caviar’s restaurant menus alone. It appears that food language in menus is qualitatively different than food language in general sources like encyclopedias and news. This is something we plan to explore more in the future.
One limitation of Word2Vec for our purposes is that it only deals with individual words. For our formulation to work, we need to be able to create fixed-length vectors for the many multi-word tags and menu items that we have (e.g., the tag “Indian Curry” and the menu item “Big Bob’s Chili Sombrero Burger”). There are advanced techniques for this such as Doc2Vec, but we found that simply averaging the vectors for each word in a phrase worked well in practice.
Steps 3 & 4: Candidate Tag Selection
A key step in our approach is selecting the candidate tags. Our primary method is to compile sets of related tags such as cuisine types(e.g., “Pizza”, “Burger”, and “Thai Curry”) and dietary restriction (e.g., “Vegetarian”, “Vegan”, and “Gluten Free”) where we expect most menu items to be best classified by only one of the tags in the set. We used cluster visualization to demonstrate this approach with cuisine types. An additional promising method is crafting individual multi-word tags that capture a broader concept (e.g., “Cake Cookies Pie”) and match menu items beyond just those listed in the tag (e.g., match cupcake and donut items for “Cake Cookies Pie”). Crafting these types of tags is more of an iterative process akin to coming up with a good search engine query. We explored this approach with a couple of in-app collections as shown in a later section.
Looking ahead to Step 7: Validation via Cluster Visualization
As mentioned in the introduction, we wanted to avoid the cost of supervised methods, specifically the creation of a ground truth set for training and validation. However, we still needed a way to validate our tags, so as a compromise, we leveraged interactive cluster visualization to do ad hoc manual validation instead. We adapted the Tensorflow Embedding Projector for this purpose:
Steps 5–7: Cuisine Type Visualization
By following the outlined steps for every menu item with cuisine types as the candidate tag set, we obtained a cosine similarity score for each of the tags. We classified each menu item as the cuisine type with highest similarity score. The following sequence of figures highlights the explorations and validations we performed on the resulting data.
In the following figure, clusters are colored by most similar tag with no minimum similarity threshold set. This is our “high recall” scenario. It’s quite noisy, and we see a number of misclassified menu items. In some cases, this is because the menu item is just hard to classify. In many cases, though, it is because the menu item’s true cuisine type is not present in our cuisine type candidate tag set (e.g., we didn’t include tags for “Kombucha”, “Cheesecake”, or “Coconut Water”), and with no minimum similarity threshold set, an incorrect naive classification is made.
The following figure demonstrates one of the many correct classifications:
The following figure demonstrates one of our misclassification scenarios. In this case, Tom Yum Noodle Soup is incorrectly classified as “Thai Curry”. This is harder to classify correctly due to Tom Yum being more closely related to Thai cuisine than to typical soups like chicken noodle and minestrone:
The following figure demonstrates our primary misclassification scenario. In this case, Gyros Plate is incorrectly classified as “Fries” due to the menu item’s cuisine type (e.g., “Mediterranean” or “Gyros”) not being present in our cuisine type candidate tag set:
In the following figure, clusters are colored by most similar tag with a minimum similarity threshold set (i.e., only classifications of 0.7 cosine similarity or greater are kept). This is our “high precision” scenario and it is much better! We see good visual separation via inspection, and we find almost no misclassified menu items beyond the ambiguous cases we discuss next:
In the next figure, we wonder, “Is it pizza or is it fries?” The answer is “Both!” The Pizza Fries menu item has high scores for both the “Pizza” and “Fries” tags, with “Pizza” edging out “Fries”. That the menu item is located equidistantly from the two distinct clusters demonstrates one of the intuitive strengths of this method. There is no reason we couldn’t classify items like this with multiple best tags:
In the next two figures, clusters are labeled with the classified tag rather than menu item name. The first figure is colored by the classified tag and the second figure is colored by similarity to the “Pizza” tag. In the first figure, we see “Pasta” and “Pie” items are closer to “Pizza” than other less similar items like “Sushi” or “Dumplings”. In the second figure, thanks to the similarity gradient, we can easily see a range from “Pizza” to “Not Pizza”. This is another demonstration of the intuitive mapping between the spatial arrangements and the Word2Vec-based similarities that allowed us to perform ad hoc validations on our results:
Step 8: Automated Recommendation Collections
Our ultimate goal with this work was to automate menu item recommendation collections from the Word2Vec-based taggings, and we’ve already implemented a few examples. The following figures demonstrate collections for both simpler cuisine type tags and more advanced multi-word concept tags.
In the following figure, we show recommendation collections for the “Pizza” and “Thai Curry” tags. These are very promising, showing a range of items from the standard (e.g., Cheese Pizza and Panang Curry) to the exciting (e.g., Calabria Pizza and Chicken Pumpkin Curry):
In the following figure, we show recommendation collections based on the interesting approach of crafting multi-word concept tags. We used “Cake Cookies Pie” as the tag for the “For Your Sweet Tooth” concept collection and “Tikka Tandoori Biryani” as the tag for the “North Indian Fare” concept collection. These too are very promising. In the “For Your Sweet Tooth” collection, we see items beyond the “Cake Cookies Pie” tag such cupcakes, donuts, ice cream, and even an ice cream scoop. In the “North Indian Fare” concept collection, we see items beyond the “Tikka Tandoori Biryani” tag such as Saag Paneer and Itsy Bitsy Naan Bites: