Figure 1: Examples of personalization tasks that can be solved by GDMix
Introduction to personalization
Personalization in the context of ranking modeling for search and recommender systems means ranking items according to the interests of an individual or a specific group. This technique is widely used in the social network and e-commerce industry to improve user satisfaction.
Personalization exists in many LinkedIn products. When a member does a search on linkedin.com, the results are generated by considering the member’s features, such as network connections, past interactions with other members, past or current companies and colleagues, etc. These personal signals help to retrieve relevant documents and rank them in the right order. The value of personalization is more obvious when the member’s intent is implicit, e.g., no query is given. For example, LinkedIn’s People You May Know and Jobs You May Be Interested In are two products where the member’s profile, networks, and activities on LinkedIn are used to generate a list of relevant member profiles or job postings.
One method to create personalized models is to include features that reflect individuality. Let’s consider job recommendations for two of our members, Alice and Bob. They both live in the San Francisco Bay Area and have similar profiles. Both are recent college graduates with a bachelor’s degree in computer science. Alice wants to stay close to home, while Bob isn’t opposed to relocation. If the geo-matching between profile location and job location is a feature in the job ranking model, then this feature should carry different weight for Alice and Bob. This, however, is impossible because we can only assign one value to a feature. Personalized features come to rescue in this instance.
We can achieve personalized features by crossing the existing features with entity IDs, resulting in a set of new features specific to the entity IDs. In the example above, if we cross the geo-matching feature with member ID, we arrive at two features: “Alice_geo-matching” and “Bob_geo-matching.” It is now possible to assign two different weights for this feature.
On the surface, personalization at the finest granularity is simply solved by crossing the entity IDs with existing features, such that each entity gets a copy of all the features. This approach, however, does not scale. For the job recommendation example, with more than 700 million members and 100 features per job, we end up with a model of 70 billion features. A model of this size can not be easily trained, despite recent advances in computer hardware. GDMix provides a solution to train these models efficiently.
Mixed model: Fixed effects and random effects
Before we dive into the details of GDMix, let’s first understand what a mixed model is and how it is related to personalization.
A mixed model is a statistical model containing both fixed effects and random effects. The fixed effects set the global trend and the random effects account for the individuality. Let’s go back to the job recommendation example for Alice and Bob. Both of them have “Tensorflow” and “machine learning” listed in their skills. A fixed effect model predicts “machine learning software engineering” jobs are good matches for them. It prevents us from sending irrelevant recommendations such as “sales” jobs to them. The random effect models learn from their past activities that Alice clicked local job postings exclusively, while Bob was not concerned with the job location. These models identify that difference and rank local jobs higher in recommendations to Alice while discounting the importance of job location in recommendations to Bob. It is the combination of fixed effects and random effects that ensures high quality, personalized results.
In the job recommendation example above, we arrived at a model of 70 billion features. GDMix offers an efficient solution to train this model by taking a parallel blockwise coordinate descent approach (Figure 2). The fixed effects and random effects can be regarded as “coordinates.” During each optimization step, we optimize one coordinate at a time and keep the rest constant. By iterating over all the coordinates a few times, we arrive at a solution that is close to the solution to the original problem. The models belonging to each random effect are independent of each other. Thus, we can train them in parallel. In the end, we break down the 70-billion-feature model into 700 million small models that are much easier to tackle individually.
Besides per-entity random effects, GDMix also supports training per-cohort random effects. A cohort is a group of entities that share certain characteristics. For example, all members in a geographical location can be regarded as one cohort. The difficulty with per-cohort random effects is that the number of training examples is usually fairly large compared to per-entity random effects. GDMix can combine multiple cohorts together and solve for the appropriate models for them by using the fixed effect solver.