where xm and xj are member and job feature vectors, fglobal is a global model, fm (xj) is a per-member model trained on the jobs that the member applied to and similarly, fj (xm) is a per-job model trained on members who interacted with this job. The model belongs to the class of generalized additive mixed models.
We used linear models for fm and fj, but one can use any model in the above formulation as long as the produced scores are calibrated to output log-odds (for example, a neural net). Usually, linear models are sufficient as per-member and per-job components, as individual members and individual jobs do not have enough interactions to train more complex non-linear models.
These three components—global, per-member, and per-job—are trained in a loop (using Photon ML and the following algorithm). Each of the many per-member and per-job models is independent of each other within a single training iteration conditional on the scores produced by other components, making each iteration embarrassingly parallel and hence, easy to distribute.
While the global model is trained on all data, each per-member model is trained using only that member’s recent job applications, and each per-job model on that job’s recent applicants. For our approach to work, it is essential to have enough density of data across members and jobs.
Our analysis demonstrated that the majority of job applicants apply to at least 5 jobs, while the majority of job postings receive at least 10 applicants. This proves to result in enough data to train the personalization models.
The personalized QA model improved the offline evaluation metric, the area under the ROC curve (AUC), by +27%. We observed similar improvements in NDCG (normalized discounted cumulative gain) metrics when learning to rank. Our baseline was the previously deployed gradient boosting tree model that was trained on the same dataset and the same features. We also estimated that the contribution of per-member and per-job models varies greatly with the use-case (see the “Applications and Impact” section below).That is, for some datasets, per-member models drive most of the gain and for some datasets, per-job models are more important.
Continuous learning via automated daily updates
The problem of model freshness
Models trained on high-velocity data can go stale quickly, and may require frequent retraining. This is the case with the QA model’s personalization components, which are trained on hirer engagement labels. In fact, the advantage over the baseline model for a per-member personalized model halves after only three weeks without updates. For per-job models personalization, this decay is even faster. Frequent updating is necessary to maintain the highest possible performance gain over the baseline.