Figure 1: A screenshot from the LinkedIn Recruiter product, some details redacted
Our focus as part of representative results is on the set of potential candidates (as well as their relative ordering) shown to a recruiter corresponding to a given search query. In this way, Recruiter will still return the same set of qualified candidates in response to any particular search; no one is added or removed. Recruiter ranking already varies depending on a number of factors, including candidates’ interest in a particular employer and their preferred location to work, to optimize for our customers’ success. Many of the technical details about the LinkedIn search stack have been described in prior blog posts.
Understanding representativeness of a ranked list
We next explain our attempts to measure the representativeness of the ranked lists generated by our talent search and recommendation systems.
Intuition underlying representativeness measurement
Our measurement and mitigation approach assumes that in the ideal setting, the set of qualified candidates and the set of top-ranked results for a search request both should have the same distribution on the attribute of interest; i.e., the ranked/recommended list should be representative of the qualified list. This assumption could be thought of as based on the definition of equal opportunity defined in prior research in the domain of machine learning. In mathematical terms, a predictor function is said to satisfy equal opportunity with respect to an attribute and true outcome if the predictor and the attribute are independent conditional on the true outcome being 1 (favorable).
In our setting, we assume that the LinkedIn members that match the criteria specified by recruiters in a search request are “qualified” for that search request. We can roughly map to the above definition as follows. The predictor function corresponds to whether a candidate is presented in the top-ranked results for the search request, while the true outcome corresponds to whether a candidate matches the search request criteria (or equivalently, is “qualified” for the search request). Satisfying the above definition means that whether or not a member is included in the top-ranked results does not depend on the attribute of interest, or, equivalently, that the proportion of members belonging to a given value of the attribute does not vary between the set of qualified candidates and the set of top-ranked results. This requirement can also be thought of as seeking statistical parity between the set of qualified candidates and the set of top-ranked results.
In other words, the operating definition of “equal” or “fair” here is towards achieving ranked results that are representative (in terms of the attribute of interest, and in the scope of this work, inferred gender) of the qualified population. We define qualified population to be the set of candidates (LinkedIn members) that match the criteria set forth in the recruiter’s query (e.g., if the query is “Skill:Java,” then the qualified population is the set of members that are familiar with computer programming using the Java programming language).
As mentioned above, our measures for evaluating representativeness are based on the assumption that the distribution of the attribute of interest for the top-ranked candidates in a search query should ideally reflect the corresponding distribution over the set of qualified candidates. Our main measure computes the skew for the attribute in question by comparing the proportion of candidates having an attribute value among the set of highest-ranked candidates to the corresponding proportion among the set of qualified candidates.
Next, we will present our approach for representative ranking which is designed to ensure that the top search results reflect the distribution of the attribute of interest (inferred gender, in our context) in the underlying talent pool, while also taking into account the scores assigned by our machine learned models to the potential candidates.
Representative Talent Search ranking
The main idea behind our approach for generating representative results in Talent Search is to perform a post-processing step, wherein we re-rank the set of candidates retrieved by the machine-learned model. A high-level overview of the representative ranking algorithm is as follows:
Partition the set of potential candidates into different gender buckets.
Rank the candidates in each gender bucket according to the scores assigned by the machine-learned model.
Merge the gender buckets, while obeying representation constraints based on the gender proportions computed from the set of qualified candidates. The merging is designed to keep the gender proportions in the ranked list similar to the corresponding proportions in the set of qualified candidates for every index of recommendation (e.g., for the top result, for the top two results, for the top five results, and so on). Note that the merging preserves the score-based order of candidates within each gender bucket.
We adopted the post-processing approach due to the following practical considerations. First, this approach is agnostic to the specifics of each model and therefore scalable across different model choices for our application. Second, this approach is easier to incorporate as part of existing systems, since we can build a stand-alone service/component for post-processing without significant modifications to the existing components. Finally, our approach aims to ensure that the search results presented to the users of LinkedIn Recruiter are representative of the underlying talent pool.
We compute the gender proportions in the set of qualified candidates as follows. First, we use LinkedIn’s Galene search engine to obtain the set of qualified candidates that match the criteria specified as part of the search query by the user. We then compute the empirical distribution over the genders to understand the required gender representation constraints, and use this distribution for computing our representation metrics (after the fact, for evaluation), and apply re-ranking (during recommendation). Next, we present the architecture for our ranking system.
Figure 2 details our two-tiered ranking architecture for achieving gender-representative ranking for LinkedIn Talent Search systems. While the details of our machine learning models optimized for two-way (Recruiter-Candidate) interest are out of the scope of this post, we direct the interested reader to some of our team’s research papers (BigData’15, WWWCompanion’16, SIGIR’18, CIKM’18–1, CIKM’18–2) describing the methodology on how we train our candidate ranking models.