Using the LinkedIn Fairness Toolkit in large-scale AI systems

Figure 1: Bias in recommendation systems can come in through different mechanisms and be potentially reinforced over time.

So how can we ensure that PYMK is fairly representing members from both groups and avoid reinforcing existing biases in networking behavior? 

The first step is choosing a robust definition of fairness. Three of the most widely used fairness definitions are equality of opportunity, equalized odds, and predictive rate parity. Equality of opportunity suggests that randomly chosen “qualified” candidates should be represented equally regardless of which group they belong to; in other words, the exposure of qualified candidates from any group should be equal. Equalized odds takes this definition a step further and requires that both “qualified” and “unqualified” candidates are treated similarly across groups, providing equal exposure to both qualified and unqualified members. Predictive rate parity ensures that the score from the algorithm predicts a candidate’s “quality” with equal precision across groups. While these definitions of fairness can be conflicting, the right definition to choose is often use-case specific. A more complete discussion of the considerations to make when choosing a fairness metric is given here.  

We have been working towards mitigation techniques for each of these definitions, hoping to provide practitioners with the tools they need for their applications. Fairness mitigation strategies commonly fall in one of three categories: pre-, in-, or post-processing.  

  • Pre-processing techniques massage the training data used to develop models in hopes that reducing bias at this stage will lead to a fair model.  

  • In-processing involves modifying model training algorithms to produce models that yield unbiased results.  

  • Post-processing methods, of which our re-rankers are examples, transform the scores produced by a model in a way that gives fairness.  

Post-processing mitigation approaches have the advantage that they are model agnostic, in the sense that they depend only on scores provided by a model. This flexibility affords engineers the ability to adjust the output of virtually any model to be fair, versus other approaches that are more application-specific. In 2018, we used post-process reranking in LinkedIn Recruiter search to ensure that each page of results is gender-representative. Since this initial foray into post-processing re-ranking based on exposure, we have developed and tested methods to re-rank according to equality of opportunity and equalized odds, which we have applied to the PYMK problem of fairly representing infrequent and frequent members. 

In PYMK, we chose to actively prevent the “rich-get-richer” phenomenon by giving qualified IMs and FMs equal representation in recommendations. In doing so, we saw more invites sent to IMs without any adverse impact on FMs: +5.44% in invitations sent to IMs and +4.8% in connections made by IMs, while remaining neutral on the respective metrics for FMs. This is an interesting result because typically, when invites are shifted from the FM group to the IM group, we would expect to see a metric increase for the latter and a decrease for the former. However, we observed neutral metrics for FMs and positive metrics for IMs, which indicates that recommendation quality has improved overall.

Of course, our fairness detection and mitigation efforts have extended beyond this illustrative example of fairness for frequent and infrequent members. A primary function of LiFT as used within LinkedIn will be to measure fairness to groups identified through protected attributes (such as age and gender). These applications come with additional privacy concerns, and we discuss one method for improving the anonymity of our system below.

Keeping demographic data anonymous
When dealing with any aspect of member data, it is of utmost importance to maintain the privacy of our members, especially their protected attributes. A core consideration for using LiFT internally has been developing a system that can provide all of our AI teams with insight into the fairness of their models without allowing each individual team access to protected attribute data.

To solve this problem, we employ a simple client-server architecture, where the fairness evaluation is performed on a server which has access to Personally Identifiable Information (PII) containing protected attribute data. Each AI team (the client side) is provided the tool as a pluggable component, which an AI engineer can configure to submit a model evaluation request. The server processes the request, and returns the analysis result to the client, without exposing the protected attribute information to the client. The server runs the fair analyzer library that supports LiFT. With this setup, member privacy is respected, in keeping with our ongoing commitment to responsible behavior.

Source link