Previously on this blog, we’ve shared information on best practices in data science, particularly in areas such as A/B testing. We’ve also discussed the importance of ethics in fields such as data science, early implementations of “fairness by design” principles in our products, and our commitment to sharing our research in order to further the industry conversation about designing systems that spread economic opportunity. These findings are shared with the aim of highlighting the real-world positive impacts of data science and encouraging further industry discussion around best practices in responsible product design.
In this post, we discuss a novel approach to integrating product A/B testing and inequality measurement concepts from the field of economics. We also discuss the methodology we have adopted for lowering barriers to economic opportunity in how different groups of members use our products. Finally, we provide examples of how it is helping to reshape research and design practices at LinkedIn, through a few choice case studies from the thousands of network A/B tests that we have already analyzed.
It is worth emphasizing that the term “inequality” is used throughout this blog post in the following ways:
- To establish inequality baselines, we use the Atkinson inequality index, which can be applied to any metric, and captures how unequally it is distributed (if everyone has the same amount of that metric, inequality is 0; if some people have a large amount and others nothing, inequality is high). It is routinely applied to income or wealth by economists. Here, we are applying it to metrics that capture economic opportunity for our members on LinkedIn.
- To measure the impact of our experiments on inequality baselines, we use inequality impact, which is used to measure the effect an experiment has on baseline inequality in our metrics. For example, if job applications are very unequally distributed, and an intervention makes them more equally distributed (e.g., by helping people who normally apply to few jobs apply to more of them), we say that there is an inequality reduction impact on job applications.
Product design, fairness in AI, and A/B testing
In recent years, researchers and industry experts have devoted a great deal of time to exploring the unintended consequences of applied technologies. Three primary areas of concern to many of us in the technology industry include:
- The tendency for algorithmic systems to “learn” or otherwise encode real-world biases in their operation (and then further amplify/reinforce those biases);
- The potential for product design to differentially benefit some groups of users more than others;
- Sparse or poor data quality that leads to objective-setting errors and system designs that lead to suboptimal outcomes for many groups of end users.
While there are many well-documented examples of these and other types of problems in the technology industry, developing a data-driven solution is not a straightforward task (see recent publications from SafeAI@AAAI, FAccT, and others).
Towards a framework for addressing fairness issues in products
Given the complexity of this topic, there are likely many ways we could go about ensuring that our members benefit as equally as possible from our products. Before showing our solution to the problem, we want to also advance a set of principles that underpin our thinking:
- First, considering what the end result is after people have engaged with a product should be as important as understanding whether an algorithm is “intrinsically” representative or fair. For example, any system that seems to be treating men and women similarly, but still results in women getting disengaged over time, is generally undesirable. These kinds of outcomes may be due to a host of reasons that exhibit patterns of structural inequality in the real world, such as social biases, cultural norms, etc.
- Secondly, collecting such data may be otherwise problematic. Tracking all protected categories for discrimination would require collecting sensitive data, potentially at odds with members’ expectations of privacy and with data security best practices (e.g., minimization, etc.) that are subject to complex, overlapping privacy laws and regulations. Currently, LinkedIn does not use sensitive demographic data (as defined by GDPR, e.g., race, ethnicity, religion, political preferences, etc.) for our Recruiter product or for marketing services.
- Finally, using existing demographic categories may not map to or reflect all kinds of inequality directly. It is possible that we may be overlooking many opportunities to improve our products if we look solely at the categories we are explicitly monitoring. We would like a way to identify instances of these kinds of functional inequalities that do not map to existing categories of users during the product testing process.
In summary, even if data on members’ demographic categories is available, it is not a panacea for identifying inequality impact. Even if a product may seem to have been designed in a “responsible” or “fair” manner based on assumptions of demographic parity, it can still drive a wedge between different groups of users. For instance, an app update that improves overall engagement but runs slowly on older mobile devices might dramatically affect members across many demographic categories in a manner that does not appear in a typical product A/B test.
Traditional A/B testing looks at averages, focusing on an idealized “average user.” However, people may respond to new products in ways that a designer never intended. In order to be inclusive, we need to look beyond the average. The approach that we’ve developed, outlined below, instead empowers leaders to design products that are more inclusive and equitable, regardless of the causes of an underlying disparity. This helps to overcome the “average user” problem of traditional A/B testing. Building more equitable products is also good business, as making sure no one is inadvertently left behind is key to long-term growth.