Editor’s note: This blog post is the first in a series providing an overview and history of LinkedIn’s experimentation platform.
At any given time, LinkedIn’s experimentation platform is serving up to 41,000 A/B tests simultaneously on a user population of over 700 million members. Operation at such a scale is enabled with the LinkedIn Targeting, Ramping, and Experimentation platform, or T-REX. It started small, but growing internal demand and external forces have led us to scale and evolve the T-REX platform over the past decade. Originally conceived as an experiment management and delivery system with a UI application, the system gradually evolved into a platform that comprises targeting, dynamic configuration and experiment infrastructure, insight and reporting pipelines, a notification system, and a seamless UI experience.
Overall, three main factors heavily influenced development of the T-REX infrastructure in the past decade:
Rapid growth of the company,
Exponential growth of the data available for analysis,
Internal cultural shift, during which experimentation became an intrinsic part of the release process.
If you have been closely following the LinkedIn Engineering Blog, you may have seen numerous posts on A/B testing over the years, but it is the first time we are describing the history of the T-REX platform as a whole. We will take a look at the evolution of the platform’s infrastructure, as well as some of the foundational principles and decisions that shaped it. (Note: During the long history of the platform, it’s had multiple incarnations and carried different names, so please do not be confused if it is called LiX (LinkedIn eXperimentation) or XLNT (an eXperimentation framework) in previous posts.)
What is A/B testing?
A/B testing is a scientific method of running studies that relies on randomly splitting a test population into two or more groups and providing them with different variants of some “treatment.” There is always a control group in such a study that does not receive the treatment and which is used as a baseline to measure the effectiveness of the treatment. With a relatively large test population size, and given the randomized assignment of the variant groups, all the individual features of the population members are averaged and erased, and it is possible to estimate the average effect of the treatment on a member. In this post, we will use the terms “A/B testing” and “experimentation” interchangeably.