Site Speed Monitoring in A/B Testing and Feature Ramp-up


Co-authors: Jiahui Qi and David He

 

Everyday, LinkedIn serves hundreds of millions of pageviews to our members, from job searches to the news feed. Our network has grown to over 500 million members, and throughout our journey, “members-first” has been a fundamental value that we’ve carried. For our mission to continue improving the member experience, LinkedIn’s site speed infrastructure is a critical component because it provides site performance metrics to the engineers who develop and roll out features to our members.

Predicting site speed impact of a feature rollout is a difficult engineering problem, particularly at scale. We are ramping hundreds of changes simultaneously through A/B testing. When ramping a feature into production, how can developers gain visibility into the site speed impact? Likewise, when a performance optimization is enabled in production, how can the developer quantify the benefit? In addition, how can performance degradation be detected at an early stage of feature ramp-up, before the impact spreads out to a larger audience? In this blog, we share our experiences and solutions here at LinkedIn.

Site speed A/B reporting

At LinkedIn, features are ramped up through an A/B testing platform called XLNT. For a feature to be deployed in the production environment, it typically goes through several ramp-up stages that span days or even weeks before the feature is fully rolled out to every member.

We collect Real User Monitoring (RUM) data from all our web pages and mobile applications. In RUM, basic metrics, such as Navigation Timing and Resource Timing, are collected. In addition, we collect a lot of debugging details using markers to indicate the performance of components. This is the source of truth for site speed at LinkedIn.

On top of this, we built a system to slice and dice the RUM data to allow developers to visualize and understand site speed changes along the entire A/B ramping cycle.

During A/B testing, we compare site speed metrics provided by RUM, such as traffic and page load time, between two groups of users: an experimental group, and a control (or ”baseline”) group. By setting other variables, e.g., country, into the same category for both groups, we are able to do a fair comparison between the two sets of results in order to differentiate the performance impact.

Site speed A/B data are available in two flavors today: daily and real time. Daily data is processed in Hadoop. It is more reliable, because daily site speed variance is relatively small. Real-time data, which is processed in Apache Samza, is aggregated into 10-minute windows and is primarily used for anomaly detection and alerting purposes.

The choice of using daily summarized data versus real-time data is a practical tradeoff. While real-time data is available quickly and is useful for anomaly detection and alerting, the visualization is often noisy. Conversely, daily data is only available much later, but often provides a better understanding of the data. The distinction in practice means that they are applied to different problems. For example, real-time data is more suited for alerting. When a feature is ramped and accidentally causes a site speed degradation on a particular page, the real-time result can raise awareness in a meaningful timeframe. By having an alerting system on top of real-time data, developers and experiment owners can be notified about the issue early. Meanwhile, daily data is better suited for quantifying a performance impact that developers are already aware of. For example, when developers finished a feature that improves the speed of a web page, the precise difference could be summarized comprehensively. Based on the result, owners can decide whether they want to continue ramping the improvement feature or not.

Use case

Let’s use a real example at LinkedIn to show how we use this framework to monitor and improve site speed. We noticed LinkedIn’s web profile page was slow due to ads needing time to load. Engineers decided to optimize ads in order to solve this problem, and rolled out their changes. To see the performance difference and business impact before we decided to fully ramp up this feature, engineers used our site speed A/B test report to monitor ads click rate and site speed metrics in the 10% A/B ramping stage. From our visualization dashboard, engineers were able to check daily page load times at 90th percentile and traffic trends for both control and experiment groups.

From first chart on the UI, we can see that the page load time of the experiment group is 20% faster than that of the control group for both countries tested. For the second chart on the UI, traffic ramping was reflected by a traffic count decrease on the control group and a traffic increase on the experiment group. On the business side, this site speed optimization resulted in ads revenue increase, demonstrating concrete business value. Based on these data, engineers were able to quantify the performance and business benefits from this optimization, even at 10% ramping stage. After making sure business metrics and performance metrics were good, engineers were able to ramp up this optimization to 100%.



Source link