Monitoring business performance data with ThirdEye smart alerts

Co-authors: Madhumita Mantri and Tyler James

At LinkedIn, ThirdEye is used for business and platform health metrics monitoring, keeping track of a variety of metrics across production infrastructure, AI model performance, or key business indicators (i.e., page view or click count). It’s a key quality assurance system for two reasons: its rule- or model-based anomaly detection reduces false alarms, and its multiple interactive root cause analysis tools help metrics owners narrow down the cause of an anomaly.

In this blog, we will explain how ThirdEye smart alerts and automated dashboards helped the LinkedIn Premium business operations team monitor key metrics—such as new free trial signups—for the timely detection of outliers in business performance data. 

Data-driven business decisions through anomaly detection

LinkedIn has an extremely complex data ecosystem operating with 8K+ services, 2K+ tracking events, 8K+ deployments, and 300+ experiments every day. Uncovering data blind spots in growing data was critical for success of Premium business health monitoring, but was becoming difficult to do so at scale. The Premium business operations team, in collaboration with the ThirdEye (anomaly detection) team, discovered a clear opportunity to uncover data blind spots quickly and efficiently by leveraging the existing tools for both business metric and system performance monitoring.

For any subscription business at LinkedIn, it is critical to monitor member signups from various channels (i.e., new free trials) on a regular basis to understand business performance and respond with a clear actionable plan. The Premium business operations team conducts the business performance management on a daily and weekly basis, with a primary focus on changes in the latter. There are various ways to pinpoint where members sign up, such as attribution (in-product vs. marketing), country, and device (i.e., desktop vs. mobile). During performance management, it’s imperative that the team is able to identify the exact driver behind anomalies in signups to quickly identify and address issues, and continue tracking to business health success plans.

Despite this rigorous performance management process, the team didn’t have a formalized way for automatically tracking granular changes in signups. Tracking codes indicate the exact source in consumer products (or marketing campaigns) where members sign up and are the most granular way to track changes. However, there are thousands of tracking events to monitor, and it is extremely time-consuming to track the performance of each individual tracking code.

It was a constant challenge for the team to monitor subscription-based member signups from various channels in a granular way across this complex ecosystem. This led to data blindspots, missed opportunities, and delayed time to detect and remediate issues that impacted the business. The best option was to manually look at the aggregated data in dashboards, but this was not enough to extract actionable insights to perform root cause analysis.

Eventually, as the data grew, they realized that the time to monitor and track changes in metrics would only worsen over time. At one point, the team had to contend with over 1,700 separate tracking codes for online A/B tests, depending on an ad-hoc system for identifying anomalies among our running key business metrics. There was no centralized way to see the changes in several metrics simultaneously at a glance.

Source link