Greykite: A flexible, intuitive, and fast forecasting library


Co-authors: Reza Hosseini, Albert Chen, Kaixu Yang, Sayan Patra, Rachit Arora, and Parvez Ahammad

In this blog post, we introduce the Greykite library, an open source Python library developed to support LinkedIn’s forecasting needs. Its main forecasting algorithm, called Silverkite, is fast, accurate, and intuitive, making it suitable for interactive and automated forecasting at scale. We will start by describing a few applications, and then walk through the algorithm design and user experience. For more technical details, please refer to this paper.

Introduction

Accurate knowledge about the future is helpful to any business. Time series forecasts can provide future expectations for metrics and other quantities that are measurable over time.

While domain knowledge and expert judgment can sometimes produce accurate forecasts, algorithmic automation enables scalability and reproducibility, and may improve accuracy. Algorithmic forecasts can be consumed by additional algorithms downstream to make decisions or derive insights.

To support LinkedIn’s forecasting needs, we developed the Greykite Python library. Greykite contains a simple modeling interface that facilitates data exploration and model tuning. Its flagship algorithm, Silverkite, is highly customizable, with tuning parameters to capture diverse time series characteristics. The output is interpretable, allowing visualizations of the trend, seasonality, and other effects, along with their statistical significance.

The Silverkite algorithm works well on time series with (potentially time-varying) trends and seasonality, repeated events/holidays, and/or short-range effects. At LinkedIn, we’ve successfully applied it to a wide variety of metrics in different time frequencies (hourly, daily, weekly, etc.), as well as various forecast horizons, e.g., 1 day ahead (short-term) or 1 year ahead (long-term).

Some key benefits:

  • Flexible: provides time series regressors for trend, seasonality, holidays, changepoints, and autoregression; users select the ones they need and fit the machine learning model of their choice.

  • Intuitive: provides exploratory plots, templates for tuning, and explainable forecasts with clear assumptions.

  • Fast: allows for quick prototyping and deployment at scale.

The Greykite library is available on GitHub and PyPI.

In the remainder of the blog post, we’ll discuss forecasting applications at LinkedIn, the algorithm’s design, the user experience, case studies, and an evaluation of Greykite’s performance. The “algorithm design” section provides an overview of how Greykite’s flagship algorithm (Silverkite) works.

Forecasting applications

At LinkedIn, we use time series forecasts for resource planning, performance management, optimization, and ecosystem insight generation. For example:

  1. To provision sufficient infrastructure to handle peak traffic.

  2. To set business metric targets and track progress for operational success.

  3. To optimize budget decisions by forecasting growth of various markets.

  4. To understand which countries are recovering faster or slower after a shock like the COVID-19 pandemic.

The Greykite library is designed to solve these types of problems. To develop Greykite, we identified a few champion use cases. These helped us refine our models and prove success. In partnership with the domain experts, we demonstrated that Greykite helps LinkedIn confidently manage the business and make better decisions.

For example, LinkedIn Marketing Solutions is a fast-growing business, with a dynamic ecosystem of advertisers and potential customers. Forecasts are essential to managing this business. Short-term forecasts of budgets, clicks, revenue, and other metrics feed into an ecosystem health dashboard that is regularly refreshed to flag potential issues. The forecasts indicate when a metric deviates from expectations, and provide additional context about which metric dimension or related metric may help explain anomalies. Long-term forecasts help us set metric targets and check whether we are on track to meet them.

On the infrastructure side, forecasts help LinkedIn maintain site availability in a cost-effective manner. We forecast peak minute-level site queries per second (QPS) and service QPS over the next year in order to provision sufficient capacity without adding excessive buffers. Better information about future traffic, combined with accurate site capacity measurements, enables confident decision making. Since even a small percentage of savings translates to a large reduction in total cost, accurate forecasts have a big business impact. Forecasts enable continued, sustainable growth through right-sized applications.

Algorithm design



Source link