PPL Bench is an open source benchmark framework for evaluating probabilistic programming languages (PPLs) used for statistical modeling. Researchers can use PPL Bench to build their own reference implementations (a number of PPLs are already included) and to benchmark them all in an apples-to-apples comparison. It’s designed to provide researchers with a standard for evaluating improvements in PPLs and to help researchers and engineers choose the best PPL for their applications.
What it is:
PPLs allow statisticians to write probability models in a formal language. In roughly the past two decades, the number of PPLs available to researchers and data scientists has exploded. But each of them comes with its own pros and cons. Some PPLs restrict the range of models they can handle, whereas others are universal languages, meaning they support any computable probability distribution. Depending on performance needs, some PPLs are better suited for different use cases than others, which means the PPL community needs a standard benchmarking process to measure inference performance.
PPL Bench does this by using predictive log likelihood as a standard measurement. We believe it is the most uniform way to measure inference accuracy and convergence rates across all types of PPLs, regardless of engine or model representation. PPL Bench also reports other common metrics used to evaluate statistical models, including effective sample size, R-hat, and inference time.
Why it matters:
As part of the PPL research community, we believe that a standardized mechanism for comparing PPLs will accelerate the development of better and faster programming languages for probabilistic modeling. We hope that community contributions will help grow and diversify PPL Bench and encourage wider industrial deployments of PPLs.