Announcing OpenTSDB 2.4.0: Rollup and…


By Chris Larsen, Architect

OpenTSDB is one of the first dedicated open source time series databases built on top of Apache HBase and the Hadoop Distributed File System. Today, we are proud to share that version 2.4.0 is now available and has many new features developed in-house and with contributions from the open source community. This release would not have been possible without support from our monitoring team, the Hadoop and HBase developers, as well as contributors from other companies like Salesforce, Alibaba, JD.com, Arista and more. Thank you to everyone who contributed to this release!

A few of the exciting new features include:

Rollup and Pre-Aggregation Storage

As time series data grows, storing the original measurements becomes expensive. Particularly in the case of monitoring workflows, users rarely care about last years’ high fidelity data. It’s more efficient to store lower resolution “rollups” for longer periods, discarding the original high-resolution data. OpenTSDB now supports storing and querying such data so that the raw data can expire from HBase or Bigtable, and the rollups can stick around longer. Querying for long time ranges will read from the lower resolution data, fetching fewer data points and speeding up queries.

Likewise, when a user wants to query tens of thousands of time series grouped by, for example, data centers, the TSD will have to fetch and process a significant amount of data, making queries painfully slow. To improve query speed, pre-aggregated data can be stored and queried to fetch much less data at query time, while still retaining the raw data. We have an Apache Storm pipeline that computes these rollups and pre-aggregates, and we intend to open source that code in 2019. For more details, please visit http://opentsdb.net/docs/build/html/user_guide/rollups.html.

Histograms and Sketches

When monitoring or performing data analysis, users often like to explore percentiles of their measurements, such as the 99.9th percentile of website request latency to detect issues and determine what consumers are experiencing. Popular metrics collection libraries will happily report percentiles for the data they collect. Yet while querying for the original percentile data for a single time series is useful, trying to query and combine the data from multiple series is mathematically incorrect, leading to errant observations and problems. For example, if you want the 99.9th percentile of latency in a particular region, you can’t just sum or recompute the 99.9th of the 99.9th percentile.

To solve this issue, we needed a complex data structure that can be combined to calculate an accurate percentile. One such structure that has existed for a long time is the bucketed histogram, where measurements are sliced into value ranges and each range maintains a count of measurements that fall into that bucket. These buckets can be sized based on the required accuracy and the counts from multiple sources (sharing the same bucket ranges) combined to compute an accurate percentile.

Bucketed histograms can be expensive to store for highly accurate data, as many buckets and counts are required. Additionally, many measurements don’t have to be perfectly accurate but they should be precise. Thus another class of algorithms could be used to approximate the data via sampling and provide highly precise data with a fixed interval. Data scientists at Yahoo (now part of Oath) implemented a great Java library called Data Sketches that implements the Stochastic Streaming Algorithms to reduce the amount of data stored for high-throughput services. Sketches have been a huge help for the OLAP storage system Druid (also sponsored by Oath) and Bullet, Oath’s open source real-time data query engine.

The latest TSDB version supports bucketed histograms, Data Sketches, and T-Digests.

Some additional features include:

  • HBase Date Tiered Compaction support to improve storage efficiency.
  • A new authentication plugin interface to support enterprise use cases.
  • An interface to support fetching data directly from Bigtable or HBase rows using a search index such as ElasticSearch. This improves queries for small subsets of high cardinality data and we’re working on open sourcing our code for the ES schema.
  • Greater UID cache controls and an optional LRU implementation to reduce the amount of JVM heap allocated to UID to string mappings.
  • Configurable query size and time limits to avoid OOMing a JVM with large queries.

Try the releases on GitHub and let us know of any issues you run into by posting on GitHub issues or the OpenTSDB Forum. Your feedback is appreciated!

OpenTSDB 3.0

Additionally, we’ve started on 3.0, which is a rewrite that will support a slew of new features including:

  • Querying and analyzing data from the plethora of new time series stores.
  • A fully configurable query graph that allows for complex queries OpenTSDB 1x and 2x couldn’t support.
  • Streaming results to improve the user experience and avoid overwhelming a single query node.
  • Advanced analytics including support for time series forecasting with Yahoo’s EGADs library.

Please join us in testing out the current 3.0 code, reporting bugs, and adding features.



Source link