Omid’s First Step in the Apache Community


By Francisco Perez-Sorrosal, Ohad Shacham, Kostas Tsioutsiouliklis, and Edward Bortnikov


We are proud to announce that Omid (“Hope” in Persian), Yahoo’s transaction manager for HBase [1][2], has been accepted as an Apache Incubator project. Yahoo has been a long-time contributor to the Apache community in the Hadoop ecosystem, including HBase, YARN, Storm, and Pig. Our acceptance as an Apache Incubator project is another step forward following the success of ZooKeeper [3] and BookKeeper [4], which were born at Yahoo and graduated to top-level Apache projects.

These days, most NoSQL databases, including HBase, do not provide the OLTP support available in traditional relational databases, forcing the applications running on top of them to trade transactional support for greater agility and scalability. However, transactions are essential in many applications using NoSQL datastores as the main source of data, for example, in incremental content processing systems. Omid enables these applications to benefit from the best of both worlds: the scalability provided by NoSQL datastores, such as HBase, and the concurrency and atomicity provided by transaction processing systems.

Omid provides a high-performant ACID transactional framework with Snapshot Isolation guarantees on top of HBase [5], being able to scale to thousands of clients triggering transactions on application data. It’s one of the few open-source transactional frameworks that can scale beyond 100K transactions per second on mid-range hardware while incurring minimal impact on the latency accessing the datastore.

At its core, Omid utilizes a lock-free approach to support multiple concurrent clients. Its design relies on a centralized conflict detection component called Transaction Status Oracle (TSO), which efficiently resolves write-set collisions among concurrent transactions [6]. Another important benefit is that Omid does not require any modification of the underlying key-value datastore – HBase in this case. Moreover, the recently-added high-availability algorithm eliminates the single point of failure represented by the TSO in those deployments that require a higher degree of dependability [7]. Last but not least, the API is very simple – mimicking the transaction manager APIs in the relational world: begin, commit, rollback – and the client and server configuration processes have been simplified to help both application developers and system administrators.

Efforts toward growing the community have already been underway in the last few months. Apache Hive [8] contributors from Hortonworks expressed interest in storing Hive metadata in HBase using Omid, and this led to a fruitful collaboration that resulted in Omid now supporting HBase 1.x versions. Omid could also be used as the transaction manager in other SQL abstraction layers on top of HBase such as Apache Phoenix [9], or as the transaction coordinator in distributed systems, such as the Apache DistributedLog project [10] and Pulsar, a distributed pub-sub messaging platform recently open sourced by Yahoo.

Since its inception in 2011 at Yahoo Research, Omid has matured to operate at Web scale in a production environment. For example, since 2014 Omid has been used at Yahoo – along with other Hadoop technologies – to power our incremental content ingestion platform for search and personalization products. In this role, Omid is serving millions of transactions per day over HBase data.

We have decided to move the Omid project to “the Apache Way” because we think it is the next logical step after having battle-tested the project in production at Yahoo and having open-sourced the code in Yahoo’s public Github in 2012 (The Omid Github repository currently has 269 stars and 101 forks, and we were asked by our colleagues in the Open Source community to release it as an Apache Incubator project.). As we aim to form a larger Omid community outside Yahoo, we think that the Apache Software Foundation is the perfect umbrella to achieve this. We invite the Apache community to contribute by providing patches, reviewing code, proposing new features or improvements, and giving talks at conferences such as Hadoop Summit, HBaseCon, ApacheCon, etc. under the Apache rules.

We see Omid being recognized as an Apache Incubator Project as the first step in growing a vibrant community around this technology. We are confident that contributors in the Apache community will add more features to Omid and further enhance the current performance and latency. Stay tuned to @ApacheOmid on Twitter!

References

[1] Apache Omid Gihthub repo: https://github.com/apache/incubator-omid

[2] Apache Omid documentation: http://omid.incubator.apache.org/

[3] Apache ZooKeeper project: http://zookeeper.apache.org/

[4] Apache BookKeeper project: http://bookkeeper.apache.org/

[5] Blog Entry introducing Omid: http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transaction-processing-for

[6] Blog Entry on Omid’s Architecture and Protocol:  http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-protocol

[7] Blog Entry on Omid’s High Availability: http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid

[8] Apache Hive project: https://hive.apache.org/

[9] Apache Phoenix project: https://phoenix.apache.org/

[10] Apache DistributedLog project: http://distributedlog.incubator.apache.org/



Source link