The Present and Future of Apache Hadoop: A Community Meetup at LinkedIn

Last but certainly not least, Microsoft’s Virajith Jalaparti (left) and Ashvin Agrawal (right) discussed the evolution of the “provided storage” feature in HDFS, which allows for HDFS clients to transparently access external storage systems (such as Azure Data Lake Storage or Amazon S3). They described a mechanism whereby the NameNode would “mount” an external store as part of its own namespace, and clients would be able to access the data as if it resided on HDFS itself. The DataNodes, which normally store the data in HDFS, would transparently fetch the data from the remote store and serve it back to the client. They were even brave enough to give us a live demo! You can view their slides here and a recording of their presentation here.

Breakout sessions

Following all of our planned presentations, we held informal “birds of a feather” discussions about topics pertinent to the Hadoop community at large.

One session discussed the management of Hadoop releases, in particular the 2.X release series as opposed to the 3.X release series. Major version upgrades in Hadoop can be painful, and many large operators are wary of upgrading from Hadoop 2 to 3. There is some support in the community for a “bridge” release, or a final release on the Hadoop 2 release line before making the plunge for a major version upgrade.

Another session discussed Java versioning. Previously, the stance of the Hadoop community was that Java version upgrades would always be accompanied by a Hadoop major version upgrade; for example, Hadoop 2 supports Java 7 and above, while Hadoop 3 only supports Java 8 and above. However, given the changes in Oracle’s release and support roadmap to a much more rapid release cycle, the Hadoop community must adapt its policies. We discussed that we will likely need to drop support for Java versions in minor, rather than major, releases of Hadoop.

Another major topic of discussion was the future of Ozone. There were deep dives into various portions of Ozone’s architecture, and in-depth discussions of how various frameworks such as Apache Spark, Apache Impala, and Presto would work on top of Ozone. Finally, there were discussions of its release timelines, and how erasure coding functionality, a recent addition to HDFS, could be supported in Ozone as well.


All of us here at LinkedIn were thrilled to be a part of the engaged community present at this meetup. Thanks to all of our speakers and participants for making this a fun and fruitful event. We’re greatly looking forward to the next one!

This meetup couldn’t have happened with the support of our amazing events staff here at LinkedIn. I owe great thanks to our media technician, Francisco Zamora, and to the rest of the catering and event services professionals who helped us out!

Source link