Figure 10: Increase of local shuffle read data size with Magnet-enabled jobs
Conclusion and future work
In this blog post, we have introduced Magnet shuffle service, a next-gen shuffle architecture for Apache Spark. Magnet improves the overall efficiency, reliability, and scalability of the shuffle operation in Spark. Recently, we have also seen other solutions proposed in the industry that target the shuffle process, specifically Cosco, Riffle, Zeus, and Sailfish. We have made a comparison between Magnet and these other solutions, especially Cosco, Riffle, and Sailfish, in our VLDB paper.
In the future, we are also considering making Magnet push-based shuffle available in other deployment environments and compute engines. Our current cluster is deployed on-prem as a compute/storage collocated cluster. As LinkedIn is migrating towards Azure, we are also evaluating ways to adapt push-based shuffle for compute/storage disaggregated clusters. In addition, our current design for push-based shuffle is mostly targeting batch engines, and we are also considering its applicability to streaming engines as well.
It takes a dedicated team to bring a project of the magnitude of Magnet to see the light of day. In addition to efforts from Min Shen, Ye Zhou, and Chandni Singh, the project has been significantly contributed to by Venkata Krishnan Sowrirajan and Mridul Muralidharan. Erik Krogen, Ron Hu, Minchu Yang, and Zoe Lin have contributed to production rollout and observability improvements around Magnet. Special shoutout to Yuval Degani for building GridBench—this tool has made it very easy to understand the impact of various factors on job runtime. Special thanks to our partner teams, especially Jan Bob and Qun Li’s team, for being early adopters of Magnet.
Large infrastructure efforts like Magnet require significant and sustained commitment from management. Sunitha Beeram, Zhe Zhang, Vasanth Rajamani, Eric Baldeschwieler, Kapil Surlakar, and Igor Perisic: thank you for your unyielding support and guidance. Magnet’s design has also benefited from reviews and deep discussions with Sriram Rao and Shirshanka Das.
Magnet has received tremendous support from the open source Apache Spark community. We are grateful for partnership with Databricks and for the reviews from numerous community members.