Suja with our Big Data Platform Team
While it’s almost unheard of today to stay at one company for so long, I worked as an engineer at IBM for over a decade, growing from an intern to a senior engineering manager. Over my 12 years with IBM, I transitioned from test to development, then on to management, and also moved from mainframes to big data. I thought I was going to be a “lifer” at IBM given all the new opportunities I had, and I enjoyed managing the big data team. However, things changed unexpectedly when a work contact reached out asking if I was interested in exploring opportunities at LinkedIn.
While I was very comfortable at IBM, I was curious about the possibilities at LinkedIn.
I was petrified about interviewing after so many years, but once I met the team, there was no doubt in my mind that I wanted to join the company.
LinkedIn: The initial year(s)
I started at LinkedIn leading the Hadoop Development team. I was used to an enterprise product life and had to adapt quickly to running a service. With the exception of a handful of engineers, the entire team was new. So, keeping the lights on without incident while building the team was challenging. At the end of my first few weeks, I decided that my first goal should be to identify outages before my manager did.
Carl Steinbach, who is the Hadoop architect, very patiently showed me the ropes, and we fostered a great collaboration that continues to this day. We were in the middle of a major Hadoop version upgrade, which had a lot of changes that were backward incompatible. Basically, our team was slated to change the airplane engine while the airplane was mid-air. This migration gave me an opportunity to learn about the various products that are developed on top of the Hadoop platform. It was during this time that I realized we were growing at an alarming rate, and could not scale without automation and streamlining support/ops.
Learning to automate for scale
To give a flavor of the automation work, I’ll highlight two projects that emerged from teams I oversaw: Byte-Ray and Dr. Elephant. Byte-Ray is our home-brewed byte code analysis tool. It was developed to deal with the Hadoop upgrade changes that were incompatible. We integrated Byte-Ray with Azkaban, so when users upload their workflows to Azkaban, Byte-Ray can immediately warn them of all the incompatibility issues detected within their workflows without having to execute these flows first. In addition, the Hadoop upgrade incompatibility issues were mostly binary-incompatible as opposed to source-code-incompatible, which meant Byte-Ray could identify and fix most issues seamlessly. This made our migration easier for end users.
Dr. Elephant was built to automate the production review process. Believe it or not, when I joined the team, when a project was ready for production, there was a manual review process that would take a minimum of two weeks to promote the project to production. With Dr. Elephant, the entire production review workflow was reduced to an order of minutes or seconds. Dr. Elephant reviews Hadoop/Spark workloads and provides tuning suggestions. It is currently a successful open source project.
Creating processes to balance growing needs
While it was such a fun experience to create these in-house solutions, they uncovered another issue that we needed to solve: balancing our internal needs while maintaining open source needs. As the usage of our Hadoop platform grew, the operational and support burden also grew significantly, and I had to hire additional engineers to help. Since we were running a multi-tenant platform, I started to consider how I could instill a sense of ownership and responsibility amongst our engineers as maintainers of the platform. That’s how our YARN OrgQueue project was born. This feature provides a logical queue to every organization within LinkedIn. When we delegated the queue ownership to individual organizations, capacity planning became easier, and every organization used their queue much more effectively. Given how useful this feature is for us, we decided to open source it. This project was a testament to how democratizing the cluster worked for us.
Challenges lead to innovation
Keeping up with the scale and an ever-changing big data landscape is a challenge. Every time we’ve upgraded, we’ve had to do extensive scale testing, and it was not cost-effective to set up a large cluster just for testing. Therefore, we created Dynamometer, our recently open-sourced tool, to simulate scale testing at minimal cost. We’ve also had to do many migrations, like Hadoop upgrades and data center migrations, which have made us think about making migrations redundant. Thus, Dali was born. Dali is our attempt to abstract out the cluster and data format details, thus making infrastructure changes transparent to our users. While facing the ever-evolving issue of scale, it has been amazing to have an opportunity to create home-grown solutions to solve our challenges.
From team to organizational focus
Believe it or not, everything discussed in this blog so far was done in my first year at LinkedIn. Fast forward to my second year at the company, and my manager asked me during a career-focused conversation about how I was going to scale. At first, this seemed odd, since I didn’t think of scaling myself as necessary. But the truth was, somewhere along the way, my team had gotten much larger, due to both organic growth and the fact that a few other teams that worked closely with ours were moved under me. I thought I could manage it all. But soon, I started to miss one-on-one meetings, hiring slowed down, and I was becoming a bottleneck for a lot of decisions. So, I started thinking about hiring managers and senior technical leads who could help me scale.
But first, I had to think about the organizational structure of my team in order to provide a sense of mission to the new leaders we were hiring. I decided to organize the us into three pillars: data management, workflow management, and tools and core infrastructure. I had absorbed one manager as part of an internal reorganization. I hired an additional manager and helped guide one of my tech leads into management, as this was an interest for his career journey. This meant the engineers who directly reported to me had to be moved under new managers. I had to work closely with everyone on the team to make the transition smooth, and had to reassure some of the team members that I was still available to them even though the reporting structure had changed.
Trusting my teams so I could continue to scale
This was a time when I truly understood the meaning of “growing pains.” I had to let go of things that were dear to me, like the OrgQueue project and Dr. Elephant, and leave them in the capable hands of the newly-hired leaders. Very quickly, I realized that they did a great job with those projects, and this gave me time to evangelize these projects both within and outside of LinkedIn. This also gave me time to focus on hiring, pursuing some of the diversity, inclusion, and belonging initiatives we have. I spent time organizing the Women in Big Data meetup and participating in various panels and talks at conferences like DataWorks, VerveCon, Strata, and meetups. It also helped me focus on my passion for mentoring and coaching emerging leaders within and outside of the company.
On to new challenges
In my third year at LinkedIn, I had an organized team that was performing well. I started getting into auto-pilot mode, and realized it was time to get out of my comfort zone and learn something new again. I had increasingly been drawn to user productivity problems, like customer onboarding, and a stint higher up the stack in Applications was beginning to sound appealing. At the same time, I was not ready to move out of Data, since Data continues to intrigue me. Luckily, I found that the Data Applications and Platform team was a perfect fit and a chance for me to influence big data infrastructure through a customer-centric lens. This is the team I’m currently on, and I’m looking forward to new challenges in the months and years to come.
Reflecting on my career transitions
Looking back over the the last three plus years, I’ve experienced multiple transitions: from enterprise world to running a service, from building a team to learning how to scale the team and myself, and from Infrastructure to Data applications. All these transitions were made possible because LinkedIn encourages career growth and provides the support that is needed. My first manager, Kapil Surlaker, planted the seed of scaling myself and helped with that transition. When I was struggling with building a good relationship between the Development and Operations teams, Greg Arnold armed me with tips and tricks to tackle the situation. Also, when I was dealing with organizational changes and delegating, Kamini Dandapani’s WITInvest gave me the right set of tools, coaches (Santalynda Marrero), and mentors (Ashvin Kannan) to overcome those challenges. Finally, when I got too comfortable with my Big Data Infrastructure role, my manager Shrikanth Shankar made it very easy for me to talk to him about exploring a new opportunity. He gave me the time and opportunity to figure out my next play.
As the saying goes, change is the only constant in life. Odds are that all of us will undergo several transformations throughout our careers. Mine has been particularly easy thanks to the cultures at IBM and LinkedIn that helped foster my growth and encouraged continued learning.
It’s the open and nurturing culture at LinkedIn which makes these career advancement discussions possible. Various leaders like Dinesh Nirmal, Michael Perera, Rick Bowers, Greg Arnold, Erica Lockheimer, Erran Berger, Raghu Hiremagalur, Igor Perisic, Kapil Surlaker, Shrikanth Shankar, Ya Xu, and Vasanth Rajamani were generous with their time and helped me get to where I am today.