The story of how Data Science interns at Airbnb spend their summer
Internships are some of the most powerful experiences a student can have during college. Very few opportunities allow you to learn and grow so much in 12 short weeks. On top of that, taking on an internship helps with making connections, improving teamwork, and learning how to navigate organizations.
This summer, I was a Data Science (DS) intern on the new Airbnb Plus team — I had fantastic role models, did meaningful work, developed valuable skills, and made great friends. In this post, I hope to answer some common questions about Data Scientists at Airbnb and shed some light on what a DS intern really does.
When looking at different job descriptions for “Data Scientists” across the industry, it’s easy to get confused. Tasks and requirements might range from defining metrics and providing product recommendations, to crafting complex experiments for measuring the impact of product changes, to building sophisticated models that predict the outcome were those changes implemented. While all of these are great examples of what a Data Scientist might do at a company, not all of them apply to every individual.
At Airbnb, we embrace this diversity of skills under the Data Science umbrella, but also understand that certain flavors of Data Science have bigger appeal to different people with different backgrounds. We see data as the voice of our users at scale — it’s up to our Data Scientists to translate it so that voice is heard by everyone in the company. To make sure people work on problems they feel excited about and that every employee has clear expectations set, Airbnb established three tracks for Data Scientists: Analytics, Inference, and Algorithms.
The Analytics track concentrates on people who use data to tell a story. Analytics Data Scientists answer business questions through metrics, dashboards, and creative analyses. The Inference track is better suited for people well versed in Statistics, who help Airbnb measure and interpret the impact of changes, leading to improved decision-making. The Algorithms track appeals to those who are passionate about Machine Learning and want to use Airbnb’s vast data to develop novel approaches to challenging problems.
Despite providing support in aligning skill and interest to problems Data Scientists can tackle at Airbnb, the tracks are not limiting — projects across all three areas can be undertaken depending on team needs and growth aspirations. The same applies to interns; we join Airbnb under a given track, but the projects we work on during the summer can span multiple areas. For more details about the Data Science tracks and how they came to be, check out this post by our Head of Data Science, Elena Grewal.
One special aspect of the DS organization at Airbnb is how embedded we are within teams. On one side of the spectrum, we have teams inside each business unit. Examples might include those dedicated to improving a Host’s ability to provide great service, to ensuring Business Travelers are offered convenient integrations for expense tracking, or to guaranteeing the quality of every Experience available on the platform. On the other, we have platform teams, the backbone of Airbnb’s products and services. They might be responsible for providing powerful, safe APIs for payment processing, for keeping our community and Airbnb safe, or for building the data infrastructure that powers our analytics tools.
Each of these teams, business or platform, will house Data Scientists. This distributed organization makes for a stronger identity with the needs and challenges of each group. By working closely with Engineering, Design, Operations, PM, Research, and many other functions, Data Scientists are able to get a much deeper understanding of the business and team operations they’re seeking to empower. This also makes it easier for data to be involved in the majority of decisions — not only in informing the overall strategy, but present in every-day product iterations.
As a Computer Science Master’s student, I had the preconceived notion prior to joining Airbnb that most tech positions were filled by CS majors. Upon arriving here, I was pleasantly surprised by a much broader variety of backgrounds and skill-sets. The core competencies of a Data Scientist include familiarity with data and experimentation, the ability to communicate findings to audiences of varying technical levels, an eye for detail and data quality, a solid understanding of statistics, and a desire to build solutions that scale. These qualities are found in a myriad of fields, from Computer Science and Engineering to Statistics, from Economics and social sciences to the physical sciences and beyond.
In my opinion, the diversity in backgrounds and expertise found here is one of the biggest strengths of Data Science at Airbnb. We have Computer Science experts building complex learning algorithms, Economics and Policy researchers studying how we can maximize our positive impact in the communities we engage with, savvy Statisticians helping teams measure the business impact of every product change, and many more. At the end of the day, though, we all identify as Data Scientists.
Working with Plus
Airbnb Plus is a new selection of the homes with impeccable quality, offered by hosts known for great hospitality and attention to detail. Plus homes are equipped with a standard set of amenities and are thoughtfully designed, showcasing a lot of personality. As a result, every home is one-of-a-kind. In order to offer a consistent level of quality, every home in the Plus program is visited in person for a comprehensive inspection.
My choice to work with Plus was very deliberate. Being interested in the business side of tech and having a taste for fast-moving teams, I knew I would find in Plus a great match. No matter where you land, however, teams and managers are more than willing to tailor your internship experience to your interests and skills. With Plus this was no different.
Operational Efficiency Analysis
For my first project, I ventured into product analytics and operational efficiency. The process of on-boarding new hosts into the Plus program and inspecting each and every home to make sure all of the quality and design criteria are met is not without its logistical challenges. My work involved partnering up with an Engineering Manager and performing an in-depth analysis of people’s behavior during a specific stage of the on-boarding flow. Using different kinds of platform logs, I investigated trends in usage behavior, cutting by interesting dimensions to surface underlying issues.
Technology-wise, there’s a lot of flexibility for Data Science work. A foundational knowledge of SQL (and query languages in general) is essential for understanding and extracting the data from where it’s stored. Distributed SQL query engines, such as Presto and Hive, help with the scale of data we deal with. For the analysis itself, there’s a larger freedom of choice. While lots can be done through SQL using tools such as our Airbnb-built and now open-sourced SQL Lab, some Data Scientists are huge R fans and others are die-hard Python users. Thanks to all of the internal tools and libraries built by our teams over the years, any of these languages is well supported, easily connects with our data warehouse, and can produce ready-to-share results.
In the end, my analysis and recommendations were shared with the team — primarily Engineering Managers and Product Managers — and is being used to inform product decisions. Most projects Data Scientist do at Airbnb are documented in the Knowledge Repository. Also open sourced, the Knowledge Repo facilitates the documentation, discovery and reproduction of analyses. Any employee investigating a specific issue and in need of data can search through the Knowledge Repo and find all of the work Data Scientists across the company have performed on the topic. To learn more about how the Knowledge Repo was conceived, check out this article.
My second project initially appeared less technical, but was of much greater complexity. The task was to define a measure of guest satisfaction, to be tracked by the team during the years to follow. During the second half of my internship, I experimented with several metrics, consulting other Data Scientists and stakeholders in the process. I learned important lessons on effective communication — especially since this project was highly visible and needed leadership approval. For this post, I wanted to share a few considerations about defining a metric I learned this summer.
- A good metric should be stable: If the metric you’re tracking has wildly unpredictable behavior, soaring and plummeting for seemingly no reason, its applicability in practice will be limited. The team will not know what to make of it and how to use it to orient their work moving forward.
- A good metric should be easy to understand and measure: During my project, I experimented with different measures. Some were very complex, involving weighted fractions of different indicators. For those, it was easy to lose sight of what we were actually measuring. I frequently caught myself asking: “What is a good value for this?” Easy to understand and measure metrics help make sure that the objective is always in sight and that we know how well we’re doing at all times.
- A good metric should align with business objectives and inspire the team: Related to the previous point, a metric that has a clear meaning has a higher chance of resonating with the team. Also, it shouldn’t measure just about anything, but primarily factors that are of utmost importance to the team and the business.
- A good metric should be actionable and influenceable by the team: When a metric has multiple components which the team cannot influence, it can easily be dismissed. After all, why would we track something that doesn’t really reflect the work we do? Similarly, changes in the metric should be easy to diagnose and act upon. The way to combine these two requirements is to have the metric be composed by factors the team can affect and that can be analyzed separately to diagnose changes and inform further action.
- A good metric should be comparable: In our case, it was very important that our metric could also be easily calculated for the remainder of the product, outside of the Plus team. This would serve as a benchmark and inform us of how we’re doing relative to the environment we’re in.
After input, review and approval from leadership, the metric was finalized and will be tracked as a way to measure progress. This project was much more nuanced and complex than the previous one due to the number of moving parts, stakeholders and higher visibility. It also helped me grow tremendously both as a data scientist and as a communicator.
Spending the summer as a Data Science intern at Airbnb was a real pleasure. From the get go, I felt welcomed and respected by all of those around me, in the true spirit of being a host. There was no shortage of interesting projects or instigating challenges. Perhaps most important of all, interns are paired with really great mentors — individual contributors with a consistent track record of superior performance and the will to make it work.