This week in Wayfair Data Science’s Explainer Series, Data Science Tech Lead Peter B. Golbus discusses machine learning from a theoretical computer science perspective. In this video, we describe multiclass classification as an encoding task, i.e. a process for building compression schemes that convert large “files” (feature vectors) into small ones (labels). By framing classification this way, we are able to use the powerful tools of Information Theory to produce actionable insight. In particular, we discuss that classification accuracy is bounded from above by the mutual information between your features and labels, and how information theory explains why ensembling and feature selection are such powerful tools for machine learning.
Peter, a Boston native and devoted father and husband, defended his PhD in computer science on a Thursday, and has been working at Wayfair ever since the following Monday. After studying the evaluation of search engines in school, Peter joined the search technologies team, where he helped restructure the way we understand how we guide our customers to the next right product for them. Peter then joined the data science team, where he has remained active in our efforts to create a personally-curated, unique shopping experience for each one of our customers, helping them live in a home that they love.