At Wayfair, we do everything we can to help our customers find exactly the products they need to furnish their homes in the style they envision. But creating all of the necessary elements to allow them to do that is not as easy as one might think. Right now, if an artist designs a new stylistic look for a home from scratch, it takes weeks before we actually get to introduce the products in the market to fit that look. Interestingly, it is not product manufacturing, but the creation of 3D models for these products that is the slowest and most expensive part of this process. There are a few key reasons for this; first of all, 3D models are required for product manufacturing, but 3D modeling software licenses and experience modelers are pricey and hard to come by. Secondly, to create a production quality 3D model (using a software like 3DS Max or Maya), 3D modelers need numerous 2D images of the product from various angles; given this requirement, you can imagine how costly a single rework or a minor change could be.
These days, it is almost impossible for a data scientist to discuss about datasets involving images without bringing in computer vision and/or deep learning into the conversation. Deep learning models have proven to be phenomenal in classification tasks, such as distinguishing dogs from cats, or sofas from chairs. But generating a production quality 3D model of a product using just 2D images is closely related to photogrammetry, which means it is more related to traditional computer vision than deep learning. To put things in context, a traditional computer vision engineer would worry mostly about camera intrinsics, SLAM, 3D geometry, stereo geometry, vanishing point analysis, and point cloud matching (ICP) etc. than thinking about loss functions, number of convolution layers, and back propagation. But the moment we bring in traditional computer vision techniques, a generalization problem arises: we cannot use them everywhere or on every product. Now the key is to bring the best of both worlds together. Because looking at the demand for content, especially in the fields of home decor, gaming and entertainment, medical, fashion or even advertisement, the need for good quality 3D artwork is going to increase.
So in this video, we look into how to train a deep learning model to think like a traditional computer vision engineer. Machine Learning Engineer Anurag Syal describes the challenge and discusses one of the possible approaches to tackling this issue—creating 3D models from 2D images.
Anurag enjoys hacking things and repurposing them to solve business problems. After completing his MS in Electrical and Computer Engineering from the University of Southern California, Anurag joined Wayfair’s Computer Vision team as a Machine Learning Engineer in July 2019. Anurag started his career as a Patents Analyst in 2013 after the completion of his Bachelor’s in Electrical and Electronics Engineering from National Institute of Technology (NIT) Surathkal, India. During his 3-year stint in the field of Intellectual Property, Anurag published several white-papers and studies focusing on global trends in research, filling and litigation in the AR/VR and 4G-LTE communication spaces. An entrepreneur at heart, Anurag also ran his own business and designed several experiences in the AR/VR space for about 1.5 years before going on to graduate school. In his free time, he enjoys cooking and making apps with his Kinect, and spends most of his vacations going on road-trips, riding motorbikes, and hiking.