Devavrat Shah has long had a keen interest in both academic research and its practical applications. In late 1999, just a few months after arriving at Stanford University from India to pursue a Ph.D. in computer science, Shah worked at a start-up that was building on-chip, high-bandwidth network processing technology. His task was to create system architecture that could use slower memory to emulate fast memory and thus maintain statistics about network flows with extreme speed and accuracy. That led to hybrid memory architecture, which has become a standard approach in the networking industry. “I was fortunate enough to work with people who were actually building engineering systems, and that helped me bridge the gap really well between the engineering world and applied mathematics,” says Shah, who completed his Ph.D. in 2004.
Today Shah is a professor of electrical engineering and computer science at Massachusetts Institute of Technology and director of the MIT Statistics and Data Science Center, part of the university’s Institute for Data, Systems and Society (IDSS). Since coming to Cambridge-based MIT in 2005, Shah has specialized in studying the theory of complex networks. Most recently, he has focused his academic research on social data processing and how to use the data generated by those networks to make better decisions. His goal: to take advantage of the enormous opportunity social data creates, which can lead to what he calls “a remarkable convergence of academia and industry.” To that end, Shah and MIT Sloan School of Management associate professor Vivek Farias in 2013 co-founded Celect, a Boston-based software company that uses predictive analytics and machine learning to help retailers optimize their inventories by “placing the right product in the right place at the right time — across all channels.”
In November 2017, Shah hosted a one-day workshop on prediction analytics on the MIT campus to give IDSS’s industry partners an opportunity to learn about the research being done by some of the institute’s faculty and students. Topics included how to quantify uncertainty and robustness at scale using Bayesian machine learning and approaches for automatically formulating and solving prediction problems from arbitrary data. After the workshop, Shah met with WorldQuant Global Head of Content Michael Peltz to discuss his interest in social networks, the discrete choice model and how, as a statistician, he is using machine learning to try to enhance that model to help people and companies make better decisions.
How did you become interested in social networks?
Devavrat Shah: I think there’s a huge opportunity to use social network data to do statistics and machine learning. The way I like to say it, we have so much social data that is recorded now that if aliens came down and they knew our language, they would know that Michael and Devavrat are speaking right now. That means the most interesting information is recorded in one form or another. A large amount of that information is available — either democratically or for purchase — so there must be lots of opportunities. If you want to realize those opportunities effectively, one way or the other you’re solving some decision-making problem. And if you now put on the hat of a classical, statistical, machine learning person, you can say, “Well, you will do that by first learning the model behind the data and then using that model to make predictions.”
For centuries social scientists have been trying to model human behavior using data generated by people, and the challenge continues. So that means this is not an easy, if not an impossible, task. And once a task becomes seemingly implausible, it becomes really interesting. That was one of the reasons I got super excited about this question — how to use social data to make decisions — that is, in particular, these were questions that embodied the frontier of statistics and machine learning. And they were so closely tied to reality that if I were to do something meaningful with this, I could actually make an impact in the real world.
What is your goal? How would you define success?
First of all, as a topic, this area is very nascent. On the one hand, success would be defined as, Is there a canonical class of questions that I can identify first and then, for them, develop a comprehensive mathematical foundational theory and then use that to improve the way we do things in practice? And then lead that further down the stream and actually make it a system that works in reality and then continue doing that and using this until there are pillars of a discipline so that 20 years from now this doesn’t become a bunch of topics but rather a unified theory or a unified discipline? I think it’s a combination of those things that would count as success.
What are the biggest challenges to achieving that?
Well, on the one hand, as an academic, you want to make sure that you can avail yourself of all the data you want. That has been one of the advantages of being part of IDSS, which has partnered with different organizations [including WorldQuant]. One thing I’m really excited about is that all our partners are keen on interacting with us through the substrate of data. That will make this challenge easier to solve. The second thing is understanding the canonical questions in this domain. Addressing them will lead to a meaningful impact in practice and also lay the foundations for this field.
How far along are you in the process?
One of the canonical questions I’ve spent time looking at over the past decade is the question of understanding how people choose. Recommendations are driven by your choices, right? Understanding choice or using choice to make decisions is a very rich area of study. On the one hand, it touches on the frontiers of abstract mathematics that we have not made progress on for a century. It also touches on questions of political science and social science, like how we elect our leaders. And it touches on things like e-commerce recommendation systems.
A small part of that big world is trying to take one of the models of choice that came out close to 100 years back and that evolved over the past century, which is called the discrete choice model. I’ve been developing mathematical foundations for learning these choice models: How do you learn them computationally and statistically efficiently and at scale? How do you build practical systems around them? How do you use those practical systems to actually help retailers do something meaningful, and then close the loop and bring some more questions to researchers?
And that goes back to the company you started?
Basically. Four years ago we founded Celect after seeing the initial success of our method for learning a discrete choice model from sparse data. We wanted to create a company that could use the model to capture peoples’ choices from sparse information, like purchasing and browsing, and use that information to enrich things across customers, products, locations and channels for retailers. How do you stitch everything together? How do you make it a rich model? How do you do that at scale? How do you make it practical? How do you deliver high-speed, real-time answers? Once you’ve done all those things, hopefully it will lead a retailer to seeing the needle move in terms of its top line and bottom line.
In the world of retail, we primarily help companies run their entire decision-driven processing very efficiently. At the end of the day, they are a marketplace. They are sourcing products from different vendors, and they are trying to sell these products to different customers. If they knew exactly how many customers were going to buy a specific type of product at what sorts of prices, then they would buy exactly those many products from those vendors at appropriate prices, right? And so if they knew the product demand accurately, then they wouldn’t lose any money in terms of inventory, they wouldn’t have to do a markdown, they wouldn’t have customers turning away because they couldn’t find a product, and they wouldn’t be holding too much money in inventory, because whatever is on the shelf is flying off the shelf immediately.
Putting it another way, what we have done is try to gain a true understanding of this customer choice really well using different types of data sources. Stitching the data together simultaneously, we can very accurately predict demand for the products for different customers, and our customers are big retailers.
Can you describe the discrete choice model in lay terms?
Of course. If I want you to describe your preference for something, it’s easier if I ask you to compare two things rather than give a rating. For example, I had shoulder surgery a few years back, and every now and then I would visit my orthopedic doctor. He would ask me questions like, “What’s your pain on a scale of one to ten?” I would reply, “Do you want me to tell you nine because last time I told you ten, or what?” and then he would laugh. Similarly, if you go to an optometrist, he’ll show you an eye chart and say, “Is this looking better or is that looking better?” Everybody’s glasses are perfect because they’re using comparisons and not ratings.
We express our preferences in the form of comparisons. That means the mathematical model we need to develop to capture how people provide their preferences and how people provide their choices should be compatible with that unit of choice. The discrete choice model is just the mathematical representation of these comparisons. So, really, the discrete choice model is the natural way in which humans express their preferences, and that’s the reason why in the world of psychology it’s called the model of judgment — how people judge things. And that’s also the reason why, if we were to do elections right — not that we do that — if we were to do elections right, then you and I would be voting not for Clinton or Trump, we would be voting for who is your No. 1 choice, who is your No. 2, who is your No. 3.
What have you been able to do to enhance the model from a math perspective?
The discrete choice model is a very high-dimensional model. Let’s suppose I want to understand peoples’ preferences between Boston restaurants. The number of active restaurants in Boston at any point in time during the past couple of years has been on the order of thousands. Now, if I want to do a full-blown model, the number of parameters I would have to learn to do that would be on the order of two to the power of 10,000 or so.
That’s a lot of zeros.
Exactly. And the number of people who have provided me information about restaurants in Boston is what? It’s no more than the U.S.’s population — 325 million, let’s say. So clearly I can’t learn that. And even if I could learn that, it would be impossible to compute. In a nutshell, what we have done is dig up whatever data is available and employ these kinds of models very, very efficiently at large scale. Putting it another way, if we have data coming from people — in the retail context from hundreds of millions of people and about hundreds of millions of products — we want to do this very quickly, very efficiently.
So you have to simplify the process?
We have to simplify it. And somehow mathematics has to help here, because without that in some sense you’re on a wild goose chase; it is just lost. So the question is, What are the right ways to think about these models, and how do you reduce the complexity by looking at the right number of representations, as one would call them, so that things can start working really well? That’s basically what we have done, both mathematically and algorithmically.
Can you provide an example?
Let’s go back to the simplest setting, where I’m a retailer and I’ve got 100 million products to sell, but I want to sell only a few products to you when you come online. From my perspective what I really care about knowing is not your entire choice model; instead, I want to be able to know with confidence what the top ten things that you might like out of these 100 million things are at a given time and given context. And if I can do that quickly, then that’s great. Now, in some sense, my effective model complexity for this decision has been reduced drastically because I’m worrying about only ten things rather than all of the millions of them.
So when you look at the situation from the lens of decision making and what problem you want to solve, well, the model is in the background. You never actually learn the model. You use it as a principle to go from data-driven decisions — and how you make it happen for different types of decision problems is where ingenuity comes in.
And this is machine learning?
This is machine learning, this is statistics, depending on what you call it. For me machine learning and statistics are two different sides of the same coin because machine learning historically started with interest in algorithms going from data-driven decisions. Statistics wanted to do the same thing but always wanted to go by learning the model. And this is kind of in the middle, where I want to worry about the models but I don’t really want to learn the models.
When you create an environment where you’re going to teach a machine to learn, does it know where it’s going to end up?
No and sometimes not. Because in some sense you have data and you’re trying to learn the structure from it, which succinctly explains the observations. You probably don’t know how succinct it will be. You know it will lie in some environment, but you don’t know exactly where it will lie.
So that’s why, going back to the context of a retailer, if you give me data, I don’t know where I’m going to end up. Maybe there’s a retailer where all the users coming to it are so monolithic that, really, the description of their choice is simple. But if I’ve got a world where maybe I’m a massive retailer that’s serving all sorts of people all sorts of the time, then maybe the effective model of choice that I will learn is extremely complex; the data will decide that.
This is all about prediction?
It’s all about prediction and decision making. And again, this is a very fine point we are making — social scientists’ interest in this kind of question would be to learn the model precisely and why people make certain choices. My interest here is not why people choose this. My interest is what people choose, and then using that to decide what products to put on the shelf.
So it doesn’t really matter why people choose a product? It just matters that they choose it?
That is exactly right. That’s the perspective from which I’m coming. Because, really, if I want to help you make decisions, I don’t care why. I do care what.
Sometimes if you are doing marketing and you really, really care about what people are buying, you first learn the why and then use the why to decide what you want to do. I believe that perspective is overvalued in the world of retail. You could be buying something for whatever reason. If I can tell you that you’re going to buy this, that’s what matters. Now, of course, there might be a world in which why you’re buying may matter, but not in my world.
As a statistician and somebody who wants to make data-driven decisions, I don’t necessarily want to understand why you’re making a certain choice. What I want to understand is how you’re making that choice. Putting it another way, all I care about is that you end up buying a latte 90 percent of the time and a cappuccino 10 percent of the time. I don’t care in principle why you do that, because all I’m interested in is understanding how you act as a model of your choice and then using that information to do further decision making.
Having said that, there are many excellent reasons social scientists do care about the why, and pursuit of it should not be forgotten or undervalued. Actually, it is really hard and hence much more appreciated.
Thought Leadership articles are prepared by and are the property of WorldQuant, LLC, and are circulated for informational and educational purposes only. This article is not intended to relate specifically to any investment strategy or product that WorldQuant offers, nor does this article constitute investment advice or convey an offer to sell, or the solicitation of an offer to buy, any securities or other financial products. In addition, the above information is not intended to provide, and should not be relied upon for, investment, accounting, legal or tax advice. Past performance should not be considered indicative of future performance. WorldQuant makes no representations, express or implied, regarding the accuracy or adequacy of this information, and you accept all risks in relying on the above information for any purposes whatsoever. The views and opinions expressed herein are those solely of the author, as of the date of this article, and are subject to change without notice, and do not necessarily reflect the views of WorldQuant, its affiliates or its employees. No assurances can be given that any aims, assumptions, expectations and/or goals described in this article will be realized or that the activities described in the article did or will continue at all or in the same manner as they were conducted during the period covered by this article. Neither WorldQuant nor the author undertakes to advise you of any changes in the views expressed herein. WorldQuant may have a significant financial interest in one or more of any positions and/or securities or derivatives discussed.