The Lucas Critique, named after economist Robert Lucas, is a theoretical result that blew up the discipline of macroeconomics in the 1970s, and its implications are directly relevant to much of the work that data scientists are doing today—including work that I am doing on the Algorithms team at Wayfair! Just like jumpsuits and peasant dresses, what is old is new again! So imagine yourself in some fabulous 70s fashion and come on a nerdy journey with me… First, I will articulate the Lucas Critique and explain what it means; then I will apply it to a simple example; next I will explain how it relates to Data Science at Wayfair and apply it to another example drawn from my own work; lastly, I will conclude by arguing that all of this implies that even for us data scientists, data alone are not enough—we need theory in order to do our jobs correctly.
Okay, let’s go!
The Core of the Lucas Critique
Though a great deal of ink has been spilled since the 1970s penning complicated, mathematical treatments of the Lucas Critique, its core claim is elegant in its simplicity:
Policy rules derived from models that only include people’s responses to changes in policy variables, and not to changes in policy rules, will in general fail to be optimal.
Now let us unpack the five key terms in that core claim: model, policy, policy variable, policy rule, and optimal.
- A model is any mathematical representation of how institutions and people make decisions.
- A policy is any action (like setting the interest rate or the price of a sofa) taken by an institution (like a central bank or Wayfair) that affects the decisions (like investing in government bonds or purchasing sofas) of a large number of people.
- A policy variable is any quantity (like an interest rate or a price) that is relevant to these decisions.
- A policy rule is any procedure that determines what policy to take, given the values of the policy variables.
- A policy rule is optimal if it generates the largest possible value for whatever quantity the institution cares about (like GDP or profits).
Now that we understand the core of the Lucas Critique, let us apply it to a simple (if somewhat fantastical) example.
Applying the Lucas Critique: A Case Study
Suppose that the City of Boston is very concerned about parking violations in the Back Bay neighborhood (where the Wayfair headquarters are located). Because of this, the City of Boston implements a new, draconian parking policy rule: every day, with a chance of one in thirty, it will select one car that is parked illegally in Back Bay and dump it in the Charles River. Under this new policy, an average of one unlucky car per month will end up at the bottom of the river (though because of the probabilistic structure of the policy, some months will see no cars meet this watery fate, and others will see more than one). Suppose that, after a short amount of time, this decidedly extreme policy rule is sufficient to deter anyone from parking illegally anywhere in Back Bay.
This change in policy rule and resultant change in behavior are illustrated in the following plots. The first one shows daily expenditures on finding cars to dump in the Charles, both before and after the implementation of the new parking policy. The second one shows how, after this new policy is implemented, people learn about their increased chance of having their car destroyed by observing this happen to their unlucky fellow citizens. The third one shows how people respond to this increased perceived probability of vehicular destruction by engaging in fewer (and eventually zero) parking violations.
Now suppose that some time passes and the City of Boston wants to re-evaluate its Back Bay parking policy, so it asks you and I to create a model of the relationship between the amount of money it spends enforcing this policy rule per month and the number of parking violations per month. We train our model using data from the past year on those two variables. Since no one parked illegally during that period, our model tells us that there is no relationship between the money spent on enforcement and the number of parking violations. This is because, under the current policy rule, the latter is zero regardless of the value of the former. In other words, in the months during which one or more cars were to be randomly selected for a watery disposal, no parking violations occurred; and in the months in which no such selection was to take place, still no parking violations occurred. This seeming lack of a relationship is readily evident in the training period indicated in the preceding plots.
In this simple example, our model implies that the optimal policy rule is to spend zero dollars enforcing this draconian parking policy, and hence discontinue it entirely. Why waste money searching Back Bay for cars eligible to be dumped in the Charles if no such cars exist? Right?
Wrong! Since the city budget is a matter of public record, people would eventually find out that the City of Boston is no longer spending any money on enforcing its draconian Back Bay parking policy. Eventually this information would become common knowledge, and parking violations would proliferate once again. You can see this in the following plots, which show the same results as those of the preceding plots, but also include what would happen (in this hypothetical scenario) after the City of Boston discontinues its new parking policy. Note that parking violations inexorably creep back up after it does so.
So where did we go wrong? We failed to apply the Lucas Critique to our model! We only modeled people’s responses to changes in our policy variable (the amount of money spent per month on enforcement) and did not model their responses to changes in our policy rule (every day, with a chance of one in thirty, select one car that is parked illegally in Back Bay and dump it in the Charles River). Because of this crucial omission, we failed to recognize that people only behaved the same regardless of monthly enforcement spending levels because they were responding to the current policy rule, and that they would respond very differently to those same spending levels under a different policy rule (like always spending zero dollars per month on enforcement).
This simple example illustrates the motivation behind the Lucas Critique: since people will respond differently to the same policy variable values under different policy rules, we need to include the latter in our models if we intend to use those models to derive optimal policy rules.
The Lucas Critique in Action at Wayfair
So how does this relate to Data Science at Wayfair? A better question might be how does it not! Nearly everything we do in Data Science at Wayfair involves developing models of people’s behavior and using them to derive optimal policy rules. Deciding what prices to set on Wayfair.com is a policy rule. Deciding what types of sales and promotions to run, on which products and at what times, is a policy rule. Deciding the order in which to display products on a web page is a policy rule. I could go on and on, but I think you get the picture! The Lucas Critique applies to basically everything we do in Data Science at Wayfair. So, if we want to do things right, we need to be mindful of it!
Sale Flags on Wayfair.com
Okay, now that we understand the Lucas Critique and how it relates to Data Science at Wayfair, let us take a look at a more relevant (and realistic) example from the work that I am doing at Wayfair: sale flags. A sale flag is a small red square that says “Sale” and appears in the upper-left corner of a product’s image on Wayfair.com (see Figure 3 for an example).
This deceptively simple signal hides complicated behavioral implications. I spend a great deal of time thinking about and analyzing these implications, because my team develops algorithms that decide which and how many products are assigned sale flags on Wayfair.com. Allow me to elaborate.
Suppose that Wayfair is more likely to assign sale flags to products with larger discounts (which is true, by the way). Moreover, suppose that people do not perfectly recall past prices, and partially infer the magnitude of the discount implied by a sale flag from the number of products that receive one. In such a situation, for a given set of prices, people will correctly infer that a larger number of sale flags indicates a smaller implied discount.
When prices are fixed, then this kind of inference is relatively simple. When prices fluctuate in a way that is not entirely predictable, then the inference becomes more complicated. In the latter case, a person cannot immediately infer the extent to which a larger number of sale flags on a given day is caused by a larger number of discounted prices, or by Wayfair simply choosing to place sale flags on more products. Because of this, people have to observe changes in prices and sale flags over multiple days, or even months, in order to infer how large of a discount is implied by a sale flag.
Arguably the world we live in is more like this latter, more complicated case, in which people interact with Wayfair.com over time and slowly learn how large of a discount is implied by our sale flags. As they learn about this, they will respond to sale flags differently. Presumably, the larger a discount they infer from a sale flag, the more likely they will be to purchase a product with a sale flag on it, and inversely. (For you nerdy economists out there, we can treat this as a fully rational response to uncertainty about future prices.)
If all this is true, then we have an important long run trade-off to make when choosing how many discounted products should get a sale flag. Each additional sale flag increases the probability that someone purchases the product which receives it, which tends to increase the total number of orders on Wayfair.com. However, each additional sale flag also decreases the magnitude of the discount implied by each sale flag, and hence the extent to which sale flags increase purchase probability, all of which tends to decrease the total number of orders.
Applying the Lucas Critique to Sale Flag Algorithms
Now let us rearticulate this second example using the language of the Lucas Critique. In this case, our policy rule is the fraction of discounted products to which we assign a sale flag, our policy variable is the number of products with a sale flag on a given day, and the response we are interested in is how likely people are to purchase a product with a sale flag on it. People infer our policy rule by observing prices and the values of our policy variable over time. The optimal policy is the one that most increases orders on Wayfair.com. Under this optimal policy, the long term increase in orders caused by increasing the fraction of discounted products to which we assign a sale flag is just offset by the concomitant decrease in how likely people are to purchase a product with a sale flag on it.
Now suppose that we create a model of the relationship between the number of products with a sale flag on a given day and the probability that a person will purchase a product with a sale flag, but we do not include in our model how different policy rules might affect this relationship. If we train this model with data collected under our current policy rule (say, assign a sale flag to 75% of discounted products), then it will tell us that there is no relationship between the number of products with a sale flag on a given day and the probability that a person will purchase a product with a sale flag. This is because, as described above, people infer the discount implied by a sale flag from our policy rule (which has remained unchanged), not from our policy variable (which may change from day to day). So in our data set, people’s behavior does not change in response to changes in our policy variable, and we incorrectly conclude that there is no relationship where there in fact is one.
If we were to derive an optimal policy from this model, it would be to assign a sale flag to every discounted product. This is because the model tells us that doing so will have no negative effect on the extent to which sale flags increase purchase probability. But as I explained above, such a negative effect actually does exist—so the optimal policy is almost certainly one that assigns sale flags to some smaller fraction of discounted products.
Already this is bad. But it gets even worse! Suppose we implement the incorrect “optimal” policy derived from our model. Since people slowly learn how large of a discount is implied by our sale flags by interacting with Wayfair.com over time, it will take quite some time before they learn that this implied discount is now smaller than it was before and adjust their behavior accordingly. So, if we re-trained our model with data from shortly after we implemented our new policy, people would not yet have had time to adjust, and our model would give us the same incorrect results! In the worst case, our new policy might actually decrease long-term profits, even though our model tells us precisely the opposite!
The Necessity of Theory
All of this brings me to my final and most important claim: we need theory; data alone are not enough! As we saw above, people’s responses to policy variables are mediated by their perception of policy rules. The latter is an abstraction and cannot be directly observed in the data—which means we need to have a theory of human behavior in order to include it in our models. Moreover, we need theory in order to predict how people will react to new policy rules under which we have not yet collected any data—a very common requirement in Data Science. To put it concisely, in order to model people’s behavior correctly, we need to understand why people do what they do, not simply observe what they do. That is the force of the Lucas Critique.
Many thanks to Christina Tajik for the custom header illustration!