Analyzing the Impact of Course Updates with Difference-in-Differences | by Rachel Liao | Coursera Engineering | Dec, 2020


This is Part IV of our Causal Impact @ Coursera series. (Part III is here)

At Coursera we use data to power strategic decision making, leveraging a variety of causal inference techniques to inform our product and business roadmaps. In this causal inference series, we will show how we utilize the following techniques to understand the stories in our data:

(1) controlled regression

(2) instrumental variables

(3) regression discontinuity

(4) difference in difference

This fourth and final post in the series covers an application of difference in difference to understand the impact of course updates on completion rate.

Since 2012, Coursera has enabled access to open and online courses that help learners from around the world learn without limits. As we’ve grown, technology has too, and new tools and skills emerge everyday. With over 75 million registered learners and over 5,000 pieces of content, it’s important for us to support our instructors and partners in incorporating job-relevant skills and the best pedagogical practices into their courses.

Limitations of A/B Testing for Course Updates

When our instructors want to test small updates to their courses such as adding a practice quiz or changing a reading, we can support them by running A/B tests. But for larger updates where a course is dramatically changed, we avoid running A/B tests. These changes, such as dropping a module or revamping the entire course material, can be confusing and disruptive to learners if we were to randomly assign them to a course version. For example, a learner could be asked to peer review an assignment from another course version that teaches different material, or they could have difficulty getting their question answered in the discussion forums. A/B testing substantial course changes can also be unethical since we expect better learning outcomes from the newer version.

Using the Diff-in-Diff Approach

This is where difference-in-differences (diff-in-diff) comes into play. Using this quasi-experimental design, we can analyze the change in learning outcomes due to a dramatic course update without randomly assigning learners to each course version and causing unnecessary confusion. Instead, we can launch the course update to all learners and track their performance against a control group made up of learners in different but similar courses that did not receive a course update.

At its core, diff-in-diff expects that there is a natural and consistent difference over time across two groups (control and treatment groups) regardless of any intervention, as well as another difference we would expect to see as a result of our treatment (in this case, the course update). These are the two differences referenced in “difference-in-difference.” Using these two differences, we can tease out the incremental effect of a course update without having to run an A/B test.

Below is a table to explain how this works:

The bottom right cell, (T -T ) -(C -C ), is the estimated difference-in-differences effect that we want.

Parallel Trends Assumption and Control Groups

Underlying this entire difference-in-differences technique is the parallel trends assumption. This is the assumption that in the absence of a treatment, the difference between your treatment and control groups is the same over time. In other words, we want to verify that our first difference from difference-in-differences is consistent. If that assumption is true, then we can use diff-in-diff to isolate the effect of the treatment.

For our example, we are interested in understanding the boost in learner outcomes after a course update that a Coursera partner made in their Algorithms course (this course will be our treatment group). We will measure learner outcomes using the monthly course completion rate. Since the changes were large and added multiple programming assignments, supporting readings, and restructured each module, we forgo an A/B test in favor of a pre-post analysis.

In order to test the parallel trends assumption, we first need to construct a control group. We want to find a control group that has a parallel trend in its monthly completion rate compared to the Algorithms course before the update was launched, so that the parallel trends assumption holds true. In this case, we select courses that are similar across a few aspects such as content type (single courses, ignoring other formats like short-form Guided Projects or Specializations), content topic (similar programming and computer science courses), language (English, like the Algorithms course), and quality (similar star ratings).

After selecting our control group, we can visually check the parallel trends assumption by graphing the monthly completion rates for our two groups and checking if they are similar before the course update. Below we can see that the completion rates for the control group and treatment group (the Algorithms course) are roughly stable before the update (noted by the dashed line), and can conclude that our parallel trends assumption holds true. Alternatively, we can check the parallel trends assumption statistically by interacting the treatment variable with dummy time variables.

Finalizing with a Regression

After visually checking the parallel trends assumption, we run a regression model to estimate the difference-in-differences and quantify the impact of this course update on course completion rate. In this example, our regression looks something like this:

In simplified R or Python code, it looks something like this:

We then use the interaction term from the model (post course update : received course update treatment, or β ) to find the treatment effect of the course update. From the table below, we can see that the interaction term from our regression model ( β ) is actually the same as the difference-in-differences effect (the bottom right cell of the table).

Using a regression for difference-in-differences allows us to be more precise in our estimation. It reduces bias from other factors and lets us understand the standard error and power of our coefficient.

After running our regression, we find that the Algorithms course sees an 11% increase in monthly completion rate! By incorporating specific techniques from the Drivers of Quality in Online Learning report into this course, such as adding more programming assignments and restructuring the course to have more videos in the beginning and more hands-on assignments near the end, the partner has effectively boosted learners’ outcomes.

Conclusion

Since course updates can be effective in driving better completion rates for Coursera learners, we will continue working with partner institutions to incorporate new job-relevant tools and best pedagogical principles for content offered on Coursera. By using causal inference techniques like difference-in-differences, we are able to estimate the effectiveness of course updates like this Algorithm course example and improve the learner experience without disruption.

Interested in Data Science @ Coursera? Check out available roles here.

Special thanks to Vinod Bakthavachalam, Jaya Chavern, Eric Karsten, and Xinying Yu for collaborating on this post.



Source link