At Coursera, we use data to power our product and better serve our learners. One example is matching learners with the right learning content to reach their goals. We’ll leave it to Amazon and Netflix to speak to general discovery, and focus here instead on a set of algorithms that are specific to our learning context — our Skills Graph —, and its discovery application.
Above is a lightweight representation of the graph, which is built on data from across the site. At its essence, the graph maps a robust library of skills to each other, to the content that teaches them, to the careers that require them, and to the learners who have or want them.
Take as one example the is-taught-by edge between the skill node and the content node. It’s generated by a machine learning model with features that include attributes of the material like the frequency with which it references the skill or concepts related to the skill. One of the most powerful features, though, is what learners on the platform have self-reported learning as they moved through their experience. This edge powers a few data products on the site; one is skills-based search.
Imagine a learner who is looking to learn specific tools or technologies, maybe because she needs it for a freelancing job, or because it’s what she’s seeing on requisitions for a job to which she wants to apply. While these tools and technologies are often taught in courses on Coursera, they may not be mentioned by the instructor in describing the course. An example is NumPy, a package for scientific computing in Python. Searching the catalog for NumPy would return null results with just a standard text similarity-based search — and in fact it did until we built this edge and deployed it in search. Now the query instead returns 21 matches of courses where from the graph we know learners are learning NumPy. This extends across a range of hard skills — from the very broad to the very granular.
Once we built the graph infrastructure and unleashed it to incorporate learner-reported tags, it surpassed what we could have come up with on our own. For example, our initial set of skill tags were exclusively in the business, computer science, and data domains. But today, thanks to the graph, learners can easily find content that’s teaching soft skills, too— even where these are skills that are taught only indirectly. For example, when a learner searches for confidence, she is returned several courses on public speaking, the infamous Learning How to Learn course, and more — all powered by a rich stream of learner-reported data that feeds and updates the graph daily.
While the skill-based search application produced our single biggest algorithmic win yet in search, it assumes the learner knows what she wants to learn. Since many learners are more focused on what outcome they want — for example what job — we extended the graph to include a mapping between careers and the set of skills they require. This is based on the frequency with which skills appear in postings for that job, and based on the skills we observe real learners in those jobs have through their in-course performance. Here’s one application: As the learner is browsing Coursera content she can filter down by career relevancy.
We can do better still by incorporating data on each individual — for example using our platform data to rigorously measure what each learner already knows, and use that to land them in the right level of content. This starts with item-response theory models trained on the hundreds of millions of questions that have been attempted on the platform. In a nutshell, the models output an estimated difficulty for each question. Marrying these estimates with a given learner’s performance on the assessments she’s attempted, we can infer her level in each of a range of skills. Below is sample output for a single learner. She is relatively stronger in Data Management, but weaker in Machine Learning. Knowing this allows us to, among other things, recommend beginner ML content.
Today we’ve shared some examples of how our Skills Graph, built on rich data captured across the platform, allows us to develop a more robust understanding of learners and content and careers and, when fed back in the product, is helping each learner find the right content for them.
In the coming weeks, we’ll share other applications of the graph, including how the graph is unlocking valuable insights for our enterprise customers in an application called Skills Benchmarking.