Building data services to bring education to millions, Part I

Data solutions empowering university and industry partners, and enterprise customers

The mission of the data engineering team at Coursera is to democratize data, in service to Coursera’s mission of transforming lives through learning. As a part of that, we build tools and products that provide internal and external stakeholders with access to a range of actionable data.

In this series, we focus on the tooling we build for key external stakeholders — university and industry partners, and our enterprise customers. Our university and industry partners (today 170+) are the universities and companies that design and launch content on the platform for learners; and our enterprise customers (today 1000+) use our platform to upskill their employees in key areas.

Within our partners and enterprise customers, there are individual stakeholders filling different roles and each with unique data needs. For example, university partners have instructors and instructional designers producing the content, deans and administrators overseeing the organization’s online education strategy, and researchers and graduate students looking to glean publishable insights into online pedagogy.

This diversity of stakeholder needs demands a range of data solutions that consider the specific use cases and technical expertise, as well as the actions we want them to take on the platform.

In this series we will cover five key endpoints:

  1. In-Platform Dashboards serve descriptive and advanced analytics in easy-to-read visualizations directly on Coursera.
  2. Self-Serve Analytics as a Service provides access to the core data model in a self-service tool for lightweight exploration.
  3. Data Exports unlock relational data in CSVs, and are most commonly used by our researcher community.
  4. Data Warehousing as a Service allows stakeholders to query the raw data and run their own advanced analytics, and/or hook up to a third-party tool (e.g., Tableau).
  5. DataHub facilitates FERPA-complaint two-way data exchange, allowing us to collect and store external data.

The good news is that, while the solutions are diverse, each is built atop a common foundation — our core data model. This is a set of standard, curated, and conformed data that spans Coursera’s products and business domains. Since each of the different endpoints, or access solutions, feed from the shared foundation, we can scale up applications efficiently while ensuring data consistency across endpoints. Most data sharing, except for DataHub is de-identified to protect our learners’ privacy. More detail on protecting learner identity in DataHub will be shared in an upcoming posts.

Interested in learning more? Check out Part II: An Embedded Approach to In-Platform Analytics.

Interested in Data Engineering @ Coursera? We’re hiring!

Source link