Spotify built its business on flawless content delivery. Our streaming platform serves up more than 50 million tracks (plus an array of images and other assets) to more than 230 million monthly active users around the world — making us one of the world’s leading streaming services. With content that feels instant and immersive, we help our customers have the best experience possible with their favorite artists.
Behind the scenes, our technology has evolved over time to achieve our user experience goals. After a decade of growth, we were using a number of disparate CDN solutions — which added complexity to our platform architecture, as well as inefficiency within the R&D organization. Spotify’s multi CDN strategy for audio streaming was working well. However serving other types of content like images or client updates led us to create a new squad that focused on standardizing our CDNs using Fastly’s edge cloud platform across diverse engineering teams, as well as provide automated tools, governance, and support.
The challenges of squad autonomy
Our engineering culture champions squad autonomy, and our R&D organization includes over 2,000 employees grouped into teams we call “squads.” In this autonomous model, every squad has the authority to make its own decisions, including which technologies, tools, and processes work best for them. This helps squads move fast, but the trade-off is a risk of technology fragmentation.
Too many CDN’s, too little oversight
One of the more painful areas of fragmentation involved content delivery. Spotify’s CDN solution with Akamai and AWS for business-critical content, such as audio streaming, was performing well and had been honed to achieve low latency and high bandwidth. However, CDN operations for everything else had become inefficient. Some content was being streamed directly from buckets like AWS S3 or Google Cloud Storage. Developers simply made the bucket readable, dropped the URI into their code, and moved on. Other squads took services that had been created for a particular use case, such as image resizing or watermarking, and used them for something entirely different. Our new solution also resolved issues for a few squads who inherited CDN endpoints without full knowledge about their configurations.
Building a company-wide CDN solution on Fastly
As a first step toward standardization, the new CDN squad created a simple workflow that allowed squads to get a new CDN service up and running quickly on Fastly. We have successfully been using Fastly’s edge cloud platform for a number of years to deliver audio streaming, and wanted to use Fastly to deliver other types of content. With this approach, the CDN squad faced the organizational challenge of “autonomous alignment” head on.
Convincing squads to migrate
The new workflow appealed to squads that were looking to implement a brand-new service. For those who were already running a different CDN domain, it seemed like more work to migrate to a new system. To convince these other squads, the CDN squad communicated the benefits of faster performance, access to metrics, and logging of delivery. Even better, the CDN squad monitored the whole CDN request path all the way to origin, 24/7.
A plan for low-maintenance operations
Apart from troubleshooting, the CDN squad also wanted a system that was easy for them to maintain. They decided to use Fastly APIs to build an automated system in order to focus on projects that added more value to the business.
Bringing together API’s and VCL
Our Developers used Fastly’s customizable edge computing language, Varnish Configuration Language (VCL), to perform intelligent caching, push application logic to the network edge, and tailor a user’s experience based on location, language, and device type. In the Spotify spirit, we needed to customize the CDN functionality to handle errors and redirects, as well as token identification and other tasks. We combined Fastly APIs with VCL, and used APIs to set up simple things like creating a new service, configuring host names, and adding origins or logging endpoints, and handling everything else in VCL.
A self-service tool to request CDN review
We developed SquadCDN as an internal pre-deployment review service that uses Fastly’s APIs and VCL. Any Spotify squad can access the tool, type in a short YAML string, and add a few details like domain, origin, bucket, and config flags. The tool then submits a pull request for the CDN squad to review and approve. With this in place, the squad achieved their objective of providing a simple workflow with a degree of automation.
Lessons learned along the way
As the CDN squad pioneered new territory for Spotify R&D, they encountered a number of tips and best practices that helped them achieve their goals such as:
- Dogfooding is key. After moving a few minor services to Fastly, the squad realized they needed to move their critical delivery services over first. Including anything that was public-facing like audio, video, cover art, and artist images. They reviewed and cleaned up any spaghetti VCL, and fixed flaws in their automated pipeline
- Keep your secrets secret. Security is a top concern at Spotify, and protecting user data is a key component of building trust. It was vital that sensitive data, such as passwords, get blanked out in the logs. The CDN squad used Fastly’s Edge Dictionaries to maintain key store values that cannot be read by humans and only referenced in VCL. Even when viewing VCL in the admin tool, developers only see variables instead of private data.
- Be mindful of API call limits. Even with high limits, too many API calls at once can force a deployment to fail. Carefully planning API calls will help ensure everything runs smoothly.
- Verify all the things. When a squad submits a new service, the CDN squad verifies a few details that are crucial for the service to work. They give the squad a test file that will help them catch problems with configurations or bucket permissions.
- Don’t allow writing to the bucket. It’s important that squads don’t create a CDN endpoint that enables someone to send a put request and overwrite a file because credentials allowed writing to that bucket. (Luckily, none of Spotify’s services has needed to write to the origin.)
- Do a smoke test. After deploys, the CDN squad performs end-to-end testing over the internet and curls for a file they know should be there. Custom VCL ensures the file doesn’t get cached so that the smoke test path allows them to reach the origin and read the right bytes back. If issues are identified after deployment, they can automatically apply a rollback to the previous version and start debugging.
- Enforce good practices. If squads want uninterrupted delivery via HTTP, they need to request it. This gives the CDN squad the opportunity to discuss the squad’s needs and determine the right use case for it. They also ask squads to tag services that handle personal data, so they can better maintain GDPR compliance, and promote sane defaults with respect to caching and purging.
- Make it easy for others. No one will use a system that requires a lot of effort. We found that it’s better for us to do some of the work so that everyone benefits in the long run. One tactic we used included proactively identifying outlying CDN endpoints and offering to modify the code, so that the squads who owned them could more easily move them over to Fastly. This enables groups across the org to move with greater speed and agility.
Successful CDN alignment
Over 60+ squads have begun using the new configuration system, representing more than 20% of the R&D organization. Over 80+ services are now pushing content through Fastly more efficiently from using their templatized tools and simplified workflows. Most importantly, our engineers are happy because they no longer have to worry about CDN details and can focus on their core mission.