Every day, all across the Internet, something bad but entirely normal happens: thousands of origin servers go down, resulting in connection errors and frustrated users. Cloudflare’s users collectively spend over four and a half years each day waiting for unreachable origin servers to respond with error messages. But visitors don’t want to see error pages, they want to see content!
Today is exciting for all those who want the Internet to be stronger, more resilient, and have important redundancies: Cloudflare is pleased to announce a partnership with the Internet Archive to bring new functionality to our Always Online service.
Always Online serves as insurance for our customers’ websites. Should a customer’s origin go offline, timeout, or otherwise break, Always Online is there to step in and serve archived copies of webpages to visitors. The Internet Archive is a nonprofit organization that runs the Wayback Machine, a service which saves snapshots of billions of websites across the Internet. By partnering with the Internet Archive, Cloudflare is able to seamlessly deliver responses for unreachable websites from the Internet Archive, while the Internet Archive can continue their mission of archiving the web to provide access to all knowledge.
Enabling Always Online in the Cloudflare dashboard allows us to share your hostname with the Wayback Machine so that they can archive your website. When a website’s origin is down, Cloudflare will go to the Internet Archive to retrieve the most recently archived version of the site, so that visitors will still be able to view the site’s content.
Trying to reach a busted origin
When a person visits a Cloudflare website, a request is made from their laptop/phone/tablet/smart fridge to Cloudflare’s edge. Our edge first looks to see if we can respond with cached content; if the requested content is not in cache, or is determined to be expired, we then obtain a fresh copy from the origin. As part of fulfilling an uncached/expired origin fetch, we also update our cache to allow subsequent requests to be served to visitors faster and more securely. If we are unable to reach the origin, our edge tries a few more times to connect before marking the origin as being down and serving an error page to the visitor. Receiving an error page is not ideal for anyone, so we try really hard to ensure that visitors to websites using Cloudflare can get some content, even if an origin is struggling.
A brief history of Always Online
When Cloudflare started 10 years ago, most of our customers were small and running on hosts that were subject to frequent downtime. These early customers feared that their host may go down at the same time a search engine was indexing their site. The search engine’s crawler would report the downed site as non-responsive and the site would drop in their search ranking. Always Online was born from that concern.
Through operating Always Online over the past 10 years, we’ve learned that fighting Internet downtime with simple, unobtrusive tools was something that our customers and their users deeply value. Though some features have undergone rewrite upon rewrite, other parts of our code have remained relatively untouched by the sands of time, a testament to their robustness. For example, Always Online clearly shows a banner indicating that it is serving an archived version of the page due to the origin being unreachable, and this transparency is well-received by both website owners and visitors.
We recently set out to make Always Online even better. We wanted to preserve what customers loved — as seamless an experience as possible for their users when their origin servers are down — while increasing the amount of content available through Always Online, ensuring it is as fresh as possible, and performing this archiving in a way that helps make the Internet a better place.
Enter the Internet Archive
Partnering with the Internet Archive’s Wayback Machine to power the next generation of Always Online accomplishes all of these goals. The Internet Archive’s mission is to provide universal access to all knowledge. Since 1996, the Internet Archive’s Wayback Machine has been archiving much of the public Web: preserving and making available millions of websites and pages that would otherwise be lost. In pursuit of that mission, they have archived more than 468 billion web pages, amounting to more than 45 petabytes of information.
Always Online’s integration with the Internet Archive will help the Archive expand their record of the Internet; many of the domains that opt-in to Always Online functionality may not have been otherwise discovered by the Archive’s crawler. And for Cloudflare customers, the Archive will seamlessly provide visitors access to content that would otherwise be errors.
In other words, Cloudflare partnering with the Internet Archive makes the Internet better, stronger, and more available to everyone.
“Through our partnership with Cloudflare, we are learning about, and archiving, webpages we might not have otherwise known about, and by integrating with Cloudflare’s Always Online service, archives of those pages are available to people trying to access them if they become unavailable via the live web”
—Mark Graham, Director of the Internet Archive’s Wayback Machine
“We are excited to work with Cloudflare and expect this partnership to bring important redundancy to the Internet and allow for us to advance our ongoing efforts to make the Internet more useful and reliable.”
—Brewster Kahle, Founder and Digital Librarian of the Internet Archive
How does the new Always Online work behind the scenes?
Upgrading to the new Always Online in the Cloudflare dashboard allows us to share some basic information about your website with the Internet Archive (like hostname and popular URLs), so they can begin to crawl and archive your website at regular intervals. This information sharing and crawling ensures content is available to Always Online and also serves to deepen the library of content available directly through the Archive.
If your origin goes down or is unreachable, Cloudflare’s edge will return a status code in the 520 to 527 range, indicating an issue connecting to the origin. When this happens, Cloudflare will first look to the local edge datacenter to see if there is a stale or expired version of content we can serve to the website visitor. If there isn’t a version in the local cache, Cloudflare will then go to the Internet Archive and fetch the most recently archived version of the site to serve to your visitors. When that happens, Always Online serves the archived content with a banner to let your visitors know that your origin is having problems. The banner allows for your visitors to check and see if your origin is back online with a single click. While dynamic content that requires communication with an origin server will still show an error to visitors (e.g. web applications or shopping carts), basic content will often be available with Always Online.
Enabling the new Always Online
For now, the old Always Online service will still be available, but we plan to fully transition to the Internet Archive-backed version soon.
Cloudflare customers can enable Always Online in the dashboard:
- For more about Always Online, and how it works, please check out our documentation.
- To get started using Always Online, please log into your Cloudflare dashboard and toggle it on.
- Please see the Internet Archive’s announcement of our partnership here.
- To help improve Always Online, or other parts of our slice of the Internet, drop us a line.