Introducing the HTMLRewriter API to Cloudflare Workers


We are excited to announce that the HTMLRewriter API for Cloudflare Workers is now GA! You can get started today by checking out our documentation, or trying out our tutorial for localizing your site with the HTMLRewriter.

Want to know how it works under the hood? We are excited to tell you everything you wanted to know but were afraid to ask, about building a streaming HTML parser on the edge; read about it in part 1 (and stay tuned for part two coming tomorrow!).

Faster, more scalable applications at the edge

The HTMLRewriter can help solve two big problems web developers face today: making changes to the HTML, when they are hard to make at the server level, and making it possible for HTML to live on the edge, closer to the user — without sacrificing dynamic functionality.

Since the introduction of Workers, Workers have helped customers regain control where control either wasn’t provided, or very hard to obtain at the origin level. Just like Workers can help you set CORS headers at the middleware layer, between your users and the origin, the HTMLRewriter can assist with things like URL rewrites (see the example below!).

Back in January, we introduced the Cache API, giving you more control than ever over what you could store in our edge caches, and how. By helping users have closer control of the cache, users could push experiences that were previously only able to live at the origin server to the network edge. A great example of this is the ability to cache POST requests, where previously only GET requests were able to benefit from living on the edge.

Later this year, during Birthday Week, we announced Workers Sites, taking the idea of content on the edge one step further, and allowing you to fully deploy your static sites to the edge.

We were not the first to realize that the web could benefit from more content being accessible from the CDN level. The motivation behind the resurgence of static sites is to have a canonical version of HTML that can easily be cached by a CDN.

This has been great progress for static content on the web, but what about so much of the content that’s almost static, but not quite? Imagine an e-commerce website: the items being offered, and the promotions are all the same — except that pesky shopping cart on the top right. That small difference means the HTML can no longer be cached, and sent to the edge. That tiny little number of items in the cart requires you to travel all the way to your origin, and make a call to the database, for every user that visits your site. This time of year, around Black Friday, and Cyber Monday, this means you have to make sure that your origin, and database are able to scale to the maximum load of all the excited shoppers eager to buy presents for their loved ones.

Edge-side JAMstack with HTMLRewriter

One way to solve this problem of reducing origin load while retaining dynamic functionality is to move more of the logic to the client — this approach of making the HTML content static, leveraging CDN for caching, and relying on client-side API calls for dynamic content is known as the JAMstack (JavaScript, APIs, Markdown). The JAMstack is certainly a move in the right direction, trying to maximize content on the edge. We’re excited to embrace that approach even further, by allowing dynamic logic of the site to live on the edge as well.

In September, we introduced the HTMLRewriter API in beta to the Cloudflare Workers runtime — a streaming parser API to allow developers to develop fully dynamic applications on the edge. HTMLRewriter is a jQuery-like experience inside your Workers application, allowing you to manipulate the DOM using selectors and handlers. The same way jQuery revolutionized front-end development, we’re excited for the HTMLRewriter to change how we think about architecting dynamic applications.

There are a few advantages to rewriting HTML on the edge, rather than on the client. First, updating content client-side introduces a tradeoff between the performance and consistency of the application — that is, if you want to, update a page to a logged in version client side, the user will either be presented with the static version first, and witness the page update (resulting in an inconsistency known as a flicker), or the rendering will be blocked until the customization can be fully rendered. Having the DOM modifications made edge-side provides the benefits of a scalable application, without the degradation of user experience. The second problem is the inevitable client-side bloat. Most of the world today is connected to the internet on a mobile phone with significantly less powerful CPU, and on shaky last-mile connections where each additional API call risks never making it back to the client. With the HTMLRewriter within Cloudflare Workers, the API calls for dynamic content, (or even calls directly to Workers KV for user-specific data) can be made from the Worker instead, on a much more powerful machine, over much more resilient backbone connections.

Here’s what the developers at Happy Cog have to say about HTMLRewriter:

“At Happy Cog, we’re excited about the JAMstack and static site generators for our clients because of their speed and security, but in the past have often relied on additional browser-based JavaScript and internet-exposed backend services to customize user experiences based on a cookied logged-in state. With Cloudflare’s HTMLRewriter, that’s now changed. HTMLRewriter is a fundamentally new powerful capability for static sites. It lets us parse the request and customize each user’s experience without exposing any backend services, while continuing to use a JAMstack architecture. And it’s fast. Because HTMLRewriter streams responses directly to users, our code can make changes on-the-fly without incurring a giant TTFB speed penalty. HTMLRewriter is going to allow us to make architectural decisions that benefit our clients and their users — while avoiding a lot of the tradeoffs we’d usually have to consider.”

Matt Weinberg, – President of Development and Technology, Happy Cog

HTMLRewriter in Action

Since most web developers are familiar with jQuery, we chose to keep the HTMLRewriter very similar, using selectors and handlers to modify content.

Below, we have a simple example of how to use the HTMLRewriter to rewrite the URLs in your application from myolddomain.com to mynewdomain.com:

async function handleRequest(req) {
  const res = await fetch(req)
  return rewriter.transform(res)
}
 
const rewriter = new HTMLRewriter()
  .on('a', new AttributeRewriter('href'))
  .on('img', new AttributeRewriter('src'))
 
class AttributeRewriter {
  constructor(attributeName) {
    this.attributeName = attributeName
  }
 
  element(element) {
    const attribute = element.getAttribute(this.attributeName)
    if (attribute) {
      element.setAttribute(
        this.attributeName,
        attribute.replace('myolddomain.com', 'mynewdomain.com')
      )
    }
  }
}
 
addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

The HTMLRewriter, which is instantiated once per Worker, is used here to select the a and img elements.

An element handler responds to any incoming element, when attached using the .on function of the HTMLRewriter instance.

Here, our AttributeRewriter will receive every element parsed by the HTMLRewriter instance. We will then use it to rewrite each attribute, href for the a element for example, to update the URL in the HTML.

Getting started

You can get started with the HTMLRewriter by signing up for Workers today. For a full guide on how to use the HTMLRewriter API, we suggest checking out our documentation.

How does the HTMLRewriter work?

How does the HTMLRewriter work under the hood? We’re excited to tell you about it, and have spared no details. Check out the first part of our two-blog post series to learn all about the not-so-brief history of rewriter HTML at Cloudflare, and stay tuned for part two tomorrow!



Source link