Edge-Side-Includes with Cloudflare Workers


At Cloudflare we’re accelerating web assets in a number of different ways. Part of this is caching, by storing the response given by the origin server directly within our 151+ global data centers. This will dramatically improve the delivery of the resources as the visitor will directly get them from the data center closest to them, instead of waiting for us to fetch the request from the origin web server.

The issue with dynamic (but not a lot) pages

The subject we’re gonna cover today is the concept of Edge-Side-Includes. And what’s better than a real use-case to introduce what it is used for? Let’s take a website where all pages are including advertisements at the head and bottom. Could we consider these pages static? We couldn’t as at least part of this page is dynamic. Could we consider caching it? That’s a no again as it would mean the first dynamic part rendered will be cached and served for the other visitors trying to get the page. It would be a catastrophe if the advertisements are user-specific.

So the issue here is that we can’t cache the page. That’s quite a shame as it means that we’ll fetch the page again, and over again for every new request just to get this 1% portion of dynamic content.

Counter-measure with delta-compression?

Back in time, we’ve released Railgun, which consist in doing delta-compression of the requests received by a web server so Railgun listener could just send the delta bytes since the last request. We’re also working on the inclusion of this delta-compression in our Argo Tunnel listener, which is a small agent opening tunnels to us so you don’t even have to have your applications public on the internet, a simple outbound HTTPS access is enough to publish, secure and accelerate applications on Internet.

In both cases, we’ll need to fetch the complete webpage in order to calculate the difference from the last request, right? This is where Edge-Side-Includes takes place.

The Edge-Side-Includes standard has been submitted to the W3C (https://www.w3.org/TR/esi-lang) in August 2001 and defines an XML markup language that can be inserted in HTML or other text based contents which defines how interstitial proxies/CDNs need to combine static and dynamic portion and were to get them. The result is that it’s possible to keep those 99% in cache and for the remaining 1%, the interstitial cache proxy will fetch directly from the destination defined in the Edge-Side-Include block for finally combining both static and dynamic parts and sending the final webpage to the visitor.

ESI block example:

<esi:include src="http://example.com/1.html" alt="http://bak.example.com/2.html" onerror="continue"/>

Implementing it in a Worker

We released months ago Cloudflare Workers, our serverless framework which helps to implement custom logics directly within the Edge. The Workers are triggering in the path of the requests and the responses and can manipulate almost everything and spin subrequests on-the-fly.
This could dramatically improve the time to action for the implementation of new logics on your applications since you won’t have to modify them anymore as this can directly be done in the Edge, even if your applications are hosted on a bunch of different locations (Cloud and on-premise).

This scenario sounds then quite compatible with what we can achieve with Cloudflare Workers. For memories, here are the actions the EDGE needs to do for implementing the ESIs:

  • Fetching the static content
  • Searching in the payload for any <esi:include/> blocks
  • Fetching separately every ESI blocks found
  • Merging the static and dynamic contents
  • Sending the whole new payload to visitors

For this purpose, I created a small page on my test web server, with the following content:

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>ESI Demonstration</title>
<script src="http://blog.cloudflare.com/cdn-cgi/apps/head/FRz8JyVlvChltLiN7yAq762_k_A.js"></script></head>
<link rel="stylesheet" href="main.css">
<img align="absmiddle" SRC="./images/logo.jpg">
<body>
<h1>Cloudflare Edge-Side-Include demonstration</h1>
<h3>How to ESI with Cloudflare</h3>
<p>ESI (or Edge-Side-Include consist in fetching a globally static page with some dynamic fragment, this can be done directly from our EDGE with the workers, where the HTML page would be like that one:</p>
<p>Dynamic portion coming from https://httpbin.org/headers:</p>
<esi:include src="https://httpbin.org/headers" onerror="continue" />
</body>
</html>

So in this page we can see that most of the content will be static and we can find just before the end of the <body> an ESI pointing to https://httpbin.org/headers that the Edge will need to combine with the static content, in place of the ESI block.

The worker explained

addEventListener("fetch", event => {
  event.respondWith(fetchAndStream(event.request))
  event.passThroughOnException()
})

async function fetchAndStream(request) {
  let response = await fetch(request)
  let contentType = response.headers.get('content-type')

  if (!contentType || !contentType.startsWith("text/")) {
    return response
  }
  let { readable, writable } = new TransformStream()
  let newResponse = new Response(readable, response)
  newResponse.headers.set('cache-control', 'max-age=0')
  streamTransformBody(response.body, writable)
  return newResponse
}

This is the main portion of the script, testing first if the response is text-based and return the content directly to the visitor if not. Then instanciating a stream pipeline to send to the visitor using a specific function called streamTransformBody() in charge of the payload chunking.

You’ll notice that I’m setting a max-age=0 cache-control header in the response as I don’t want browsers to cache this response in case of a bad configuration on the Origin side.

I’m also declaring this script as fail-safe so that the request will go through normally if raising an Exception. More debugging information can be found on our developer HUB. Feel free to adapt for your use-cases and for example sending the Exception to a specific header, or even fancier sending the event to a logging tool.

async function handleTemplate(encoder, templateKey) {
  const linkRegex = /(esi:include.*src="http://blog.cloudflare.com/(.*?)".*/)/gm
  let result = linkRegex.exec(templateKey);
  let esi
  if (!result) {
    return encoder.encode(`<${templateKey}>`);
  }
  if (result[2]) {
    esi = await subRequests(result[2]);
  }
  return encoder.encode(
    `${esi}`
  );
}

In this portion, we’re receiving the chunks and searching for ESI blocks and if found, getting the dynamic parts. You can see that when the regex is matching, we’re calling a subRequest() function. This function will fetch the content of the ESI destination and return + encode the received payload.

async function subRequests(target){
  const init = {
            method: 'GET',
            headers: {
                'user-agent': 'cloudflare'
            }
        }
  let response = await fetch(target, init)
  let text = await response.text()
  
  return text
}

This portion is quite simple as it defines the function that does the subrequest to the ESI destination, pay attention to what the Origin server is serving as cache-control response header for those dynamic parts as we want to keep them dynamic and not cached by Cloudflare. It’s also possible to override the TTL via Cloudflare directly if you don’t want to modify your application. The documentation about how to manipulate Cloudflare features is available on the developers HUB

async function streamTransformBody(readable, writable) {
  const startTag = "<".charCodeAt(0);
  const endTag = ">".charCodeAt(0);
  let reader = readable.getReader();
  let writer = writable.getWriter();

  let templateChunks = null;
  while (true) {
    let { done, value } = await reader.read();
    if (done) break;
    while (value.byteLength > 0) {
      if (templateChunks) {
        let end = value.indexOf(endTag);
        if (end === -1) {
          templateChunks.push(value);
          break;
        } else {
          templateChunks.push(value.subarray(0, end));
          await writer.write(await translate(templateChunks));
          templateChunks = null;
          value = value.subarray(end + 1);
        }
      }
      let start = value.indexOf(startTag);
      if (start === -1) {
        await writer.write(value);
        break;
      } else {
        await writer.write(value.subarray(0, start));
        value = value.subarray(start + 1);
        templateChunks = [];
      }
    }
  }
  await writer.close();
}

In the streamTransformBody() function, I’m chunking the payload received in the readable object with specific boundaries in order to avoid chunks to terminate in the middle of a line or worse, in the middle of an ESI block.

async function translate(chunks) {
  const decoder = new TextDecoder();

  let templateKey = chunks.reduce(
    (accumulator, chunk) =>
      accumulator + decoder.decode(chunk, { stream: true }),
    ""
  );
  templateKey += decoder.decode();

  return handleTemplate(new TextEncoder(), templateKey);
}

The translate() function decode the chunks and send the stringified payload to the handleTemplate() function, which is for memories intended to properly replace the ESI by the dynamic pieces.

The complete worker script

addEventListener("fetch", event => {
  event.respondWith(fetchAndStream(event.request))
  event.passThroughOnException()
})

async function fetchAndStream(request) {
  let response = await fetch(request)
  let contentType = response.headers.get('content-type')

  if (!contentType || !contentType.startsWith("text/")) {
    return response
  }
  let { readable, writable } = new TransformStream()
  let newResponse = new Response(readable, response)
  newResponse.headers.set('cache-control', 'max-age=0')
  streamTransformBody(response.body, writable)
  return newResponse
}

async function handleTemplate(encoder, templateKey) {
  const linkRegex = /(esi:include.*src="http://blog.cloudflare.com/(.*?)".*/)/gm
  let result = linkRegex.exec(templateKey);
  let esi
  if (!result) {
    return encoder.encode(`<${templateKey}>`);
  }
  if (result[2]) {
    esi = await subRequests(result[2]);
  }
  return encoder.encode(
    `${esi}`
  );
}

async function subRequests(target){
  const init = {
            method: 'GET',
            headers: {
                'user-agent': 'cloudflare'
            }
        }
  let response = await fetch(target, init)
  let text = await response.text()
  
  return text
}

async function streamTransformBody(readable, writable) {
  const startTag = "<".charCodeAt(0);
  const endTag = ">".charCodeAt(0);
  let reader = readable.getReader();
  let writer = writable.getWriter();

  let templateChunks = null;
  while (true) {
    let { done, value } = await reader.read();
    if (done) break;
    while (value.byteLength > 0) {
      if (templateChunks) {
        let end = value.indexOf(endTag);
        if (end === -1) {
          templateChunks.push(value);
          break;
        } else {
          templateChunks.push(value.subarray(0, end));
          await writer.write(await translate(templateChunks));
          templateChunks = null;
          value = value.subarray(end + 1);
        }
      }
      let start = value.indexOf(startTag);
      if (start === -1) {
        await writer.write(value);
        break;
      } else {
        await writer.write(value.subarray(0, start));
        value = value.subarray(start + 1);
        templateChunks = [];
      }
    }
  }
  await writer.close();
}

async function translate(chunks) {
  const decoder = new TextDecoder();

  let templateKey = chunks.reduce(
    (accumulator, chunk) =>
      accumulator + decoder.decode(chunk, { stream: true }),
    ""
  );
  templateKey += decoder.decode();

  return handleTemplate(new TextEncoder(), templateKey);
}

Testing the script

Testing is easy, we’re going to cURL over the URL having my small HTML code presented earlier in the article, and see what looks like the answer.

cURL command:

curl https://www.justalittlebyte.ovh/esi.html -sv 2>&1 | grep -E "Cf-Ray|cf-cache-status|cf-ray"

The grep is trying to catch few things:

  • First CF-Ray is for the request itself
  • CF-Cache-Status is the header Cloudflare is using indicating if the requests has been fetched from the cache
  • Second CF-Ray is the CF-Ray coming from the call made to https://httpbin.org/headers, this will change as the CF-Ray is unique for any new requests made through our Edge

The answer is that one

< cf-cache-status: HIT 
< cf-ray: 447b3a379ce8bc3e-LHR
    "Cf-Ray": "447b3a37d217bc3e-IAD",

Doing it again

< cf-cache-status: HIT
< cf-ray: 447b3a3c4b5fbc44-LHR
    "Cf-Ray": "447b3a3c61f5bc44-IAD",

and the page looks like this

ESI_Demonstration

The most interesting comes from the cf-cache-status where we can see that the page is actually coming from our cache but the worker modified the payload as he detected an ESI block.

Share your worker recipes

You can find additional worker recipes and examples in our official documentation

Have you written a worker that you’d like to share? Send it to us and you might get featured on our blog or added to our Cloudflare worker recipe collection with a credit.



Source link