Content Negotiation with Cloudflare Workers
                                Thu Jun 23 2022
A neat power from 1995 that we can use to provide better user experience to web
                               browsers and APIs
--------------------------------------------------------------------------------

Content Negotiation with Cloudflare Workers
===========================================

Published Jun 23, 2022 - 16 min read

/-------------- Table of contents --------------\
| Table of contents                             |
| * Content Negotiation with Cloudflare Workers |
| * Brief reflection on its origins             |
| * Negotiating Requests in practice            |
| * Browsers again                              |
| * Cloudflare Workers                          |
| * Conclusion                                  |
\-----------------------------------------------/

A neat feature exists on the web, since the web 1.0 days in fact! Content
Negotiation [L1] is a way for a user agent (think browser) to say I can take or
handle these things please!

Content negotiation was first codified in RFC2068 [L2] section 14.1 in 1997,
originally drafted in 1995. This header, the Accept header goes waaaay back..
Along with Accept came a few others, like Accept-Language, Accept-Charset,
Accept-Encoding. I will be focusing on Accept.

In practice, the browser sends a prioritized list of "MIME Content-Types" (see
RFC1341 [L3] from 1992) and the server must decide which content to send back.
If the Content-Type header is omitted or no matches are found, the */* option is
used. Generally, the */* option means "I'll take whatever you'll give me".

[I1: Flow chart of content negotiation]

Though, the example above is more relevant to emails than browsers. What
browsers out there ask for text first? None. Emails can receive multiple content
-type bodies at once, and it is up to the client to choose which to give to the
user.

/[cendyne: bullshit2]----------------------------------------------------------\
| It's generally a good idea to provide both a plain text in addition to HTML  |
| content when sending email.                                                  |
| [I2: USPS does not send plaintext with HTML emails]                          |
\------------------------------------------------------------------------------/

So here we see Content negotiation for humans isn't actually all that useful.

Wait, so why am I writing about something useless?

It is useful! But for programs rather than humans!

Brief reflection on its origins
-------------------------------

Back in 1995 when RFC2068 [L2] was still being drafted, I think they had big
dreams about the web.

We didn't know how powerful browsers would become, that Google Chrome became the
basis for nearly all browsers [L4], and many applications through Electron [L5] 
or even be wrapped in a light operating system [L6].

[I3: Oops, chromium became the base for nearly all browsers]

But we did gaze into a theoretical future with extensible applications,
browsers, and servers. What we lacked was insight into how difficult some of
these features are to use effectively.

Do you really think servers out there bother to convert UTF-8 [L7] to ISO-8859-1
[L8] ? No. They don't. Today the client is responsible for interpreting what
charset the server sends back.

Big companies like Google, Microsoft, eBay, Paypal, IBM, DHL, Fedex, Walmart,...
the list goes on do not respect the Accept-Language header [L9].

The Network Working Group expected servers to be more flexible than the industry
would realize. And so we have flexible clients instead.

That said... when the server is flexible, we can do some neat things!

/------------------------------------------------------------------------------\
|                       Just old memories, you can skip                        |
|------------------------------------------------------------------------------|
| /[cendyne: access-granted]-------------------------------------------------\ |
| | In 1995 I was busy learning how computers work. I deleted System32 and   | |
| | my parents had to reinstall Windows 95 every week.                       | |
| \--------------------------------------------------------------------------/ |
|                                                                              |
| /----------------------------------------------------------[cendyne: ssssh]\ |
| | Though, I wasn't programming yet at the time. I did play a few odd games | |
| |     like Return to Zork. This was actually a terrible game, don't bother | |
| |                                                              with it. :) | |
| \--------------------------------------------------------------------------/ |
|                                                                              |
| /--------------------------------------------------------------------------\ |
| |                                 Estryark                                 | |
| |--------------------------------------------------------------------------| |
| | Youtube Video [L10]                                                      | |
| | Return to Zork - Want some rye? [L11] 1/9/2023                           | |
| \--------------------------------------------------------------------------/ |
\------------------------------------------------------------------------------/

Negotiating Requests in practice
--------------------------------

Some platforms such as Ruby on Rails and Spring MVC support content negotiation.
While Rails has an explicit code path depending on the content type, other
frameworks like Spring have the developer populate a model which is then
transformed into the content type's format by the framework. See Spring MVC
Content Negotiation [L12] for more. Alternatively, multiple controller endpoints
can be specified, one for each supported content type!

As a content negotiating server, this provides flexibility for more types of
clients. With Spring, a developer can code it once, adapt it to the client's
desired format generally, and move on to focus on interesting things. But XML?
Really? Thankfully we've moved on from SOAP [L13] and most integrations have
JSON. XML was made a bit too smart as a messaging format and thus comes with a
lot of attack surface [L14]. Still, as a service creator you may have to meet to
the whims of the clients. If you don't, a fragile contracted middleware will.

While content negotiation can be used to specify which API version is requested
and to be served, or even functionality, its use is more esoteric. Large vendors
like Oracle even go overboard [L15] with this, using an actual unique content
type for each endpoint.

/[cendyne: teaching]-----------------------------------------------------------\
| Vendor specific media types look like (where * is whatever): */vnd.name-here |
| +*. You can find out more in RFC4288 [L16].                                  |
\------------------------------------------------------------------------------/

/--------------------------------------------[cendyne: press-f-to-pay-respects]\
|      You might also see metadata (after a semi-colon) like image/jpeg;q=0.8, |
|    where q=0.8 is the metadata. While you might think that you can use it to |
|          specify the API Version (and technically you can), in practice most |
|                                             frameworks discard all metadata. |
\------------------------------------------------------------------------------/

In Java you can use the @Produces [L17] annotation or with Spring the
@RequestMapping [L18] annotation. The framework may select the appropriate
handler from several with hints from the client.


@GET
@Produces("application/vnd.cendyne-v1+json")
public Response oldApi() {
  return Response.ok(new Version1Response("hello")).build();
}

@GET
@Produces("application/vnd.cendyne-v2+json")
public Response newApi() {
  return Response.ok(new Version2Response("hola")).build();
}

/[cendyne: gendo]--------------------------------------------------------------\
| Cloudflare workers will be less structured than the Java example above.      |
| There's a reason most servers do not support content negotiation. The code   |
| to do it right generically is hard and heavy, and when there is a problem    |
| most developers won't have the time to resolve it neatly.                    |
\------------------------------------------------------------------------------/

/-----------------------------------------------------------[cendyne: little-a]\
|    Cloudflare workers have to be small and light! In fact, the limit is 1 MB |
| gzipped! And you think making front end single page web applications under 5 |
|                                                                MB is hard... |
\------------------------------------------------------------------------------/

Lastly, while browsers may receive a fallback response, API clients will likely
have a better time if the server quickly responds with unsupported format.

/[cendyne: you-stop-that]------------------------------------------------------\
| Ever have your functional API integration start breaking because the service |
| goes down and the remote load balancer responds with an HTML page? Usually   |
| with an error like 'Could not parse "&lt;html..."'                           |
\------------------------------------------------------------------------------/

[I4: Flow chart of content negotiation with apis]

No supported matches is shown as ____ in diagram above.
Content negotiation is a reliable mechanism for differing clients to communicate
with one server; the server satisfies each client within the same process for a
predefined set of content types.

Now let me drop a bombshell: browsers are clients too!

Browsers again
--------------

Above, I said that content negotiation doesn't make sense for humans. Yeah. And
I stand by that.

It's what's on the page the browser receives that can be content negotiated!

See, not all browsers (looking at you, (Strike through: Internet Explorer) Apple
Safari) support the same features and content.

It took Safari 6 years to add webp image [L19] support.

[I5: Can I use board of webp support]

Today, Safari does not support avif images [L20], even though avif provides
better results in the same or less size as a similar webp.

[I6: Can I use board of avif support]

There are ways around this. Web specs for picture and source elements can give
hints to the client which to use.

But it can get gnarly.


<picture>
  <source type="image/jxl" srcset="/assets/stickers/nervous.jxl">
  <source type="image/avif" srcset="/assets/stickers/nervous.avif">
  <source type="image/webp" srcset="/assets/stickers/nervous.webp">
  <img loading="lazy" alt="nervous" class="sticker" src="/assets/stickers/
nervous.jpg">
</picture>

/[cendyne: shiver]-------------------------------------------------------------\
| Now imagine emitting twice this much to handle device pixel ratios.. I've    |
| done it.                                                                     |
\------------------------------------------------------------------------------/

So what does it look like if you rely on content negotiation? The client gets
less HTML!


<img loading="lazy" alt="nervous" class="sticker" src="/assets/stickers/nervous"
>

Both with the <picture><img /></picture> and <img /> case, the browser sends
requests that look like the following:


GET /assets/stickers/nervous
Accept: image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows Phone OS 7.0; Trident/3.1
; IEMobile/7.0; NOKIA; Lumia 1320)

/[cendyne: thinking-about]-----------------------------------------------------\
| See that Accept header? We can use that to choose which to send! We can      |
| conveniently ignore the User-Agent in most cases! User agents lie anyway.    |
\------------------------------------------------------------------------------/

Now just cutting off the extension won't work outright with most servers.
Content negotiation is a server responsibility and most do not implement it.
Cloudflare will not do it for you, and if you do manage it with nginx,
openresty, or even node then Cloudflare caching will get in the way!

/[cendyne: huff-angry]---------------------------------------------------------\
| Cloudflare's Cache Vary documentation [L21] shows that they won't even let   |
| you vary cache responses for images with the Accept header unless you're a   |
| paying customer for that DNS zone.                                           |
\------------------------------------------------------------------------------/

Headers make things complicated. It is easier to just cache requests by the URL
rather than the URL and select headers.

There is a way around this, implement content negotiation at the edge with
Cloudflare workers.

Cloudflare Workers
------------------

See, Cloudflare Workers is the first thing the Cloudflare endpoint will use.
Within the worker, one can use the caches API to match against the incoming
request, or more importantly: a custom request that encodes what you care about
in the URL!

Following the example above, that means if the request looks like


GET /assets/stickers/nervous
Accept: image/avif,image/webp,image/jpeg

Then the cache may be asked for a request that looks like


GET /assets/stickers/nervous.avif

The worker can query the origin for /assets/stickers/nervous.avif or /assets/
stickers/avif/nervous.avif. The origin path really does not have to equal the
inbound request at all.

[I7: Diagram of where cloudflare workers compares to without cloudflare workers]

A client does not choose if they get a worker or not, the cloudflare endpoint
does. From there the worker may choose to access other things like origins,
cache, and services.

If you decouple the cache and origin requests from the client request, you can
perform some convenient magic.

/[cendyne: ok]-----------------------------------------------------------------\
| Also, another product: Cloudflare KV [L22] is another useful tool to         |
| consider. Unlike Cloudflare cache, KV is replicated globally and still       |
| responds with low latency.                                                   |
\------------------------------------------------------------------------------/

/---------------------------------------------------------[cendyne: you-got-it]\
|  In fact, the contents of this site are pulled from Cloudflare KV instead of |
|       an origin at all. Each file and some metadata is uploaded to KV during |
|       deploy time with Github Actions! Finally serverless build, deploy, and |
|                                                                     runtime! |
\------------------------------------------------------------------------------/

Here's a reduced bit of code I use to do some content negotiation for my
stickers.


let varies = ['Accept'];
let acceptHeader = c.req.header('accept');
let desiredContentType;
if (acceptHeader) {
  for (let accept of acceptHeader.split(',')) {
    let type = accept.split(';')[0];
    let foundMatch = false;
    switch (type) {
      case 'image/jpeg':
      case 'image/png': {
        desiredContentType = 'image/*'
        foundMatch = true;
        break;
      }
      case 'image/avif':
      case 'image/webp':
      {
        desiredContentType = type;
        if (type == 'image/avif') {
          urlType = '/avif';
        } else if (type == 'image/webp') {
          urlType = '/webp'
        }
        foundMatch = true;
        break;
      }
    }
    if (foundMatch) {
      break;
    }
  }
}

let cacheRequest = new Request(`https://${hostname}/s/${character}/${name}/
${size}${urlType}`);
let cachedResponse = await caches.default.match(cacheRequest);
if (cachedResponse && !c.req.header('via')) {
  let returnResponse = new Response(cachedResponse.body, cachedResponse);
  returnResponse.headers.set('vary', varies.join(', '));
  return returnResponse;
}

The cache will store requests that look like

GET /s/cendyne/wink/256/avif


There's more neat things like Client Hints [L23] such as DPR (Device Pixel
Ratio), Width (yes image requests can say how wide it expects to be!), and even 
Save-Data [L24] to request lower bandwidth content. Though if you dabble in
that, you'll also want to use Accept-CH, and Permissions-Policy headers too.

What's not shown in the above code is the size query parameter from the url,
which may also be modified by the DPR header or the Width header.

/[cendyne: you-dense]----------------------------------------------------------\
| Apple doesn't support the DPR header on Safari... So if the user agent has   |
| AppleWebkit (and not Chrome), I just pretend DPR: 2 since most Apple devices |
| are double pixel ratio.                                                      |
\------------------------------------------------------------------------------/

After this, some logic happens to lazily produce the response.

  1. Make a KV key which encodes the asset, content type, and desired size

  2. Search KV for this key

  3. If something, return it

  4. Resize and convert the original asset to the desired content type and
     desired size

  5. Save the result to KV, so all future requests are instant

  6. Respond with the converted asset

While my sticker service stores its images in Cloudflare KV, my media service is
backed by Cloudflare R2 [L25], which is a simple object storage service (like
Amazon S3 [L26]) since I expect far more storage in its lifetime.

Conclusion
----------

/[cendyne: heh-heh]------------------------------------------------------------\
| Invisibly, content negotiation provides a better user experience by offering |
| newer formats that save transfer data, present a better quality image or     |
| video, and can even reduce battery life by sending the optimal content for   |
| that device.                                                                 |
\------------------------------------------------------------------------------/

/[cendyne: hmm]----------------------------------------------------------------\
| However, implementing content negotiation requires precise control of your   |
| endpoints. If you plan to do this, I recommend you have your content on      |
| another subdomain or domain entirely.                                        |
\------------------------------------------------------------------------------/

/[cendyne: objection]----------------------------------------------------------\
| But if you rely on Width or DPR headers, you must add Permissions-Policy to  |
| your website and Accept-CH to both your website and the content domain.      |
\------------------------------------------------------------------------------/

/[cendyne: if-i-fits-i-sits]---------------------------------------------------\
| Or you know, just find a CDN with content negotiation [L27] and put your     |
| images there. You don't have to be me and make your own CDN on top of        |
| Cloudflare.                                                                  |
\------------------------------------------------------------------------------/

--------------------------------------------------------------------------------

 [L1]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Content_negotiation
 [L2]: https://tools.ietf.org/html/rfc2068
 [L3]: https://tools.ietf.org/html/rfc1341
 [L4]: https://archive.ph/bMMPf
 [L5]: https://www.electronjs.org/
 [L6]: https://www.google.com/chromebook/chrome-os/
 [L7]: https://en.wikipedia.org/wiki/UTF-8
 [L8]: https://en.wikipedia.org/wiki/ISO/IEC_8859-1
 [L9]: https://archive.ph/zHE8R
[L10]: https://youtu.be/iHKKq7kMF8w
[L11]: https://www.youtube.com/watch?v=iHKKq7kMF8w
[L12]: https://www.baeldung.com/spring-mvc-content-negotiation-json-xml
[L13]: https://en.wikipedia.org/wiki/SOAP
[L14]: https://www.ws-attacks.org/Welcome_to_WS-Attacks
[L15]: https://archive.ph/GKG1c
[L16]: https://tools.ietf.org/html/rfc4288
[L17]: https://jakarta.ee/specifications/platform/9/apidocs/jakarta/ws/rs/
       produces
[L18]: https://www.baeldung.com/spring-requestmapping
[L19]: https://developers.google.com/speed/webp
[L20]: https://jakearchibald.com/2020/avif-has-landed/
[L21]: https://developers.cloudflare.com/cache/about/vary-for-images/
[L22]: https://developers.cloudflare.com/workers/runtime-apis/kv/
[L23]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Client_hints
[L24]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Save-Data
[L25]: https://developers.cloudflare.com/r2/runtime-apis/
[L26]: https://aws.amazon.com/s3/
[L27]: https://docs.imgix.com/tutorials/improved-compression-auto-content-
       negotiation
 [I1]: https://c.cdyn.dev/PFh01H
 [I2]: https://c.cdyn.dev/5RBMUuBa
 [I3]: https://c.cdyn.dev/33nvf_xx
 [I4]: https://c.cdyn.dev/ASmXsi
 [I5]: https://c.cdyn.dev/XywRdK7t
 [I6]: https://c.cdyn.dev/rdKizbtM
 [I7]: https://c.cdyn.dev/EQgg0R