A Precious Side Project — This Website
- 57 min read - Text OnlyHooray! This website is now no longer fully bound to an unmaintained static site generator! So, what's new, and what's changed? There's a long story on how it got to where it is. Bear with me for a moment.
Previously
I used and still use a fork of Mendoza, a documentation oriented static site generator for the Janet programming language. I saw the potential for differentiating my content with its powerful markdown-esque macro markup.
In the Mendoza markdown, it looks like this:
## Previously
I used ...
@sticker-left[cendyne/hold-this]{
For example, adding dialog with stickers on the side.
}
Which calls these Janet functions:
(defn sticker [name]
(def sticker-parts (string/split "/" name))
(def character (get sticker-parts 0))
(def sticker-name (get sticker-parts 1))
(def sticker-image {:tag "img"
"src" (string "https://s.cdyn.dev/S/128/" character "/" sticker-name)
"alt" sticker-name
"class" "sticker"})
{:tag "div" "class" "sticker-container" :content sticker-image})
(defn sticker-left [name content]
{:tag "div"
"class" "sticker-left"
:content [
(sticker name)
{:tag "div" "class" "im-message" :content {:tag "div" "class" "im-message-left" :content content}}
]
})
The final expression, which is the return value, is a dictionary that represents an HTML element. This dictionary can have mixed types for keys and values. Any key that is strictly a string
type will become an HTML attribute, while keywords
like :tag
instruct the render function which HTML tag to use.
Early Friction
Mendoza enabled my own consistent site-wide style. When I needed to rewrite the HTML for different CSS styling, I did not have to rewrite or edit any of the prior documents.
This level of customization in combination with my writing style — using lots of stickers and other media along the way — came at a significant cost.
Unmaintained since adoption
Since I started writing in 2021, Mendoza has not had much attention, besides making it still function with the latest compiler. Sure, there are no regressions and it does what it is intended for: it builds Janet's documentation website. Beyond that, it shows few signs of polish and comes with plenty of rough edges.
I think I'm the only one out there using Mendoza to write content nearly every month.
In my first year of publication, 2021, the first significant friction was how it handled static content. Every time I saved an edit, it would copy a few hundred files from the static folder into the build folder. Some of that was self-inflicted. For every sticker and image on this site, I had at least a jpeg, webm, avif, and jpeg-xl format encoded. At first I used imagemagick. Later, I used squoosh instead to do reliably convert images.
Back then, I was less familiar with Mendoza's internals. In 2022, I did end up forking it. I'll get to that. See section "A final experiment."
The second point of friction was building on more than one computer. It turns out running different versions of Janet on different architectures with other system dependencies like imagemagick with the right environment for all this at once is painful to synchronize. I'll also cover how I solved this. See section "Dev Containers"
The third point of friction is if there's a markdown parsing error, then the entire site build stops and errors. There is no graceful degradation. Common issues include quotes around curly braces requiring escapes, and parentheses requiring escapes after macros. Also, the fact that Mendoza "markdown" only extends as far as headings. Other styling, such as italics and bold, require macros. Oh! And if I start a paragraph with a styling macro, like italics, the rest of the paragraph never ends up in a paragraph HTML tag. At times I have to manually say: yes, I am intending to put a paragraph here. In short, there are several problems with the format I use to author.
Parsing content content/posts/2023-07-10-a-precious-side-project.mdz as mendoza markup
error: parser has unchecked error, cannot consume
in parser/eof [src/core/parse.c] on line 935
in capture-value [/usr/local/lib/janet/mendoza/markup.janet] on line 26, column 3
in peg/match [src/core/peg.c] on line 1694
in markup [/usr/local/lib/janet/mendoza/markup.janet] on line 150, column 16
in <anonymous> [/usr/local/lib/janet/mendoza/markup.janet] on line 176, column 51
in <anonymous> [/usr/local/lib/janet/mendoza/markup.janet] on line 175, column 41
in require-1 [boot.janet] (tailcall) on line 2963, column 18
in read-pages [/usr/local/lib/janet/mendoza/init.janet] on line 87, column 25
in read-pages [/usr/local/lib/janet/mendoza/init.janet] on line 84, column 20
in read-pages [/usr/local/lib/janet/mendoza/init.janet] on line 84, column 20
in load-pages [/usr/local/lib/janet/mendoza/init.janet] on line 91, column 3
in build [/usr/local/lib/janet/mendoza/init.janet] (tailcall) on line 142, column 14
Mitigations
Along the way, I've gotten enough patches in place to make authoring at this scale tolerable.
Dev Containers
Around the time I needed to switch my authoring computer, I looked for alternatives. At first, I tried to set up a remote VSCode session in a FreeBSD jail on my NAS, since it seemed like a stable place to do it. However, FreeBSD is not really supported.
Later, a friend mentioned that VSCode Dev Containers are a thing. I looked into it, tried it out, and whoa! My Docker at work skills came very handy and I have been a fan of using dev containers since.
I still use VSCode Dev Containers to author with Mendoza and likely will to the end of this year.
Extracting stickers
One of my first projects on Cloudflare workers was my stickers service. After all, my sticker collection was growing past one hundred source images and having so many alternative forms slowed the static build process down to half a minute.
I spent a month or two learning Rust and practicing with an image library there to resize things on the fly. It was unfulfilling work and ultimately in web assembly, did not have the speed I desired.
Then, I found out that Cloudflare has image resizing and image encoding for workers. It is unfortunately paid and only available to the professional plan, which is per domain, rather than per user. This functionality is also not functional on local. Which makes testing a pain.
My sticker service now derives scaled images in a requested format on the fly and caches them in Cloudflare KV. I upload the stickers with Insomnia (a program much like postman) and it handles the rest - including serving with content negotiation.
Also, before I had to look in the folder to see which stickers I wanted to use. Now, I have a "sound board" where I can click on any sticker I want and it sets my clipboard to paste the embed code for that sticker.
With this, my static build times cut to a fifth! It no longer copies hundreds of files every time I update a markdown file!
Extracting media
As of August 2022, my articles include Tweets in a privacy respecting way. Previously, Tweets were mere screenshots. At no point in the past or the future were there cross-origin requests to Twitter, YouTube, or others without your consent as a reader.
A problem. Remember how the build time shrank after extracting all the stickers? In a few months of writing, Tweets and other media would undo that performance gain.
To prevent a repeat of degraded authorship-experience (like user-experience, but writing!), I opted for a similar but separate solution for media content. After all, photos and videos are far larger in size than 512x512 .png
s.
While the edit and refresh delay was reduced by extracting stickers, it was still too long.
Not only did I want to serve more images directly, I wanted to serve more videos too. This compounded with my desire to add socially sourced content. Tweets, Toots, and YouTube videos come with several files: profile pictures, several attached images – even custom emojis, video previews (also called poster), and videos or "gifs" as .mp4
and .webm
files.
I would not add social media embeds until I could handle the big file problem: videos.
And I really wanted to have videos and "gifs." Web-scaled images are within KV's limits. Videos go beyond KV's limits (25 megabytes). I could not reuse the KV approach for videos, and I would rather use the same storage backend for arbitrary binary content.
Unlike the Cloudflare workers image API used for the stickers service, the closest video equivalent has a high price tag.
I needed a storage solution that supports more than a few megabytes of binary content with a low cost to store and to serve.
Cloudflare had just released their Amazon Simple Storage Service (S3) competitor: Cloudflare R2 Object Storage.
Now, videos and images comfortably sit in R2 and I do not have to worry about Tweets, Toots, or YouTube content being too large for KV! And, in fact, R2 supports content range requests which is essential to serving videos on the web.
However, R2 is not the only acronym service involved. Before content goes into R2, an immutable key is generated by HMAC-tagging (archived) the content with a namespace key.
Resuming from the tangent above... I mentioned that the arbitrary binary content is stored on R2.
For example, take the soundboard image above. The original source image is stored in R2 at the path KWCA/PYB9wAPGQgmRCY0iCa6lslvnQhZJZavrUxeUliwdw38
. Don't worry, you do not have to make much sense of it. I'll simplify it. The path looks like: <site id>/<HMAC(site key, bytes)>
. This is referenced by the entity with an id jx3EsTkU
, which happens to involve another HMAC. And finally, the URL you called above references the entity identified by sZnpIRps
. Guess what? Another HMAC is involved! And then any runtime specific parameters, such as resize to 645
wide, are canonicalized and HMAC'd again to find derived content.
So in short, sZnpIRps
, the content URL you hit above, is found and combined with any runtime parameters to find an existing derived image. If one is not found, then it finds the file in the source entity, identified by jx3EsTkU
, and uses the image API to transform the original asset stored on R2 at KWCA/PYB9wAPGQgmRCY0iCa6lslvnQhZJZavrUxeUliwdw38
. It then saves the result back into R2 at some other path like KWCA/<another base64url HMAC tag here>
, and serves that result for all similar requests in the future.
Once my worker was live, I could comfortably offload all my media to it and rely on upload scripts to handle it from there.
And indeed, my build times changed to under ten seconds! I was only producing 80 HTML files or so.
What took it ten seconds? All those embeds for Tweets were stored as JSON and referenced during the build process. Each file, one at a time, blocked the HTML rendering process.
Extracting social content
Remember how I want to preserve and respect the privacy of my readers? Well it came as a cost to the build process. There were over a hundred JSON files that functioned like a database for all the remote social content I had collected and where their media was stored.
{
"name": "Jake Williams",
"username": "MalwareJake",
"timestamp": 1687005507,
"text": "Totally normal, definitely not a bubble, investment cycle.\nhttps://fortune.com/2023/06/14/mistral-ai-startup-record-113-million-seed-round-arthur-mensch/",
"photos": [
{
"url": "https://c.cdyn.dev/x6zxHIcY",
"width": 1080,
"height": 1016,
"blurhash": "LEQmCrD%_3xu9Fs:%MRj~qoLM{xu"
}
],
"videos": [],
"icon": "https://c.cdyn.dev/vscF0lY1",
"iso8601": "2023-06-17T07:38:27-0500",
"banner": "https://c.cdyn.dev/h_3_s8On",
"banner_blurhash": "EGF~,QIA9ERh-=ofISt7_4WC9Eof",
"color": "#10A3FF"
}
Which becomes...
I needed something I could use to self service and find the Tweets, Toots, Youtube videos, and so on that I gather for later embedding. It turns out, KV is terrible if you want to paginate content! Following the same pattern of using yet another Cloudflare technology, I tried out the D1 open alpha. It is just distributed-ish SQLite, and SQLite is great at paginating results.
There is a database size limit. I am nowhere near that small 100MB ceiling.
Anyway, for local development, Mendoza now creates an iframe to my social archive and some JavaScript will automatically resize the content vertically so it appears seamless while editing. Then, during the build process on GitHub, I'd find all iframes and inline each iframe's content. By the time it gets to you, dear reader, there are no iframes and it is seamlessly integrated with the article.
This social archive service is just a pleasant frontend for that data and it converts it into formats I desire, from HTML, to simplified HTML for RSS content, and now my document IR. Yes, I'll get to that soon! See section "Today, with Intermediate Representation."
Vendor Lock-in
Mitigation results
Now, finally, finally, my site builds in under a second.
The less time I spend waiting on it, the faster I can judge my writing in its near published format and fix any site breaking syntax errors, and get back to writing more. After all, the harder it is to write, the less I will write.
Dynamic content
I want my content to go live at a specific time without having to run a command on my machine beforehand.
I would also like my content to be viewable before its publish time for proofreading. This is impossible to do with a purely static website.
Pivoting the origin
At first I considered trying OpenResty again. Though, the last time I wrote something for it, I did not find it fun to maintain. It is reliable though, if you want to do something for yourself.
Instead, I opted for Workers Sites, which is separate from Workers Pages. Sites generates a manifest file that is statically imported into the build process. The manifest specifies which pathnames go to which immutable files in KV. During deployment, it also compares the new manifest with KV and removes unlinked files. I say immutable in that the files are appended with a truncated hash. Instead of changing files in place, files are either created anew or garbage collected.
Here's a sample from a silly site cendyne.gay:
{
"main.css": "main.6da346e39b.css",
"pride-background.png": "pride-background.174ca9ae29.png",
"sign.png": "sign.de9888ffc3.png"
}
The manifest is a dictionary where the keys are the pathnames (without the leading /
) and the values are the keys in KV where the content is stored.
wrangler kv:key --namespace-id db7b05b5778b494d9e356aaa21facefd list
[
{
"name": "main.6da346e39b.css"
},
{
"name": "pride-background.174ca9ae29.png"
},
{
"name": "sign.de9888ffc3.png"
}
]
wrangler kv:key --namespace-id db7b05b5778b494d9e356aaa21facefd get "main.6da346e39b.css"
html, body, div {
margin: 0;
padding: 0;
border: 0;
font-size: 100%;
font: inherit;
vertical-align: baseline;
}
...
As you can see, those files exist as values in KV.
This origin change is not necessary to do what comes next.
Rewriting content
Here's where things get interesting. I noticed that Cloudflare also has a neat streaming HTML transformer library.
By having additional attributes, such as data-publish-date="2022-09-11T16:14:04Z"
, I could choose to delete the element entirely or alter it in some way. I altered my Posts page index to have the publish date embedded on each list item.
Then, at request time, the worker would transform the HTML with all the posts to show only the published posts, by comparing to the current time and deleting elements that are in the future. This also meant that future publications could be uploaded as is and be otherwise unlinked to the public eye!
This change makes the site a hybrid of static and dynamic, where the only database is the origin where the static HTML is stored. No "Hug of Death" will take down this site now! There is no single point of failure to crumple under the weight of 🍊's front page.
Adding Really Simple Syndication
The solution I used requires a bit of adjacent knowledge. Here's how Mendoza allowed me to create an index page.
Accessing the site map
There was a trick to making the Posts page index. The markdown for the post page is empty! Instead, the list of links is created in the template, which is processed after all the pages are parsed and structured in memory.
<div class="content">
{{ content }}
<ul>
{{ (seq [post :in (posts)] (render-toc post)) }}
</ul>
</div>
There are two functions to make note of above, a simplified version is included below.
- The
posts
function, which searches the sitemap for all posts - The
render-toc
function, which generates almost-HTML for every post found
(defn posts []
(sort
(((dyn :sitemap) "posts") :pages)
(fn [a b] (> (a :pub-date) (b :pub-date)))
))
(defn render-toc [node]
{:tag "li"
"data-publish-date" (iso8601 (node :pub-date))
:content [{
:tag "span"
:content [
(node :date)
" "
{:tag "a" "href" (node :url) :content [(node :title)]}
]
}]
})
The template processing phase has access to the entire website, where each entry is keyed by its destination file path and the value is a "document" with a title and any other metadata added in the Janet header. Literally a "site map."
In the same dictionary, labeled node
above, is the document's content, accessed with the key :content
, which value is a list of almost-HTML dictionaries.
Transforming the site map to an RSS feed
Remember how the sticker HTML is made? The return value was a map or dictionary with a :tag
key and several string attributes, as well as a :content
key. The same structures are inside the document's :content
!
Which means that, like the Posts index page, I can access the site map and process the almost-HTML content of each page.
However, I had to do some tweaks. For example, in RSS you do not want to use relative URLs (archived.) To solve this, I had to write a visitor which would descend into every node or element or dictionary observed and find matching links, images, and sources for images and videos and rewrite the URL before being rendered as HTML in the RSS.
Several of my HTML structures were made for a browser-centric experience which permit styling and scripts. HTML that goes into RSS is limited - more limited than what you can send over email.
And so, I figured out that I could set other keyword keys on my output almost-HTML and my RSS transformer would detect and rewrite the content it was tied to.
(defn sticker-left [name content]
(def sticker-parts (string/split "/" name))
(def character (get sticker-parts 0))
(def sticker-name (get sticker-parts 1))
{:tag "div"
"class" "sticker-left"
:meta-node "sticker"
:meta-data {
:character character
:name sticker-name
:content content
}
:content [
(sticker name)
{:tag "div" "class" "im-message" :content {:tag "div" "class" "im-message-left" :content content}}
]
})
Observe the :meta-node
and :meta-data
values.
(defn- post-process-sticker-message [node]
(def content (get-in node [:meta-data :content]))
(def sticker-parts (string/split "/" sticker-name))
(def character (get sticker-parts 0))
(def sticker-name (get sticker-parts 1))
{:tag "table" "border" "0" :content [
{:tag "tr" :content [
{:tag "td" "width" "128" :content {:tag "img"
"src" (string "https://s.cdyn.dev/S/128/" character "/" sticker-name)
"alt" sticker-name
:no-close true}}
{:tag "td" :content (post-process content)}
]}
]})
(varfn post-process [node]
(cond
(bytes? node) node
(indexed? node) (map post-process node)
(dictionary? node) (if (node :tag)
(do
(cond
(= :sticker-left (node :meta-node)) (post-process-sticker-message node)
(= :sticker-right (node :meta-node)) (post-process-sticker-message node)
true (post-process-tag node)
))
node)
true node))
And now observe that this visitor descends and recognizes any almost-HTML dictionaries which look like stickers. Upon matching, it then rewrites the content into a table, rather than relying on external CSS.
In October, I created an issue to track my desire to create a path out of this unmaintained static site generator.
A final experiment
Many months passed and eventually I had the energy and focus to transition my content towards something I could build from.
I would not rewrite all of my content in another markup, markdown, whatever format by hand. I needed to prove that I could structurally transform and interpret the existing content.
In good engineering practice: I first performed a feasibility test. Rather than convert to another form of HTML, I reduced the scope to emitting a plain-text file. It took a few evenings to make. The results were flimsy, yet functional.
This experiment revealed all the things I needed to annotate with metadata. From quotes and embedded content to a home-grown shared glossary I added a year back.
For example, I referenced PRF above and reference IR below. I can also optionally include a glossary section in the article with a definition I wrote elsewhere.
- Pseudo Random Function PRF
- A Pseudo Random Function or PRF is a keyed function that produces uniformly random data. It takes an input and reliably produces the same output with a fixed length. It sounds a lot like a hash function, and often is made with a hash function with additional mechanics around it.
- Intermediate Representation IR
- A structure or union of structures that represent some source code. It can be examined and manipulated with further processing. Compilers use Intermediate Representations (IRs) to optimize code. Some IR structures are only available to the compiler as processing infers additional contextual information.
I had to extend my fork of Mendoza to add more functionality after it rendered content to an HTML file, such that it would save a .txt
version too. After all, it was only designed to generate Janet documentation. There was no concept of multiple artifacts per source file.
A good experiment brings greater knowledge on the way forward. This delivered on that. A great experiment provides something useful that enables future work. This also delivered on that.
Today, with Intermediate Representation
Following the .txt
output above, my next goal was to make a JSON output.
Ultimately I need a serializable intermediate representation (IR).
JSON is a fantastic format to serialize. It is easy to define schemas, either through JSON Schema or TypeScript type declarations. It also crosses the language barrier!
In contrast, extensible data notation (EDN) and its Janet equivalent Janet data notation (JDN) is neat and accessible from Janet. However, unless I want to write a .jdn
parser in TypeScript, I'd be locked to processing inside the Janet language. That would not move me forward, when most of my new code is in TypeScript.
Generating an Intermediate Representation
With JSON as the IR, I had a target to export meta-data and content to. In general, any dictionary with a :meta-node
key will be directly emitted as an IR structure, with it's child content also transformed. That leaves any remaining HTML, like blockquote
, hr
, h2
, and so on. These are also transformed into IR nodes.
A select few HTML tags are supported, the rest emit a warning and it is ignored.
Now that a JSON friendly IR is ready, let's write it out to a file!
{
"type": "document",
"content": [
// ...
{
"content": [
{
"text": "\u000A",
"type": "text"
},
{
"id": "1670048257873960964",
"type": "tweet"
},
{
"text": "\u000A",
"type": "text"
}
],
"type": "array"
},
// ...
{
"type": "paragraph",
"content": [
{
"text": "Now that a JSON friendly ",
"type": "text"
},
{
"content": [
{
"text": "IR",
"type": "text"
}
],
"definition": {
"abbreviation": [
{
"text": "IR",
"type": "text"
}
],
"key": "definition-ir"
},
"type": "definition-reference"
},
{
"text": " is ready, let's write it out to a file!\u000A",
"type": "text"
}
]
},
{
"type": "text",
"text": "\u000A\u000A"
},
// ...
]
}
This odd and unnecessary whitespace is an artifact of the authoring document. Every sentence I write is its own line, and so every line ends up with its own "type": "text"
node ending in "\000A"
.
To solve the whitespace issue, as well as that unnecessary "type": "array"
node, I wrote WhitespaceTransformer.ts
and ArrayCollapseTransformer.ts
. As for the Tweet, that was yet another transformer which does two passes. The first finds all social content to fetch, it fetches all the content in bulk from the social archive, and then the second pass replaces all social nodes that matched with the corresponding document-ir
which the social archive produces.
After post processing, it now looks like:
{
"type": "document",
"content": [
// ...
{
"type": "card",
"header": {
"type": "card-header",
"title": [
{
"type": "text",
"text": "Gergely Orosz"
}
],
"username": "GergelyOrosz",
"usernameDomain": "twitter.com",
"url": "https://twitter.com/GergelyOrosz",
"imageUrl": "https://c.cdyn.dev/Wg-ozM6I"
},
"attribution": {
"type": "card-attribution",
"date": "2023-06-21T15:57:57.000Z",
"url": "https://twitter.com/GergelyOrosz/status/1671548015948046343"
},
"content": {
"type": "card-content",
"content": [
{
"type": "text",
"text": "Google Domains had been the world's 3rd most popular registrar, by monthly domain registrations 2021 to now. ("
},
{
"type": "link",
"content": [
{
"type": "text",
"text": "#1"
}
],
"url": "https://twitter.com/hashtag/1"
},
{
"type": "text",
"text": " is, of course, GoDaddy by a huge margin. "
},
{
"type": "link",
"content": [
{
"type": "text",
"text": "#2"
}
],
"url": "https://twitter.com/hashtag/2"
},
{
"type": "text",
"text": " is NameCheap: both companies' bread-and-butter is domains) Congrats to the Google Domains team for pulling this off."
}
]
},
"original": {
"type": "tweet",
"id": "1671548015948046343"
}
},
// ...
{
"type": "paragraph",
"content": [
{
"type": "text",
"text": "Now that a JSON friendly "
},
{
"type": "definition-reference",
"definition": {
"abbreviation": [
{
"type": "text",
"text": "IR"
}
],
"key": "definition-ir"
},
"content": [
{
"type": "text",
"text": "IR"
}
]
},
{
"type": "text",
"text": " is ready, let's write it out to a file!"
}
]
},
// ...
]
}
This post processing occurs in GitHub Actions after the site builds with Mendoza. Not only will this document-ir
be useful to generate HTML, it will be quite usable to make RSS and plain text content too!
Using the Intermediate Representation
While you can view the JSON representation for any post I have now, for example: A Precious Side Project — This Website (json), this is only an interesting tech demonstration rather than something of value to you as a reader.
What really brings value to you, dear reader, is providing an artifact tailored to your experience. Be it with a modern browser, RSS, or even Internet Explorer 6.
While I can support silly uses cases like Internet Explorer 6 in 2023 now, there were several things I needed to deliver before I could swap the HTML re-written content with Document IR generated content. Namely:
- Emit near equivalent HTML for fully featured HTML pages.
- Replace the RSS Feed.
- Replace index pages like Posts and Topics.
- Replace the
sitemap.xml
. - Reimplement the extra features like HTTP Referer messaging, and more.
A new site map
Before I can do anything more, I really need my own manifest for which document JSON files map to which paths. There is a manifest for workers sites. However, it does not identify which files are specifically document-ir
files.
As part of my document-ir
post processing action, I now generate an enhanced sitemap JSON file.
It looks something like this:
{
"version": "b1792837db9611dd5cda46cec1bc858c86204808",
"branch": "main",
"collectiveDigest": "5Xb4BXzPzgzwf7Xum2jp2xVpM2ZCFTxYwvUXroDsRKc",
"date": "2023-07-04T06:22:24.774Z",
"documents": {
"/index.html": {
"source": "/index.json",
"title": "Hello, I'm Cendyne",
"url": "/index.html",
"contentDigest": "suImAKzRXfuFxGNs97ySnMRNiUVXDUbsvhbQJrE-ks0",
"description": "I write about security, software architecture, management, and applied cryptography."
},
"/posts/2021-04-11-website.html": {
"source": "/posts/2021-04-11-website.json",
"title": "A New Year, A New Website",
"url": "/posts/2021-04-11-website.html",
"pub-date": 1618174005,
"pubDate": "2021-04-11T20:46:45.000Z",
"lastMod": "2021-04-11T20:46:45.000Z",
"contentDigest": "eVQ7_KoeojURovRlhvgo6Luj8WlvGA2LHVZa7-jDSe0",
"date": "2021-04-11",
"description": "The new website for the year, a reflection on 2020 and now 2021",
"guid": "eb24d936-9b85-4f02-baa2-24147e3d242a",
"image": "https://c.cdyn.dev/pr45JDDH"
},
// ...
}
}
In a similar manner to how wrangler injects a JSON file at deploy time, I generate this file before deployment and import it. Additional information is embedded, such as the Git commit hash, Git branch, and build date. Every page also tracks its own canonicalized JSON digest for tracking over time to see if it has been modified.
The enhanced sitemap is then used to form the response when a request GET
s /sitemap.xml
. Of course, content that has not yet been published will be omitted from the response.
Near HTML Equivalent pages
When you visit a .html
link, my worker now checks the enhanced sitemap for an entry. The source
is then used to pull the document-ir
from the Workers Sites KV binding.
Some further post processing happens and I sprinkle in the HTTP Referer message into the final document ready to be rendered. Next, it goes into a JSX tag which recursively renders document-ir
nodes into HTML.
Any ephemeral global state added by the JSX nodes is then collected and reflected on the response, such as any Content Security Policy headers that need to be accounted for.
Lastly, the response is sent to you to read, learn, and enjoy.
Index pages
Posts and Topics is actually a breeze at this point. I can read the enhanced sitemap and emit a document-ir
document! Rather than post process the HTML like before, the generated document has posts pre-filtered and it goes through the same rendering process as any normal article I have!
RSS Feed
RSS is tricky! Not only do links have to be rewritten to absolute URLs. I also cannot faithfully deliver the styling that makes my online brand possible. Many node types can pass through just fine, like paragraphs, italic, blockquotes, and such. Specialty nodes like stickers, cards, and highly technical sections need an alternate output, as they
While certainly not so clean, the way I got around this was by setting some ephemeral global state which sets a lite
flag. When the <Sticker ...>...</Sticker>
JSX node is executed, it checks the lite
flag and provides an alternative HTML Table based output.
This lite
flag also benefits silly use cases like displaying content to Internet Explorer 6.
Then, the newest four posts are fully embedded with this rendered HTML, while the remaining next six are stubs that link to this website.
A regression: highlighted code
One concern I had going in was how document-ir
did not support highlighted code. In a way, that is by design! The IR should be as simple as possible for processing. Enhancements can always be added later.
And indeed, after I went live, the next task was to create a highlighter.
This opened up more options, actually! See, I only have a few built in languages in Mendoza to work with. The rest are custom that I brought in. And, Mendoza's language parsers often crash the build, which is a pain to deal with.
There's a popular JavaScript project highlight.js which I now use to highlight my code at the edge. The good thing is: I can add my own languages to this too! And it really is not too complicated.
However, instead of directly embedding this functionality into my blog source, it is extracted to another service. A Service Binding is used to couple my cendyne.dev
worker with a worker dedicated to only syntax highlighting.
However, service bindings are also asynchronous. I cannot invoke a service binding while rendering JSX. Therefore, I used a two pass approach, much like the social media embedding described above. The first pass collects all formatted code with a language tag, then asynchronously requests and caches the highlighted results. The second pass renders synchronously and uses the cache to inline HTML when a successful response exists.
A surprise benefit: Search Engine Optimization warnings
Now that document-ir
has proved functional in GitHub Actions and at runtime, I went for one more useful thing, which I could not easily do before.
A wholistic GitHub Check Run is recorded as part of a build job "SEOChecks" that reads the same json files which go to staging and production.
I reviewed warnings and recommendations across several search engine portals. Among them were the usual: title is too short / too long, meta description is too short / too long, missing image alt attributes, too many h1
elements, and so on.
Future
You might be thinking: Wow! That was a lot of effort just to send HTML back to me that looks unchanged. What did that accomplish?
Switching to output and render document-ir
unblocks the next long term project: migrating to another authoring technology. I have pushed Mendoza as far as it will go with comfort. It is time I write in something else.
If I can transform document-ir
into another authoring markup and back to document-ir
without any changes, then I will have a way to port all of my existing content without loss of content. Ultimately, I want to enjoy writing and not need to solve obscure problems to deliver exactly what I want to you, the reader.
Plus, I could do neat things now like dynamically include other content like my Toots on the fediverse on my website, with document-ir
in between.
Maybe I'll do that some time.
Thank you for reading. This is the longest side project I have maintained yet. Despite all the grumbling I have for the issues I have overcome, this software does make me happy.
Small shout out
Check out Xe Iaso's My Blog is Hilariously Overengineered to the Point People Think it's a Static Site. Similarly, my site was static, and now it is nearing the same level of complexity as Xe's. Even Xe has a Content Delivery Network: XeDN.
My sticker commentary is in part inspired by Xe Iaso's writing style.
Camping
Also I wrote most of this while camping! One morning I woke up with a frog on my tent.
The camping diet has been mostly sausage and egg for a few days. I think I'm ready for something else.