Warning! Your devicePixelRatio is not 1 and you have JavaScript disabled!
(Or if this message is not in a red box, your browser doesn't even support basic CSS and you can probably safely ignore this warning.
You'll encounter way more suboptimal things in that case...)
Since HTML is as braindead as it can get, pixel graphics might look off without javashit workarounds.
Try to get a non-HiDPI screen or enable javashit for best results.
2026 updates
Created: 1768509728 (2026-01-15T20:42:08Z), Updated: 1769956313 (2026-02-01T14:31:53Z), 5731 words, ~24 minutes
Tags: meta, rant, blog update
This post is part of series blog update: 2023, 2024, 2025, 2026
Yes, I know, it's only January, but I couldn't stay put, and fixed some of the worst problems on my site.
git(ea)#
There has been a gitea instance running at git.
Fast forward last year, when I ran into stagit, a static site generator for git repositories. You give it a git repo, and it generates a bunch of static HTML files, and it's done. No need to run a daemon, to dynamically generate pages, just plain HTML files served by nginx (or whatever your choice of HTTP server is). I liked the idea, and started thinking about this whole git server, and I frankly come to the conclusion that I don't really need a fancy software forge on my site. No one contributes code anyway. The few bug reports I got on gitea usually disappeared because it always broke mail sending. So maybe the static files generated by stagit is enough.
Well, two problems.
The manual states it's not suitable for large repositories, and it's not an overstatement.
The first time I tried running it, it started processing my QEMU fork, and after a couple (tens) of minutes it crashed because it ran out of disk space...
The blog post says "the cache (-c flag) or (-l maxlimit) is a workaround for this in some cases", but note the readme in the git mirror doesn't mention the -l flag for this problem, only -c.
And the man page states "Write a maximum number of commits to the log.-l flag does is not linking the commit diffs into the commit list page.
All the commit diffs files are still generated, they just become dangling files without any reference to them, thus pointlessly wasting disk space and generation time.
Seriously, what kind of deranged lunatic idiot you have to be to think this feature makes any sense?!
What's the point?
It's worse than imperial units!
So in the end I ended up forking stagit to fix this problem, and be able to generate pages for the git repos I have.
Second, figuring out how to clone the repos.
The stagit author's site gives you a git:// URL.
Yes, the old unencrypted protocol which was an insane choice for anything besides local LAN even 10 years ago.
You have the SSH protocol, but it's generally only used for authenticated access.
So what remains is the HTTP protocol, which is actually two different protocols, the dumb and the smart one.
The dumb protocol is pretty simple, there's a script you have to run after each push (just set up a post-update hook, and it will be taken care of), but then you can just serve the git repo through a simple HTTP server.
Simple to setup on the server, but only do this if you hate your users with a passion.
The problem is, to clone a repo like this, the client first had to download the refs (a single file), then for each ref download the corresponding commit.
First it tries to download the commit object, if it's successful from it it gets the tree and parent ids, which have to be downloaded as well.
The client practically has to make a tree walk, and it's not really parallelizable.
And this goes on until the client hits a 404, because it means the object is in a pack.
So at this point the client has to download the the indexes of every pack file (which sometimes end up being comparable sizes if not bigger than the actual packs), to determine which pack file it has to download.
And here lies another problem, the client has to download the whole packfile, even if it only needs a few objects from it.
Try to download a repo with a lot of loose objects from a server on the other side of the Earth, and you'll be longing at subversion's painfully slow checkout process.
So the other option is the smart protocol. If you have GitLab, Gitea or something similar, it has it's own smart protocol implementation, and everything works fine. But without? The standard answer to this question is git-http-backend. Yes, a CGI script, which means I can't use it directly with nginx. I looked around on the internet for alternatives, but what I found were all either abandoned, still experimental, or written in some hipster language where figuring out how to download and compile them is more trouble than getting nginx to execute CGI. So in the end, I stayed with the CGI script. Fortunately, someone made a CGI module for nginx, so I didn't have to mess with fcgiwrap like I did it once in the past (even though it's not packed in Alpine Linux, so I had to make a custom APKBUILD for it).
But figuring out how to set things up is a different beast.
First, unless you want everything publicly accessible, you need to put a file called git-daemon-export-ok in each git repo you want to export.
I also wanted to use the method where stagit pages URL and the git clone URL are the same, described under the "To serve gitweb at the same url" part in the manual, and the accelerated part below.
Unfortunately, it's for Apache only, so I had to get creative.
In the end I ended up with something like this:
location / {
root /path/to/html;
add_header cache-control "no-cache" always;
}
root /path/to/git;
location ~ ^/[^/]+/[^/]+/(?:HEAD|info/refs|objects/info/packs|git-upload-pack|git-upload-archive)$ {
cgi pass /usr/libexec/git-core/git-http-backend;
}
location ~ ^/[^/]+/[^/]+/objects/ {
}
There are two main changes compared to what's in the man page.
First, I don't forward git-receive-pack to the CGI script.
I never intend to push to a HTTP URL, nginx's user doesn't have write access to the git repos anyway, but still, minimizing the attack surface seems like a wise decision.
Second, I simplified the regex around the objects dir, only objects/info/packs have to be forwarded to git-http-server (this is a file which is normally generated by git update-server-info, so it can be missing/git-daemon-export-ok, but an attacker knows some object/
Of course, when I tried it it didn't work.
First, on Alpine Linux, I had to install the git-daemon package, git-http-backend is not in the normal git package.
But then I still got HTTP 500 errors, without anything in the logs.
In the end I ran nginx in strace with follow mode, and I finally found the problem.
Of course, it's CVE-2022-24765 still fucking over everyone's life.
I don't know in which sane system (Windows doesn't qualify) do you end up storing git repos in a dir, where an untrusted user might have access to one of it's parent directories, but whatever.
Plus the error help message (which is only visible inside strace's output for maximum user friendliness) only mentions a git command to execute to fix it.
Yeah, I'm going to figure out how to execute a git command as my web server.
If you don't want to figure out which random directory will be your HOME dir, you can create (or append to) the file /etc/gitconfig:
[safe]
directory = /path/to/repos/*
One small note, here * works like **, so this will match every repo under /path/to/repos.
nanoc#
In last year's update, I already hinted at my discontent with nanoc, which should be called slowc.
During the Christmas I had some unintended time where I had nothing better to do than thinking about this whole shit, and after my brain was overflowing with ideas I knew I had to do something.
My first attempt was to utilize ninja to build the site.
Sure, it doesn't have a web server like nanoc or can't watch directories for a change, but those should be relatively easy to add with a Ruby script, right?
But I didn't get so far, problems manifested way earlier.
Getting back to the hashes topic, the first program I made was a little Ruby script to calculate the hashes needed for the filenames.
I generated a basic ninja build file, ran it... and using all 32 threads of my CPU, it took more time than nanoc did it on a single thread.
Uhhh, just perfect, turns out the Ruby interpreter's start up time is not exactly negligible.
I had a second job (more on it a bit later) which only called a few shell commands, it was much faster.
Should I rewrite it in C?
But thinking about it, I have a lot of (over 2000) small files, so batching them together could help a lot.
I tried it, and it indeed helped, but then welcome to ninja's limitations.
Unlike make, ninja saves the command line of the executed program, and if it changes it rebuilds the output, even if the inputs are unchanged.
Great, this is what you want 99.ninja.
ninja was originally made to build C/ninja got a dyndep feature, which allows you to have a target generate the list of inputs/
ln -sf ../../../$in "out/c/$$(cat $hash)/"$base && touch $out
The extra touch at the end is needed because tasks in the dyndep files are identified by the first (normal) output of the target, so it needs a dummy output with a fixed name too.
This worked, but it's disgusting and inefficient.
But this is still not enough, as I can't generate build targets dynamically.
The current site has two "dynamic" file targets, and by dynamic I mean you can't just tell the set of output file names from just the names of the input files: the index pages and tag list.
For the index pages, you need to get the (metadata) of all posts, sort them by creation date, and make chunks of length 5.
Now I could kind of workaround this, because of tag list and series info and the likes, I already needed something which gets the metadata of all posts and computes these info, and the number of index pages needed only depend on the number of posts, so I could just make a "generate page i of index" script, and have it figure out through implicit dependencies which posts it exactly needs.
But tag list is not so easy.
A single post can have an arbitrary number of tags, and they can overlap, and their filename depend on the tag name not just an index.
Maybe I could have said, fuck it, I make 100 tag targets, and if somehow my site ends up with more than 100 tag I need to adjust a constant in my ninja generator script, but at this point I was so full of ninja's limitations, that I decided to roll my own.
nanoc replacement, take two#
At this point, from my ninja experiments and my previous experience with nanoc, I knew Ruby interpreter's startup time is not zero, so spawning a new ruby instance for every little task is not the correct approach.
The question is, how to do parallel processing in Ruby?
There's the dreaded GIL called GVL in MRI Ruby, which means despite having threads in the language, they can't run parallel.
Yes I know about JRuby and TruffleRuby, but they struggle with new Ruby features like non-blocking fibers, and JRuby is extremely slow to start up (multiple seconds from what I remember).
(TruffleRuby without JVM is supposed to be faster, but honestly I never tried it.
No Gentoo ebuild, and I don't want to go back to having to mess with rvm.)
There are Ractors, but...
Oh well...
I can't write a status update without a healthy rant, right?
Some people say Ractors have a chicken/
X = 1
Ractor.new { puts X }
X = 2
and depending on the scheduling, you might get 1 or 2 printed to the console.
But if you have X = "foo" (without frozen_string_literal), you get an IsolationError.
If you use $x = 1, you also get an IsolationError.
Or you know, you can do def x instead of assigning to a constant to prevent getting warnings about already initialized constants.
You cannot access class variables (@@foo) from Ractors at all, but you can read (but not write) instance variables on classes/
class Pool
def initialize
@queue = Thread::Queue.new
end
def checkout
@queue.pop true
rescue ThreadError
Whatever.new
end
def checkin(conn) = @queue << conn
end
And with async:
class Pool
def initialize
@pool = []
end
def checkout = @pool.pop || Whatever.new
def checkin(conn) = @pool << conn
end
(Well, because of the GVL probably even this works with Threads.) They should have either went on to have an ability to fire up complete separate Ruby interpreter on a new thread (like JavaScript workers), or get rid of GVL (like what Python or the alternate Ruby implementations did). The Ractors in their current form are a complete bullshit. But anyway, the final nail in the coffin of Ractors was this:
[4] pry(main)> Ractor.new { OpenSSL::Digest::SHA256.digest 'foo' }
(pry):4: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
#<Thread:0x00007f6df708d4c0 run> terminated with exception (report_on_exception is true):
(pry):4:in 'block in __pry__': defined with an un-shareable Proc in a different Ractor (RuntimeError)
=> #<Ractor:#2 (pry):4 terminated>
Even Ruby's built in libraries are not Ractor compatible yet.
(A small note, the info above is correct as of Ruby 3.
So back on how to do parallel processing in Ruby question.
Seems like the answer is still the same as 20 years ago, by forking (or spawning) extra Ruby processes.
How do you communicate with external processes in a scripted language (so no things like boost::interprocess)?
Most likely pipes or sockets.
The problem is I have many workers, and while having a dedicated pipe to each worker and deciding which worker gets which job in the master process works, I a) didn't want to implement it, b) this inadvertently leads to bottlenecks if many workers finish their job at the same time, as the master is single threaded.
What was my solution?
Sockets can be set do datagram mode, and if you do it, the kernel will guarantee each datagram will be received by exactly one worker in whole.
Sounds good?
Almost.
First problem is process management.
When having multiple processes talking to each other, I like to design them in a way so if any of the processes crashes, the others shut down too, and not leaving lingering processes behind, like what chrome always do.
This means the master kills the child processes before exit, and the child processes need some way to detect the disappearance of the master process too.
With pipes, if you try to read after the write end is closed, you get an EOF, so you can detect a dying parent easily.
The same is true if you create a stream unix domain socket pair, but I need datagrams.
And datagrams are connectionless, so reading on a socket where the other end is already closed results in an infinite wait.
Useful, right?
To solve this problem you need to use SOCK_SEQPACKET, which is a weird hybrid between stream and datagram mode, and I didn't even know about its existence before I ran into this problem.
For unix domain sockets, it's pretty much the same as datagram mode, except it's connection-based, so you can detect when the other peer dies.
Second is, by changing to datagrams, I lost the ability to send arbitrarily long messages.
There's a globally settable size of the buffer used by the Linux kernel for unix domain sockets, if your message doesn't fit in it, sending fails, and this limit is around 200 kiB, which is not a small size, but not a lot either.
Fortunately, unix domain sockets allow passing file descriptors around, so I can write large data into a file and pass the FD around.
But do I really need to create temp files?
Can't it be some kind of shared memory?
While looking at the io-memory project, I realized the solution: memfd_create!
It's Linux only, but with a single syscall you get something like creating a temporary file on a tmpfs, without needing a specific tmpfs mount.
I didn't end up using io-memory project, because it was made for shared memory, and in my case I need to marshal Ruby objects, which I can do directly into an FD, but not into an mmapped memory region.
Porting the shit over#
After I had a working multiprocess work queue, I started writing the basics of a task scheduler and tasks to calculate the hashes of static files.
I'm not going to write a lot about this, you just have to wait until the dependencies are done and start building.
The problematic part is fixing all the weird edge cases, the main reason I wanted to use ninja originally, to save me from this...
But after I had the basic system up and running, it blew away ninja's process forking madness, being way faster than it, and I didn't even have to implement any kind of batching!
So it's time to mull over how to actually build this site.
Static files are the easiest.
Just hash them, and link them into the output dir at c/<hash>/<filename>.
No dependencies, easy peasy.
Next the CSS files.
I actually run CSS files through erubi to do some substitutions, most importantly resolve links in url()s.
So to compile CSS files, I need hashes.
Which hashes?
Good question, I can only tell it after I executed the Ruby script in the erubi templates.
Nanoc handled these dependencies more or less automatically, but here I'd need to declare them upfront.
To avoid it, I went with ninja's idea on how to handle generated headers, put the task generating them into an order-only dependency, and during the C build, capture the actual list of used headers from the compiler.
Of course, here I don't generate a make compatible file of dependencies just to parse it in the next step, but the theory is the same.
On the first build, all hashes have to be done, on subsequent builds, only changes in the actually used hashes causes a rebuild.
And what's the name of the output?
Since I wanted CSS files to be cacheable like static assets, they need a path which change every time the CSS changes.
Actually, in case of nanoc, I had a disgusting hack, where I tried to extract the dependencies from the CSS using a regex, since in nanoc you have to tell the destination filenames upfront, but here's no need for it, I just hash the output CSS file.
This also means if something in the input changes which disappears after minification, it won't create a new file.
I also finally managed to integrate rollup properly.
Instead of invoking the rollup executable directly, I have a small javashit file which uses rollup's API to compile the bundle and output a JSON on stdout.
Doing this has two advantages: first, this way I can get a list of dependencies, which is crucial for correct subsequent rebuilds.
Second, this way I can hash the JavaScript output from Ruby and put it into the correct dir without having to deal with temporary files.
And I also got rid another disgusting hack here.
Right now, the javashit bundle has two filename references (a CSS and an image file).
With nanoc, I only knew these names after the nanoc rules were processed, but I had to execute rollup in advance.
My solution were to put some ugly placeholders in the JS code, then replace it with an ugly regex in nanoc.
Fortunately this is no longer needed, another win for sanity!
Then onto the posts.
First problem is the large amount of helpers I've written, I had to adjust them to work without nanoc.
Interestingly, even though nanoc comes with a bunch of helpers, I barely used any of them, so most changes were required to figure out how to get all the info my previous helpers got from @items.
And a few random problems popped up.
For every video on the blog, I need to extract the mime types (so the browser can tell which formats are supported), the length of the video, and for every image that appears directly in the page, their size (if not overridden).
Videos are a bit simpler, I need the mime types for every video file, so I can get them in another task, easily parallelizable and cacheable.
However I don't need the length for every video file, the different formats of the same video have the same length, but for now I went with only getting the lengths of the HQ formats.
Every video is available in HQ format, and I don't think this will change.
The images are different though.
I really don't know the list of images I need to get the size of in advance, and there's a bunch of images I don't need to get the size of, so with images I went in a similar direction as in nanoc, while compiling the markdown files, calculate images sizes of the needed images on the fly, and cache them.
But I'm not done.
As you can see at the top or bottom of this page, there's a list of other posts in the same series.
To generate it, I need to know the metadata of every other posts on the site.
(Actually I figured out this part while thinking about the ninja implementation, but it didn't get anywhere.)
So there's an extra task which gets all posts on the site, parses the JSON metadata at the beginning of the file, merges them together, and stores them for the later markdown tasks.
I have a problem with this approach though, as now any change in any post triggers the rebuilding of this task.
Fortunately if the output is unchanged, the build script prunes this task from the dependency tree (something similar to what ninja does), so it's not too bad, but I'm still thinking about abolishing this nanoc style metadata in the file crap.
Also this is the same task which takes care of the index and tag lists pages.
In ninja this is impossible, but here I can just generate a new task while building.
One last thing I wanted to take care of is compressing text files.
I never figured out how to do it with nanoc efficiently, all I had was a simple bash script which rsynced the output files to a different directory and compressed them as needed.
In my new build script, I have a pretty generic "task with dependencies" implementation, so I just needed a few new new tasks and done.
Automatically parallelized and dependency checked.
Sweet.
Now the question is, was it worth it?
A full rebuild with nanoc and empty caches took about 22.nanoc.
However this metric can be misleading, as nanoc only measures the time it spent actually building the site, the total time required (as measured by time nanoc) is more like 22.nanoc needs about 600 ms to start up.
With filled page caches, the same operation takes 19.nanoc!
And of course, the numbers for no-op rebuilds are: 0.
However, I'm comparing apples with oranges here.
The build times for my script include rollup and compressing the output files, while nanoc numbers doesn't.
So I made a large command line, combining rollup, nanoc, and my compress script.
Build times here are 33.
One thing still annoying me though, is the startup time of the Ruby interpreter. As you can see it's around 500–600 ms for both, which makes sense as both solutions use roughly the same set of underlying libraries. Fortunately, it's not a huge problem if I use watch mode, which I generally do (also something I implemented, but let's not go into it...). I made an attempt of trying to lazy load the libraries only when needed, but then as I expected it, it made no-op rebuilds faster and full rebuilds slower. In the end, I decided against going this way, made the code more messy with questionable advantages. If I want performance improvements, this is what I should look into:
At first, all the hash jobs starts.
Around the 1 s mark, hashes start finishing.
However it still takes about 1.rollup, and it's not fast, it needs about 1.
Update 2026-01-18: and I couldn't hold back.
I did the per-directory hashes I mentioned above, and also gave some tasks higher priority, this way a full cached build takes 3.nanoc no-op build!
I'm not posting another graph, since it looks the similar, except the valley is a bit shorter.
Rollup just takes longer then all the hashing.
One thing I could look into here is using swc, which is much faster than typescript, but there's a catch: it doesn't do typechecking.
So in this case I'd still need to run typescript manually, the only thing I'd win is moving typescript checking out of the critical path.
Maybe some day.
End update.
Update 2026-02-01: more news. While I was writing the post about NFS5PS, I've implemented a few new features. First, I can have clickable timestamps for videos. To be honest, this was one of the reasons I looked into JS players, but after I finally managed to integrate Plyr, I lost my motivation to poke it, even with a 10 feet pole. But now it's here, check out in the NFS5PS post. I'll go through the old posts, and add links as necessary.
Second, about the filestore.
When writing the euphoria post, I needed a way to upload the decensor patch somewhere.
I didn't want to add it to the site, because it was a bit big (~354 MiB), and I already knew nanoc doesn't like big files.
Later as I started writing the NFS posts, I needed to "attach" files more and more, I started placing more files there, while I also started using nginx's fancy index module to make file listings a bit less ugly.
But now, I feel like there's no need for this anymore, nanoc is no more, and while doing NFS posts, I had to add huge video files to the git repo anyway, so none of my original concerns hold.
So I wrote a simple attachment functionality, I wanted to debut it with the NFS5PS post, but in the end I didn't attach any files.
This means I'll go over the existing posts linking to filestore later, and migrate them to attachments, and fix all the bugs that crop up in the progress.
And a small note, I'm likely to remove ROMs from the filestore, you can find them on the internet at a zillion place, and I rather avoid dealing with publishers sending their zombie lawyers after me for the crime of distributing something they no longer sell anyway.
Third, I started added file sizes next to download links, so you know what you're ending up downloading if you click on them. A bit of retro feeling from the times when people were surfing the net on painfully slow dial-up connections. End update.
Blog is open source#
With the above mentioned changed, I decided to make the blog finally open source, it's available on my git server.
I don't want to write too much about here, check the git if you're interested, but I have to warn you it can be messy.
Also it uses git-annex to store large media files, make sure you have git-annex installed and follow the readme to get the assets, if you actually want to compile the site.
But if all you're interested in is how the site is being made, you probably don't need it, just don't be surprised if image/