Warning! Your devicePixelRatio is not 1 and you have JavaScript disabled!
(Or if this message is not in a red box, your browser doesn't even support basic CSS and you can probably safely ignore this warning.
You'll encounter way more suboptimal things in that case...)
Since HTML is as braindead as it can get, pixel graphics might look off without javashit workarounds.
Try to get a non-HiDPI screen or enable javashit for best results.
2026 updates
Created: 1768509728 (2026-01-15T20:42:08Z), Updated: 1768778270 (2026-01-18T23:17:50Z), 5382 words, ~23 minutes
Tags: meta, rant, blog update
Yes, I know, it's only January, but I couldn't stay put, and fixed some of the worst problems on my site.
git(ea)#
There has been a gitea instance running at git.neptards.moe for some time, but it hasn't been updated in a long time. Back at the time I used it, because it was like a lightweight GitLab, and I also thought I would use its features. But of course, as time went on, gitea became more and more bloated, but the final nail in the coffin was when the chink UI Nazis on the project started making javashit mandatory to use the site. Later there was a forgejo fork, I briefly looked at it, but it was obvious this fork had political reasons (the Chinese on gitea did some shady shit, but honestly I didn't look too deep into this), and not because they were having any problem of turning gitea into a javashit-heavy bloated mess. On the alternatives side, there's sourcehut, but anyone who looked at the sourcehut installation manual can tell it's a hyper-duper scalable microservice crap requiring you to set up 14 gazillion services just to run a basic git hosting, and I'm not sure you can run it without buying a server park first. So in the end, I was stuck with running some old version of gitea because I didn't want to update to the newer abominations.
Fast forward last year, when I ran into stagit, a static site generator for git repositories. You give it a git repo, and it generates a bunch of static HTML files, and it's done. No need to run a daemon, to dynamically generate pages, just plain HTML files served by nginx (or whatever your choice of HTTP server is). I liked the idea, and started thinking about this whole git server, and I frankly come to the conclusion that I don't really need a fancy software forge on my site. No one contributes code anyway. The few bug reports I got on gitea usually disappeared because it always broke mail sending. So maybe the static files generated by stagit is enough.
Well, two problems.
The manual states it's not suitable for large repositories, and it's not an overstatement.
The first time I tried running it, it started processing my QEMU fork, and after a couple (tens) of minutes it crashed because it ran out of disk space...
The blog post says "the cache (-c flag) or (-l maxlimit) is a workaround for this in some cases", but note the readme in the git mirror doesn't mention the -l flag for this problem, only -c.
And the man page states "Write a maximum number of commits to the log.html file only. However the commit files are written as usual."
Yes, this is the correct answer.
All -l flag does is not linking the commit diffs into the commit list page.
All the commit diffs files are still generated, they just become dangling files without any reference to them, thus pointlessly wasting disk space and generation time.
Seriously, what kind of deranged lunatic idiot you have to be to think this feature makes any sense?!
What's the point?
It's worse than imperial units!
So in the end I ended up forking stagit to fix this problem, and be able to generate pages for the git repos I have.
Second, figuring out how to clone the repos.
The stagit author's site gives you a git:// URL.
Yes, the old unencrypted protocol which was an insane choice for anything besides local LAN even 10 years ago.
You have the SSH protocol, but it's generally only used for authenticated access.
So what remains is the HTTP protocol, which is actually two different protocols, the dumb and the smart one.
The dumb protocol is pretty simple, there's a script you have to run after each push (just set up a post-update hook, and it will be taken care of), but then you can just serve the git repo through a simple HTTP server.
Simple to setup on the server, but only do this if you hate your users with a passion.
The problem is, to clone a repo like this, the client first had to download the refs (a single file), then for each ref download the corresponding commit.
First it tries to download the commit object, if it's successful from it it gets the tree and parent ids, which have to be downloaded as well.
The client practically has to make a tree walk, and it's not really parallelizable.
And this goes on until the client hits a 404, because it means the object is in a pack.
So at this point the client has to download the the indexes of every pack file (which sometimes end up being comparable sizes if not bigger than the actual packs), to determine which pack file it has to download.
And here lies another problem, the client has to download the whole packfile, even if it only needs a few objects from it.
Try to download a repo with a lot of loose objects from a server on the other side of the Earth, and you'll be longing at subversion's painfully slow checkout process.
So the other option is the smart protocol. If you have GitLab, Gitea or something similar, it has it's own smart protocol implementation, and everything works fine. But without? The standard answer to this question is git-http-backend. Yes, a CGI script, which means I can't use it directly with nginx. I looked around on the internet for alternatives, but what I found were all either abandoned, still experimental, or written in some hipster language where figuring out how to download and compile them is more trouble than getting nginx to execute CGI. So in the end, I stayed with the CGI script. Fortunately, someone made a CGI module for nginx, so I didn't have to mess with fcgiwrap like I did it once in the past (even though it's not packed in Alpine Linux, so I had to make a custom APKBUILD for it).
But figuring out how to set things up is a different beast.
First, unless you want everything publicly accessible, you need to put a file called git-daemon-export-ok in each git repo you want to export.
I also wanted to use the method where stagit pages URL and the git clone URL are the same, described under the "To serve gitweb at the same url" part in the manual, and the accelerated part below.
Unfortunately, it's for Apache only, so I had to get creative.
In the end I ended up with something like this:
location / {
root /path/to/html;
add_header cache-control "no-cache" always;
}
root /path/to/git;
location ~ ^/[^/]+/[^/]+/(?:HEAD|info/refs|objects/info/packs|git-upload-pack|git-upload-archive)$ {
cgi pass /usr/libexec/git-core/git-http-backend;
}
location ~ ^/[^/]+/[^/]+/objects/ {
}
There are two main changes compared to what's in the man page.
First, I don't forward git-receive-pack to the CGI script.
I never intend to push to a HTTP URL, nginx's user doesn't have write access to the git repos anyway, but still, minimizing the attack surface seems like a wise decision.
Second, I simplified the regex around the objects dir, only objects/info/packs have to be forwarded to git-http-server (this is a file which is normally generated by git update-server-info, so it can be missing/outdated if the post-update hook is not set up), everything else is just plain object/pack files.
Just in this case if you have a repo without git-daemon-export-ok, but an attacker knows some object/pack id, they will be able to download it.
Of course, when I tried it it didn't work.
First, on Alpine Linux, I had to install the git-daemon package, git-http-backend is not in the normal git package.
But then I still got HTTP 500 errors, without anything in the logs.
In the end I ran nginx in strace with follow mode, and I finally found the problem.
Of course, it's CVE-2022-24765 still fucking over everyone's life.
I don't know in which sane system (Windows doesn't qualify) do you end up storing git repos in a dir, where an untrusted user might have access to one of it's parent directories, but whatever.
Plus the error help message (which is only visible inside strace's output for maximum user friendliness) only mentions a git command to execute to fix it.
Yeah, I'm going to figure out how to execute a git command as my web server.
If you don't want to figure out which random directory will be your HOME dir, you can create (or append to) the file /etc/gitconfig:
[safe]
directory = /path/to/repos/*
One small note, here * works like **, so this will match every repo under /path/to/repos.
nanoc#
In last year's update, I already hinted at my discontent with nanoc, which should be called slowc.
During the Christmas I had some unintended time where I had nothing better to do than thinking about this whole shit, and after my brain was overflowing with ideas I knew I had to do something.
My first attempt was to utilize ninja to build the site.
Sure, it doesn't have a web server like nanoc or can't watch directories for a change, but those should be relatively easy to add with a Ruby script, right?
But I didn't get so far, problems manifested way earlier.
Getting back to the hashes topic, the first program I made was a little Ruby script to calculate the hashes needed for the filenames.
I generated a basic ninja build file, ran it... and using all 32 threads of my CPU, it took more time than nanoc did it on a single thread.
Uhhh, just perfect, turns out the Ruby interpreter's start up time is not exactly negligible.
I had a second job (more on it a bit later) which only called a few shell commands, it was much faster.
Should I rewrite it in C?
But thinking about it, I have a lot of (over 2000) small files, so batching them together could help a lot.
I tried it, and it indeed helped, but then welcome to ninja's limitations.
Unlike make, ninja saves the command line of the executed program, and if it changes it rebuilds the output, even if the inputs are unchanged.
Great, this is what you want 99.9% of the times, but here this would mean I need a way to get a "stable" batching, so if I add/remove a single file, only a few batches need to be rebuild, not half of them.
Oh and since the generator is re-run every time the list of files change, this batching algorithm has to be stateless.
I'm pretty sure some smart boy already came up with a clever algorithm, but I had another problem, and these two problems eventually led to me abandoning ninja.
ninja was originally made to build C/C++ programs.
Generally, your build targets have one or more inputs, one or more outputs, and during the execution the compiler will output extra inputs (headers) it uses in some format.
Please note this assumes those extra inputs are existing when the build starts, they're not generated by the build script.
If they are, you usually have to manually add these generated files as (probably order-only) dependencies.
Later ninja got a dyndep feature, which allows you to have a target generate the list of inputs/outputs used by other targets.
Note the word "other".
It must be still kinda generated upfront.
There was a conversation about dynamic implicit outputs, someone made a PR about it which got nowhere, someone else continued it, but it also got nowhere.
My workaround for this was to have the Ruby script calculate the hash, write it to a file (with a fixed name), and also generate a dyndep file, then have a second target which does the actual linking, something along the lines of
ln -sf ../../../$in "out/c/$$(cat $hash)/"$base && touch $out
The extra touch at the end is needed because tasks in the dyndep files are identified by the first (normal) output of the target, so it needs a dummy output with a fixed name too.
This worked, but it's disgusting and inefficient.
But this is still not enough, as I can't generate build targets dynamically.
The current site has two "dynamic" file targets, and by dynamic I mean you can't just tell the set of output file names from just the names of the input files: the index pages and tag list.
For the index pages, you need to get the (metadata) of all posts, sort them by creation date, and make chunks of length 5.
Now I could kind of workaround this, because of tag list and series info and the likes, I already needed something which gets the metadata of all posts and computes these info, and the number of index pages needed only depend on the number of posts, so I could just make a "generate page i of index" script, and have it figure out through implicit dependencies which posts it exactly needs.
But tag list is not so easy.
A single post can have an arbitrary number of tags, and they can overlap, and their filename depend on the tag name not just an index.
Maybe I could have said, fuck it, I make 100 tag targets, and if somehow my site ends up with more than 100 tag I need to adjust a constant in my ninja generator script, but at this point I was so full of ninja's limitations, that I decided to roll my own.
nanoc replacement, take two#
At this point, from my ninja experiments and my previous experience with nanoc, I knew Ruby interpreter's startup time is not zero, so spawning a new ruby instance for every little task is not the correct approach.
The question is, how to do parallel processing in Ruby?
There's the dreaded GIL called GVL in MRI Ruby, which means despite having threads in the language, they can't run parallel.
Yes I know about JRuby and TruffleRuby, but they struggle with new Ruby features like non-blocking fibers, and JRuby is extremely slow to start up (multiple seconds from what I remember).
(TruffleRuby without JVM is supposed to be faster, but honestly I never tried it.
No Gentoo ebuild, and I don't want to go back to having to mess with rvm.)
There are Ractors, but...
Oh well...
I can't write a status update without a healthy rant, right?
Some people say Ractors have a chicken/egg problem, but IMHO they have a being fundamentally broken problem. What Ractors try to achieve is having an actor-style framework where you have no shared data (so no locking, and deadlocks and race condition and shit), but without having you to have completely separate instances (like workers in JavaScript). Sounds nice in theory, if you look at it from 1000 km, but upon a closer inspection it's a horrible idea. Since Ruby doesn't really have constants, trying to share code between Ractors means trying to share quasi-constant data between Ractors, while littering Ractors with lots of stupid restrictions. For example, this works:
X = 1
Ractor.new { puts X }
X = 2
and depending on the scheduling, you might get 1 or 2 printed to the console.
But if you have X = "foo" (without frozen_string_literal), you get an IsolationError.
If you use $x = 1, you also get an IsolationError.
Or you know, you can do def x instead of assigning to a constant to prevent getting warnings about already initialized constants.
You cannot access class variables (@@foo) from Ractors at all, but you can read (but not write) instance variables on classes/modules if they're shareable (which in the end is practically the same thing).
So what we have here, is something which combines the bad aspects of threads (hard to reason about program state, deadlocks, race conditions) with actor-like frameworks (lot of copying because the lack of shared state, problematic if you have non serializable/transferable objects), with zero of advantages of either approach, while as an added bonus, breaks about 99.9% of existing Ruby code.
Also look at this article, where the author tries to implement implement a very basic Ractor aware connection pool in about two pages of code.
Pure insanity.
For comparison, this is how it would look like with threads:
class Pool
def initialize
@queue = Thread::Queue.new
end
def checkout
@queue.pop true
rescue ThreadError
Whatever.new
end
def checkin(conn) = @queue << conn
end
And with async:
class Pool
def initialize
@pool = []
end
def checkout = @pool.pop || Whatever.new
def checkin(conn) = @pool << conn
end
(Well, because of the GVL probably even this works with Threads.) They should have either went on to have an ability to fire up complete separate Ruby interpreter on a new thread (like JavaScript workers), or get rid of GVL (like what Python or the alternate Ruby implementations did). The Ractors in their current form are a complete bullshit. But anyway, the final nail in the coffin of Ractors was this:
[4] pry(main)> Ractor.new { OpenSSL::Digest::SHA256.digest 'foo' }
(pry):4: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
#<Thread:0x00007f6df708d4c0 run> terminated with exception (report_on_exception is true):
(pry):4:in 'block in __pry__': defined with an un-shareable Proc in a different Ractor (RuntimeError)
=> #<Ractor:#2 (pry):4 terminated>
Even Ruby's built in libraries are not Ractor compatible yet. (A small note, the info above is correct as of Ruby 3.4, I saw the freshly released Ruby 4.0 had some Ractor improvements, but most of what I wrote above still stands.)
So back on how to do parallel processing in Ruby question.
Seems like the answer is still the same as 20 years ago, by forking (or spawning) extra Ruby processes.
How do you communicate with external processes in a scripted language (so no things like boost::interprocess)?
Most likely pipes or sockets.
The problem is I have many workers, and while having a dedicated pipe to each worker and deciding which worker gets which job in the master process works, I a) didn't want to implement it, b) this inadvertently leads to bottlenecks if many workers finish their job at the same time, as the master is single threaded.
What was my solution?
Sockets can be set do datagram mode, and if you do it, the kernel will guarantee each datagram will be received by exactly one worker in whole.
Sounds good?
Almost.
First problem is process management.
When having multiple processes talking to each other, I like to design them in a way so if any of the processes crashes, the others shut down too, and not leaving lingering processes behind, like what chrome always do.
This means the master kills the child processes before exit, and the child processes need some way to detect the disappearance of the master process too.
With pipes, if you try to read after the write end is closed, you get an EOF, so you can detect a dying parent easily.
The same is true if you create a stream unix domain socket pair, but I need datagrams.
And datagrams are connectionless, so reading on a socket where the other end is already closed results in an infinite wait.
Useful, right?
To solve this problem you need to use SOCK_SEQPACKET, which is a weird hybrid between stream and datagram mode, and I didn't even know about its existence before I ran into this problem.
For unix domain sockets, it's pretty much the same as datagram mode, except it's connection-based, so you can detect when the other peer dies.
Second is, by changing to datagrams, I lost the ability to send arbitrarily long messages.
There's a globally settable size of the buffer used by the Linux kernel for unix domain sockets, if your message doesn't fit in it, sending fails, and this limit is around 200 kiB, which is not a small size, but not a lot either.
Fortunately, unix domain sockets allow passing file descriptors around, so I can write large data into a file and pass the FD around.
But do I really need to create temp files?
Can't it be some kind of shared memory?
While looking at the io-memory project, I realized the solution: memfd_create!
It's Linux only, but with a single syscall you get something like creating a temporary file on a tmpfs, without needing a specific tmpfs mount.
I didn't end up using io-memory project, because it was made for shared memory, and in my case I need to marshal Ruby objects, which I can do directly into an FD, but not into an mmapped memory region.
Porting the shit over#
After I had a working multiprocess work queue, I started writing the basics of a task scheduler and tasks to calculate the hashes of static files.
I'm not going to write a lot about this, you just have to wait until the dependencies are done and start building.
The problematic part is fixing all the weird edge cases, the main reason I wanted to use ninja originally, to save me from this...
But after I had the basic system up and running, it blew away ninja's process forking madness, being way faster than it, and I didn't even have to implement any kind of batching!
So it's time to mull over how to actually build this site.
Static files are the easiest.
Just hash them, and link them into the output dir at c/<hash>/<filename>.
No dependencies, easy peasy.
Next the CSS files.
I actually run CSS files through erubi to do some substitutions, most importantly resolve links in url()s.
So to compile CSS files, I need hashes.
Which hashes?
Good question, I can only tell it after I executed the Ruby script in the erubi templates.
Nanoc handled these dependencies more or less automatically, but here I'd need to declare them upfront.
To avoid it, I went with ninja's idea on how to handle generated headers, put the task generating them into an order-only dependency, and during the C build, capture the actual list of used headers from the compiler.
Of course, here I don't generate a make compatible file of dependencies just to parse it in the next step, but the theory is the same.
On the first build, all hashes have to be done, on subsequent builds, only changes in the actually used hashes causes a rebuild.
And what's the name of the output?
Since I wanted CSS files to be cacheable like static assets, they need a path which change every time the CSS changes.
Actually, in case of nanoc, I had a disgusting hack, where I tried to extract the dependencies from the CSS using a regex, since in nanoc you have to tell the destination filenames upfront, but here's no need for it, I just hash the output CSS file.
This also means if something in the input changes which disappears after minification, it won't create a new file.
I also finally managed to integrate rollup properly.
Instead of invoking the rollup executable directly, I have a small javashit file which uses rollup's API to compile the bundle and output a JSON on stdout.
Doing this has two advantages: first, this way I can get a list of dependencies, which is crucial for correct subsequent rebuilds.
Second, this way I can hash the JavaScript output from Ruby and put it into the correct dir without having to deal with temporary files.
And I also got rid another disgusting hack here.
Right now, the javashit bundle has two filename references (a CSS and an image file).
With nanoc, I only knew these names after the nanoc rules were processed, but I had to execute rollup in advance.
My solution were to put some ugly placeholders in the JS code, then replace it with an ugly regex in nanoc.
Fortunately this is no longer needed, another win for sanity!
Then onto the posts.
First problem is the large amount of helpers I've written, I had to adjust them to work without nanoc.
Interestingly, even though nanoc comes with a bunch of helpers, I barely used any of them, so most changes were required to figure out how to get all the info my previous helpers got from @items.
And a few random problems popped up.
For every video on the blog, I need to extract the mime types (so the browser can tell which formats are supported), the length of the video, and for every image that appears directly in the page, their size (if not overridden).
Videos are a bit simpler, I need the mime types for every video file, so I can get them in another task, easily parallelizable and cacheable.
However I don't need the length for every video file, the different formats of the same video have the same length, but for now I went with only getting the lengths of the HQ formats.
Every video is available in HQ format, and I don't think this will change.
The images are different though.
I really don't know the list of images I need to get the size of in advance, and there's a bunch of images I don't need to get the size of, so with images I went in a similar direction as in nanoc, while compiling the markdown files, calculate images sizes of the needed images on the fly, and cache them.
But I'm not done.
As you can see at the top or bottom of this page, there's a list of other posts in the same series.
To generate it, I need to know the metadata of every other posts on the site.
(Actually I figured out this part while thinking about the ninja implementation, but it didn't get anywhere.)
So there's an extra task which gets all posts on the site, parses the JSON metadata at the beginning of the file, merges them together, and stores them for the later markdown tasks.
I have a problem with this approach though, as now any change in any post triggers the rebuilding of this task.
Fortunately if the output is unchanged, the build script prunes this task from the dependency tree (something similar to what ninja does), so it's not too bad, but I'm still thinking about abolishing this nanoc style metadata in the file crap.
Also this is the same task which takes care of the index and tag lists pages.
In ninja this is impossible, but here I can just generate a new task while building.
One last thing I wanted to take care of is compressing text files.
I never figured out how to do it with nanoc efficiently, all I had was a simple bash script which rsynced the output files to a different directory and compressed them as needed.
In my new build script, I have a pretty generic "task with dependencies" implementation, so I just needed a few new new tasks and done.
Automatically parallelized and dependency checked.
Sweet.
Now the question is, was it worth it?
A full rebuild with nanoc and empty caches took about 22.03 s, as reported by nanoc.
However this metric can be misleading, as nanoc only measures the time it spent actually building the site, the total time required (as measured by time nanoc) is more like 22.64 s.
Yes, nanoc needs about 600 ms to start up.
With filled page caches, the same operation takes 19.76 (20.28) s.
And this is after a lot of optimizations I did last year, before it was over 45 s.
But what was the most annoying is when I edited a single file, and it took ages for the results to show up.
So I benchmarked no-op rebuilds, they were 3.77 (4.39) s uncached and 3.62 (4.14) s cached.
4 seconds just to figure out you don't have to do anything!
So, what about my own script, executed on the same computer, but not restricted to a single thread?
A full uncached rebuild is 5.65 (6.33) s and cached is 4.28 (4.84) s.
A full rebuild is almost as fast as a no-op rebuild with nanoc!
And of course, the numbers for no-op rebuilds are: 0.34 (0.90) s uncached, 0.27 (0.80) s cached.
I save the md file in emacs, look at my browser, and the results are there, not wait 4–5 seconds then the results are there.
However, I'm comparing apples with oranges here.
The build times for my script include rollup and compressing the output files, while nanoc numbers doesn't.
So I made a large command line, combining rollup, nanoc, and my compress script.
Build times here are 33.92 s uncached, 31.36 s cached, and no-op builds are 6.28 / 6.02 s.
Now, this takes longer to figure out nothing has changed than my script needs to to a full rebuild.
Not bad.
One thing still annoying me though, is the startup time of the Ruby interpreter. As you can see it's around 500–600 ms for both, which makes sense as both solutions use roughly the same set of underlying libraries. Fortunately, it's not a huge problem if I use watch mode, which I generally do (also something I implemented, but let's not go into it...). I made an attempt of trying to lazy load the libraries only when needed, but then as I expected it, it made no-op rebuilds faster and full rebuilds slower. In the end, I decided against going this way, made the code more messy with questionable advantages. If I want performance improvements, this is what I should look into:
At first, all the hash jobs starts.
Around the 1 s mark, hashes start finishing.
However it still takes about 1.5 s until all the hashes finish, as there's an almost 1 GiB video, which in itself takes about 0.5 s to hash, and the ordering of the tasks are less than optimal.
Then the CSS files can start, then rollup, and it's not fast, it needs about 1.1 s, and only after can it start compiling the pages.
I should probably switch over to xxHash, as hashing the same file only needs about 0.1 s using XXH3_64bits while still having enough bits to generate a 10 character id.
But as I mentioned in 2025 updates, this would change every file's location, so I only did this partially: there's an ugly regex in the code matching the locations of the old articles, files there are hashed using SHA-256, everything else using xxHash.
My plan is to remove directories from this regexp if I change any old post, or alternatively wait until I have a better reason to make such a breaking change.
Furthermore, right now every CSS/JS/markdown file depends on every hash, but in practice this is not needed, right now none of my posts need any file from another post's directory, and I don't see any reason why would I change this.
Well, maybe next time.
Update 2026-01-18: And I couldn't hold back.
I did the per-directory hashes I mentioned above, and also gave some tasks higher priority, this way a full cached build takes 3.27 (3.83) s.
Faster than nanoc no-op build!
I'm not posting another graph, since it looks the similar, except the valley is a bit shorter.
Rollup just takes longer then all the hashing.
One thing I could look into here is using swc, which is much faster than typescript, but there's a catch: it doesn't do typechecking.
So in this case I'd still need to run typescript manually, the only thing I'd win is moving typescript checking out of the critical path.
Maybe some day.
End update.
Blog is open source#
With the above mentioned changed, I decided to make the blog finally open source, it's available on my git server. I don't want to write too much about here, check the git if you're interested, but I have to warn you it can be messy. Also it uses git-annex to store large media files, make sure you have git-annex installed and follow the readme to get the assets, if you actually want to compile the site. But if all you're interested in is how the site is being made, you probably don't need it, just don't be surprised if image/video files end up as symlinks to some weirdly named nonexistent files.