Long Form Posts
- A self-hosted SCM (source code management) system that can be bound to my own domain name.
- A place to store Git repositories and LFS objects that can be scaled up as my needs for storage grow.
- A CI/CD runner of some sort that can be used for automated builds, ideally something that supports Linux and MacOS.
- 2x vCPU (CX22) instances, each with 4 GB RAM and 40 GB storage
- A virtual network which houses these two instances
- One public IP address
- One 50 GB volume which can be resized
- I turned off the ability for others to register themselves as users, an important first step.
- I changed the bind address of Forgejo. It listens to
0.0.0.0:3000
by default, but I wanted to put this behind a reverse proxy, so I changed it to127.0.0.1:3000
. - I also reduced the minimum size of SSH RSA keys. The default was 3,072, but I still have keys of length 2,048 that I wanted to use. There was also an option to turn off this verification.
- I was unable to register the launch agent definition within the user domain. I had to use the
gui
domain instead, which requires the user to be logged in. I’ve got the Mac Mini setup to login on startup, so this isn’t a huge issue, but it’s not quite what I was hoping for. - Half the commands in
launchctl
are deprecated and not fully working. Apples documentation on the command is sparse, and many of the Stack Exchange answers are old. So a lot of effort was spent fumbling through unfinished and outdated documentation trying to install and enable the launch service. - The actual runner is launch using a shell script, but when I tried the backup job, Bash couldn’t access the external drive. I had to explicitly add Bash to Privacy & Security → Full Disk Access within the Settings app.
- Once I finally got the runner up and running as a launch agent, jobs were failing because
.bash_profile
wasn’t been loaded. I had to adjust the launch script to include this explicitly, so that the PATH to Node and Go were set properly. - This was further exacerbated by two runners running at the same time. The foreground runner I was using to test with was configured correctly, while the one running as a launch agent wasn’t fully working yet. This manifested as the back-up job randomly failing with the same
cannot find: node in PATH
error half the time. - One of the most prominent words is “just”, with “it’s” not far behind. I though it’s because I started a lot of sentences with “it’s just”, but it turns out I’ve only used that phrase once, while the individual words show up around 10 times each. I guess I use “just” a lot (apparently, so does Seth). I am surprise to see the word “anyway” only showing up twice.
- Lots of first-person pronouns and articles, like “I’m”, “I’ve”, and “mine”. That’s probably not going to change either. This is just1 the tonal choice I’ve made. I read many blogs that mainly speak in the second person and I don’t think it’s a style that works for me. Although I consciously know that they’re not speaking to me directly, or even to the audience as a whole, I don’t want to give that impression myself, unless that’s my intention. So it’ll be first person for the foreseeable future I’m sure.
- Because it’s only the first page, many of the more prominent words are from recent posts. So lots about testing, OS/2, and Bundanoon. I would like to cut down on how much I write about testing. A lot of it is little more than venting, which I guess is what one does on their blog, but I don’t want to make a habit of it.
- I see the word “good” is prominent. That’s good: not a lot of negative writing (although, this is a choice too).
- I see the word “video” is also prominent. That’s probably not as good. Might be a sign I’m talking a little too much about the videos I’ve been watching.
-
Should there be a “not” here? “It’s not even attention?” ↩︎
- No migration script — If you can get away with not writing a migration script, then this is preferred option. Of course, this will depend on how much data you’ll need to migrate, and how complicated keeping support for the previous version in your code-base. If the amount of data is massive (we’re talking millions or hundred of millions of rows), then this is probably your only option. On the other hand, if there’s a few hundred or a few thousands, then it’s probably just worth migrating the data.
- Always indicate progress — You’re likely going to have way more data in prod that your dev environments so consider showing ongoing progress of the running script. If there’s multiple stages in the migration process, make sure you log when each stage begins. If you’re running a scan or processing records, then give some indication of progress through the collection of rows. A progress bar is nice, but failing that, include a log message say every 1,000 records or so.
- Calculate expected migration size if you can — If it’s relatively cheap to get a count of the number of records that need to be migrated, then it’s helpful for the user to report this to the user. Even an estimate would be good just to give a sense of magnitude. If it’ll be too expensive to do so, then you can ignore it: better to just get migrating rather than have the user wait for a count.
- Silence is golden — Keep logging to the screen to a minimum, mainly progress indicators plus and any serious warnings or errors. Avoid bombarding the user with spurious log messages. They want to know when things go wrong, otherwise they just want to know that the script is running properly. That said:
- Log everything to a file — If the script is involved with migrating data, but will ignored records that have already been migrated, then log that records will be skipped. What you’re looking for is assurance that all records have been dealt with, meaning that any discrepancy with the summary report (such as max records encountered vs. max records migrated) can be reconciled with the log file.
- I cringe every time I see society bend to the limitations of the software they use. It shouldn’t be this way; the software should serve the user, not the other way around.
- I appreciate a well designed API. Much of my job is using APIs built by others, and the good ones always feel natural to use, like water flowing through a creek. Conversely, a badly designed API makes me want to throw may laptop to the ground.
- I think a well designed standard is just as important as a well designed API. Thus, if you’re extending the standard in a way that adds a bunch of exceptions to something that’s already there, you may want to reflect on your priorities and try an approach that doesn’t do that.
- I also try to appreciate, to varying levels of success, that there are multiple ways to do something and once all the hard and fast requirements are settled, it usually just comes down to taste. I know what appeals to my taste, but I also (try to) recognise that others have their own taste as well, and what appeals to them may not gel with me. And I just have to deal with it. I may not like it, but sometimes we have to deal with things we don’t like.
- I believe a user’s home directory is their space, not yours. And you better have a bloody good reason for adding stuff there that the user can see and didn’t ask for.
-
Although not by much. ↩︎
-
Younger me would be shocked to learn that I’d favour a WYSIWYG editor over a text editor with Markdown support ↩︎
A Follow-Up To Mockless Unit Testing
I’m sure everyone’s dying to hear how the mockless unit tests are going. It’s been almost two months since we started this service, and we’re smack bang in the middle of brownfield iterative development: adding new features to existing ones, fixing bugs, etc. So it seems like now is a good time to reflect on whether this approach is working or not.
And so far, it’s been going quite well. The amount of code we have to modify when refactoring or changing existing behaviour is dramatically smaller than before. Previously, when a service introduces a new method call, every single test for that service needed to be changed to handle the new mock assertions. Now, in most circumstances, it’s only one or maybe two tests that need to change. This has made maintenance so much easier, and although I’m not sure it mades us any faster, it just feels faster. Probably because there’s less faffing around unrelated tests that broke due to the updated mocks.
I didn’t think of it at the time, but it also made code reviews easer too. The old way meant longer, noisier PRs which — and I know this is a quality of mine that I need to work at — I usually ignore (I know, I know, I really shouldn’t). With the reviews being smaller, I’m much more likely to keep atop of them, and I attribute this to the way in which the tests are being written.
Code hygiene plays a role here. I got into the habit of adding test helpers to each package I work on. Much like the package is responsible for fulfilling the contract it has with its dependants, so too is it responsible for providing testing facilities for those dependants. I found this to be a great approach. It simplified the level of setup each dependant package needed to do in their tests, reducing the amount of copy and pasted code and, thus, containing the “blast radius” of logic changes. It’s not perfect — there are a few tests where setup and teardown were simply copied-and-pasted from other tests — but it is better.
We didn’t quite get rid of all the mocking though. Tests that exercise the database and message bus do so by calling out to these servers running in Docker, but this service also had to make calls to another worker. Since we didn’t have a test service available to us, we just implemented this using old-school test mocks. The use of the package test helpers did help here: instead of having each test declare the expected calls on this mock, the helper just made “maybe” calls to each of the service methods, and provided a way to assert what calls were recorded.
Of course, much like everything, there are trade-offs. The tests run much slower now, thanks to the database setup and teardown, and we had to lock them down to run on a single thread. It’s not much of an issue now, but we could probably mitigate this with using random database names, rather than have all the test run against the same one. Something that would be less easy to solve are the tests around the message bus, which do need to wait for messages to be received and handled. There might be a way to simplify this too, but since the tests are verifying the entire message exchange, it’d probably be a little more involved.
Another trade-off is that it does feel like you’re repeating yourself. We have tests that check that items are written to the database correctly for the database provider, the service layer, and the handler layer. Since we’re writing tests that operate over multiple layers at a time, this was somewhat expected, but I didn’t expect it to be as annoying as I found it to be. Might be that a compromise is to write the handler tests to use mocks rather than call down to the service layer. Those tests really only validate whether the hander converts the request and response to the models correctly, so best to isolate it there, and leave the tests asserting whether the business rules are correct to the server layer.
So if I were to do this again, that’ll be the only thing I change. But, on the whole, it’s been really refreshing writing unit tests like this. And this is not just my own opinion; I asked my colleague, who’s has told me how difficult it’s been maintaining tests with mocks, and he agrees that this new way has been an improvement. I’d like to see if it’s possible doing this for all other services going forward.
On the Easy Pit To Fall Into
From Matt Bircher’s latest post on Birchtree:
One of the hard parts about sharing one’s opinions online like I do is that it’s very easy to fall into the trap of mostly complaining about things.
This is something I also think about. While I haven’t done anything scientific to know what my ratio of posting about things I like vs. things I don’t, I feel like I’m getting the balance better. It might still be weighted too much on writing about the negatives, but I am trying to write more about things I think are good.
I do wonder, though, why it’s so easy to write about things you hate. Matt has a few theories regarding the dynamics of social media, but I wonder if it more about someone’s personal experience of that thing in question. You hear about something that you’d thought would be worth a try. I doubt many people would actually try something they know they’re going to dislike. If that’s the case, they wouldn’t try it at1. So I’m guessing that there’s some expectation that you’ll like the thing.
So you start experience the thing, and maybe it all goes well at first. Then you encounter something you don’t like about it. You make a note of it and keep going, only to encounter another thing you don’t like, then another. You eventually get to the point where you’ve had enough, and you decided to write about it. And lo, you’ve got this list of paper-cuts that can be easily be used as arguments as to why the thing is no good.
Compare this to something that you do end up liking. You can probably come up with a list of things that are good about it, but you’re less likely to encounter them while you’re experiencing the thing. You just experience them, and it flows through you like water. When the time comes to write about it, you can recall liking the plot, or this character, etc., but they’re more nebulous and it takes effort to solidify them into a post. The drive to find the path of least resistance prevails, and you decided that it’s enough to just like it.
Anyway, this is just a hypothesis. I’m not a psychologist and I’ve done zero research to find out if any of this is accurate. In the end, this post might simply describe why my posting seems to be more weighted towards things I find annoying.
A Tour Of My New Self-Hosted Code Setup
While working on the draft for this post, a quote from Seinfield came to mind which I thought was a quite apt description of this little project:
Breaking up is knocking over a Coke machine. You can’t do it in one push. You gotta rock it back and forth a few times and then it goes over.
I’ve been thinking about “breaking up” with Github on and off for a while now. I know I’m not the only one: I’ve seen a few people online talk about leaving Github too. They have their own reasons for doing so: some because of AI, others are just not fans of Microsoft. For me, it was getting bitten by the indie-web bug and wanting to host my code on my own domain name. I have more than a hundred repositories in Github, and that single github.com/lmika
namespace was getting quite crowded. Being able to organise all these repositories into groups, without fear of collisions or setting up new accounts, was the dream.
But much like the Seinfield quote, it took a few rocks of that fabled Coke machine to get going. I dipped my toe in the water a few times: launching Gitea instances in PikaPod, and also spinning up a Gitlab instance in Linode during a hackathon just to see how well it would feel to manage code that way. I knew it wouldn’t be easy: not only would I be paying more for doing this, it would involve a lot of effort up front (and on an ongoing basis), and I would be taking on the responsibility of backups, keeping CI/CD workers running, and making sure everything is secured and up-to-date. Not difficult work, but still an ongoing commitment.
Well, if I was going to do this at all, it was time to do it for real. I decided to set up my own code hosting properly this time, complete with CI/CD runners, all hosted under my own domain name. And well, that Coke machine is finally on the floor. I’m striking out on my own.
Let me give you a tour of what I have so far.
Infrastructure
My goal was to have a setup with the following properties:
For the SCM, I settled on Forgejo, which is a fork of Gitea, as it seemed like the one that required the least amount of resources to run. When I briefly looked at doing this a while back, Forgejo didn’t have anything resembling GitHub Actions, which was a non-starter for me. But they’re now in Forgejo as an alpha, preview, don’t-use-it-for-anything-resembling-production level of support, and I was curious to know how well they worked, so it was worth trying it out.
I did briefly look at Gitea’s hosted solution, but it was relatively new and I wasn’t sure how long their operations would last. At least with self-hosting, I can choose to exit on my own terms.
It was difficult thinking about how much I was willing to budget for this, considering that it’ll be more than what I’m currently paying for GitHub now, which is about $9 USD /month ($13.34 AUD /month). I settled for a budget of around $20.00 AUD /month, which is a bit much, but I think would give me something that I’d be happy with without breaking the bank.
I first had a go at seeing what Linode had to offer for that kind of money. A single virtual CPU, with 2 GB RAM and 50 GB storage costs around $12.00 USD /month ($17.79 AUD /month). This would be fine if it was just the SCM, but I also want something to run CI/CD jobs. So I then took a look at Hetzner. Not only do they charge in Euro’s, which works in my favour as far as currency conversions go, but their shared-CPU virtual servers were much cheaper. A server with the same specs could be had for only a few euro.
So after a bit of looking around, I settled for the following bill of materials:
This came to €10.21, which was around $16.38 AUD /month. Better infrastructure for a cheaper price is great in my books. The only downside is that they don’t have a data-centre presences in Australia. I settled for the default placement of Falkenstein, Germany and just hoped that the latency wasn’t too slow as to be annoying.
Installing Forgejo
The next step was setting up Forgejo. This can be done using official channels by either downloading a binary, or by installing a Docker image. But there’s also a forgejo-contrib repository that distributes it via common package types, with Systemd configurations that launch Forgejo on startup. Since I was using Ubuntu, I downloaded and installed the Debian package.
Probably the easiest way to get started with Forgejo is to use the version that comes with Sqlite, but since this is something that I’d rather keep for a while, I elected to use Postgre for my database. I installed the latest Ubuntu distribution of Postgres, and setup the database as per the instructions. I also made sure the mount point for the volume was ready, and created a new directory with the necessary owner and permissions so that Forgejo can write to it.
At this point I was able to launch Forgejo and go through the first launch experience. This is where I configured the database connection details, and set the location of the repository and LFS data (I didn’t take a screenshot at the time, sorry). Once that was done, I shut the server down again as I needed to make some changes within the config file itself:
After that, it was a matter of setting up the reverse proxy. I decided to use Caddy for this, as it comes with HTTPS out of the box. This I installed as a Debian package also. Configuring the reverse proxy by changing the Caddyfile deployed in /etc
was a breeze and after making the changes and starting Caddy, I was able to access Forgejo via the domain I setup.
One quick note about performance: although logging in via SSH was a little slow, I had no issues with the speed of accessing Forgejo via the browser.
The Runners
The next job was setting the runners. I thought this was going to be easier than setting up Forgejo itself, but I did run into a few snags which slowed me down.
The first was finding out that a Hetzner VM running without a public IP address actually doesn’t have any route to the internet, only the local network. The way to fix this is to setup one of the hosts which did have a public IP address to act as a NAT gateway. Hetzner has instructions on how to do this, and after performing a hybrid approach of following both the Ubuntu 20.04 instructions and Ubuntu 22.04 instructions, I was able to get the runner host online via the Forgejo host. Kinda wish I knew about this before I started this.
For the runners, I elected to go with the Docker-based setup. Forgejo has pretty straightforward instructions for setting them up using Docker Compose, and I changed it a bit so that I could have two runners running on the same host.
Setting up the runners took multiple attempts. The first attempts failed when Forgejo couldn’t locate any runners for an organisation to use. I’m not entirely sure why this was, as the runners were active and were properly registered with the Forgejo instance. It could be magical thinking, but my guess is that it was because I didn’t register the runners with an instance URL that ended with a slash. It seems like it’s possible to register runners that are only available to certain organisations or users. Might be that there’s some bit of code deep within Forgejo that’s expecting a slash to make the runners available to everyone? Not sure. In either case, after registering the runners with the trailing slash, the organisations started to recognise them.
The other error was seeing runs fail with the error message cannot find: node in PATH
. This resolved itself after I changed the run-on
label within the action YAML file itself from linux
to docker
. I wasn’t expecting this to be an issue — I though the run-on
field was used to select a runner based on their published labels, and that docker
was just one such label. The Forgejo documentation was not super clear on this, but I got the sense that the docker
label was special in some way. I don’t know. But whatever, I can use docker
in my workflows.
Once these battles were won, the runners were ready, and I was able to build and test a Go package successfully. One last annoying thing is that Forgejo doesn’t enable runners by default for new repositories — I guess because they’re still considered an alpha release. I can live with that in the short term, or maybe there’s some configuration I can enable to always have them turned on. But in either case, I’ve now got two Linux runners working.
MacOS Runner And Repository Backup
The last piece of the puzzle was to setup a MacOS runner. This is for the occasional MacOS application I’d like to build, but it’s also to run the nightly repository backups. For this, I’m using a Mac Mini currently being used as a home server. This has an external hard drive connect, with online backups enabled, which makes it a perfect target for a local backup of Forgejo and all the repository data should the worse come to pass.
Forgejo does’t have an official release of a MacOS runner, but Gitea does, and I managed to download a MacOS build of act_runner and deploy it onto the Mac Mini. Registration and performing a quick test with the runner running in the foreground went smoothly. I then went through the process of setting it up as a MacOS launch agent. This was a pain, and it took me a couple of hours to get this working. I won’t go through every issue I encountered, mainly because I couldn’t remember half of them, but here’s a small breakdown of the big ones:
It took me most of Saturday morning, but in the end I managed to get this MacOS runner working properly. I’ve not done anything MacOS-specific yet, so I suspect I may have some XCode related stuff to do, but the backup job is running now and I can see it write stuff to the external hard drive.
The backup routine itself is a simple Go application that’s kicked off daily by a scheduled Forgejo Action (it’s not in the documentation yet, but the version of Forgejo I deployed does support scheduled actions). It makes a backup of the Forgejo instance, the PostgreSQL database, and all the repository data using SSH and Rsync.
I won’t share these repositories as they contain references to paths and such that I consider sensitive; but if you’re curious about what I’m using for the launch agent settings, here’s the plist file I’ve made:
<!-- dev.lmika.repo-admin.macos-runner.plist -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>dev.lmika.repo-admin.macos-runner</string>
<key>ProgramArguments</key>
<array>
<string>/Users/lmika/opt/macos-runner/scripts/run-runner.sh</string>
</array>
<key>KeepAlive</key>
<true/>
<key>RunAtLoad</key>
<true/>
<key>StandardErrorPath</key>
<string>/tmp/runner-logs.err</string>
<key>StandardOutPath</key>
<string>/tmp/runner-logs.out</string>
</dict>
</plist>
This is deployed by copying it to $HOME/Library/LaunchAgents/dev.lmika.repo-admin.macos-runner.plist
, and then installed and enabled by running these commands:
launchctl bootstrap gui/$UID "Library/LaunchAgents/dev.lmika.repo-admin.macos-runner.plist"
launchctl kickstart gui/$UID/dev.lmika.repo-admin.macos-runner
The Price Of A Name
One might see this endeavour, when viewed from a pure numbers and effort perspective, as a bit of a crazy thing to do. Saying “no” to all this cheap code hosting, complete with the backing of a large cooperation, just for the sake of a name? I can’t deny that this may seem a little unusual, even a little crazy. After all, it’s more work and more money. And I’m not going to suggest that others follow me into this realm of a self-hosted SCM.
But I think my code deserves it’s own name now. After all, my code is my work; and much like we encourage writers to write under their own domain name, or for artists and photographers to move away from the likes of Instagram and other such services, so too should my work be under a name I own and control. The code I write may not be much, but it is my own.
Of course, I’m not going to end this without my usual “we’ll see how we go” hedge against myself. I can only hope I got enough safeguards in place to save me from my own decisions, or to easily move back to a hosted service, when things go wrong or when it all becomes just a bit much. More on that in the future, I’m sure.
A Bit of 'Illuminating' Computer Humour
Here’s some more computer-related humour to round out the week:
How many software developers does it take to change a lightbulb? Just one.
How many software developers does it take to change 2 lightbulbs? Just 10.
How many software developers does it take to change 7 lightbulbs? One, but everyone within earshot will know about it.
How many software developers does it take to change 32 lightbulbs? Just one, provided the space is there.
How many software developers does it take to change 35 lightbulbs? Just one. #lightbulbs
How many software developers does it take to change 65 lightbulbs? Just one, if they’re on their A grade.
How many software developers does it take to change 128 lightbulbs? Just one, but they’ll be rather negative about it.
How many software developers does it take to change 256 lightbulbs? What lightbulbs?
Enjoy your Friday.
Asciidoc, Markdown, And Having It All
Took a brief look at Asciidoc this morning.
This is for that Markdown document I’ve been writing in Obsidian. I’ve been sharing it with others using PDF exports, but it’s importance has grown to a point where I need to start properly maintaining a change log. And also… sharing via PDF exports? What is this? Microsoft Word in the 2000s?
So I’m hoping to move it to a Gitlab repo. Gitlab does support Markdown with integrated Mermaid diagrams, but not Obsidian’s extension for callouts. I’d like to be able to keep these callouts as I used them in quite a few places.
While browsing through Gitlabs’s help guide on Markdown extensions, I came across their support for Asciidoc. I’ve haven’t tried Asciidoc before, and after taking a brief look at it, it seemed like a format better suited for the type of document I’m working on. It has things like auto-generated table of contents, builtin support for callouts, proper title and heading separations; just features that work better than Markdown for long, technical documents. The language syntax also supports a number of text-based diagram formats, including Mermaid.
However, as soon as I started porting the document over to Asciidoc, I found it to be no Markdown in terms of mind share. Tool support is quite limited, in fact it’s pretty bad. There’s nothing like iA Writer for Asciidoc, with the split-screen source text and live preview that updates when you make changes. There’s loads of these tools for Markdown, so many that I can’t keep track of them (the name of the iA Writer alternative always eludes me).
Code editors should work, but they’re not perfect either. GoLand supports Asciidoc, but not with embedded Mermaid diagrams. At least not out of the box: I had to get a separate JAR which took around 10 minutes to download. Even now I’m fighting with the IDE, trying to get it to find the Mermaid CLI tool so it can render the diagrams. I encountered none of these headaches when using Markdown: GoLand supports embedded Mermaid diagrams just fine. I guess I could try VS Code, but to download it just for this one document? Hmm.
In theory the de-facto CLI tool should work, but in order to get Mermaid diagrams working there I need to download a Ruby gem and bundle it with the CLI tool (this is in addition to the same Mermaid command-line tool GoLand needs). Why this isn’t bundled by default in the Homebrew distribution is beyond me.
So for now I’m abandoning my wish for callouts and just sticking with Markdown. This is probably the best option, even if you set tooling aside. After all, everyone knows Markdown, a characteristic of the format that I shouldn’t simply ignore. Especially for these technical documents, where others are expected to contribute changes as well.
It’s a bit of a shame though. I still think Asciidoc could be better for this form of writing. If only those that make writing tools would agree.
Addendum: after drafting this post, I found that Gitlab actually supports auto-generated table of contents in Markdown too. So while I may not have it all with Markdown — such as callouts — I can still have a lot.
My Position On Blocking AI Web Crawlers
I’m seeing a lot of posts online about sites and hosting platforms blocking web crawlers used for AI training. I can completely understand their position, and fully support them: it’s their site and they can do what they want.
Allow me to lay my cards on the table. My current position is to allow these crawlers to access my content. I’m choosing to opt in, or rather, not to opt out. I’m probably in the minority here (well, the minority of those I follow), but I do have a few reasons for this, with the principal one being that I use services like ChatGTP and get value from them. So to prevent them from training their models on my posts feels personally hypocritical to me. It’s the same reason why I don’t opt out of Github Copilot crawling my open source projects (although that’s a little more theoretical, as I’m not a huge user of Copilot). To some, this position might sound weird, and when you consider the gulf between what value these AI companies get from scraping the web verses what value I get from them as a user, it may seem downright stupid. And if you approach it from a logical perspective, it probably is. But hey, we’re in the realm of feelings, and right now this is just how I feel. Of course, if I were to make a living out of this site, it would be a different story. But I don’t.
And this leads to the tension I see between site owners making decisions regarding their own content, and services making decisions on behalf of their users. This site lives on Micro.blog, so I’m governed by what Manton chooses to do or not do regarding these crawlers. I’m generally in favour of what Micro.blog has chosen so far: allowing people to block these scrapers via “robots.txt” but not yet blocking requests based on their IP address. I’m aware that others may not agree, and I can’t, in principal, reject the notion of a hosting provider choosing to block this crawlers at the network layer. I am, and will continue to be, a customer of such services.
But I do think some care should be considered, especially when it comes to customers (and non-customer) asking these services to add these network blocks. You may have good reason to demand this, but just remember there are users of these services that have opinions that may differ. I personally would prefer a mechanism where you opt into these crawlers, and this would be an option I’ll probably take (or probably not; my position is not that strong). I know that’s not possible under all circumstances so I’m not going to cry too much if this was not offered to me in lieu of a blanket ban.
I will make a point on some comments that I’ve seen that, if taken in an uncharitable way, imply that creators that have no problem with these crawlers do not care about their content. I think such opinions should be worded carefully. I know how polarising the use of AI currently is, and making such remarks, particularly within posts that are already heated due to the author’s feelings regarding these crawlers, risks spreading this heat to those that read it. The tone gives the impression that creators okay with these crawlers don’t care about what they push online, or should care more than they do. That might be true for some — might even be true for me once in a while — but to make such blanket assumptions can come off as a little insulting. And look, I know that’s not what they’re saying, but it can come across that way at times.
Anyway, that’s my position as of today. Like most things here, this may change over time, and if I become disenfranchised with these companies, I’ll join the blockade. But for the moment, I’m okay with sitting this one out.
Thinking About Plugins In Go
Thought I’d give Go’s plugin package a try for something. Seems to works fine for the absolutely simple things. But start importing any dependencies and it becomes a non-starter. You start seeing these sorts of error messages when you try to load the plugin:
plugin was built with a different version of package golang.org/x/sys/unix
Looks like the host and plugins need to have exactly the same dependencies. To be fair, the package documentation says as much, and also states that the best use of plugins is for dynamically loaded modules build from the same source. But that doesn’t help me and what I’m trying to do, which is encoding a bunch of private struct types as Protobuf messages.
So might be that I’ll need to find another approach. I wonder how others would do this. An embedded scripting language would probably not be suitable for this, since I’m dealing with Protobuf and byte slices. Maybe building the plugin as a C shared object? That could work, but then I’d loose all the niceties that come from using Go’s type system.
Another option would be something like WASM. It’s interesting seeing WASM modules becoming a bit of a thing for plugin architectures. There’s even a Go runtime to host them. The only question is whether they would have the same facilities as regular process would have, like network access; or whether they’re completely sandboxed, and you as the plugin host would need to add support for these facilities.
I guess I’d find out if I were to spend any more time looking at this. But this has been a big enough distraction already. Building a process to shell-out to would work just fine, so that’s probably what I’ll ultimately do.
Word Cloud
From Seth’s blog:
Consider building a word cloud of your writing.
Seems like a good idea so that’s what I did, taking the contents of the first page of this blog. Here it is:
Some observations:
Anyway, I thought these findings were quite interesting. One day, I’ll have to make another word cloud across all the posts on this blog.
Day Trip to Bundanoon
Decided to go on a day trip to Bundanoon today. It’s been five years since I last visited and I remember liking the town enough that I thought it’d be worth visiting again. It’s not close, around 1 hour and 40 minutes from Canberra, but it not far either and I thought it would be a nice way to spend the day. Naturally, others agreed, which I guess explains why it was busier than I expected, what with the long weekend and all. Fortunately, it wasn’t too crowded, and I still had a wonderful time.
The goal was to go on a bush-walk first. I chose to do the Erith Coal Mine track, for no particular reason other than it sounded interesting. This circuit track was meant to take you to a waterfall by an old coal mine. However, the track leading to the actual mine was closed, thanks to the recent rain. In fact, if I could describe the bush-walks in one word, it would be “wet”. The ground was soaked, as were the paths, and although conditions were lovely, the paths were still very slippery.
After completing that circuit in probably 45 minutes, my appetite for bush-walking was still unsatisfied, so I tried the Fairy Bower Falls walk next. This was not as steep as the first one, but it turned to be a much harder track due to how wet and slippery everything was.
I stopped short of the end of this one too, as it seems the path was washed away. But I did manage to get a glimpse of the waterfall, so I’m considering that a win.
After that, I returned to the town for lunch and some train spotting. The train line to Goulburn runs through Bundanoon, and the last time I was there, there was probably a freight train every hour or so. So I was hoping to get a good view of a lot of freight traffic. Maybe shoot a video of a train passing through the station I could share here.
I had lunch outside and walked around the town a little, always within sight of the railway line, hoping for at least one train to pass through. But luck wasn’t on my side, and it wasn’t until I was on my way home that I saw what I think was a grain train passing through Wingello. I pulled over to take a video, and while I miss the locomotive, I got a reasonable enough recording of the wagons.
Being a little more hopeful, I stopped at Tallong, the next town along the road. I bought a coffee and went to the station to drink it and hopefully see a train pass through. Sadly, it was not to be. So I decided to head back home.
So the train spotting was a bust, and the bush-walks were difficult, but all in all it was quite a nice day. I look forward to my next visit to Bundanoon. Lets hope the trains are running a little more frequently then.
An Unfair Critique Of OS/2 UI Design From 30 Years Ago
A favourite YouTube channel of mine is Michael MJD, who likes to explore retro PC products and software from the 90s and early 2000s. Examples of these include videos on Windows 95, Windows 98, and the various consumer tech products designed to get people online. Can I just say how interesting those times were, where phrases such as “surfing the net” were thrown about, and where shopping centres were always used to explain visiting websites. I guess it was the best analogy one could use at the time.
A staple of Michael MJD’s channel is when he installs an old operating systems onto old hardware1. Yesterday, I watched the one where he installed OS/2 Warp 4 onto a 98 PC. We were an OS/2 household back when I was growing up, thanks to my dad using it for work, and I can remember using OS/2 2.1 and thinking it was actually pretty good. Better than Windows 95, in fact. I can’t remember if I ever used Warp 4, though.
Anyway, while watching this video, and I was taken aback on how bad the UI design of OS/2 Warp 4 was. And really, I probably shouldn’t be throwing stones here: I’m not a great UI designer myself. But I guess my exposure to later versions of Windows and macOS matured my tastes somewhat; where I got exposed to the idea of interaction systems and user experience design (and generally just growing up). Obviously given how new the GUI was back then, many of these concepts were still in their infancy, although if you were to compare these UIs to the classic Mac or even Windows 3.1, I do think there was something missing in IBM’s design acumen. Was it ability? Interest? Care? Not sure. But given that it’s been 30 years, I’m not expecting the OS/2 devs to be able to defend themselves now. That’s what makes this critique wholly unfair.
Anyway, I’d thought I share some of the stills from this video that I thought contained some of the more cringeworthy UI designs2, along with my remarks. Enjoy.
Some More Thoughts On Unit Testing
Kinda want to avoid this blog descending into a series of “this is wrong with unit testing” posts, but something did occur to me this morning. We’ve kicked off a new service at work recently. It’s just me and this other developer working on it at the moment, and it’s given us the opportunity to try out this “mockless” approach to testing, of which I ranted about a couple of weeks ago (in fact, the other developer is the person I had that discussion with). And it’s probably no surprise, but I’m finding writing tests this way to be a much nicer experience already.
And I think I’ve come to the realisation that the issue is not so much with mocking itself. Rather, it’s the style of testing that it encourages. When you’re testing against “real” services, you’ll left with treating them as a black box. There’s no real way to verify your code is working correctly other than letting it interact with these services as it would, and then “probing” them in some way — running queries, waiting for messages to arrive at topics, etc. — to know whether the interaction worked. You can’t just verify this by intercepting the various calls made by the service (well you can, but it would be difficult to do).
There’s nothing about mocking that inhibits this style of testing. You can use mocks to simulate a message broker by storing the messages in an in-memory list, for example. What it does do, however, is make it easier to write tests that simply intercept the calls of the service and verify that they were made. It’s less upfront work than setting up a real client, or simulating a message broker, but now you’ve tied your tests to your implementation. You may feel like you’ve saved time and effort now, but really you’ve just deferred it for later, when you need to fix your tests when you’ve change your implementation.
I know this is stuff I said before, so I’ll just stop here, and end by saying that I’m excited to seriously try out this approach to writing unit tests. Is it a better approach than using mocks? I guess time will tell. It’s been my experience that it’s when you need to refactor things in your service when you find out how good your tests are. So I guess we’ll check back in about six months or so.
Don't Leave User Experience For Later
DDH wrote a post yesterday that resonates with me. This is how he opens:
Programmers are often skeptical of aesthetics because they frequently associate it with veneering
I doubt DHH reads this blog, but he could’ve address this post directly at me. I’m skeptical about aesthetics. Well… maybe not skeptical, but if we’re talking about personal projects, I do consider it less important than the functional side of things. Or at least I did.
He continues:
Primary reason I appreciate aesthetics so much is its power to motivate. And motivation is the rare fuel that powers all the big leaps I’ve ever taken in my career and with my projects and products. It’s not time, it’s even attention. [sic1] It’s motivation. And I’ve found that nothing quite motivates me like using and creating beautiful things.
He was mainly talking about the code design, but I think this extends to the project’s UI and UX. Particularly if you’re building it for yourself. Maybe especially if you’re building it for yourself.
And this is where my story begins. From the start I’ve been putting off any task that would improve the user experience on one of my side project. I considered such tasks unnecessary, or certainly less important than the “functional” side of things. Whenever faced with a decision on what part to work on next, the user experience work was left undone, usually with a thought along the lines of “eh, it’s UI stuff, I’ll do it later.”
But I think this was a mistake. Since I was actually using this tool, I was exposed to the clunky, unfinished UI whenever I needed to do something with it. And it turns out no matter how often you tell yourself that you’ll fix it later, a bad UI is still a bad UI, and it affects how it feels to use it. And let me tell you: it didn’t feel good at all. In fact, I detested it so much that I thought about junking it all together.
It was only when I decided to add a bit of polish did things improved. And it didn’t take much: just fixing the highlights of the nav to reflect the current section, for example. But it was enough, and the improved UI made it feel better to use, which motivated me to get back to working on it again.
So I guess the take away is similar to the point DHH made in his post, which is if something feels good to use, you’re more likely to work on it. And sure, you can’t expect to have a great user experience out of the box: I’ll have to work through some early iterations of it as I build things out. But I shouldn’t ignore user experience completely, or defer it until “later”. If it’s something I’m using myself, and it doesn’t feel good to use, best to spend some time on it now.
Writing Good Data Migration Scripts
I’m waiting for a data migration to finish, so I’ve naturally got migration scripts on my mind.
There’s an art to writing a good migration script. It may seem that simply throwing together a small Python script would be enough; and for the simpler cases, it very well might be. But it’s been my experience that running the script in prod is likely to be very different than doing test runs in dev.
There are a few reasons for this. For one thing, prod is likely to have way more data so there will always be more to migrate. And dealing with production data is always going to be a little more stressful than in non-prod, especially when you consider things like restricted access and the impact of things going wrong. So a script with a better “user experience” is always going to be better than one slapped togeather.
So without further ado, here’s the attributes for what I think makes for a good migration script:
May your migrations be quite and painless.
On Sharing Too Much About Too Little
Manuel Moreale wrote an interesting post today about sharing stuff online:
Life can be joyful and wonderful and marvellous. But it can also be a fucking nightmare. And yes, it’s important to celebrate the victories and to immortalise the glorious moment. But it’s also important to document the failures, the shitty moments, the dark places our minds find themselves stuck in. It’s all part of what makes us unique after all.
I can certaintly appreciate those that are willing to share both the ups and downs to others online. But I know for myself that I rather not, for a couple of reasons. First being that there’s already so much dark stuff online already: why add to that? And the second being that, along with being a public journal of most of my day, this site is a bit of an escape as well: a refuge that I can visit when the world gets a little too much for me.
And it may seem that what is posted here is exclusively what I feel, think, or do during the day, but that cannot be further from the truth. Shitty things happen to me, I have dark thoughts, fits of despair, or just general periods of malaise. Those get documented too, but in a private journal that’s not for public consumption.
That’s not to say that others should do likewise. I’m not here to tell you what to post, and why: you do you. This is just explaining how and why I post what I post. Maybe one day this will change: but until then that’s how I like to do things.
As Someone Who Works In Software
As someone who works in software…
The Perfect Album
The guys on Hemispheric Views have got me blogging once again. The latest episode bought up the topic of the perfect album: an album that you can “just start from beginning, let it run all the way through without skipping songs, without moving around, just front to back, and just sit there and do nothing else and just listen to that whole album”.
Well, having crashed Hemispheric Views once, I’d thought it’s time once again to give my unsolicited opinion on the matter. But first, some comments on some of the suggestions made on the show.
I’ll start with Martin’s suggestion of the Cat Empire. I feel like I should like Cat Empire more than I currently do. I used to know something who was fanatic about them. He shared a few of their songs when we were jamming out — we were in a band together — and on the whole I thought they were pretty good. They’re certainly a talented bunch of individuals. But it’s not a style of music that gels with me. I’m just not a huge fan of scar, which is funny considering the band we were both in was a scar band.
I feel like I haven’t given Radiohead a fair shake. There were many people that approached me and said something of the lines of “you really should try Radiohead; it’s a style of music you may enjoy,” and I never got around to following their advice. I probably should though, I think they may be right. Similarly for Daft Punk, of which I have heard a few tracks of and thought them to be pretty good. I really should give Random Access Memory a listen.
I would certainly agree with Jason’s suggestion of the Dark Side of the Moon. I count myself a Pink Floyd fan, and although I wouldn’t call this my favourite album by them, it’s certainly a good album (if you were to ask, my favourite would probably be either The Wall or Wish You Were Here, plus side B of Metal).
As to what my idea of a perfect album would be, my suggestion is pretty simple: it’s anything by Mike Oldfield.
LOL, just kidding!1 😄
No, I’d say a great example of a perfect album is Jeff Wayne’s musical adaptation of The War Of The Worlds.
I used to listen to this quite often during my commute, before the pandemic arrived and bought that listen count down to zero. But I’ve picked it back up a few weeks ago and it’s been a constant earworm since. I think it ticks most of the boxes for a perfect album. It’s a narrative set to music, which makes it quite coherent and naturally discourages skipping tracks. The theming around the various elements of the story are really well done: hearing one introduced near the start of the album come back later is always quite a thrill, and you find yourself picking up more of these as you listen to the album multiple times. It’s very much not a recent album but, much like Pink Floyd, there’s a certain timelessness that makes it still a great piece of music even now.
Just don’t listen to the recent remakes.
Favourite Comp. Sci. Textbooks
John Siracusa talked about his two favourite textbooks on Rec Diffs #233: Modern Operation Systems, and Computer Networks, both by Andrew S. Tanenbaum. I had those textbooks at uni as well. I still do, actually. They’re fantastic. If I were to recommend something on either subject, it would be those two.
I will add that my favourite textbook I had during my degree was Compilers: Principal, Techniques and Tools by Alfred V. Aho, et al. also known as the “dragon book.” If you’re interested in compiler design in any way, I can definitely recommend this book. It’s a little old, but really, the principals are more or less the same.
Thou Doth Promote Too Much
Manual Moreale wrote an interesting post about self promotion, where he reflects on whether closing out all his People and Blogs post with a line pointing to his Ko-Fi page is too much:
And so I added that single line. But adding that single line was a struggle. Because in my head, it’s obvious that if you do enjoy something and are willing to support it, you’d probably go look for a way to do it. That’s how my brain works. But unfortunately, that’s not how the internet works. Apparently, the correct approach seems to be the opposite one. You have to constantly remind people to like and subscribe, to support, to contribute, and to share.
I completely understand his feelings about this. I’m pretty sure I’d have just as much trouble adding such a promotion at the bottom of my post. Heck, it’s hard enough to write about what I’m working on here without any expectation from the reader other than to maybe, possibly, read it. They’ve been relegated to a separate blog, so as to not bother anyone.
But as a reader of P&B, I think the line he added is perfectly fine. I think it’s only fair to ask people to consider supporting something where it’s obvious someone put a lot of effort into it, as he obviously has been doing with P&B.
As for where to draw the line, I think I agree with Moreale:
How much self-promotion is too much? Substack interrupting your reading experience to remind you to subscribe feels too much to me. An overlay interrupting your browsing to ask you to subscribe to a newsletter is also too much. Am I wrong? Am I crazy in thinking it’s too much?
I get the need to “convert readers” but interrupting me to sign up to a newsletter is just annoying. And I’m not sure “annoying” is the feeling you want to imbue in your readers if you want them to do something.
But a single line at the end of a quality blog post? Absolutely, go for it!
Crashing Hemispheric Views #109: HAZCHEM
Okay, maybe not “crashing”, a.la Hey Dingus. But some thoughts did come to me while listening to Hemispheric Views #109: HAZCHEM that I’d though I share with others.
Haircuts
I’m sorry but I cannot disagree more. I don’t really want to talk while I’m getting a haircut. I mean I will if they’re striking up a conversation with me, but I’m generally not there to make new friends; just to get my hair cut quickly and go about my day. I feel this way about taxis too.
I’m Rooted
I haven’t really used “rooted” or “knackered” that much. My goto phrase is “buggered,” as in “oh man, I’m buggered!” or simply just “tired”. I sometimes used “exhausted” when I’m really tired, but there’s just too many syllable in that word for daily use.
Collecting
I’m the same regarding stickers. I received (although not really sought after) stickers from various podcasts and I didn’t know what to do with them. I’ve started keeping them in this journal I never used, and apart from my awful handwriting documenting where they’re from and when I added them, so far it’s been great.
I probably do need to get some HV stickers, though.
A Trash Ad for Zachary
Should check to see if Johnny Decimal got any conversions from that ad in #106. 😀
Also, here’s a free tag line for your rubbish bags: we put the trash in the bags, not the pods.
🍅⏲️ 00:39:05
I’m going to make the case for Vivaldi. Did you know there’s actually a Pomodoro timer built into Vivaldi? Click the clock on the bottom-right of the status bar to bring it up.
Never used it myself, since I don’t use a Pomodoro timer, but can Firefox do that?!
Once again, a really great listen, as always.
On Micro.blog, Scribbles, And Multi-homing
I’ve been ask why I’m using Scribbles given that I’m here on Micro.blog. Honestly I wish I could say I’ve got a great answer. I like both services very much, and I have no plans of abandoning Micro.blog for Scribbles, or visa-versa. But I am planning to use both for writing stuff online, at least for now, and I suppose the best answer I can give is a combination of various emotions and hang-ups I have about what I want to write about, and where it should go.
I am planning to continue to use Micro.blog pretty much how others would use Mastodon: short-form posts, with the occasional photo, mainly about what I’m doing or seeing during my day. I’ll continue to write the occasional long-form posts, but it won’t be the majority of what I write here.
My intentions for what I post on Scribbles is more likely to be long-form, which brings me to my first reason: I think I prefer Scribbles editor for long-form posts. Micro.blog works well for micro-blogging but I find any attempt to write something longer a little difficult. I can’t really explain it. It just feels like I’m spending more effort trying to get the words out on the screen, like they’re resisting in some way.
It’s easier for me to do this using Scribbles editor. I don’t know why. Might be a combination of how the compose screen is styled and laid out, plus the use of a WYSIWYG editor1. But whatever it is, it all combines into an experience where the words flow a little easier for me. That’s probably the only way I can describe it. There’s nothing really empirical about it all, but maybe that’s the point. It’s involves the emotional side of writing: the “look and feel”.
Second, I like that I can keep separate topics separate. I thought I could be someone who can write about any topic in one place, but when I’m browsing this site myself, I get a bit put out by all the technical topics mixed in with my day-to-day entries. They feel like they don’t belong here. Same with project notes, especially given that they tend to be more long-form anyway.
This I just attribute to one of my many hang-ups. I never have this issue with other sites I visit. It may be an emotional response from seeing what I wrote about; where reading about my day-to-day induces a different feeling (casual, reflective) than posts about code (thinking about work) or projects (being a little more critical, maybe even a little bored).
Being about to create multiple blogs in Scribbles, thanks to signing up for the lifetime plan, gives me the opportunity to create separate blogs for separate topics: one for current projects, one for past projects, and one for coding topics. Each of them, along with Micro.blog, can have their own purpose and writing style: more of a public journal for the project sites, more informational or critical on the coding topics, and more day-to-day Mastodon-like posts on Micro.blog (I also have a check-in blog which is purely a this-is-where-I’ve-been record).
Finally, I think it’s a bit of that “ooh, shiny” aspect of trying something new. I definitely got that using Scribbles. I don’t think there’s much I can do about that (nor, do I want to 😀).
And that’s probably the best explanation I can give. Arguably it’s easier just writing in one place, and to that I say, “yeah, it absolutely is.” Nothing about any of this is logical at all. I guess I’m trying to optimise to posting something without all the various hang-ups I have about posting it at all, and I think having these separate spaces to do so helps.
Plus, knowing me, it’s all likely to change pretty soon, and I’ll be back to posting everything here again.