If there’s ever an article I should print out and staple to my forehead, it’s this one.
I’ve been really enjoying all the WeblogPoMo posts that the PoMo bot has been relaying. Discovered a bunch of new blogs this way, that I’ve now added to NetNewsWire.
Had to miss the first part of Micro.camp this year, unfortunately. My meeting with the sandman went long. Hope to catch up on the keynote and state of the platform videos later.
🔗 Slack users horrified to discover messages used for AI training
I’d like to avoid jumping on the “I have everything AI” bandwagon, but I agree that Slacks use of private message data to train their LLM is a pretty significant breach of trust. A lot of sensative data runs through their system, and although they may be hosting it, it’s not theirs to do as they please. Maybe they think it’s within their right, what with their EULAs and everything, but if I were a paying customer — of enterprise software, if you remember — I’d make bloody sure that data is the customer and the customer’s own.
It’ll be interesting to see how this will affect me personally. We use Slack at work and I know management is very sensative about IP (and given the domain, I can understand). Maybe I’ll finally get to try Teams out.
Friday Development Venting Session
Had a great venting session with someone at work about the practices of micro-services, the principals of component driven development, mocking in unit tests, and interfaces in Go. Maybe one day I’ll write all this up, but it was so cathartic to express how we can do better on all these fronts.
If anyone is to ask what I think, here it is in brief:
- Micro-services might be suitable for what you’re building if you’re Amazon or Google, where you have teams of 20 developers working on a single micro-service. But if you’ve got a team of 20 developers working on the entire thing, you may want to consider a monolith instead. Easier to deploy, easier to operate, and you get to rely on the type system telling you when there’s an integration problem rather than finding it out during runtime.
- The idea of component driven design — which is modelled on electrical engineering principals whereby a usable system is composed of a bunch of ICs and other discrete components — is nice in theory, but I think it’s outlived it’s usefulness for most online services. It probably still makes sense if you’re stuck in the world of Java and J2EE, where your “system” is just a bunch of components operating within a container, or if you actually are vending components to be reused. But most of the time what you’re working on is an integrated system. So you should be able to leverage that fact, rather than think that you’re building and testing ICs that you expect others to use. You’re not likely going to completely replace one component for another when you need to change a database (which you’re unlikely to do anyway). You’re more likely going to modify the database component instead. So don’t make that assumption in your design.
- This also extends to the idea of unit testing, with the assumption that you must test the component in isolation. Again, you’re not building ICs that you’re expecting to transplant into other projects (if you are, then keep testing in isolation). So it makes more sense to build tests that leverage the other components of the system. This means building tests that actually call the components directly: the service layer calling the actual database driver, for example. This produces a suite of tests that looks like a staircase, each one relying on the layers below it: the database driver working with a mock database, the service layer using the actual database driver, and the handlers using the actual services. Your unit test coverage should only be that of the thing you’re testing: don’t write database driver tests in your handler package. But I don’t see a reason why you shouldn’t be able to rely on existing areas of the system in your tests.
- The end result of doing this is that you’re tests are actually running on a mini-version of the application itself. This naturally means that there’s less need to mock things out. I know I’ve said this before, but the idea of mocking out other services in unit tests instead of just using them really defeats the idea of writing tests in the first place: gaining confidence in the correct operation of the system. How can you know whether a refactor was successful if you need to change the mocks in the unit test just to get them green again? Really, you should only need to use mocks to stub out external dependencies that you cannot run in a local Docker container. Otherwise, run your tests against a local database running in Docker, and use the actual services you’ve built as your dependencies. And please: make it easy for devs to run the unit tests in their IDE or in the command line with a single “make” command. If I need to set environment variables to run a test, then I’ll never run them.
- Finally, actually using dependent services directly means there’s less need to defined interfaces up front. This, after starting my career as a Java dev, is something I’m trying to unlearn myself. The whole idea of Go interfaces is that they should come about organically as the need arise, not pre-defined from above before the implementation is made. That is the same level of thinking that comes from component design (you’re not building ICs here, remember?). Just call the service directly and when you need the interface, you can add one. But not before. And definitely not because you finding yourself needing to mock something out (because, again, you shouldn’t need to mock other components of the system).
Anyway, that’s my rant.
Flights to Canberra booked. Going to be bird watching again real soon.
If the macOS devs are looking for something to do: here’s a free idea. Detect when the user is typing on their keyboard, say using keystrokes in the last N seconds, and if it’s greater than some low number, prevent any window from stealing keyboard focus.
I must agree once again with Manual Morale on his recent post about search and the future of the web:
I think curation, actual human curation, is going to play an important role in the future. In a web filled with generated nonsense, content curated by knowledgeable human beings is going to be incredibly valuable.
Ben Thompson has been arguing this point too: in a world of AI generating undifferentiated “content”, that which has the human element, either in it’s creation or curation, would stand apart. He says he bets his career on this belief. I think it’s a bet worth taking.
How is it that it’s become so natural to write about stuff here, yet I’m freezing in my boots drafting up an email to a blogger in response to a call for some feedback?
Love that NetNewsWire has a setting to open links in Safari instead of the built-in WebView. Very useful for articles which require an active login session, which I’m more likely to have in Safari. To enable, go to Settings and turn off “Open Links in NetNewsWire”.
Never thought I’d be desperate enough for food and money that I’d be forced to learn everything there is to know about authentication, OAuth, and SSO, but here we are. 🤓
P.S. I’m trying to be droll here. Please don’t test me on my knowledge of OAuth or SSO. 😅
Writing Good Data Migration Scripts
I’m waiting for a data migration to finish, so I’ve naturally got migration scripts on my mind.
There’s an art to writing a good migration script. It may seem that simply throwing together a small Python script would be enough; and for the simpler cases, it very well might be. But it’s been my experience that running the script in prod is likely to be very different than doing test runs in dev.
There are a few reasons for this. For one thing, prod is likely to have way more data so there will always be more to migrate. And dealing with production data is always going to be a little more stressful than in non-prod, especially when you consider things like restricted access and the impact of things going wrong. So a script with a better “user experience” is always going to be better than one slapped togeather.
So without further ado, here’s the attributes for what I think makes for a good migration script:
- No migration script — If you can get away with not writing a migration script, then this is preferred option. Of course, this will depend on how much data you’ll need to migrate, and how complicated keeping support for the previous version in your code-base. If the amount of data is massive (we’re talking millions or hundred of millions of rows), then this is probably your only option. On the other hand, if there’s a few hundred or a few thousands, then it’s probably just worth migrating the data.
- Always indicate progress — You’re likely going to have way more data in prod that your dev environments so consider showing ongoing progress of the running script. If there’s multiple stages in the migration process, make sure you log when each stage begins. If you’re running a scan or processing records, then give some indication of progress through the collection of rows. A progress bar is nice, but failing that, include a log message say every 1,000 records or so.
- Calculate expected migration size if you can — If it’s relatively cheap to get a count of the number of records that need to be migrated, then it’s helpful for the user to report this to the user. Even an estimate would be good just to give a sense of magnitude. If it’ll be too expensive to do so, then you can ignore it: better to just get migrating rather than have the user wait for a count.
- Silence is golden — Keep logging to the screen to a minimum, mainly progress indicators plus and any serious warnings or errors. Avoid bombarding the user with spurious log messages. They want to know when things go wrong, otherwise they just want to know that the script is running properly. That said:
- Log everything to a file — If the script is involved with migrating data, but will ignored records that have already been migrated, then log that records will be skipped. What you’re looking for is assurance that all records have been dealt with, meaning that any discrepancy with the summary report (such as max records encountered vs. max records migrated) can be reconciled with the log file.
May your migrations be quite and painless.
I appreciate projects like Next.JS put a lot of effort into their guides, but they still need to provide a basic API reference. Knowing about request helpers is fine, but do they return strings or arrays? What if a query parameter’s not set? This is stuff I need to know. 🤷
Working in the project which is using TypeScript for the code, and Go for the deployment configuration. Wish it was the other way around, where Go is used for the code, and TypeScript… isn’t used at all. 😛
👨💻 New post on Databases over at the Coding Bits blog: PostgreSQL BIGSERIAL “Type”
I haven’t gone all in with AI co-pilots or anything with my coding setup yet, but the latest version of GoLand comes with what is essentially a line completion feature that I actually find quite useful. I suspect there’s some ML in there as it seems to understand context and produce suggested line completions that are, more often than not, pretty much what I was going to type out by hand anyway. Many times I could implement most of a new function simply by typing Tab several times. Impressive work, JetBrains.
On Sharing Too Much About Too Little
Manuel Moreale wrote an interesting post today about sharing stuff online:
Life can be joyful and wonderful and marvellous. But it can also be a fucking nightmare. And yes, it’s important to celebrate the victories and to immortalise the glorious moment. But it’s also important to document the failures, the shitty moments, the dark places our minds find themselves stuck in. It’s all part of what makes us unique after all.
I can certaintly appreciate those that are willing to share both the ups and downs to others online. But I know for myself that I rather not, for a couple of reasons. First being that there’s already so much dark stuff online already: why add to that? And the second being that, along with being a public journal of most of my day, this site is a bit of an escape as well: a refuge that I can visit when the world gets a little too much for me.
And it may seem that what is posted here is exclusively what I feel, think, or do during the day, but that cannot be further from the truth. Shitty things happen to me, I have dark thoughts, fits of despair, or just general periods of malaise. Those get documented too, but in a private journal that’s not for public consumption.
That’s not to say that others should do likewise. I’m not here to tell you what to post, and why: you do you. This is just explaining how and why I post what I post. Maybe one day this will change: but until then that’s how I like to do things.
BASIC.HTM
While poking through some old files this morning I came across probably the first bit of HTML I’ve ever written, way back on the 10th April 19961.
I think I vaguely remember making these. We were in Castlemaine staying over at my grandparents house and Dad bought along his laptop for us kids to play with (complete with a passive-matrix LCD and 2 hours of battery life). It was the evening and I was mucking around with Netscape Navigator. This was before we got the internet at home so I’m not quite sure why we even had Navigator. It may have been that Dad was using it for work, or maybe it came preinstalled on the laptop. But whatever the reason, it was there, and I was playing around with it.
I was browsing around a locally stored site showcasing Java in the browser (which actually included a decent game of tic-tac-toe playable as a Java applet) when I found the built-in HTML editor. Not knowing what this editor was for, and thinking that it was some strange, crappier version of Microsoft Word, I started typing random things. I eventually found the button to make anchor tags, and the ability to link pages together, and realising that this was possible, I got a bit excited and I set about trying to recreate the on-line help that came with QBasic.
I only made a couple of pages or so before I got bored and a little underwhelmed. After all, the online QBasic help was “cooler” because it used ASCII characters, while this was using graphics and GUI elements2. So I stopped and moved on to something else, completely forgetting about these HTML files. I thought they were lost to time until I found them today.
So here they are, posted as screenshots for posterity. Naturally, they still render in modern browsers, with the biggest difference being that the background was grey back then. But the default serif font is authentic.
Indexing In UCL
I’ve been thinking a little about how to support indexing in UCL, as in
getting elements from a list or keyed values from a map. There already
exists an index
builtin that does this, but I’m wondering if this can
be, or even should be, supported in the language itself.
I’ve reserved .
for this, and it’ll be relatively easy to make use
of it to get map fields. But I do have some concerns with supporting
list element dereferencing using square brackets. The big one being that
if I were to use square brackets the same way that many other languages
do, I suspect (although I haven’t confirmed) that it could lead to the
parser treating them as two separate list literals. This is because the
scanner ignores whitespace, and there’s no other syntactic indicators
to separate arguments to proc calls, like commas:
echo $x[4] --> echo $x [4]
echo [1 2 3][2] --> echo [1 2 3] [2]
So I’m not sure what to do here. I’d like to add support for .
for
map fields but it feels strange doing that just that and having nothing
for list elements.
I can think of three ways to address this.
Do Nothing — the first option is easy: don’t add any new syntax to
the language and just rely on the index
builtin. TCL does with
lindex, as does Lisp with nth, so I’ll be in good company
here.
Use Only The Dot — the second option is to add support for the dot
and not the square brackets. This is what the Go templating language
does for keys of maps or structs fields. They also have an index
builtin too, which will work with slice elements.
I’d probably do something similar but I may extend it to support index
elements. Getting the value of a field would be what you’d expect, but
to get the element of a list, the construct .(x)
can be used:
echo $x.hello \# returns the "hello" field
echo $x.(4) \# returns the forth element of a list
One benefit of this could be that the .(x)
construct would itself be a
pipeline, meaning that string and calculated values could be used as
well:
echo $x.("hello")
echo $x.($key)
echo $x.([1 2 3] | len)
echo $x.("hello" | toUpper)
I can probably get away with supporting this without changing the scanner or compromising the language design too much. It would be nice to add support for ditching the dot completely when using the parenthesis, a.la. BASIC, but I’d probably run into the same issues as with the square brackets if I did, so I think that’s out.
Use Parenthesis To Be Explicit — the last option is to use square brackets, and modify the grammar slightly to only allow the use of suffix expansion within parenthesis. That way, if you’d want to pass a list element as an argument, you have to use parenthesis:
echo ($x[4]) \# forth element of $x
echo $x[4] \# $x, along with a list containing "4"
This is what you’d see in more functional languages like Elm and I think Haskell. I’ll have see whether this could work with changes to the scanner and parser if I were to go with this option. I think it may be achievable, although I’m not sure how.
An alternative way might be to go the other way, and modify the grammar rules so that the square brackets would bind closer to the list, which would mean that separate arguments involving square brackets would need to be in parenthesis:
echo $x[4] \# forth element of $x
echo $x ([4]) \# $x, along with a list containing "4"
Or I could modify the scanner to recognise whitespace characters and use that as a guide to determine whether square brackets following a value. At least one space means the square bracket represent a element suffix, and zero mean two separate values.
So that’s where I am at the moment. I guess it all comes down to what works best for the language as whole. I can live with option one but it would be nice to have the syntax. I rather not go with option three as I’d like to keep the parser simple (I rather not add to all the new-line complexities I’ve have already).
Option two would probably be the least compromising to the design as a whole, even if the aesthetics are a bit strange. I can probably get use to them though, and I do like the idea of index elements being pipelines themselves. I may give option two a try, and see how it goes.
Anyway, more on this later.
🔗 Goodbye to Apple’s Smart Keyboard Folio, the best iPad Pro accessory
I’ve never considered hoarding accessories before, but I might start. The Smart Keyboard Folio is perfect for how I use the iPad: a great stand and decent enough keyboard that doesn’t get in the way when I just want to read.