Writing Good Data Migration Scripts

I’m waiting for a data migration to finish, so I’ve naturally got migration scripts on my mind.

There’s an art to writing a good migration script. It may seem that simply throwing together a small Python script would be enough; and for the simpler cases, it very well might be. But it’s been my experience that running the script in prod is likely to be very different than doing test runs in dev.

There are a few reasons for this. For one thing, prod is likely to have way more data so there will always be more to migrate. And dealing with production data is always going to be a little more stressful than in non-prod, especially when you consider things like restricted access and the impact of things going wrong. So a script with a better “user experience” is always going to be better than one slapped togeather.

So without further ado, here’s the attributes for what I think makes for a good migration script:

  1. No migration script — If you can get away with not writing a migration script, then this is preferred option. Of course, this will depend on how much data you’ll need to migrate, and how complicated keeping support for the previous version in your code-base. If the amount of data is massive (we’re talking millions or hundred of millions of rows), then this is probably your only option. On the other hand, if there’s a few hundred or a few thousands, then it’s probably just worth migrating the data.
  2. Always indicate progress — You’re likely going to have way more data in prod that your dev environments so consider showing ongoing progress of the running script. If there’s multiple stages in the migration process, make sure you log when each stage begins. If you’re running a scan or processing records, then give some indication of progress through the collection of rows. A progress bar is nice, but failing that, include a log message say every 1,000 records or so.
  3. Calculate expected migration size if you can — If it’s relatively cheap to get a count of the number of records that need to be migrated, then it’s helpful for the user to report this to the user. Even an estimate would be good just to give a sense of magnitude. If it’ll be too expensive to do so, then you can ignore it: better to just get migrating rather than have the user wait for a count.
  4. Silence is golden — Keep logging to the screen to a minimum, mainly progress indicators plus and any serious warnings or errors. Avoid bombarding the user with spurious log messages. They want to know when things go wrong, otherwise they just want to know that the script is running properly. That said:
  5. Log everything to a file — If the script is involved with migrating data, but will ignored records that have already been migrated, then log that records will be skipped. What you’re looking for is assurance that all records have been dealt with, meaning that any discrepancy with the summary report (such as max records encountered vs. max records migrated) can be reconciled with the log file.

May your migrations be quite and painless.

I appreciate projects like Next.JS put a lot of effort into their guides, but they still need to provide a basic API reference. Knowing about request helpers is fine, but do they return strings or arrays? What if a query parameter’s not set? This is stuff I need to know. 🤷

Working in the project which is using TypeScript for the code, and Go for the deployment configuration. Wish it was the other way around, where Go is used for the code, and TypeScript… isn’t used at all. 😛

👨‍💻 New post on Databases over at the Coding Bits blog: PostgreSQL BIGSERIAL “Type”

I haven’t gone all in with AI co-pilots or anything with my coding setup yet, but the latest version of GoLand comes with what is essentially a line completion feature that I actually find quite useful. I suspect there’s some ML in there as it seems to understand context and produce suggested line completions that are, more often than not, pretty much what I was going to type out by hand anyway. Many times I could implement most of a new function simply by typing Tab several times. Impressive work, JetBrains.

On Sharing Too Much About Too Little

Manuel Moreale wrote an interesting post today about sharing stuff online:

Life can be joyful and wonderful and marvellous. But it can also be a fucking nightmare. And yes, it’s important to celebrate the victories and to immortalise the glorious moment. But it’s also important to document the failures, the shitty moments, the dark places our minds find themselves stuck in. It’s all part of what makes us unique after all.

I can certaintly appreciate those that are willing to share both the ups and downs to others online. But I know for myself that I rather not, for a couple of reasons. First being that there’s already so much dark stuff online already: why add to that? And the second being that, along with being a public journal of most of my day, this site is a bit of an escape as well: a refuge that I can visit when the world gets a little too much for me.

And it may seem that what is posted here is exclusively what I feel, think, or do during the day, but that cannot be further from the truth. Shitty things happen to me, I have dark thoughts, fits of despair, or just general periods of malaise. Those get documented too, but in a private journal that’s not for public consumption.

That’s not to say that others should do likewise. I’m not here to tell you what to post, and why: you do you. This is just explaining how and why I post what I post. Maybe one day this will change: but until then that’s how I like to do things.

BASIC.HTM

While poking through some old files this morning I came across probably the first bit of HTML I’ve ever written, way back on the 10th April 19961.

I think I vaguely remember making these. We were in Castlemaine staying over at my grandparents house and Dad bought along his laptop for us kids to play with (complete with a passive-matrix LCD and 2 hours of battery life). It was the evening and I was mucking around with Netscape Navigator. This was before we got the internet at home so I’m not quite sure why we even had Navigator. It may have been that Dad was using it for work, or maybe it came preinstalled on the laptop. But whatever the reason, it was there, and I was playing around with it.

I was browsing around a locally stored site showcasing Java in the browser (which actually included a decent game of tic-tac-toe playable as a Java applet) when I found the built-in HTML editor. Not knowing what this editor was for, and thinking that it was some strange, crappier version of Microsoft Word, I started typing random things. I eventually found the button to make anchor tags, and the ability to link pages together, and realising that this was possible, I got a bit excited and I set about trying to recreate the on-line help that came with QBasic.

I only made a couple of pages or so before I got bored and a little underwhelmed. After all, the online QBasic help was “cooler” because it used ASCII characters, while this was using graphics and GUI elements2. So I stopped and moved on to something else, completely forgetting about these HTML files. I thought they were lost to time until I found them today.

So here they are, posted as screenshots for posterity. Naturally, they still render in modern browsers, with the biggest difference being that the background was grey back then. But the default serif font is authentic.


  1. Or at least that’s what the date-stamp says. ↩︎

  2. That QBasic was made up entirely of text characters was an attribute I was really drawn to, even back when Windows 95 was the thing. ↩︎

📝 New post on UCL over at the Workpad blog: Indexing In UCL

🔗 Goodbye to Apple’s Smart Keyboard Folio, the best iPad Pro accessory

I’ve never considered hoarding accessories before, but I might start. The Smart Keyboard Folio is perfect for how I use the iPad: a great stand and decent enough keyboard that doesn’t get in the way when I just want to read.

Free idea for anyone interested in making a mockumentary: a band that specialises in “Musak,” the type of music you hear in lifts or dental offices. They’re trying to make it to the big leagues — a well known department store, like a Myer or Macies — and they’re up against other bands getting better gigs, the Musak industry “big-wigs,” and their own shortcomings. Sort of like “Spinal Tap” meets the doctors waiting room.

It’s ironic to think that part of my job is to make sure that the nice artwork that I see on our 500 and 404 error pages are never seen by anyone else.

Ah, hello, my “is this article helpful?” popup friend, the ugly cousin of all the “please rate this experience” solicitations everyone seems to get. Oh, and I see you’re the super helpful one that covers up the very text I’m trying to read.

A HTML modal over prose with the prompt 'Is this article helpful?' with a 'Yes' and 'No' button

It’s always fascinating browsing the early methods and properties of the DOM. It feels a bit like an archeologist shifting through strata uncovering facts about some long lost civilisation. “Oh, they didn’t call them query parameters back then. Instead, they were known as search strings.”

One other skill I wish I had was good audio mastering skills. Been going through some more tapes last night and it would be so sweet to be able to remove the loud hiss some of them have. I know what I need to do in principal, but translating that into an FX chain in Logic Pro is where my gap lie.

Browsing some of the WeblogPoMo posts on Mastodon the past few days. A lot of great posts, plus some really talented web designers out there. Wish I had their artistic or web-design skills.

My second favourite word to write in a Jira ticket, after augment, is “decommission”. I’m basically using it as an euphemism for “rip this unused code out”. To have made a few tickets with this word today feels glorious. 😊

As Someone Who Works In Software

As someone who works in software…

  1. I cringe every time I see society bend to the limitations of the software they use. It shouldn’t be this way; the software should serve the user, not the other way around.
  2. I appreciate a well designed API. Much of my job is using APIs built by others, and the good ones always feel natural to use, like water flowing through a creek. Conversely, a badly designed API makes me want to throw may laptop to the ground.
  3. I think a well designed standard is just as important as a well designed API. Thus, if you’re extending the standard in a way that adds a bunch of exceptions to something that’s already there, you may want to reflect on your priorities and try an approach that doesn’t do that.
  4. I also try to appreciate, to varying levels of success, that there are multiple ways to do something and once all the hard and fast requirements are settled, it usually just comes down to taste. I know what appeals to my taste, but I also (try to) recognise that others have their own taste as well, and what appeals to them may not gel with me. And I just have to deal with it. I may not like it, but sometimes we have to deal with things we don’t like.
  5. I believe a user’s home directory is their space, not yours. And you better have a bloody good reason for adding stuff there that the user can see and didn’t ask for.

My favourite gym t-shirt. All the Aussies would get this reference.

A black T-shirt is displayed with the text 'WE'RE GOIN TA BONNIE DOON' printed on the front in bold white letters, with the line Bonnie Doon Hotel below it on the right.

This I got from an op-shop but I have been to the Bonnie Doon Hotel a few times. It’s actually pretty nice.

📺 Taitset

Discovered another YouTube channel about Victorian railways this evening. This one’s more about history and operations and less pure cab-rides. A lot of fascinating information about locations that I’m very familiar with.

It’s already May and I’m way behind on my reading goals for the year.

Screenshot of the Micro.blog reading goals for 2024, showing 1 book read with a goal of reading 10. The single book cover is blank.

The trouble is that the book that I want to read next is one I’ve read before, which doesn’t really count towards my goal. Well, I guess it could, since I haven’t listed it here. Maybe I’ll let myself this one pass.