Writing Good Data Migration Scripts
I’m waiting for a data migration to finish, so I’ve naturally got migration scripts on my mind.
There’s an art to writing a good migration script. It may seem that simply throwing together a small Python script would be enough; and for the simpler cases, it very well might be. But it’s been my experience that running the script in prod is likely to be very different than doing test runs in dev.
There are a few reasons for this. For one thing, prod is likely to have way more data so there will always be more to migrate. And dealing with production data is always going to be a little more stressful than in non-prod, especially when you consider things like restricted access and the impact of things going wrong. So a script with a better “user experience” is always going to be better than one slapped togeather.
So without further ado, here’s the attributes for what I think makes for a good migration script:
- No migration script — If you can get away with not writing a migration script, then this is preferred option. Of course, this will depend on how much data you’ll need to migrate, and how complicated keeping support for the previous version in your code-base. If the amount of data is massive (we’re talking millions or hundred of millions of rows), then this is probably your only option. On the other hand, if there’s a few hundred or a few thousands, then it’s probably just worth migrating the data.
- Always indicate progress — You’re likely going to have way more data in prod that your dev environments so consider showing ongoing progress of the running script. If there’s multiple stages in the migration process, make sure you log when each stage begins. If you’re running a scan or processing records, then give some indication of progress through the collection of rows. A progress bar is nice, but failing that, include a log message say every 1,000 records or so.
- Calculate expected migration size if you can — If it’s relatively cheap to get a count of the number of records that need to be migrated, then it’s helpful for the user to report this to the user. Even an estimate would be good just to give a sense of magnitude. If it’ll be too expensive to do so, then you can ignore it: better to just get migrating rather than have the user wait for a count.
- Silence is golden — Keep logging to the screen to a minimum, mainly progress indicators plus and any serious warnings or errors. Avoid bombarding the user with spurious log messages. They want to know when things go wrong, otherwise they just want to know that the script is running properly. That said:
- Log everything to a file — If the script is involved with migrating data, but will ignored records that have already been migrated, then log that records will be skipped. What you’re looking for is assurance that all records have been dealt with, meaning that any discrepancy with the summary report (such as max records encountered vs. max records migrated) can be reconciled with the log file.
May your migrations be quite and painless.