On The Little Things of Data Representation
When it comes to writing code that works with a database, I tend to, or at least try to, obsess over how the data is marshalled. That includes the things you’d expect, like how the data is structured or what information is being stored. But it also includes the little things: what case am I using for the field names; whether an absent value should be represented as either an empty value or an absent field; whether nulls should be a thing, whether they should be different from empty values, etc.
I don’t know how prevalent these feelings are amongst the industry. I was under the impression that it’s pretty fundamental, one could even say self-evident, to know that it’s much easier to change the code than it is to change the data. But in my experience, I found that these feelings are not as widespread as I imagined. I get the impression that there are some out there that take a very… shall we say, “relaxed” approach to these decisions.
I partially attribute this to the introduction of NoSQL databases, like DynamoDB. When the whole pitch with these data stores is that “you don’t need to know how the data is structured, just cram it all in,” it’s shouldn’t be a surprise that developers take on that advice. So I open up a DynamoDB table and a field might use both null or empty to represent the same thing, or one field is in camel-case and another is in capital-camel-case. And I just sigh. It’s not enough to want me to run a data fix, but I do wish it was much neater1.
But it’s not like projects using SQL databases are immune. I remember the bad old days of working with Java and Hibernate where the database schema, rather than being hand-rolled, would be generated from the type hierarchy. Really? You’re going to let an automated tool scan your classes and modify the database schema without review? You know, that thing with customer data? And you’re going to do that at runtime upon launch?
And it’s not like it was a good schema. In fact, it was rather terrible. There was one table that was a single column of incrementing integers for some reason. And the application absolutely ground to a halt whenever data needed to be queried or modified. Yeah, I’m glad I’m not on that project anymore, and I’m far removed from Java that I’m not sure if ORMs are still a thing, but I hope they’re not.
I guess the point I’m trying to make is this: don’t treat your data representation as an afterthought. Take good care of it: the little things like case matter. It’ll be around much longer than you think.
-
I share some of the blame here. After all, it’s my job to review these changes. ↩︎