On TOON And Human Readable Data Formats

Today I learnt about a format call TOON: Token-Oriented Object Notation, which is a data interchange format like JSON, that is geared towards being more efficient for use with LLMs. How does it do so? By stripping back the syntax of a JSON like structure to one that is almost a CSV file, with a type signature describing how the data is structured. It claims that doing so produces a format that is ingestible by LLMs while still being (somewhat) human readable.

Well, that is all well and good, but some syntax affordance for making it easy to work with would be appreciated. This was the huge issue with JSON: making a format that prioritises the machine over the human, yet claiming that it’s still human readable because it’s all “plain text.” Such designers forget that if the format is human readable, then it’s also going to be human writable. And I really wish the people designing JSON actually considered this when they were deciding to strip out comments or trying to get it working with Microsoft’s crappy JavaScript parser that shipped with IE. Sure, it made it easier to build the parsers, but now you’ve got an annoying data format that freaks out whenever an extra comma is missing.

Did nobody learn anything from the days of WSDL and XSD? Do not assume that your human readable interchange format will always have a frontend that is user friendly. If people can read it, they will modify it. The law of least resistance applies to technologists just as well as it does to travelling electrons. I can sort of admire Protobuf here: it’s makes no apologies for not being human readable. Binary or nada.

Anyway, I’ve not had any experience with TOON so I can’t say how easy it is to work with. But please, no more text-based data structure formats that assume humans will read but won’t write. At the very least, always include comments.