My Position On Blocking AI Web Crawlers

I’m seeing a lot of posts online about sites and hosting platforms blocking web crawlers used for AI training. I can completely understand their position, and fully support them: it’s their site and they can do what they want.

Allow me to lay my cards on the table. My current position is to allow these crawlers to access my content. I’m choosing to opt in, or rather, not to opt out. I’m probably in the minority here (well, the minority of those I follow), but I do have a few reasons for this, with the principal one being that I use services like ChatGTP and get value from them. So to prevent them from training their models on my posts feels personally hypocritical to me. It’s the same reason why I don’t opt out of Github Copilot crawling my open source projects (although that’s a little more theoretical, as I’m not a huge user of Copilot). To some, this position might sound weird, and when you consider the gulf between what value these AI companies get from scraping the web verses what value I get from them as a user, it may seem downright stupid. And if you approach it from a logical perspective, it probably is. But hey, we’re in the realm of feelings, and right now this is just how I feel. Of course, if I were to make a living out of this site, it would be a different story. But I don’t.

And this leads to the tension I see between site owners making decisions regarding their own content, and services making decisions on behalf of their users. This site lives on Micro.blog, so I’m governed by what Manton chooses to do or not do regarding these crawlers. I’m generally in favour of what Micro.blog has chosen so far: allowing people to block these scrapers via “robots.txt” but not yet blocking requests based on their IP address. I’m aware that others may not agree, and I can’t, in principal, reject the notion of a hosting provider choosing to block this crawlers at the network layer. I am, and will continue to be, a customer of such services.

But I do think some care should be considered, especially when it comes to customers (and non-customer) asking these services to add these network blocks. You may have good reason to demand this, but just remember there are users of these services that have opinions that may differ. I personally would prefer a mechanism where you opt into these crawlers, and this would be an option I’ll probably take (or probably not; my position is not that strong). I know that’s not possible under all circumstances so I’m not going to cry too much if this was not offered to me in lieu of a blanket ban.

I will make a point on some comments that I’ve seen that, if taken in an uncharitable way, imply that creators that have no problem with these crawlers do not care about their content. I think such opinions should be worded carefully. I know how polarising the use of AI currently is, and making such remarks, particularly within posts that are already heated due to the author’s feelings regarding these crawlers, risks spreading this heat to those that read it. The tone gives the impression that creators okay with these crawlers don’t care about what they push online, or should care more than they do. That might be true for some — might even be true for me once in a while — but to make such blanket assumptions can come off as a little insulting. And look, I know that’s not what they’re saying, but it can come across that way at times.

Anyway, that’s my position as of today. Like most things here, this may change over time, and if I become disenfranchised with these companies, I’ll join the blockade. But for the moment, I’m okay with sitting this one out.