OpenAI and Anthropic are ignoring an established rule that prevents bots scraping online content

IndustryStandard@lemmy.world · 6 months ago

OpenAI and Anthropic are ignoring an established rule that prevents bots scraping online content

gaylord_fartmaster@lemmy.world · 6 months ago

Am I missing something in this article? I’m not defending either company, but it doesn’t seem like they actually have any evidence to confirm either is doing this.

The world’s top two AI startups are ignoring requests by media publishers to stop scraping their web content for free model training data, Business Insider has learned.

It claims this, but then they say this about the source of this info:

TollBit, a startup aiming to broker paid licensing deals between publishers and AI companies, found several AI companies are acting in this way and informed certain large publishers in a Friday letter, which was reported earlier by Reuters. The letter did not include the names of any of the AI companies accused of skirting the rule.

So their source doesn’t actually say which companies are doing this, but then they jump straight into this:

AI companies, including OpenAI and Anthropic, are simply choosing to “bypass” robots.txt in order to retrieve or scrape all of the content from a given website or page.

So they’re just concluding that based on nothing and reporting it as fact?

ChickenLadyLovesLife@lemmy.world · 6 months ago

So cynical … what makes you think “a startup aiming to broker paid licensing deals between publishers and AI companies” can’t be trusted implicitly?