AI Chatbots Are Ignoring Robots.txt and Scraping Blocked Content

AI chatbots are giving away all your content for free.

Another major revelation from the Columbia Journalism Review study is that AI chatbots are completely ignoring the robots.txt protocol.

For context, robots.txt is a universally accepted (though not legally binding) standard that tells search engines which parts of a website they can and cannot crawl. Historically, Google and other search engines have respected it. But chatbots? Not so much.

Take Perplexity, for example. Despite National Geographic blocking its crawlers, Perplexity still managed to scrape all ten requested quotes from its paywalled articles. This is a huge problem. Publishers use paywalls to monetize well-researched content. If an AI tool can just bypass that and scrape the information for free, it undermines the entire business model.

Imagine creating a course and charging people for access, only to find that an AI chatbot is giving away all your content for free. That’s what’s happening to publishers right now. If AI search engines want to be taken seriously, they need to respect content ownership.