Cloudflare Blocks Perplexity From Crawling 25% of The Internet's Domains

13 August 2025
A new standard is being set for digital content. Cloudflare has officially de-listed Perplexity AI from its verified bot program and implemented blocking measures against what it characterises as deceptive web scraping practices.

A new standard is being set for digital content. Cloudflare has officially de-listed Perplexity AI from its verified bot program and implemented blocking measures against what it characterises as deceptive web scraping practices. This move affects approximately 25% of global web traffic that flows through Cloudflare's infrastructure, fundamentally reshaping how AI companies can access internet content.

The implications extend far beyond a single corporate dispute. This represents the first major infrastructure-level response to AI companies systematically extracting value from content creators without compensation, the unsustainable model we explored in our recent analysis of the AI content grab.

What Cloudflare Actually Discovered

When Cloudflare customers complained that Perplexity was still accessing their content despite blocking measures, Cloudflare didn't just take their word for it. They ran a controlled experiment that should concern every business publishing content online.

The setup was elegant. Cloudflare created brand-new domains that had never been indexed by search engines or made publicly accessible. They implemented robots.txt files that explicitly told all bots to stay away. Then they asked Perplexity AI questions about these hidden websites.

The results were damning. Perplexity provided detailed information about content that could only have been accessed by directly violating the blocking directives.

But here's where it gets interesting. Cloudflare discovered Perplexity wasn't just ignoring the rules, they were actively trying to hide their violations.

The Sophisticated Deception Behind the Scenes

What Cloudflare uncovered reads like a playbook for systematic deception. When Perplexity's official crawlers were blocked, the company deployed what Cloudflare calls "stealth crawlers" that used sophisticated tactics to bypass website protections.

  • The Disguise Game: Perplexity's bots started impersonating regular users by spoofing user agent strings to look like Google Chrome running on a Mac. Instead of identifying themselves as AI crawlers, they pretended to be human browsers.
  • The Shell Game: The company rotated through different IP addresses and even changed their Autonomous System Numbers (ASNs), essentially using different internet neighbourhoods to avoid detection.
  • The Scale: This wasn't small-scale testing. Cloudflare observed this behaviour across tens of thousands of domains, with Perplexity's stealth crawlers making 3-6 million additional requests per day beyond their 20-25 million declared requests.

When confronted with evidence, Perplexity's response was telling. They dismissed Cloudflare's research as a "sales pitch" and claimed the identified bots "aren't even ours." Later, they escalated to personal attacks, calling Cloudflare's leadership "dangerously misinformed" and claiming they're "more flair than cloud."

Meanwhile, OpenAI Shows How It Should Be Done

The contrast with OpenAI couldn't be starker. When Cloudflare ran identical tests with ChatGPT, they observed completely different behaviour.

ChatGPT fetched the robots.txt files and stopped crawling when blocked. No follow-up attempts. No stealth crawlers. No user agent spoofing. When presented with blocking pages, ChatGPT respected the directive and moved on.

OpenAI has also implemented the proposed Web Bot Auth standard, which provides cryptographic signatures for their requests, enabling transparent identification and verification.

This proves that AI companies can build powerful systems without resorting to deceptive practices. The technology exists to respect website owner preferences while delivering valuable AI capabilities.

Taking Control of Your Content Strategy

If you're publishing valuable content online, you're likely experiencing AI extraction whether you know it or not. The Perplexity situation reveals that traditional blocking methods aren't enough against determined AI companies.

Start with the Basics

Check your robots.txt file and ensure it specifically addresses AI crawlers. Generic blocking statements may not provide adequate protection against sophisticated evasion tactics.

Review your Cloudflare or hosting provider settings to understand what AI access you're currently allowing by default.

Think Strategically About Your Content

Not all content deserves the same protection. Your proprietary research, unique analysis, and competitive insights are valuable assets that AI companies are using to build billion-dollar businesses.

Consider which AI applications align with your business objectives. Search enhancement that drives traffic differs fundamentally from training data collection that replaces your content.

Implement Technical Protection

Beyond robots.txt files, consider Web Application Firewall rules that provide enforceable blocking against evasion tactics.

Monitor your traffic analytics for suspicious patterns, unusual user agent distributions, high-volume requests from diverse IP ranges, or traffic spikes without corresponding engagement increases may signal unauthorised AI access.

Take Control of the Relationship

The era of free AI content extraction is ending. Forward-thinking organisations are establishing licensing frameworks and compensation models for AI access to their content.

Cloudflare's "Pay Per Crawl" program transforms content access into revenue streams, enabling you to monetise AI usage of your valuable content rather than subsidising competitors.

The Bigger Picture: Your AI Strategy Needs to Address Both Sides

The Perplexity controversy illuminates a critical blind spot in most AI strategies. While organisations focus on implementing AI tools internally, they're often ignoring how AI companies are extracting value from their content externally.

Successful AI strategy requires addressing both dimensions: how you use AI and how AI uses you.

The permission-based economy emerging from infrastructure decisions like Cloudflare's creates new opportunities for organisations that proactively establish AI governance frameworks. The key is making intentional decisions about AI engagement rather than accepting whatever terms AI companies dictate.

The Strategic Questions You Need to Answer

How valuable is your content in AI training and inference contexts? Which AI applications align with your business objectives versus compete with them? What protective measures and monetisation opportunities should you implement?

These decisions compound over time. Organisations that establish strong AI governance positions today will have significant advantages as the permission-based economy matures.

Building Your AI-Ready Organisation

As AI reshapes how content is accessed, created, and monetised, your organisation needs comprehensive strategy that addresses both opportunities and threats.

This isn't just about blocking bad actors like Perplexity. It's about positioning your business advantageously in an AI-driven economy where content value, technical capabilities, and strategic partnerships determine competitive success.

At Adaca, we help organisations navigate exactly these complexities through our AI Transformation service. We don't just implement AI tools, we help you build strategic frameworks for thriving in the permission-based AI economy.

Our AI Readiness assessment evaluates your content assets, competitive positioning, and governance needs to ensure you're prepared for the AI-driven future that's rapidly emerging.

Whether you're protecting high-value content from extraction, implementing responsible AI usage policies, or building AI capabilities that respect intellectual property rights, we provide the strategic guidance and technical expertise you need.

The decisions you make about AI today will determine your competitive position tomorrow. Don't let AI companies write the rules for how they access and use your valuable content.