• Home  
  • Cloudflare Blocks Mixed‑Use Crawlers on Sites with Ads
- Tech Business

Cloudflare Blocks Mixed‑Use Crawlers on Sites with Ads

Cloudflare will default‑block AI web crawlers on ad pages starting September 15, 2026, reshaping how search engines and AI trainers access content.

Cloudflare Blocks Mixed‑Use Crawlers on Sites with Ads

“Now that the majority of traffic on the Internet is non-human, we must go further and act faster so that a sustainable ecosystem can emerge,” Cloudflare CEO Matthew Prince said in the company’s statement. That’s the headline that tells us why Cloudflare is moving from optional protection to a default stance against mixed‑use crawlers that both index for search and feed AI training pipelines.

Historical Context

Cloudflare’s journey toward the September 2026 rollout didn’t happen overnight. In the early 2020s, the company focused on protecting sites from DDoS attacks and malicious bots. As AI models grew more capable, a new class of crawlers emerged—bots that combined traditional search indexing with data‑harvesting for machine‑learning pipelines. Those mixed‑use crawlers raised concerns among publishers who saw their ad‑supported pages being scraped without compensation.

In 2025 Cloudflare introduced the Pay Per Crawl program. The model let site owners block AI crawlers unless a paying party covered the cost of each request. That was the first time the platform tried to monetize access at the request level rather than through traditional advertising. The rollout was optional; owners had to enable the block manually. Early adopters praised the added control, but the opt‑in nature left many sites vulnerable because the default remained open to all bots.

Industry pressure mounted as the proportion of non‑human traffic climbed. Analysts noted that search engines, especially Google, were already blending search indexing with data collection for their own AI projects. Publishers began demanding a clearer separation. Cloudflare responded by shifting the default configuration for new customers, turning the optional block into a built‑in safeguard for ad‑filled pages. The policy change aligns with a broader movement to treat AI‑derived traffic as a distinct commodity, rather than an invisible side effect of search.

Key Takeaways

  • Starting September 15, 2026, new Cloudflare customers will have search enabled but training and agent use blocked on pages that contain ads.
  • Mixed‑use crawlers that don’t let site owners opt out of AI use will be blocked by default on ad pages.
  • The Pay Per Crawl feature is being rebranded as Pay Per Use, paying sites when their content appears in AI chatbot answers.
  • Cloudflare cites the fact that Google accesses roughly 2X more information than leading AI firms, because Google’s crawler mixes search and training.
  • Partnerships with Ceramic.AI and You.com are highlighted, but the company hopes other AI players will join the Pay Per Use program.

Cloudflare’s New Policy on AI Web Crawlers

From September 15, 2026, any new customer who signs up for Cloudflare—or any new site added by an existing subscriber—will inherit a default configuration that lets search engines crawl but blocks AI training on pages that host ads. That’s a shift from the previous opt‑in model, where owners had to manually enable the block. Free‑tier accounts will also be switched unless they opt out before the deadline. The company says the move is meant to give site owners more visibility and commercial opportunities while still letting AI companies that have clear intent access content.

What the Default Changes Mean

In practice, the rule means that if your homepage contains ad slots, Cloudflare will automatically send a 403 response to any crawler that identifies itself as both a search bot and an AI trainer. That’s why the statement talks about “mixed‑use crawlers that don’t give site owners the option to choose whether their site is used for AI.” The policy forces those crawlers to separate their functions or be denied access on ad‑rich pages. We’ve seen a few early adopters already tweaking their robots.txt files to align with the new defaults.

Pay Per Use Replaces Pay Per Crawl

Back in 2025 Cloudflare rolled out Pay Per Crawl, letting sites block AI crawlers unless a company paid to scrape the content. The new Pay Per Use model flips the economics: instead of paying for a crawl, site owners earn money when their content surfaces in AI chatbot answers. That’s a pretty radical pivot, and it hinges on the ability to track when a piece of text is used in a model’s response. Cloudflare says the mechanism will rely on partnerships with AI firms that can report answer‑level usage back to the platform.

How Payments Are Calculated

According to the announcement, payments will be triggered only when the content appears in a chatbot answer, not merely when a page is fetched. That’s why the company highlights its deals with Ceramic.AI and You.com—both are already building answer‑level reporting into their pipelines. If other AI providers adopt the standard, Cloudflare hopes the ecosystem will grow, giving publishers a new revenue stream tied directly to AI‑generated traffic.

Impact on Search Engines and AI Companies

The policy has an obvious side effect: it puts pressure on the biggest search engine, Google, whose crawler—Googlebot—already mixes search indexing with data collection for Gemini and other AI features. Cloudflare notes that Google’s “largest search engine has access to about 2X more information than leading AI companies because they make it difficult for customers to remain discoverable without also being used for AI.” That’s a direct jab at Google’s lack of a clean‑separate crawler for pure search results.

Google’s Position

Google does offer a separate Google‑Extended crawler for traditional search, but the company hasn’t provided a way for publishers to stay out of AI Mode while still appearing in search. That means many sites are forced to choose between visibility and protecting their content from AI training. Cloudflare’s default block forces Google and other mixed‑use services to either split their bots or lose access to ad‑supported pages. If Google decides to respect the new defaults, we might finally see a clean line between search and AI training.

Developer and Site Owner Reactions

Early feedback from developers has been mixed. Some appreciate the extra control, saying they’ve been worried about AI models scraping their paywalled articles for free. Others worry that the default block could hurt SEO if a crawler misclassifies itself and gets shut out. We’ve heard from a few site operators who’ve already updated their Cloudflare settings to whitelist specific AI services they trust, using the new Pay Per Use dashboard to monitor earnings.

Practical Steps for Site Operators

If you run a site behind Cloudflare, you’ll need to review the default settings before September 15. That means logging into the Cloudflare dashboard, checking the “Mixed‑Use Crawler” toggle, and deciding whether you want to keep the block on ad pages. You’ll also want to explore the Pay Per Use enrollment page to see if you can start earning from answer‑level usage right away. Remember, you can opt out of the default at any time, but the process is a few clicks away.

What This Means For You

For developers building AI‑powered products, the change means you’ll have to adjust your crawling strategy. If your bot needs training data, you’ll now have to request explicit permission from site owners or work with Cloudflare‑approved partners like Ceramic.AI or You.com. That could add friction, but it also pushes the industry toward more transparent data collection practices.

For site owners, the policy offers a new revenue opportunity through Pay Per Use, but it also demands vigilance. You’ll need to monitor which AI services are pulling your content into chatbot answers and make sure the payments line up with your expectations. In short, the default block forces you to think about how AI interacts with your ad inventory, and it gives you a lever to monetize that interaction.

Scenario 1: News Publisher with Ad‑Supported Articles

A daily news site that relies on banner ads to fund reporting can now let Google index its headlines while blocking AI training on the article body. The site’s Cloudflare settings will return a 403 to any bot that claims both search and training roles on the article page. When a partnered AI chatbot later references a headline in a user answer, the Pay Per Use system will credit the publisher for that specific usage. The publisher can track earnings per answer and decide whether to extend the block to other sections, such as opinion pieces that lack ads.

Scenario 2: E‑Commerce Store with Product Listings

An online retailer that displays product ads on each listing page can keep its catalog discoverable by search engines. The default block will stop AI trainers from ingesting the full product description unless the retailer whitelists a partner. If a chatbot later pulls the product name and price into a conversational response, the retailer receives a payment tied to that answer. The store can use the dashboard to compare revenue from traditional sales versus AI‑driven referrals, adjusting its strategy accordingly.

Scenario 3: Startup Building a Knowledge Base Bot

A startup that aggregates technical documentation for a chatbot must now negotiate access with each site that hosts ad‑filled pages. The team can either integrate with a Cloudflare‑approved partner that already reports answer‑level usage, or they can request direct permission from site owners. While the extra step adds overhead, it also creates a clear audit trail: every piece of content used in a bot answer is logged, and the originating site earns a share of the revenue.

Key Questions Remaining

  • Will Google develop a truly separate crawler that respects the default block, or will it adapt its existing bot to pass the mixed‑use test?
  • How quickly will other AI firms adopt the answer‑level reporting standard required for Pay Per Use payments?
  • What mechanisms will Cloudflare provide for dispute resolution if a site believes it’s been incorrectly blocked or under‑paid?
  • Can the Pay Per Use model scale to cover the massive volume of AI‑generated queries without causing latency or overhead for publishers?

“Now that the majority of traffic on the Internet is non-human, we must go further and act faster so that a sustainable ecosystem can emerge,” Matthew Prince said.

Sources: Engadget, TechCrunch

About the Author

— AI & Technology Reporter

Halil Kale is an AI and technology reporter at AI Post Daily, where he covers artificial intelligence, machine learning, cybersecurity, and the business of tech. With a background in computer science and over five years of experience tracking the AI industry, Halil specializes in translating complex technical developments into clear, actionable insights for developers, founders, and technology professionals. He has reported on breakthroughs from Anthropic, OpenAI, Google DeepMind, and NVIDIA, as well as critical cybersecurity incidents and emerging robotics applications. Halil believes that understanding AI is no longer optional — it's essential for anyone working in or around technology. At AI Post Daily, he applies rigorous editorial standards to ensure every story is accurate, sourced, and genuinely useful to readers.

About AI Post Daily

Independent coverage of artificial intelligence, machine learning, cybersecurity, and the technology shaping our future.

Contact: Get in touch

We use cookies to personalize content and ads, and to analyze traffic. By using this site, you agree to our Privacy Policy.