Technical SEO

robots.txt

AI Crawlers

Technical SEO

The robots.txt Mistake That's Blocking AI From Recommending You

Your robots.txt file might be silently blocking AI crawlers from reading your site. Here's how to check, what to allow, and the exact configuration that works.

Jipianu Adin-DanielMarch 20, 20268 min read

Jipianu Adin-Daniel

CTO & Co-Founder at Recomaze. AI and ecommerce expert with years of experience in search technology, generative engine optimization (GEO), and AI visibility strategies. Specialist in helping ecommerce businesses get discovered and recommended by AI assistants like ChatGPT, Perplexity, and Google AI.

You Might Be Blocking AI Right Now

So here's something fun. About 30% of websites are accidentally blocking AI crawlers from reading their content.

Not intentionally. Nobody sat down and said "I don't want ChatGPT to recommend my products." It's just that their robots.txt file was set up years ago for Google crawling, and nobody updated it when AI crawlers became a thing.

The result? ChatGPT, Perplexity, and Google AI literally cannot read your website. They can't recommend what they can't see.

What Is robots.txt and Why Should You Care?

robots.txt is a text file that lives at the root of your website (yourdomain.com/robots.txt). It tells crawlers which parts of your site they're allowed to access.

Back in the day, this was mostly about Google. You'd block admin pages, duplicate content, staging environments. Standard stuff.

But now there are AI-specific crawlers. GPTBot (OpenAI), PerplexityBot, ClaudeBot (Anthropic), Google-Extended (for Gemini). Each one reads your robots.txt to decide whether it's allowed to crawl your pages.

If your robots.txt blocks them, or doesn't mention them at all, you might be invisible to AI recommendations.

Want to know right now if AI crawlers can reach your site? Run a free Recomaze audit. It checks your robots.txt, tests actual crawler access, and tells you exactly which AI bots are allowed and which are blocked. Takes about a minute.

The Common robots.txt Mistakes

1. Blocking Everything by Default

Some sites have this:

User-agent: *
Disallow: /

This blocks ALL crawlers from ALL pages. Including AI crawlers. Your entire site is invisible. Usually this was a leftover from a staging environment that accidentally went to production.

2. Blocking Specific AI Crawlers Without Realizing It

Some CMS platforms or security plugins add blocks for "unknown" crawlers. Since AI bots are relatively new, they often get caught in these blanket blocks.

Check if your robots.txt or server configuration has rules that block user agents containing "bot" or "crawler" generically.

3. Not Mentioning AI Crawlers at All

If your robots.txt only has rules for Googlebot, AI crawlers fall under the User-agent: * rule. If that rule is restrictive, AI bots are restricted too.

4. Blocking Important Content Paths

Some sites block /products/, /blog/, or /api/ paths. If AI crawlers can reach your homepage but not your product pages, they can't recommend your products.

The robots.txt Configuration That Works

Here's what you want. A robots.txt that explicitly allows AI crawlers to access your important content:

# AI Crawlers - Allow access
User-agent: GPTBot
Allow: /
Disallow: /admin/
Disallow: /checkout/
Disallow: /account/

User-agent: ChatGPT-User
Allow: /
Disallow: /admin/
Disallow: /checkout/
Disallow: /account/

User-agent: PerplexityBot
Allow: /
Disallow: /admin/
Disallow: /checkout/
Disallow: /account/

User-agent: ClaudeBot
Allow: /
Disallow: /admin/
Disallow: /checkout/
Disallow: /account/

User-agent: Google-Extended
Allow: /
Disallow: /admin/
Disallow: /checkout/
Disallow: /account/

# Google
User-agent: Googlebot
Allow: /
Disallow: /admin/
Disallow: /checkout/
Disallow: /account/

# Default
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /checkout/
Disallow: /account/

Sitemap: https://yourdomain.com/sitemap.xml

The key principle: allow by default, block only private pages. Admin panels, checkout flows, account pages. Everything else should be open.

Which AI Crawlers Matter Most?

Here's your priority list:

GPTBot and ChatGPT-User (OpenAI) - Powers ChatGPT, the most popular AI assistant. If you block these, you're invisible to the largest AI audience.

Google-Extended - Controls whether Google uses your content for Gemini and AI Overviews. Block this and you disappear from Google's AI answers.

PerplexityBot - Perplexity is the fastest-growing AI search engine. Their bot crawls aggressively and cites sources well.

ClaudeBot (Anthropic) - Powers Claude, another major AI assistant. Growing fast in professional and enterprise use.

Bytespider (ByteDance) - Powers the AI behind TikTok's search features. Smaller impact but growing.

Pro tip: run a Recomaze audit and check the "AI Crawler Access" section of your report. It tests each major AI bot individually and tells you which ones can actually reach your pages. Some sites allow GPTBot but block PerplexityBot without realizing it.

Platform-Specific Instructions

Shopify

Shopify generates your robots.txt automatically. You can customize it through the robots.txt.liquid template. Go to Online Store > Themes > Edit Code > Templates > robots.txt.liquid.

Add the AI crawler rules above. Shopify's default robots.txt is generally AI-friendly, but it's worth checking.

WordPress

Your robots.txt is usually at your domain root. Some security plugins (Wordfence, Sucuri, iThemes) add crawler blocks. Check your plugin settings.

With Rank Math: go to Rank Math > General Settings > Edit robots.txt. Add the AI crawler rules.

With Yoast: go to Yoast > Tools > File Editor > robots.txt.

BigCommerce

BigCommerce lets you edit robots.txt directly: Server Settings > Search engine robots.

Custom Sites

Edit the robots.txt file directly in your web root. If you use a CDN (Cloudflare, Fastly), make sure the CDN isn't adding its own bot-blocking rules on top of your robots.txt.

Beyond robots.txt: Other Things That Block AI

robots.txt isn't the only thing that can block AI crawlers. Watch out for:

Server-side bot detection - Some WAFs (Web Application Firewalls) block AI crawlers based on user-agent strings. Cloudflare's "Bot Fight Mode" can do this if set to aggressive.

Rate limiting - If your server returns 429 (Too Many Requests) to AI crawlers, they'll stop trying. Make sure your rate limits allow reasonable crawling.

JavaScript-rendered content - If your content only appears after JavaScript executes, some AI crawlers won't see it. How AI crawlers read your website covers this in detail.

Login walls - Content behind authentication is invisible to AI. If you have gated content, consider making at least a preview or summary publicly accessible.

Noindex meta tags - A page can be crawlable (robots.txt allows it) but still not indexed if it has a noindex meta tag. Check your meta tags on important pages.

How to Test Your Configuration

After updating your robots.txt, test it:

Google's robots.txt Tester - In Google Search Console, test if Googlebot can access your pages
Manual check - Visit yourdomain.com/robots.txt in your browser and read through the rules
Recomaze audit - The AI Crawler Setup Checklist covers testing in detail
Fetch as bot - Use curl with different user-agent strings to test server responses

curl -A "GPTBot" https://yourdomain.com/
curl -A "PerplexityBot" https://yourdomain.com/

If you get a 200 response, the crawler can reach your page. If you get 403 or a redirect, something is blocking it.

The Quick Fix Checklist

Check your robots.txt at yourdomain.com/robots.txt right now
Look for any Disallow: / rules that block everything
Add explicit Allow rules for GPTBot, PerplexityBot, ClaudeBot, and Google-Extended
Block only private pages (admin, checkout, account)
Check your CDN and firewall settings for bot-blocking rules
Add your sitemap URL at the bottom of robots.txt
Test with curl commands using different bot user-agents

This takes about 10 minutes. And if your AI crawlers were blocked, this one fix can be the difference between being invisible and being recommended.

Check if AI crawlers can reach your site - free Recomaze audit tests GPTBot, PerplexityBot, and other AI crawlers against your actual configuration. See exactly what's blocked and what's allowed. No account needed.

robots.txt

AI Crawlers

Technical SEO

GPTBot

PerplexityBot

Crawler Access

The robots.txt Mistake That's Blocking AI From Recommending You

You Might Be Blocking AI Right Now

What Is robots.txt and Why Should You Care?

The Common robots.txt Mistakes

1. Blocking Everything by Default

2. Blocking Specific AI Crawlers Without Realizing It

3. Not Mentioning AI Crawlers at All

4. Blocking Important Content Paths

The robots.txt Configuration That Works

Which AI Crawlers Matter Most?

Platform-Specific Instructions

Shopify

WordPress

BigCommerce

Custom Sites

Beyond robots.txt: Other Things That Block AI

How to Test Your Configuration

The Quick Fix Checklist

Check Your AI Readiness

Audit Assistant

You Might Be Blocking AI Right Now

What Is robots.txt and Why Should You Care?

The Common robots.txt Mistakes

1. Blocking Everything by Default

2. Blocking Specific AI Crawlers Without Realizing It

3. Not Mentioning AI Crawlers at All

4. Blocking Important Content Paths

The robots.txt Configuration That Works

Which AI Crawlers Matter Most?

Platform-Specific Instructions

Shopify

WordPress

BigCommerce

Custom Sites

Beyond robots.txt: Other Things That Block AI

How to Test Your Configuration

The Quick Fix Checklist

Related Articles

Image Optimization for AI: Why Your Product Photos Need More Than Good Lighting

Site Architecture for AI: How Your Website Structure Affects AI Recommendations

Internal Linking: The AI Navigation Map Most Stores Completely Ignore

Check Your AI Readiness

Audit Assistant