The robots.txt Mistake That's Blocking AI From Recommending You
Your robots.txt file might be silently blocking AI crawlers from reading your site. Here's how to check, what to allow, and the exact configuration that works.
You Might Be Blocking AI Right Now
So here's something fun. About 30% of websites are accidentally blocking AI crawlers from reading their content.
Not intentionally. Nobody sat down and said "I don't want ChatGPT to recommend my products." It's just that their robots.txt file was set up years ago for Google crawling, and nobody updated it when AI crawlers became a thing.
The result? ChatGPT, Perplexity, and Google AI literally cannot read your website. They can't recommend what they can't see.
What Is robots.txt and Why Should You Care?
robots.txt is a text file that lives at the root of your website (yourdomain.com/robots.txt). It tells crawlers which parts of your site they're allowed to access.
Back in the day, this was mostly about Google. You'd block admin pages, duplicate content, staging environments. Standard stuff.
But now there are AI-specific crawlers. GPTBot (OpenAI), PerplexityBot, ClaudeBot (Anthropic), Google-Extended (for Gemini). Each one reads your robots.txt to decide whether it's allowed to crawl your pages.
If your robots.txt blocks them, or doesn't mention them at all, you might be invisible to AI recommendations.
Want to know right now if AI crawlers can reach your site? Run a free Recomaze audit. It checks your robots.txt, tests actual crawler access, and tells you exactly which AI bots are allowed and which are blocked. Takes about a minute.
The Common robots.txt Mistakes
1. Blocking Everything by Default
Some sites have this:
User-agent: *
Disallow: /This blocks ALL crawlers from ALL pages. Including AI crawlers. Your entire site is invisible. Usually this was a leftover from a staging environment that accidentally went to production.
2. Blocking Specific AI Crawlers Without Realizing It
Some CMS platforms or security plugins add blocks for "unknown" crawlers. Since AI bots are relatively new, they often get caught in these blanket blocks.
Check if your robots.txt or server configuration has rules that block user agents containing "bot" or "crawler" generically.
3. Not Mentioning AI Crawlers at All
If your robots.txt only has rules for Googlebot, AI crawlers fall under the User-agent: * rule. If that rule is restrictive, AI bots are restricted too.
4. Blocking Important Content Paths
Some sites block /products/, /blog/, or /api/ paths. If AI crawlers can reach your homepage but not your product pages, they can't recommend your products.
The robots.txt Configuration That Works
Here's what you want. A robots.txt that explicitly allows AI crawlers to access your important content:
# AI Crawlers - Allow access
User-agent: GPTBot
Allow: /
Disallow: /admin/
Disallow: /checkout/
Disallow: /account/
User-agent: ChatGPT-User
Allow: /
Disallow: /admin/
Disallow: /checkout/
Disallow: /account/
User-agent: PerplexityBot
Allow: /
Disallow: /admin/
Disallow: /checkout/
Disallow: /account/
User-agent: ClaudeBot
Allow: /
Disallow: /admin/
Disallow: /checkout/
Disallow: /account/
User-agent: Google-Extended
Allow: /
Disallow: /admin/
Disallow: /checkout/
Disallow: /account/
# Google
User-agent: Googlebot
Allow: /
Disallow: /admin/
Disallow: /checkout/
Disallow: /account/
# Default
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /checkout/
Disallow: /account/
Sitemap: https://yourdomain.com/sitemap.xmlThe key principle: allow by default, block only private pages. Admin panels, checkout flows, account pages. Everything else should be open.
Which AI Crawlers Matter Most?
Here's your priority list:
GPTBot and ChatGPT-User (OpenAI) - Powers ChatGPT, the most popular AI assistant. If you block these, you're invisible to the largest AI audience.
Google-Extended - Controls whether Google uses your content for Gemini and AI Overviews. Block this and you disappear from Google's AI answers.
PerplexityBot - Perplexity is the fastest-growing AI search engine. Their bot crawls aggressively and cites sources well.
ClaudeBot (Anthropic) - Powers Claude, another major AI assistant. Growing fast in professional and enterprise use.
Bytespider (ByteDance) - Powers the AI behind TikTok's search features. Smaller impact but growing.
Pro tip: run a Recomaze audit and check the "AI Crawler Access" section of your report. It tests each major AI bot individually and tells you which ones can actually reach your pages. Some sites allow GPTBot but block PerplexityBot without realizing it.
Platform-Specific Instructions
Shopify
Shopify generates your robots.txt automatically. You can customize it through the robots.txt.liquid template. Go to Online Store > Themes > Edit Code > Templates > robots.txt.liquid.
Add the AI crawler rules above. Shopify's default robots.txt is generally AI-friendly, but it's worth checking.
WordPress
Your robots.txt is usually at your domain root. Some security plugins (Wordfence, Sucuri, iThemes) add crawler blocks. Check your plugin settings.
With Rank Math: go to Rank Math > General Settings > Edit robots.txt. Add the AI crawler rules.
With Yoast: go to Yoast > Tools > File Editor > robots.txt.
BigCommerce
BigCommerce lets you edit robots.txt directly: Server Settings > Search engine robots.
Custom Sites
Edit the robots.txt file directly in your web root. If you use a CDN (Cloudflare, Fastly), make sure the CDN isn't adding its own bot-blocking rules on top of your robots.txt.
Beyond robots.txt: Other Things That Block AI
robots.txt isn't the only thing that can block AI crawlers. Watch out for:
Server-side bot detection - Some WAFs (Web Application Firewalls) block AI crawlers based on user-agent strings. Cloudflare's "Bot Fight Mode" can do this if set to aggressive.
Rate limiting - If your server returns 429 (Too Many Requests) to AI crawlers, they'll stop trying. Make sure your rate limits allow reasonable crawling.
JavaScript-rendered content - If your content only appears after JavaScript executes, some AI crawlers won't see it. How AI crawlers read your website covers this in detail.
Login walls - Content behind authentication is invisible to AI. If you have gated content, consider making at least a preview or summary publicly accessible.
Noindex meta tags - A page can be crawlable (robots.txt allows it) but still not indexed if it has a noindex meta tag. Check your meta tags on important pages.
How to Test Your Configuration
After updating your robots.txt, test it:
- Google's robots.txt Tester - In Google Search Console, test if Googlebot can access your pages
- Manual check - Visit yourdomain.com/robots.txt in your browser and read through the rules
- Recomaze audit - The AI Crawler Setup Checklist covers testing in detail
- Fetch as bot - Use curl with different user-agent strings to test server responses
curl -A "GPTBot" https://yourdomain.com/
curl -A "PerplexityBot" https://yourdomain.com/If you get a 200 response, the crawler can reach your page. If you get 403 or a redirect, something is blocking it.
The Quick Fix Checklist
- Check your robots.txt at yourdomain.com/robots.txt right now
- Look for any
Disallow: /rules that block everything - Add explicit
Allowrules for GPTBot, PerplexityBot, ClaudeBot, and Google-Extended - Block only private pages (admin, checkout, account)
- Check your CDN and firewall settings for bot-blocking rules
- Add your sitemap URL at the bottom of robots.txt
- Test with curl commands using different bot user-agents
Check if AI crawlers can reach your site - free Recomaze audit tests GPTBot, PerplexityBot, and other AI crawlers against your actual configuration. See exactly what's blocked and what's allowed. No account needed.
