Quick Answer
Your robots.txt file may be blocking AI crawlers from accessing your content, making you invisible to ChatGPT, Claude, Perplexity, and Google AI Overview. To check, visit yourdomain.com/robots.txt and look for rules blocking GPTBot, ClaudeBot, anthropic-ai, PerplexityBot, or Google-Extended. If blocked, you're invisible to these AI systems.
The Problem: Accidentally Invisible
Many websites have robots.txt files that were created before AI search existed. These files might:
- Block all bots by default
- Include AI crawlers in broad blocking rules
- Use outdated configurations that didn't consider AI
The result: your content is completely invisible to AI systems, regardless of how good it is.
How to Check Your robots.txt
Step 1: View Your File
Go to: yourdomain.com/robots.txt
You'll see a text file with rules like:
```
User-agent: *
Disallow: /admin/
```
Step 2: Look for AI Crawler Blocks
Search for these AI-related user agents:
OpenAI/ChatGPT:
- GPTBot
- ChatGPT-User
Anthropic/Claude:
- anthropic-ai
- Claude-Web
- ClaudeBot
Google AI:
- Google-Extended
Perplexity:
- PerplexityBot
Microsoft/Bing AI:
- Bingbot (also used for Copilot)
Step 3: Check for Blocking Rules
Problematic patterns:
```
# This blocks all AI crawlers
User-agent: GPTBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
```
```
# This blanket block affects AI too
User-agent: *
Disallow: /
```
Common Blocking Scenarios
Scenario 1: Intentional Blocking
Some sites deliberately block AI:
```
User-agent: GPTBot
Disallow: /
```
If this is intentional, understand the trade-off: you're choosing not to appear in AI results.
Scenario 2: Overly Broad Rules
Generic blocking that catches AI:
```
User-agent: *
Disallow: /
Allow: /public/
```
This blocks everything except /public/, including AI crawlers.
Scenario 3: Outdated Files
Files created before AI didn't account for these crawlers. They may not explicitly block AI but also don't explicitly allow it.
Scenario 4: Platform Defaults
Some CMS platforms or hosts add default robots.txt rules that may affect AI access.
How to Configure robots.txt for AI Visibility
Recommended Configuration
```
# Allow AI crawlers to access content
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: PerplexityBot
Allow: /
# Block sensitive areas from all bots
User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /api/
```
What to Allow
Generally allow access to:
- Blog posts and articles
- Service/product pages
- About and company information
- Resources and guides
What to Block
Consider blocking:
- Admin areas
- User account pages
- Internal tools
- Checkout/cart pages
- Private content
Partial Access
You can allow AI to some areas:
```
User-agent: GPTBot
Allow: /blog/
Allow: /services/
Disallow: /
```
This allows AI to your blog and services but nothing else.
Testing Your Configuration
Manual Testing
After updating robots.txt:
- Visit yourdomain.com/robots.txt to confirm changes
- Wait a few weeks for AI systems to re-crawl
- Test by asking AI about your content
Google's Robots Testing Tool
Google Search Console has a robots.txt tester that helps validate your syntax.
Third-Party Tools
Various SEO tools can analyze your robots.txt for issues.
Important Considerations
robots.txt vs Actual Access
robots.txt is a request, not enforcement. AI systems choose whether to honor it. However, major AI companies (OpenAI, Anthropic, Google) do respect robots.txt.
Caching and Delays
Changes to robots.txt don't take effect immediately. AI systems re-crawl periodically, so changes may take weeks to reflect in AI responses.
Training Data vs Real-Time
Some AI content comes from training data (historical), some from real-time crawling. robots.txt primarily affects real-time access.
Partial Blocking Trade-offs
Blocking AI from some content means that content won't be cited. Consider whether the trade-off makes sense for each section.
Beyond robots.txt
robots.txt is one factor in AI visibility. Even with perfect robots.txt, you also need:
What's Next?
After fixing robots.txt:
