Technical SEO

Is Your robots.txt Blocking AI Crawlers? Here's How to Check

February 9, 2026By Brenden Parker
Illustration for Is Your robots.txt Blocking AI Crawlers? Here's How to Check - Your robots.txt might be accidentally blocking AI systems from accessing your content. Learn how to

Quick Answer

Your robots.txt file may be blocking AI crawlers from accessing your content, making you invisible to ChatGPT, Claude, Perplexity, and Google AI Overview. To check, visit yourdomain.com/robots.txt and look for rules blocking GPTBot, ClaudeBot, anthropic-ai, PerplexityBot, or Google-Extended. If blocked, you're invisible to these AI systems.

The Problem: Accidentally Invisible

Many websites have robots.txt files that were created before AI search existed. These files might:

  • Block all bots by default
  • Include AI crawlers in broad blocking rules
  • Use outdated configurations that didn't consider AI

The result: your content is completely invisible to AI systems, regardless of how good it is.

How to Check Your robots.txt

Step 1: View Your File

Go to: yourdomain.com/robots.txt

You'll see a text file with rules like:

```

User-agent: *

Disallow: /admin/

```

Step 2: Look for AI Crawler Blocks

Search for these AI-related user agents:

OpenAI/ChatGPT:

  • GPTBot
  • ChatGPT-User

Anthropic/Claude:

  • anthropic-ai
  • Claude-Web
  • ClaudeBot

Google AI:

  • Google-Extended

Perplexity:

  • PerplexityBot

Microsoft/Bing AI:

  • Bingbot (also used for Copilot)

Step 3: Check for Blocking Rules

Problematic patterns:

```

# This blocks all AI crawlers

User-agent: GPTBot

Disallow: /

User-agent: anthropic-ai

Disallow: /

```

```

# This blanket block affects AI too

User-agent: *

Disallow: /

```

Common Blocking Scenarios

Scenario 1: Intentional Blocking

Some sites deliberately block AI:

```

User-agent: GPTBot

Disallow: /

```

If this is intentional, understand the trade-off: you're choosing not to appear in AI results.

Scenario 2: Overly Broad Rules

Generic blocking that catches AI:

```

User-agent: *

Disallow: /

Allow: /public/

```

This blocks everything except /public/, including AI crawlers.

Scenario 3: Outdated Files

Files created before AI didn't account for these crawlers. They may not explicitly block AI but also don't explicitly allow it.

Scenario 4: Platform Defaults

Some CMS platforms or hosts add default robots.txt rules that may affect AI access.

How to Configure robots.txt for AI Visibility

Recommended Configuration

```

# Allow AI crawlers to access content

User-agent: GPTBot

Allow: /

User-agent: ChatGPT-User

Allow: /

User-agent: anthropic-ai

Allow: /

User-agent: ClaudeBot

Allow: /

User-agent: Google-Extended

Allow: /

User-agent: PerplexityBot

Allow: /

# Block sensitive areas from all bots

User-agent: *

Disallow: /admin/

Disallow: /private/

Disallow: /api/

```

What to Allow

Generally allow access to:

  • Blog posts and articles
  • Service/product pages
  • About and company information
  • Resources and guides

What to Block

Consider blocking:

  • Admin areas
  • User account pages
  • Internal tools
  • Checkout/cart pages
  • Private content

Partial Access

You can allow AI to some areas:

```

User-agent: GPTBot

Allow: /blog/

Allow: /services/

Disallow: /

```

This allows AI to your blog and services but nothing else.

Testing Your Configuration

Manual Testing

After updating robots.txt:

  1. Visit yourdomain.com/robots.txt to confirm changes
  2. Wait a few weeks for AI systems to re-crawl
  3. Test by asking AI about your content

Google's Robots Testing Tool

Google Search Console has a robots.txt tester that helps validate your syntax.

Third-Party Tools

Various SEO tools can analyze your robots.txt for issues.

Important Considerations

robots.txt vs Actual Access

robots.txt is a request, not enforcement. AI systems choose whether to honor it. However, major AI companies (OpenAI, Anthropic, Google) do respect robots.txt.

Caching and Delays

Changes to robots.txt don't take effect immediately. AI systems re-crawl periodically, so changes may take weeks to reflect in AI responses.

Training Data vs Real-Time

Some AI content comes from training data (historical), some from real-time crawling. robots.txt primarily affects real-time access.

Partial Blocking Trade-offs

Blocking AI from some content means that content won't be cited. Consider whether the trade-off makes sense for each section.

Beyond robots.txt

robots.txt is one factor in AI visibility. Even with perfect robots.txt, you also need:

What's Next?

After fixing robots.txt:

Want Results Like These?

Book a free consultation and learn how we can help your business get found by AI.

Explore Services