Why this matters to you
When ChatGPT performs a live web search for hotel recommendations, it sends a crawler to your website. So does Perplexity. So does Google AI Mode. Whether those crawlers can read your site — or are blocked from it — is controlled by a single file at the root of your domain: robots.txt.
Most hoteliers set up robots.txt years ago and never revisited it. That file was written for Google and Bing. It says nothing about the new generation of AI crawlers. And that silence has real consequences — in both directions.
We parsed robots.txt from 104,214 hotel websites to find out what hotels are actually telling AI crawlers — and whether any of it is intentional.
95.9% of hotels have no AI-specific crawler rules. Most are open to ChatGPT and Perplexity by accident, not by strategy.
Key findings at a glance
01
95.9% have zero AI-specific blocking rules
Only 4.1% of hotel websites have any AI crawler rule at all. The other 95.9% are open to every bot by default — including training crawlers that build LLM knowledge bases.
02
Training bots are blocked 2.5× more than search bots
Among hotels that do block AI crawlers, training bots (GPTBot, Google-Extended, CCBot) are blocked at 2.5× the rate of search bots. Hotels are more concerned about training data than visibility.
03
Only 2.4% have the optimal configuration
Block training bots, allow search bots. That’s the strategic approach — protecting content from model training while staying visible in ChatGPT and Perplexity answers. Only 2.4% of hotels do this.
04
France is an outlier at 8.1%
One hotel chain’s coordinated decision accounts for ~970 properties. Remove that chain and France drops to 2.3% — matching the US rate. Chain-level decisions dominate the data.
What this means for your hotel
There are two types of AI bots crawling hotel websites, and they have completely different implications:
Training crawlers vs. search crawlers \u2014 different bots, different consequences
| Bot type | Examples | What it does | Block it? |
|---|---|---|---|
| Training crawlers | GPTBot, Google-Extended, CCBot | Builds the model’s knowledge base | Your choice |
| Search crawlers | OAI-SearchBot, PerplexityBot, Googlebot | Powers real-time AI search answers | Generally no |
Blocking a training crawler means your content won’t be used to train future model versions — a legitimate choice some hotels make for content ownership reasons. Blocking a search crawler means your hotel disappears from ChatGPT and Perplexity answers entirely.
The 0.2% of hotels blocking search bots while allowing training bots have it exactly backwards. They’re donating their content to train AI models while getting zero visibility in return.
The default is not neutral
A robots.txt with no AI rules doesn’t mean “do nothing.” It means “allow everything” — training crawlers included. If you have no AI rules today, you are actively opted in to training data collection. That may be fine. But it should be a decision, not an oversight.
What to do about it
1. Add explicit rules for AI search crawlers.
At minimum, explicitly allow the search crawlers that power AI answers. This isn’t about gaming anything — it’s confirming that these bots are welcome to read your site and include you in answers.
# Allow AI search crawlers — these power ChatGPT, Perplexity, Google AI Mode User-agent: OAI-SearchBot Allow: / User-agent: PerplexityBot Allow: / User-agent: Googlebot Allow: / # Training crawlers — block if you prefer not to be in future training data User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: CCBot Disallow: / Sitemap: https://yourdomain.com/sitemap.xml
2. Make a conscious decision about training bots.
There is no universally right answer. If you want your content to influence how future AI models understand your property, allow training bots. If you prefer to keep control over your content, block them. Both are defensible positions — what isn’t defensible is not having a position.
3. Always include your Sitemap directive.
robots.txt is also where crawlers discover your sitemap. If the Sitemap directive is missing, some crawlers won’t find all your pages. This affects both traditional search engines and AI crawlers.
The evidence
Finding 1 — Overall blocking rates
Share of hotels with AI crawler rules
104,214 hotel websites parsed · April 2026
Bars are shown proportional to each other. The 95.9% bar is scaled to 100% for readability; all other bars reflect their true relative size.
Finding 2 — Which bots are blocked most
Among hotels with any AI crawler rules, training bots dominate. The gap between training bot blocking rates and search bot rates is consistent: roughly 2.5× across all three training bot categories.
Bot-specific blocking rates — share of all 104,214 hotels
Training bots vs. search bots
OAI-SearchBot and PerplexityBot are grouped as their blocking rates were nearly identical (~1.3% each).
2.5×
How much more often training bots are blocked vs. search bots. Average training bot blocking rate: 3.33%. Search bot blocking rate: ~1.3%. Hotels are more protective of training data than they are concerned about AI search visibility.
Finding 3 — Country breakdown
Country-level variation is mostly explained by chain-level decisions, not individual hotel choices. France’s outlier status collapses entirely once a single chain is excluded.
Share of hotels with any AI crawler rule, by country
104,214 hotels across 7 countries
France drops to 2.3% when a single chain (~970 properties) with a coordinated block policy is excluded \u2014 matching the US rate exactly.
Frequently asked questions
Yes, by default. A robots.txt with no rules — or no robots.txt at all — is an open invitation to all crawlers. If you want to prevent AI training bots from indexing your site, you need to explicitly add Disallow rules for GPTBot, Google-Extended, and CCBot.
It depends on which bot ChatGPT uses for a given query. GPTBot is the training crawler — blocking it affects what future model versions learn about you. OAI-SearchBot is the live search crawler — blocking it removes you from real-time ChatGPT answers. They are different bots with different functions.
Yes. Google AI Mode uses the same Googlebot infrastructure as traditional search. Allowing Googlebot covers both traditional Google Search results and Google AI Mode recommendations.
There is no universal answer. Blocking training crawlers means future AI models won’t learn from your content directly — which could reduce how naturally AI engines describe your property over time. Allowing them means your content contributes to model training, which may improve how AI understands and recommends you. Both are legitimate positions.
Visit yourdomain.com/robots.txt in any browser. If you get a 404, you have no robots.txt and all crawlers have full access by default. If you see content, check whether any of the AI crawler user-agents listed in this article appear.
How we ran the study
104,214
Hotels parsed
7
Countries
14
AI crawlers checked
10s
Request timeout
We fetched and parsed robots.txt from 104,214 reachable hotel websites with a 10-second timeout per request. Hotel URLs were sourced across seven countries: United States, United Kingdom, France, Germany, Spain, Italy, and the Netherlands.
For each file, we checked for 14 known AI crawler user-agents (training and search) plus traditional search engine bots. We classified each hotel into: no AI rules, blocks training only, blocks search only, blocks all, or blocks all comprehensively.
Limitations. robots.txt compliance is voluntary — well-behaved bots respect it, but not all crawlers do. Our analysis covers stated policy, not actual crawler behavior. Hotels with no robots.txt file were counted as “no AI-specific rules.”
Want to know if AI engines can actually find your hotel?
Huxo’s AI Visibility Report audits your hotel across ChatGPT, Perplexity, and Google AI Mode — including a full crawlability check and robots.txt analysis.