AI bots ignore LLMS.txt but scan the internet at scale – 2 studies, 1 conclusion
For some time now, there has been discussion in the marketing industry about the need to structure content for language models. The proposed solution was the llms.txt file, which is intended to act as a guide for AI bots, providing them with clean, easy-to-process context about a given page in Markdown format.

Google Trends – llms.txt popularity in Google
The following set of data comes from Google Trends 1
The Google Trends data shows spikes in interest, with a global peak in March 2026 and a sharp (though short-lived) surge in Poland at the end of 2025. However, these impressive charts may be nothing more than a fleeting curiosity. In this case, the popularity chart is not a measure of the technology’s success, but merely a record of collective hope and hype that has absolutely nothing to do with the interest of AI giants.
This is my second study, and this time I approached it more comprehensively in order to dispel any doubts – if anyone had them – the llms.txt file makes absolutely no sense.
First study
I conducted the first study in the period 13.05.2025 – 01.09.2025 and the statistics look as follows
- Dataprovider – 1582x
- Some custom ones – 1332x
- Regular user – 11x
- python-requests – 10x
- Screaming Frog – 8x
- Fake Googlebot – 2x
- Semrush – 2x
I wrote about it on my LinkedIn (click here and expand) [PL]
I analyzed server logs from the last 191 days, covering ~900 domains. The data comes from the period 04.09.2025 – 13.04.2026, that is, from the beginning of September 2025 to mid-April 2026.
llms.txt is a proposed standard that no one uses
I started by checking how often files related to the new standard are requested, namely:
- /llms.txt
- /llms-full.txt
- /llms-ctx.txt
Over more than half a year – let me remind you, across ~900 domains – I recorded only 1227 requests for these files (on average about 6 requests per day). This traffic concerned 107 domains. The most frequent path was the standard /llms.txt, which had as many as 1215 requests.
| File / path | Number of requests |
|---|---|
| /llms.txt | 1215 |
| /llms-full.txt | 9 |
| /docs/llms.txt | 1 |
| /api/llms.txt | 1 |
| /.well-known/llms.txt | 1 |
Who is requesting these files?
Among the requesters there was not a single real AI bot. Instead of giants training their models (such as OpenAI, Anthropic, or Google), llms.txt is mainly of interest to:
- Data aggregators and scanners – Dataprovider.com was responsible for the lion’s share of traffic (794 requests). There is also activity from tools such as AI-Security-Scanner, ReconTool, and SiteAuditBot.
- People – Chrome (392 requests) and Firefox indicate that it was most likely administrators, researchers, or SEO auditors manually checking for the presence of this file on servers.
- Simple scripts –
llmstxtcrawlerorrobots-ai-permissions, which based on the User-Agent turned out to be a Python script
Requester details
| Client / Bot | Number of requests | Type / Purpose |
|---|---|---|
| Dataprovider | 794 | Data aggregator / Analytical crawler |
| Chrome | 392 | Web browser (human/script) |
| llmstxtcrawler | 12 | Script dedicated to scanning llms.txt |
| AI-Security-Scanner | 8 | Security scanner |
| ReconTool | 5 | Audit tool |
| SiteAuditBot | 5 | Bot Semrush |
| Googlebot (fake) | 4 | Impersonating Googlebot |
| Firefox | 3 | Web browser (human) |
| robots-ai-permissions | 2 | Script (Python) |
| DomainShield | 1 | Protection tool |
| Bingbot | 1 | Search engine crawler (Microsoft) |
| TOTAL | 1227 |
Daily trend and hourly distribution
The request trend charts confirm that we are mainly dealing with mechanical, automated scans here. The traffic is small (peaks reach only 20-25 requests per day), and the hourly distribution is fairly flat and even throughout the day. There is no trace here of organic, massive interest from LLM crawlers.


Real AI traffic, or 45 million requests in the background
Someone might argue that AI bots do not visit the sites on which I conducted the study at all. Well, while llms.txt collected just over a thousand requests, the overall traffic from bots associated with AI amounted to nearly 45 million requests during the same time! Yes, to be precise, 44,996,657 – that is exactly how many times AI of various kinds scanned the sites during the analyzed period. I identified a total of 88 unique bots, which gives an astronomical average of over half a million requests per bot.

So who consumes the most resources?
1. OpenAI
Looking at the breakdown by company, OpenAI is the absolute leader. It generates over 25% of all AI traffic in my study (more than 11.5 million requests). This is driven by bots such as GPTBot (almost 8.8 million requests – number 1 in the overall ranking), OAI-SearchBot, and ChatGPT-User.
2. Anthropic
In second place is Anthropic (the creators of Claude) with just under 6 million requests, mainly due to the aggressive ClaudeBot.
3. PetalBot
In third place in the Top 15 bots ranking, a massive bar in second place stands out – PetalBot. With a result of nearly 8.3 million. PetalBot is a crawler belonging to Huawei (linked to its Petal Search engine and AI development). It is worth keeping this in mind, because it is often accused by administrators of very aggressive behavior and overloading servers.
4. Big tech is not far behind
Meta is responsible for nearly 3 million requests (meta-externalagent), and the top group also includes Amazon’s bot (Amazonbot with 4.3 million) and Apple’s (Applebot with 2.5 million).
5. Google scans too!
Google also has its share, although it is low, at just under 170 thousand requests (e.g. GoogleOther, Google-NotebookLM, Gemini-Deep-Research). This is probably because Google may largely use data gathered earlier by the main Googlebot to train its models (which in fact is not a pure AI crawler).
A collective look at big tech
A collective look at the tech giants leaves no illusions about who is downloading the most data from our sites:
| LLM creator / Organization | Total number of requests | Share of total traffic |
|---|---|---|
| Other (remaining bots) | 24,444,255 | ~54,3% |
| OpenAI (ChatGPT) | 11,521,228 | ~25,6% |
| Anthropic (Claude) | 5,923,626 | ~13,2% |
| Meta (Llama) | 2,939,423 | ~6,5% |
| Google (Gemini) | 168,125 | ~0,4% |
TOP15 AI crawlers
Here is the ranking of the 15 greediest AI crawlers I identified in the logs (based on an analysis of nearly 45 million requests):

| Place | Bot name | Total number of requests |
|---|---|---|
| 1 | GPTBot (OpenAI) | 8,798,505 |
| 2 | PetalBot (Huawei) | 8,291,994 |
| 3 | ClaudeBot (Anthropic) | 5,921,228 |
| 4 | Amazonbot (Amazon) | 4,361,437 |
| 5 | Applebot (Apple) | 2,597,117 |
| 6 | LinkupBot | 2,462,636 |
| 7 | meta-externalagent (Meta) | 2,331,582 |
| 8 | IbouBot | 1,719,613 |
| 9 | OAI-SearchBot (OpenAI) | 1,457,764 |
| 10 | LCC | 1,403,196 |
| 11 | ChatGPT-User (OpenAI) | 1,264,907 |
| 12 | Bytespider (ByteDance/TikTok) | 1129,001 |
| 13 | TerraCotta | 550,077 |
| 14 | Awario | 510,164 |
| 15 | spider | 354,905 |
Summary and conclusions
My study based on server data debunks (at least as of the publication date) the myth of the usefulness of lms.txt. Despite the huge and constantly growing traffic from AI bots, the technology giants have not widely implemented reading of this standard. They prefer to render and analyze the full HTML code „the old way”.
What does this mean in practice?
- Do not waste your time – creating and maintaining
llms.txtfiles is currently art for art’s sake. Check your site technically and make sure the most important content is not presented with JavaScript. AI do not render JavaScript, so content may be invisible to them. - Monitor logs – your servers are probably constantly being bombarded by
GPTBot,PetalBot, andClaudeBot. Server logs are a huge source of knowledge about who visits your sites, including Googlebot. - Manage access – if you notice performance drops on your server, instead of creating useless but structured guides for AI, consider managing their traffic in a traditional
robots.txtfile or completely blocking the most resource-hungry crawlers, of course if you do not see any benefit from being in their training datasets 😉
The llms.txt file is nothing more than a curiosity, which I think everyone scans except the real AI bots and Googlebot (and if it did scan it, then it must have found it, after all text files are on the list that Googlebot indexes 2 but it does not get there on its own)
- Google Trends is a free tool by Google that shows how often specific queries are entered into the search engine, presenting the relative popularity of topics on a chart on a scale from 0 to 100. Link: https://trends.google.com/ ↩︎
- File types indexed by Google, https://developers.google.com/search/docs/crawling-indexing/indexable-file-types ↩︎
