How to tell if AI agents are reading your B2B website.

A red, orange and blue "S" - Salespeak Images

How to tell if AI agents are reading your B2B website.

Omer Gotlieb Cofounder and CEO - Salespeak Images
Omer Gotlieb
6 min read
May 16, 2026

How to tell if AI agents are reading your B2B website.

Someone on your team noticed traffic doing something strange. A spike GA4 can't explain. User agents in the server log nobody recognizes. Claude and GPTBot showing up where they didn't a year ago. Before anyone decides whether that's a problem, you need to actually classify it. This is the tactical version: how to find AI agents in the logs you already have, in an afternoon, without buying anything.

Why your normal analytics won't tell you

GA4 and most marketing analytics run on JavaScript. They count sessions where a browser executed a tracking script. Almost no AI agent executes that script. So the agent traffic that matters most to a B2B company is the traffic your marketing dashboard is structurally blind to.

That gap is the whole problem. Your CEO asks what AI is doing to the site, and the dashboard you'd normally open was built for a web where every visitor was a human with a browser. It quietly drops the fastest-growing audience you have. To see agents, you have to look at the layer underneath: raw server logs, CDN logs, or a log-based analytics view.

The four signals, in order of reliability

No single signal is conclusive. Stack them and the picture gets clear fast.

1. User agent strings

This is the first thing to grep for, and the easiest to fake, so treat it as a starting point and not a verdict. The named agents worth filtering for:

  • GPTBot, OAI-SearchBot, ChatGPT-User (OpenAI)
  • ClaudeBot, Claude-User, anthropic-ai (Anthropic)
  • PerplexityBot, Perplexity-User (Perplexity)
  • Google-Extended, GoogleOther (Google's AI training and Gemini fetches, distinct from Googlebot)
  • Bytespider, Amazonbot, Applebot-Extended, meta-externalagent (the long tail)

One distinction matters more than the rest. Names ending in -Bot are usually crawlers building a training corpus or a search index. Names containing -User are agents fetching a page right now because a human asked a question this minute. The second group is buyer traffic in real time. If you only look at one line in your log, look at the ratio between those two.

2. Published IP ranges

OpenAI, Anthropic, Perplexity, and Google all publish the IP ranges their crawlers operate from. Cross-check the source IP of anything claiming to be GPTBot against OpenAI's published list. A request with the right user agent from the wrong IP is a scraper wearing a costume. A request with a matching IP is the real thing. This single check separates legitimate buyer agents from impostors better than anything else you can do for free.

3. Request patterns

Agents and humans browse differently, and agents and scrapers browse differently from each other.

  • Buyer agents pull a small number of pages per session, often deep ones: pricing, security, integrations, comparison pages, docs. They arrive, extract what they need, and leave. No mouse movement, no scroll events, sub-second dwell.
  • Training crawlers move methodically across your whole sitemap over hours or days.
  • Threat bots either crawl exhaustively or hammer one endpoint, and they probe auth-gated and admin paths that a buyer agent skips.

Across Salespeak's customer base, 94% of AI agent visits target deep content pages, not the homepage. If your spike is concentrated on pricing and product pages, that's a buyer-research signature, not an attack.

4. JavaScript and robots behavior

Most agents don't render JavaScript and do respect robots.txt. If the traffic is requesting raw HTML, skipping your JS bundles, and honoring your crawl directives, it's behaving like a legitimate agent. If it's ignoring robots.txt and poking at login pages, it isn't.

A 30-minute version you can run today

  1. Pull one week of raw access logs from your web server or CDN (Cloudflare, Fastly, CloudFront all expose these).
  2. Filter rows where the user agent matches the names in signal 1. That's your candidate set.
  3. For the top sources, spot-check source IPs against the providers' published ranges. Drop the mismatches.
  4. Group what's left by URL. Look at how concentrated it is on deep pages.
  5. Split the -User agents from the -Bot agents. The first number is your live buyer-agent traffic. That's the one to report upward.

If you'd rather not do the log work by hand, our free tool at isyourwebsiteready.ai runs the classification for you. Point it at your domain and it reports which AI agents are reaching your site and what they can actually read, no log access required. Running the manual pass once is still worth it, because the number usually surprises people, but the free tool is the faster way to a baseline.

What the numbers should look like

Over the past 30 days, Salespeak tracked 640,000 AI agent visits across our customer base. A typical B2B site sees somewhere between 750 and 4,000 AI page fetches a day, scaling with how much depth and category-relevant content it has published. 91% of those visits trace back to ChatGPT's infrastructure.

So a rough sanity check: if your site publishes real product, pricing, and technical content and you're seeing fewer than a few hundred agent fetches a day, the likely story isn't that agents aren't interested. It's that something is blocking them, or your content is thin enough that they have little to read. Either way, that's the finding to chase next.

What to do once you've found them

Finding the traffic is step one. The mistake to avoid is treating the answer as binary, block or allow. Three categories need three different reactions:

  • Threat bots: block, with normal WAF rules. Nothing new here.
  • Search and training crawlers: allow. They're how you stay findable and how you get represented in AI answers at all.
  • Buyer agents: allow, and go further. These are humans researching your company through an assistant. Make sure they can read your deep pages without a JavaScript wall, and start capturing what they ask. The questions an agent brings to your site are buyer-intent data most CRMs never see.

One worry comes up a lot, especially from security-minded teams: is serving agents well a cloaking risk? It isn't, as long as the facts are the same. Cloaking is showing search engines different claims than humans see. Giving an agent a clean, structured version of the same true information is just good infrastructure, the same way you already serve a mobile layout and a desktop layout of one truth.

The detection pass is the cheap part. What it buys you is an honest answer to the question your leadership is already asking, and a baseline you can measure against once you start doing something about it. If you do one thing this week, run your domain through isyourwebsiteready.ai and read what comes back. For the strategic side of this, the next read is below.

Related reading

No items found.