← All posts

The 12 signals AI search engines look for (and most sites miss)

ChatGPT, Claude, Perplexity, and Google’s AI Overviews don’t read your site the way Google’s classic crawler does. They look for different signals — and most SEO tools never check for them. If you’ve been told your site is “SEO-perfect” but you’re still invisible in AI answers, this is why.

Here are the 12 signals that actually move the needle for AI search visibility, ordered roughly from biggest impact to smallest. AEO Radar checks all of them on every scan.

The four AEO-specific signals

These are the ones every traditional SEO tool ignores.

1. llms.txt at the root

A markdown file at https://yoursite.com/llms.txt that tells AI crawlers — in their native format — what your site is, what it does, and what’s worth quoting. Think of it as a press kit for language models.

It doesn’t replace robots.txt. It supplements it. The major AI crawlers don’t strictly require it yet, but they’re starting to read it, and Anthropic, OpenAI, and the llmstxt.org working group all reference it.

A minimal llms.txt:

# Acme Inc.

A concise summary of what Acme does, in plain English.

URL: https://acme.com
Contact: [email protected]

## Products
- Widget — what it does, who it's for
- Gadget — what it does, who it's for

## Documentation
- API reference: https://acme.com/docs/api
- Quickstart: https://acme.com/docs/start

Ship one. It’s 50 lines of markdown.

The Spawning consortium’s standard for opting in or out of AI training data collection. Lives at https://yoursite.com/ai.txt. Even if you don’t have strong feelings about training data, having one is a clear signal to AI crawlers that you’ve thought about this — which makes them more likely to respect your other rules.

3. Explicit AI crawler rules in robots.txt

The default User-agent: * covers Googlebot. It does not cover GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, Applebot-Extended, or CCBot — at least not the way most operators expect.

Be explicit. A modern robots.txt should have a section per major AI bot, even if you’re allowing all of them:

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

Sites that don’t address these crawlers are operating on defaults — and the defaults differ between bots. Some assume permissive, some assume restrictive. Don’t leave it to chance.

4. FAQPage JSON-LD schema

This one is a citation magnet for AI Overviews and Perplexity. When AI engines need to answer a question, they look for content that’s already structured as Question + Answer pairs. A blog post with three FAQ entries at the bottom routinely outperforms a 2000-word essay on the same topic, because the schema makes the quotable bits explicit.

Aim for 40–60 words per answer. Long enough to be substantive, short enough to fit in an AI summary card.

The four “JSON-LD foundations” AI engines need

If you only do four schema blocks, do these.

5. Organization schema

Tells AI who you are. Without it, AI engines have to guess from your <title> tag and footer copy. Include name, url, logo, sameAs (your social profiles), and contactPoint.

6. WebSite or WebApplication schema

Tells AI what your site does. WebSite for content sites; WebApplication for tools and SaaS. Include name, url, description, and (for tools) applicationCategory.

7. Author / Person schema on content pages

E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness) is no longer just a Google ranking factor — it’s an AI citation factor. AI engines are far more likely to quote a page that has a named author with verifiable credentials than an anonymous one. Include name, url (link to a real bio page), and sameAs (LinkedIn, Twitter, etc.).

8. BreadcrumbList schema on deep pages

Helps AI understand site structure and which pages are top-level vs. deep. Easy to ship, often skipped.

The four classic-SEO basics that AI still cares about

These are the boring ones, but they still account for a lot. Skip them and the schema work above doesn’t help.

9. A real <title> tag

50–60 characters. Includes the primary keyword. Unique per page. Yes, AI engines still parse the title first — it’s the strongest single signal of what the page is about.

10. A meta description that summarizes

140–160 characters. Written for humans, not stuffed with keywords. AI engines extract this as the canonical summary when they cite your page.

11. Single <h1> per page

One H1, matching the title’s intent. Multiple H1s confuse AI parsers — they don’t know which is the page’s main topic.

12. A canonical URL

Self-referencing, absolute, on every page. Without it, AI engines treat parameterized URLs (?utm_source=…) as separate pages and split their authority.

What about page speed and security headers?

They matter for ranking and user trust, but they don’t directly drive AI citations. AEO Radar still checks them — a fast, secure site is a precondition for AI engines bothering to crawl deeply — but the four AEO-specific signals above are where most sites are losing the game.

FAQ

Do I need all 12 to rank in AI search?

No. The four AEO-specific signals (llms.txt, ai.txt, AI crawler rules, FAQPage schema) and the JSON-LD foundations (Organization, WebSite) cover most of the gap. The classic-SEO basics are table stakes — fix them, but they alone won’t differentiate you in AI answers.

Will adding llms.txt slow down my site?

No. It’s a static text file the size of a tweet. It’s served in milliseconds and AI crawlers fetch it at most once per crawl session.

Is FAQPage schema spam?

Only if you fake it. Real questions with real answers get cited. Fake FAQs designed only to game schema get downranked, both by Google and by AI engines that have learned to detect the pattern.

How do I check what my site is missing?

Run a free scan at aeoradar.io. It tests all 12 signals above, plus 17 more, and gives you a per-check breakdown of what’s working and what’s not.


If you got value from this, the AEO Radar scanner checks every signal in this post — free, no signup, 30 seconds.

By Clinton Patrick