All posts

How to audit your GEO performance in five steps (a working playbook)

A practical, repeatable GEO audit that takes one focused day and produces a defensible picture of where your brand stands in generative engines.

A five-step audit flow from query selection through source mapping to remediation list

A working GEO audit takes one day. Not one week, not one quarter. One day with two people, a shared doc, and access to four AI engines. The output is a defensible picture of where you stand and a prioritized list of fixes.

Most teams skip this because the alternatives sound more important. Buying a tool. Hiring an agency. Writing more content. None of those make sense without the audit first. You cannot improve what you have not measured.

This is the five-step process we run with customers. Steal it. Run it without us. Run it before you buy any tool.

Step 1: define the query set (one hour)

The audit lives or dies on the queries. Get this wrong and the rest is noise.

Pick fifteen to twenty queries that match how a real buyer in your category actually asks questions. Not how an SEO would phrase them. How a person in a hurry types into ChatGPT. Mix across three categories:

Category-definition queries (3-5 of the 15). Things like "what is [category]," "how does [category] work," "best practices for [category]." These tell you whether the model has heard of your brand at all in the broad context.

Evaluation queries (5-8 of the 15). Things like "best [category] tool for [buyer profile]," "what to look for when choosing a [category] vendor," "comparison of leading [category] options." These are the queries where buyers are deciding what to buy.

Brand-specific queries (3-5 of the 15). Things like "what is [your brand]," "[your brand] vs [main competitor]," "is [your brand] good for [use case]." These reveal how the model describes you when it has already been pointed at you.

Adjacent queries (1-2 of the 15). Things in nearby categories where your brand could plausibly surface but might not. These tell you about reach beyond your home territory.

Write the list down. Show it to a salesperson. If they cannot recognize buyer language in the list, the queries are too SEO-flavored. Rewrite.

Heads up

The single biggest reason GEO audits produce useless results is that the query set was written by the SEO team using SEO conventions. AI engines respond to natural conversational queries, not optimized keyword phrases.

Step 2: run the queries across engines (two hours)

For each query, run it once in each of: ChatGPT, Gemini, Perplexity, and Claude. Use clean sessions, no system prompts, no prior context. If your buyer uses a vertical AI engine specific to your industry, add it.

For each response, record:

  • The full response text (or a clean summary if it is long)
  • Whether your brand is mentioned (yes/no/partial)
  • The position your brand is mentioned in (first, middle, last, or context)
  • The framing your brand receives (one-sentence description from the response)
  • The cited sources, if shown
  • The competitors mentioned and their framing

This is tedious. Two people can do it in two hours if they share the load. One person tracking, one person running queries.

You now have 60-80 data points (15 queries × 4 engines). That is enough to draw conclusions.

Step 3: spot the patterns (one hour)

Pull the data into a spreadsheet. Group it.

By engine: how does your brand presence compare across ChatGPT, Gemini, Perplexity, and Claude? Variance here is the rule, not the exception. A 30-point difference between best and worst engine is normal and informative.

By query category: are you stronger in category-definition queries or evaluation queries? Strong in definition but weak in evaluation means the model knows you but does not consider you a buyer-worthy option. Strong in evaluation but weak in definition is the inverse, and is unusual.

By source: which third-party sites appear most often in citations? Across queries, a small set of dominant sources usually emerges. Write down the top five. These are the sources shaping the answers your buyers read.

By framing: what description does your brand consistently get? If the framing is consistent across engines, it reflects actual external coverage. If it varies wildly, the model is grasping. If it is consistently negative or off-positioning, you have a source-remediation problem.

Step 4: build the source map (two hours)

This is the step most teams skip. It is also the step that turns audit results into actions.

Take the top five sources you identified in step 3. For each, answer:

  • What is the site? Is it editorial (a publication), structural (Wikipedia, knowledge graphs), social/forum (Reddit, communities), review-based (G2, Capterra), or analyst-driven (Forrester, Gartner)?
  • Does your brand currently appear on this site? If yes, how prominently? If no, why not?
  • Is there an editorial path to coverage? PR contact, vendor profile claim, review collection, etc.
  • What is the typical lead time to influence this source? Days for a profile update, months for editorial coverage, years for analyst reports.

This gives you a source-by-source action plan. Each source has a different lever and a different timeline. Not all sources are worth pursuing. Some are too expensive relative to their citation share. Most have at least one realistic move.

The output of this step is a list of five sources with one realistic action per source. That is your remediation backlog.

Diagram showing how to map dominant citation sources to actionable remediation steps with cost and timeline
Each source has a different lever and a different timeline. The audit's value is in the action map.

Step 5: prioritize and decide what gets done (one hour)

You now have a list of issues and a list of source-level actions. The temptation is to do everything. Resist.

Rank the actions on three dimensions:

Leverage: how much would moving this source actually improve our visibility? A source cited in 60% of queries is worth more than one cited in 10%, all else equal.

Reachability: can our team or vendor actually move this source in the next quarter? Some sources are inaccessible (closed editorial properties, analyst reports with locked publication cycles). Some are very reachable (review sites where you control the vendor profile).

Cost: what does this realistically cost? Some sources require a one-time fix (claiming a vendor profile, updating outdated content). Some require sustained investment (PR programs, original research).

Pick two actions. Three at most. The first one should be highest-leverage and highest-reachability, even if it costs more. The second should be quickest, even if lower leverage. This combination gets you a meaningful win and a fast win in the same quarter.

The remaining actions go on the backlog. Re-audit in a quarter. Re-prioritize then.

AI Visibility Tracking

See where your brand stands in AI search

Track how ChatGPT, Gemini, Perplexity, and Claude recommend your brand vs competitors.

Start tracking free

What the output looks like

By end of day you should have:

  • A documented query set (you will reuse this every quarter)
  • A spreadsheet with 60-80 data points (presence, framing, citations across queries and engines)
  • A pattern analysis identifying your dominant failure mode
  • A source map for the top five sources shaping responses in your category
  • A prioritized list of two or three actions for the next quarter

Plus a clear answer to questions executives ask. "How are we doing in AI search?" is now a data-backed conversation, not a guess.

Common mistakes in the audit itself

A few patterns to avoid.

Too few queries. Five queries is anecdote. Twenty is measurement. The marginal value of going from 5 to 20 is huge; the marginal value of going from 20 to 100 is much smaller. If you have to economize, economize on the high end, not the low end.

Only one engine. Single-engine audits are misleading because cross-engine variance is large. Even if your buyer mainly uses ChatGPT, run the others. You will see things in Gemini or Perplexity that change your interpretation of the ChatGPT data.

One run per query. AI responses vary across runs of the same query. For high-stakes audits, run each query three times in each engine and look at the consistency. Single-run audits over-index on lucky or unlucky responses.

Skipping the source map. The patterns analysis tells you what is happening. The source map tells you why. Without the source map, the audit produces awareness without direction. Action requires both.

Treating the audit as one-time. The first audit is the baseline. The value compounds when you re-audit quarterly with the same query set and see what moved.

How Whaily fits in

The audit above is doable manually. We do not pretend you need a tool. What a tool gives you is consistency at scale and time savings.

If you are running an audit for a single brand in one category, the manual process works. Run it. See if the discipline is something your team can maintain.

If you are running audits across multiple brands or categories, or doing weekly tracking rather than quarterly snapshots, the volume becomes the bottleneck. That is when tooling pays for itself.

FAQ

Can a single person run this audit? Yes, but expect it to take a full day instead of half a day. The two-person version is mostly faster because of parallel query running and pattern discussion.

How often should I re-audit? Quarterly is the cadence for most teams. Monthly is overkill unless you are actively investing in changes and want to track them tightly. Yearly is too infrequent given how fast AI engine behavior moves.

What tools do I need for the manual version? A spreadsheet, four browser tabs, and a shared doc. No special tooling required.

What if my industry has very different AI engines (legal, medical, etc.)? Add the vertical engines to your list. The same five-step process applies. The query patterns will be different, but the methodology is the same.

AI Visibility Tracking

See where your brand stands in AI search

Track how ChatGPT, Gemini, Perplexity, and Claude recommend your brand vs competitors.

Start tracking free

Keep reading

Comparison grid showing eight AI visibility platforms with their strengths and gaps
Guide

The best AI visibility tools in 2026: an honest buyer's guide

9 min read
A magnifying glass over a brand silhouette with four labeled diagnostic paths
Guide

Why your brand is invisible in AI search (and the four fixes that actually move it)

10 min read