WhailyWhaily
All posts

How to audit your brand's AI visibility in under an hour

A step-by-step process for checking where your brand stands across ChatGPT, Gemini, Perplexity, and Claude without needing specialized tools.

Abstract visualization of a systematic brand audit process

Most brand and marketing teams have no idea how their brand performs in AI search. They might have run a few casual ChatGPT queries at some point. They have a rough impression. But they don't have data.

This guide changes that. In about 60 minutes, working entirely with free tools and a spreadsheet, you can produce a structured baseline picture of your brand's AI visibility across the four models that matter most. It won't replace systematic tracking over time, but it gives you something concrete to work with immediately.

Before you start: what you're measuring

An AI visibility audit answers four questions about your brand.

First: does your brand appear at all when users ask about your product category? Second: when it does appear, where does it sit in the response and how is it framed? Third: which models mention you and which don't? Fourth: for which types of queries are you present and for which are you absent?

Those four questions map to the four columns you'll track in your scoring sheet. Keep them in mind as you design your query set, because every choice you make in the next step shapes how well your answers hold up.

Step 1: Build your query set (10 minutes)

Pick 8 to 10 queries. Not 3. Not 20. Eight to ten is enough to produce patterns without taking the rest of your day.

The queries should represent how a real buyer would ask about your category, not how your marketing team would describe it. Buyers ask conversationally. They ask about their situation, not your product features. A useful heuristic: would a friend asking for a recommendation actually phrase it this way?

Good query types to include:

Category-level queries target your space generically. "What's the best project management tool for a remote engineering team?" "Which CRM should a 50-person sales team use?" These are the highest-volume, most competitive queries. Your presence here signals category authority.

Use-case queries target specific scenarios within your category. "What's the best tool for tracking AI search visibility?" "Which platform should I use to monitor brand mentions in AI models?" These tend to be longer-tail and less contested, but they may be closer to the actual questions your buyers are asking.

Comparison queries name competitors. "Whaily vs [competitor]" or "what's better for AI visibility tracking, [brand A] or [brand B]?" These surface how AI models understand your competitive position.

Intent queries surface what a buyer asks just before a purchase decision. "How do I start tracking my brand in AI search?" "What tools do teams use to measure AI visibility?" Position in this type of query is often where deals start.

Write your 8 to 10 queries in a column in your spreadsheet before moving to the next step. Do not run any queries yet.

Spreadsheet template with columns for query, ChatGPT result, Gemini result, Perplexity result, Claude result, and scoring notes
A simple spreadsheet is enough for a first-pass audit. Each row is one query; each column is one model.

Step 2: Run the queries (20 minutes)

Open four browser tabs, one for each model: ChatGPT (chat.openai.com), Google Gemini (gemini.google.com), Perplexity (perplexity.ai), and Claude (claude.ai).

For each of your 8 to 10 queries, paste it into all four models and record what you get. A few things to keep consistent:

Start each session fresh. Don't run all 10 queries in one ChatGPT session and then start Gemini. The models build conversational context, which can bleed between queries. Either use separate browser windows or clear the conversation between queries.

Use the same query text in every model. Copy and paste, don't retype. Even small variations can produce different results, and you want the model to be the variable, not the query wording.

Don't ask follow-up questions. Record the response to the initial query only. Follow-up questions introduce a different variable.

For each response, record in your spreadsheet:

  • Whether your brand was mentioned (yes/no)
  • Position in the response (first named, second, third, not mentioned)
  • How the brand is framed (recommended, mentioned with caveats, mentioned as a comparison point, mentioned negatively)
  • Which competitors appeared

This takes roughly 2 to 3 minutes per query when you're moving efficiently.

Note

Run each query in a private/incognito window if your accounts have prior conversation history with these models. Previous conversations about your brand can influence responses in the same session, which would skew your audit results.

Step 3: Score your results (10 minutes)

With your raw data collected, apply a simple scoring framework to each cell in the matrix.

3 points: Brand mentioned in the top two positions, accurately described, recommended for the right use case. 2 points: Brand mentioned, not in top two, or framed with minor inaccuracies. 1 point: Brand mentioned but framed negatively, inaccurately, or as an explicit second choice. 0 points: Brand not mentioned.

Sum your scores by model (column totals) and by query type (row totals). Both dimensions matter.

Column totals tell you which models are your strongest and weakest channels. A high score on Perplexity and a low score on ChatGPT is different from the reverse. The population of users on each platform, and the query types each platform attracts, affect how much each gap costs you.

Row totals tell you which query types you win and which you lose. A brand that appears in category-level queries but disappears from use-case queries has a different problem than a brand that scores well on comparison queries but not on intent queries.

Maximum possible score per model is 30 (10 queries at 3 points each). A useful rough benchmark for a first audit: below 12 is poor, 12 to 20 is average, above 20 is strong. These thresholds will shift as you collect more data and refine your query set.

Step 4: Identify the gaps (10 minutes)

Gaps come in three forms.

Model gaps are queries where you score 0 across most or all phrasings on a specific model. This usually signals a training data or third-party coverage problem specific to that model's sources.

Query type gaps are queries where you consistently miss despite being present in other query types. A brand that shows up for category queries but disappears on use-case queries is often struggling with content specificity. The model has learned who you are but not what specific problems you solve.

Framing gaps are cases where you appear but are described in ways that don't match how you want to be positioned. Your brand might be cited accurately as a tool but described as expensive when that's not your positioning, or positioned for small businesses when you primarily serve enterprise.

Matrix showing brand scores across four AI models and four query type categories, with color coding to highlight gap areas
Color-coding your score matrix quickly reveals where your gaps cluster. Red zones are where to focus improvement efforts.

Step 5: Prioritize what to fix (10 minutes)

Not all gaps deserve equal attention. Prioritize based on two factors: query volume and fix difficulty.

High-volume queries in your primary product category matter most. If you're invisible on "best [category] tool" queries, that's your first priority, because those queries represent users who are actively deciding what to buy.

Fix difficulty is a function of gap type. Model gaps caused by thin review site coverage are fixable with a focused effort over two to three months. Framing gaps caused by outdated G2 reviews are fixable by soliciting fresh reviews. Gaps caused by competitors having substantially richer editorial coverage are a longer-term content and PR problem.

Write down three actions you will take based on the gaps you found. Being specific helps. "Improve G2 review volume with a customer request campaign in Q2" is actionable. "Improve AI visibility" is not.

Turning a one-time audit into ongoing measurement

A single audit gives you a baseline. It doesn't tell you whether you're improving or declining, whether model updates have changed your position, or whether a competitor just gained ground on queries you were winning.

That's where the manual process hits its limit. Running 8 to 10 queries across 4 models every week is time-consuming and inconsistent. Human reviewers introduce variability in scoring. And the models themselves change without notice, making it hard to know whether a shift in your scores reflects something you did or a model update.

Whaily automates the query-running, scoring, and trend tracking that this manual process only approximates. It's worth using if you're going to take AI visibility seriously as an ongoing measurement discipline rather than a one-time check.

AI Visibility Tracking

See where your brand stands in AI search

Track how ChatGPT, Gemini, Perplexity, and Claude recommend your brand vs competitors.

Start tracking free

FAQ

How often should I repeat this audit? Monthly is a reasonable cadence for manual audits. The AI model landscape shifts often enough that quarterly is too infrequent to catch meaningful changes. If you're running a major content or PR campaign, run an audit before and after to measure its effect.

Should I test with a logged-in account or logged out? Logged-in. Most of your users are logged in, and some models adjust responses based on account preferences. The goal is to simulate the experience of a real user, not a sanitized baseline.

What if my brand doesn't appear in any of the results? Start with third-party presence. The most common cause of complete absence in AI recommendations is a lack of coverage on sources the models weight heavily: G2, Capterra, Trustpilot, industry publications, and active forum discussions. A brand with a good website but thin external coverage often scores near zero.

Do I need to test every model version, like GPT-4o vs GPT-4o mini? For a first audit, no. Use the default model version each platform serves. Once you have a baseline, you can add model version as a variable if you want more granular data.

AI Visibility Tracking

See where your brand stands in AI search

Track how ChatGPT, Gemini, Perplexity, and Claude recommend your brand vs competitors.

Start tracking free

Keep reading

Abstract visualization showing the evolution from single search to multiple AI interfaces
Guide

From SEO to AEO: how the shift from Google's monopoly changes your strategy

10 min read
Abstract visualization of a structured measurement framework
Guide

Building an AI visibility measurement framework from scratch

9 min read