METHODOLOGY · §M

How AI Picks works.

§01

What you're looking at

Every AI Picks page is generated from real AI responses. We put the same buyer-style questions to each major AI model, per industry, and we do it on a schedule so the picture stays fresh. What you see is the consensus, the dissent, and the sources each model leaned on.

§02

The weekly council

Every week we run a fixed set of reference prompts in each industry across Claude, GPT, Gemini, Perplexity, DeepSeek and Grok. Prompts are phrased the way a real buyer would phrase them ("What are the best CRMs for a 5-person sales team?"), answered independently by each model, with fresh context. No leading, no priming.

§03

The daily watch (for Whaily customers)

Customers on Whaily run their own prompts every day to track how their brand shows up. Those responses feed a per-industry signal that moves as the models change their minds. Individual prompt text and per-brand breakdowns are never published. Only the aggregate is.

§04

The Recommendation Score

For each tool in a category:

recommendation_score = mentions_of_tool / total_responses_in_industry

Computed over a rolling 90-day window across all responses tagged to the industry. The model breakdown shown below the score is the same calculation, scoped to a single model.

§05

The buyer view

On AI Picks, the same signal is consolidated into a side-by-side view: which tools each model recommends, where they agree, where they don't, and what they cite to back it up. It's designed to be useful to an actual buyer making an actual purchase decision, not just interesting to read.

§06

Refresh cadence

Daily. Customer-prompt aggregation snapshots refresh.
Weekly (Sundays). The reference council re-runs across all models.

§07

Model coverage

At publication: Anthropic Claude Sonnet 4.6, OpenAI GPT-4o, Google Gemini 2.5 Pro, Perplexity Sonar Large, DeepSeek R1, xAI Grok. We add new models as they're released and retire ones that disappear.

§08

Cold-start

When an industry doesn't yet have enough customer signal (fewer than 10 responses or 2 brands), we publish only the reference-council baseline and clearly label it as such.

§09

What we never publish

Individual customer prompt text (it can leak brand context).
Per-org breakdowns or any way to tell which brand runs which prompts.
Internal notes, comments, or billing data.