When marketing directors and founders ask ChatGPT that question, 43 of the 100 UK recruitment firms we tested never appear. See where you rank - or whether you rank at all.
| # | Firm | Score ↕ | Visibility | Mentions | Best speciality |
|---|
A low score doesn't mean your business is bad. It means ChatGPT can't find you. We help businesses fix that - structuring your content so AI can read it, building integrations so AI can transact through you, and automating the distribution so it compounds.
Email greg@rafiki.works →We ask ChatGPT 140 buyer-intent questions a marketing director, CEO or founder might genuinely ask when looking for a UK recruitment firm - across 14 specialities from tech to fractional, finance to creative. 10 questions per speciality, each run 3 times to capture variance. That's 420 responses analysed. If your firm gets mentioned, you score. If you never come up, you score zero. We don't take money to rank firms higher. We don't hide the queries - examples below. Run them yourself.
Each question is run 3 times because ChatGPT does not return the same answer every time. The full set of 140 prompts is available on request.
Every ChatGPT response is parsed for every firm named. The score is weighted heavily on appearance, then on prominence, then on sector relevance. No manual ranking, no editorial weighting, no paid promotion.
Firms that never appear in any response score zero and are flagged Invisible.
The "Mentions" column counts unique prompts a firm appeared in across all 14 specialities, not just one. Each speciality has 10 prompts, so the cap per speciality is 10. The cap across the whole benchmark is 140. Two real firms make the trade-off visible.
Neither pattern is better. A deep vertical signal tells buyers in that niche you're the answer. A broad horizontal signal tells buyers across many sectors you're credible. The score reflects both - appearance frequency (heaviest weight) plus cross-category strength.
ChatGPT does not return the same answer every time you ask the same question. We measured the variance across all 140 prompts. The numbers below are actual averages from our benchmark.
In plain English: about 5 of every 8 firms named are rock solid, and the rest shift between runs. That's why a single run isn't enough. Three runs gives us a reliable "always named" signal for the stable core, partial credit for firms named in 1 or 2 runs of 3, and a strong negative result for firms that never surface across 420 attempts. Five runs would tighten it further but you hit diminishing returns. Three is the sweet spot for cost versus signal.
AI visibility changes constantly - the model itself updates, web search results shift daily, and ChatGPT's answers vary by configuration. This benchmark used gpt-4o-search-preview via the OpenAI API with web search enabled - the closest API equivalent to ChatGPT.com's browsing experience. The underlying search backend the API uses may rank or cite sources slightly differently than what you'd see typing the same question into ChatGPT.com directly, so expect broadly similar but not byte-identical results. We refresh the full benchmark quarterly.
Talk to Rafiki Works about GEO content strategy, custom GPT actions, and AI distribution automation.