Engine deep-diveJune 14, 2026·9 min read

How ChatGPT decides which law firm to recommend

Inside the citation graph: how ChatGPT picks the firms it names when in-house counsel asks for a shortlist, and how to enter it without violating bar rules.

Pri Vora

Co-founder · Head of engineering

When a general counsel types "who are the best IP litigation boutiques for software patents?" into ChatGPT and gets back a shortlist of five firms, that list is not random. It is also not the same answer Google would give for a similar search. The model is doing something different — it is paraphrasing a latent representation of every authoritative document it has seen about IP litigation, weighted by how often firms were named together with the qualifiers in the prompt.

Understanding that mechanic is the difference between trying to rank in AI search and actually getting cited. This is the model we use inside Cognoverge to decide what a firm should ship first.

The two-pass shortlisting model

ChatGPT (and, with slight variation, Claude and Gemini) appears to make recommendations in two passes:

Candidate retrieval. The model gathers a plausibility-weighted set of named entities from its training data that co-occur with the query's qualifiers — "IP litigation," "boutique," "software patents." This set is larger than the final answer.
Authority filtering. The candidates are then re-ranked by a coarse authority signal — usually some combination of frequency of mention in high-weight sources (Chambers, Law360, American Lawyer), recency of those mentions, and the diversity of contexts in which the firm appears.

The output you see is the top-N from pass two. If your firm never enters pass one, no amount of website optimization rescues you.

What "enters pass one" actually means

The model has to have seen your firm name paired with the right qualifier in a context that survived training. That is roughly:

A directory entry that explicitly tags you with the practice area in a structured way (Chambers' practice tags, JD Supra's topic taxonomy, Vault's specialization labels).
Long-form articles where you are named alongside the qualifier — case writeups, journalistic profiles, panelist bios on industry events.
Your own published thought leadership where you put the qualifier in close proximity to the firm name in the first 200 words.

Notably absent from this list: your firm's home page hero copy. ChatGPT sees it, but co-occurrence with practice area qualifiers in marketing copy on your own domain is heavily discounted — the model treats it as low-credibility self-description. This is the same heuristic Google's E-E-A-T framework formalized, learned implicitly by every modern LLM.

What you can control on a 30-day horizon

Three lever points, in priority order:

1. Practice-area schema on every attorney bio page

Most firm sites have attorney bios with practice areas in a sidebar list — not in structured data. Add Person + knowsAbout + worksFor schema with the practice areas explicitly tagged. This makes every bio a parseable signal that the model can extract during its next training corpus refresh.

2. Get into the topic taxonomies that already feed LLMs

Vault, Chambers, JD Supra, Lawyerist, and Above the Law all have submission flows for new attorney biographies and case coverage. The data flowing into the model from these sources is weighted higher than your own publications — by a factor we measure at roughly 4–6× in our citation graph analysis.

3. Pair your firm name with the qualifier in published content

Every article you publish should put the firm name and the practice-area qualifier in the same sentence in the lede. Not "Our team..." — name the firm. The model's co-occurrence window is narrower than you think (roughly the first 500 tokens of an article carry most of the signal).

What you can't control (and shouldn't try to)

Pre-training corpus composition is fixed for any given generation of the model. GPT-4-class models have a training cutoff that lags reality by 12–18 months. You cannot fast-path into that. What you can do is make sure when the next training refresh happens, you are dense in the corpus.

For nearer-term wins, the retrieval layer (ChatGPT's web browsing tool, Bing-backed search, the "search" mode in newer builds) is much more recency-sensitive — closer to the Perplexity model. Different strategy, different cadence — we covered that in a separate post on the divergence.

The unfair-advantage move

Most firms competing on AI search are still treating it like SEO — optimizing keywords on landing pages and writing more content. That is missing the unit of work. The unit of work for LLM citation is co-occurrence-in-corpus, not page rank. Get named, in context, in sources the model trusts. Everything else is noise.

Keep reading.

All posts →

Engine deep-dive

Why your firm appears on Perplexity but not on ChatGPT

June 14, 20269 min

Run your firm against this method.

The free 24-hour audit shows you specifically how the eight engines describe your firm against 200 high-intent legal and compliance queries.

Run a free audit Join the waitlist

loading workspace…

All posts