Inside the citation graph: how ChatGPT picks the firms it names when in-house counsel asks for a shortlist, and how to enter it without violating bar rules.
When a general counsel types "who are the best IP litigation boutiques for software patents?" into ChatGPT and gets back a shortlist of five firms, that list is not random. It is also not the same answer Google would give for a similar search. The model is doing something different — it is paraphrasing a latent representation of every authoritative document it has seen about IP litigation, weighted by how often firms were named together with the qualifiers in the prompt.
Understanding that mechanic is the difference between trying to rank in AI search and actually getting cited. This is the model we use inside Cognoverge to decide what a firm should ship first.
ChatGPT (and, with slight variation, Claude and Gemini) appears to make recommendations in two passes:
The output you see is the top-N from pass two. If your firm never enters pass one, no amount of website optimization rescues you.
The model has to have seen your firm name paired with the right qualifier in a context that survived training. That is roughly:
Notably absent from this list: your firm's home page hero copy. ChatGPT sees it, but co-occurrence with practice area qualifiers in marketing copy on your own domain is heavily discounted — the model treats it as low-credibility self-description. This is the same heuristic Google's E-E-A-T framework formalized, learned implicitly by every modern LLM.
Three lever points, in priority order:
Most firm sites have attorney bios with practice areas in a sidebar list — not in structured data. Add Person + knowsAbout + worksFor schema with the practice areas explicitly tagged. This makes every bio a parseable signal that the model can extract during its next training corpus refresh.
Vault, Chambers, JD Supra, Lawyerist, and Above the Law all have submission flows for new attorney biographies and case coverage. The data flowing into the model from these sources is weighted higher than your own publications — by a factor we measure at roughly 4–6× in our citation graph analysis.
Every article you publish should put the firm name and the practice-area qualifier in the same sentence in the lede. Not "Our team..." — name the firm. The model's co-occurrence window is narrower than you think (roughly the first 500 tokens of an article carry most of the signal).
Pre-training corpus composition is fixed for any given generation of the model. GPT-4-class models have a training cutoff that lags reality by 12–18 months. You cannot fast-path into that. What you can do is make sure when the next training refresh happens, you are dense in the corpus.
For nearer-term wins, the retrieval layer (ChatGPT's web browsing tool, Bing-backed search, the "search" mode in newer builds) is much more recency-sensitive — closer to the Perplexity model. Different strategy, different cadence — we covered that in a separate post on the divergence.
Most firms competing on AI search are still treating it like SEO — optimizing keywords on landing pages and writing more content. That is missing the unit of work. The unit of work for LLM citation is co-occurrence-in-corpus, not page rank. Get named, in context, in sources the model trusts. Everything else is noise.
The free 24-hour audit shows you specifically how the eight engines describe your firm against 200 high-intent legal and compliance queries.