Episode 4: GenAI and RAG Fail on Messy Language

 Season 5 Episode 4

Welcome to Season 5 of the Law Firm Data Governance Podcast. I’m CJ Anderson from Iron Carrot, helping law firms do more with their data by improving their data governance.   

This season, we’re levelling up law firm data from intake to insight. With clarity, confidence, and practical steps to move your firm’s data forward.   

In this episode, I’m talking about why clear, shared taxonomies are the backbone of reliable Generative AI and retrieval‑augmented generation (R.A.G.) in law firms. You’ll learn how to design a minimal viable taxonomy, embed it at intake, and measure AI‑readiness so pilots move from novelty to dependable business value.  

If you’ve been experimenting with generative AI or retrieval‑augmented generation in your firm, you’ve probably discovered a frustrating truth very quickly: the technology is impressive, but the outputs are inconsistent. You ask the same question twice and get two different answers. You search for “similar matters” and get a mixed bag that feels only vaguely relevant. And despite good intentions, lawyers start to lose confidence. 

What’s going on isn’t a model problem. It’s a language problem. 

In law firms, shared language lives in taxonomies — those controlled lists and definitions behind practice areas, matter types, sectors, phases, jurisdictions, and document classes. When those aren’t aligned, AI doesn’t fail loudly. It fails subtly. And subtle failure is the most dangerous kind, because it looks plausible while quietly eroding trust. 

Today, we’re going to talk about how to make your firm’s language machine‑ready, without boiling the ocean. We’ll focus on minimal viable taxonomy, how to embed it at intake, and how to measure whether it’s actually making your AI more reliable. 

Why common language is nonnegotiable for AI 

Let’s start with why taxonomy matters so much more in a GenAI and R.A.G. world than it ever did for simple reporting. 

Retrieval‑augmented generation depends on finding “like-for-like” content. That means embeddings need a reasonably consistent semantic signal. If one matter is called “Commercial Litigation,” another is “Disputes – General,” and a third is tagged free‑text as “High Court fight,” the system has to guess whether those things belong together. Sometimes it guesses well. Often, it doesn’t. 

The result is what many firms see in pilots: 
– Retrieval pulls documents that feel adjacent but not comparable 
– Summaries blend incompatible work types 
– Answers are confident but contextually wrong 

None of that is because GenAI is broken. It’s because the firm’s language is ambiguous. 

This is also where many firms get stuck. They assume fixing taxonomy means launching a massive data standards programme, with years of debate and a 200‑value drop‑down that no one wants to use. The irony is that AI needs less taxonomy than BI ever did — but it needs that small set to be consistent, defined, and enforced at the right moment. 

That’s where minimal viable taxonomy comes in. 

What “minimal viable taxonomy” actually means 

A minimal viable taxonomy is not a weak compromise. It’s a deliberate design choice. 

The goal is not to reflect every nuance of legal work. The goal is to differentiate work just enough to make retrieval and reasoning reliable. In practice, that often means starting with 15 to 20 controlled values per core dimension, not 150. 

For most firms, the core dimensions that matter for GenAI are:  

  • Matter type 
  • Practice or service line 
  • Sector or industry 
  • Jurisdiction 
  • Matter phase or status 
  • Document class 

Each value needs three things: 

  1. A short, unambiguous label 
  2. A plain‑English definition that lawyers agree with 
  3. A clear purpose (why this value exists and when it should be used) 

If you can’t explain why a particular value improves retrieval or reduces ambiguity, it probably doesn’t belong in version one. You can always add later, but you can’t govern what you never stabilised. 

A useful test is this: if two different lawyers tag the same matter, would they choose the same value most of the time? If the answer is no, the taxonomy is too complex or too vague. 

Designing taxonomy for how lawyers actually work 

One of the biggest mistakes firms make is designing taxonomy in isolation from workflow. 

Lawyers don’t wake up wanting to tag things. They want to open a matter, get on with the work, and serve the client. If taxonomy feels like extra admin, it will be bypassed, rushed, or ignored — and your AI will inherit that mess. 

Instead, taxonomy should feel like decision support. 

For example:
– Choosing a practice can automatically suggest a small, relevant list of matter types 
– Selecting a jurisdiction can default to firm‑approved abbreviations and codes 
– Choosing a template can pre‑classify document type 

This is where metadata shifts from “data governance” to “helpful structure.” You’re not asking people to think harder; you’re helping them make fewer choices. 

And critically, this needs to happen at intake and creation, not retroactively. Retagging later is expensive, error‑prone, and politically fraught. Getting it mostly right at the start quietly compounds value over time. 

Governing taxonomy without slowing the firm down 

Governance is where many well‑intentioned taxonomy efforts go to die. Either nothing is governed, and chaos reigns, or everything is governed centrally, and progress grinds to a halt. 

The best balance is federated governance with clear ownership. 

Practices should be able to propose changes, refinements, or new values — because language evolves. But approvals should happen on a predictable cadence, typically monthly, through a small cross‑functional group that understands both legal work and downstream systems. 

Two practical tips here: 

  1. Keep a visible change log with go‑live dates 
  2. Treat taxonomy changes as versioned releases, not silent edits 

This matters enormously for AI. Prompts, embeddings, and evaluation datasets all rely on stability. Silent taxonomy drift is one of the fastest ways to degrade model performance without anyone noticing why. 

Where taxonomy directly improves GenAI and RAG outcomes 

Let’s make this concrete. 

When matter types are consistent:  

  • Retrieval finds genuinely comparable matters, not just keyword overlaps 
  • Summaries are more coherent because inputs are similar in scope 
  • Answers can be scoped more precisely in prompts 

When document classes are reliable:  

  • Retrieval can prioritise precedents over correspondence 
  • Drafting tools can distinguish templates from executed documents 
  • Risk of hallucination is reduced because sources are more predictable 

When jurisdictions are standardised:  

  • Cross‑border work stops collapsing into generic advice 
  • AI can be constrained to the right legal context 
  • Confidence increases because boundaries are clearer 

None of this requires sophisticated modelling. It requires shared language. 

Measuring whether your taxonomy is AIready 

One of the most useful shifts firms can make is moving from “we think this is better” to “we can show this is working.” 

There are three lightweight measures that link taxonomy directly to GenAI performance. 

First, tagging coverage. What percentage of new matters and documents actually include the required ‘minimum viable taxonomy’ fields? If coverage is low, fix the workflow before blaming the taxonomy. 

Second, crosssystem match rates. Do matter types and sectors align between PMS, DMS, CRM, and Deal Rooms? Mismatches are a silent killer of retrieval quality. 

Third, R.A.G. precision and recall on a defined use case. Take a small test set and ask: are we retrieving the right things, and are we missing anything critical? 

Some firms also add a simple confidence indicator to AI outputs, based on metadata completeness. That transparency helps lawyers understand when the system is operating inside its comfort zone, and when it isn’t. 

Example minimal viable taxonomy starting points 

To make this practical, here’s what “small but useful” often looks like in real firms. 

Matter type: 
10 to 12 values that reflect meaningful differences in work, not internal accounting quirks. 

Matter phase or status: 
5 to 7 values that help distinguish advisory, transactional, disputes, (or open versus completed work). 

Sector or industry: 
10 to 12 values aligned to how clients buy, not how marketing organises PDFs. 

Jurisdiction: 
ISO  country codes (or noslegal jurisdiction codes) as a base, with firm‑specific additions only where they add real value. 

Document class: 
5 to 7 values that separate precedents, advice, filings, correspondence, and final executed documents. 

The discipline is not in deciding what to include. It’s in deciding what to leave out until you can prove it matters. 

Common failure modes to avoid 

Before I finish up, it’s worth calling out a few patterns I see repeatedly. 

First, over‑engineering too early. A perfect taxonomy that no one uses is strictly worse than a simple one that’s adopted. 

Second, retrofitting everything to legacy data. AI works best on what you do next, not what you did ten years ago. Prioritise forward motion. 

Third, treating taxonomy as an IT artefact. This is shared language. If lawyers don’t recognise it, AI won’t fix that disconnect. 

And finally, forgetting to link taxonomy decisions back to value. Every field and value should be defensible in terms of retrieval quality, risk reduction, or time saved. 

Bringing it all together 

If there’s one takeaway from this episode, it’s this: GenAI doesn’t need your firm to be perfectly structured. It needs your firm to be consistently understandable

Start with one AI pilot. Work backwards and ask: what language must be consistent for this to work reliably? That becomes your minimal viable taxonomy. Embed it at intake, govern it lightly but visibly, and measure its impact on real outputs. 

When the language is right, the technology stops feeling magical — and starts feeling dependable.  

Thank you for joining me for this Law Firm Data Governance Podcast episode.  

Outro CTA 
Want to see where your firm stands today and what to prioritise next? Download the Law Firm Data Governance Maturity Benchmark 2025 at IronCarrot.com — or drop me a note and I’ll send you the report and a one‑page action checklist. 

If we haven’t connected yet, follow me on LinkedIn for weekly law firm data governance tips, benchmark insights, and episode updates — you’ll find the link in the show notes (search “CJ Anderson Iron Carrot”). 

Don’t forget to subscribe so you don’t miss any of this season’s insights. Or head over to Iron Carrot.com to get in touch with your questions and ideas for future episodes. 

Links to articles on IronCarrot.com 

• Iron Carrot – Building a Data River 

• Iron Carrot – 5 Principles of Law Firm Data Governance 

• Iron Carrot – Reasons Law Firms Need to Adopt Data Governance 

  • Do you want learn more about the podcast?
  • Are you curious about what’s coming up in future seasons?
  • Do you want to listen to the latest episode?

Answers to these questions and more can be found on the podcast page.