Season 1 Episode 4
In this episode of the Law Firm Data Governance Podcast, CJ Anderson begins at the beginning by talking about why you need both a data glossary and a data dictionary.
She’ll explain the differences between a glossary and a dictionary, what they are used for, and who is responsible for creating and maintaining them.
One of the biggest causes of data quality issues is a lack of understanding about what the data means. The data dictionary and data glossary try to solve that problem by clarifying what data is being entered in a particular field and what that data represents on a report or in use in another system.
While you may have multiple data dictionaries, you should have only one data glossary for your Firm.
Subscribe
Law Firm Data Governance Podcast
- Do you want learn more about the podcast?
- Are you curious about what’s coming up in future seasons?
- Do you want to listen to the latest episode?
Answers to these questions and more can be found on the podcast page.
Episode Transcript
Welcome to the law firm Data Governance Podcast, the Data Governance Companion for law firm leaders who want to know more about implementing and improving data governance each week, I’ll help you with your law firm’s data governance initiative by sharing something I’ve learned in my 20 past years of working with information and data in law firms.
In this episode, I’ll begin at the beginning by talking about why you need both a data glossary and a data dictionary.
I’ll talk about the differences between glossaries and dictionaries, what they’re used for, who’s responsible for creating and maintaining them, and why they’re helpful.
One of the biggest causes of data quality issues is a lack of understanding about what the data really means.
The data dictionary and data glossary are trying to solve that problem for you by making it clear what data is being entered ina particular field and what data represents on a report or in use in another system. While you may have multiple data dictionaries, you should only have one data glossary for your firm.
So let’s start by defining what these documents, artefacts, concepts, whatever you want to call them really are.
The naming of these documents causes the most confusion when you talk to both it and business stakeholders.
A data glossary is often called a business glossary or a business data glossary, but for clarity I will use the term data glossary from this point on.
A data glossary then is the place to record business terms alongside their definitions.
A business term includes something like headcount, or utilisation, or practise name.
Concepts that everybody thinks they understand, but would benefit from having their definitions agreed on and written down.
The main focus of the content in the data glossary is the information designed to improve the businesses understanding and use of that data, and again you should have only one data glossary for your firm.
A data dictionary then is where business and or technical terms and their definitions are stored, and they usually use a limited set of metadata concentrating on the names and definitions of the physical data and related objects.
Unlike a data glossary, you can have multiple data dictionaries.
A data dictionary could be created as one document for each database, or a single document for a group of databases.
Common practise is to create a data dictionary for every system built or implemented in your firm.
It’s often included as a project outcome, but sadly in my experience these documents usually get forgotten and aren’t maintained.
How a data dictionary and a data glossary are created is one of the key differences between them.
The data dictionary is seen very much as an IT owned document.
The business should create and maintain the data glossary, because data dictionaries should include a business definition of all terms.
This is where the confusion starts.
It should mean that whoever is creating the dictionary looks at the glossary and then works with business stakeholders to agree on missing definitions.
However, because the most likely people to refer to a data dictionary are the IT and Reporting teams, these documents are often created without input from the rest of the business service functions or the firm at large.
This is what leads to misunderstandings at best and resentment and disengagement at worst.
This also makes the data glossary a crucial deliverable for your data governance initiative.
Because of that, alongside the terms and definitions, you should also be capturing the data owner and data steward for each term.
As data governance in the firm matures, you might also start to add things like data quality rules, systems, tables, field locations, and flagging issues for potential users of that data so they have a better understanding of how mature the data is itself.
When you’re creating a glossary, you do need to be pragmatic and recognise that it’s going to be an iterative and ever-evolving document.
In some cases you are just not going to be able to jump straight to a firm-wide standard definition.
What you can do is to identify terms with multiple different definitions and multiple different terms with the same definition.
Be aware that there may be sensible and valid business requirements for the various definitions;
Headcount for a people perspective, and headcount for a finance perspective, spring to mind.
Over time, in conversations involving data owners, data stewards, users of those terms, reporting teams and so on, you’ll be able to start with simple disambiguations, making it clear that it is headcount-finance and headcount-people.
Eventually you will reach as many standard terms as your firm will wear and that’s OK.
To sum up then, data dictionaries are more technical and tend to be systems specific.
They are managed by the IT department and a data dictionary defines data elements, meanings and allowable values.
A data dictionary should be a project deliverable for all system related projects.
Data glossaries exist to improve the business understanding of the data they produce and use.
These should be created firmwide.
You should have only one, and a data glossary is a vital part of a successful data governance framework.
If you need a refresher on what data owners and data stewards are, listen to Episode 3 of this season, Season 1, and in the next episode we’ll begin at the beginning by finding out what a critical data asset is.