What is data lineage?

Season 1 Episode 7

In this episode, CJ Anderson begins at the beginning by explaining data lineage.

This concept means different things to different teams – and all of them are important. You’ll want to understand why you need various data lineages to help your data governance initiative. 

Data Lineage is an essential part of data governance because it identifies (and documents) where data starts, how it gets produced, how it transforms and moves through the firm’s systems, and how it gets to the end-users.

Subscribe

Law Firm Data Governance Podcast

  • Do you want learn more about the podcast?
  • Are you curious about what’s coming up in future seasons?
  • Do you want to listen to the latest episode?

Answers to these questions and more can be found on the podcast page.

Episode Transcript

This is Season 1: “Begin at the beginning”, episode 7.

In this episode I’ll explain data lineages.

This concept means different things to different teams and all of them are important.

You’ll want to understand why you need various data lineages to help your data governance initiative, and what the importance to the firm of having access to these lineage documents or systems might be.

Data governance includes the rules and procedures that firms use to manage and control data.

Data lineages an essential part of data governance because it identifies and documents where data starts, how it gets produced, how it transforms and moves through the firm systems, and how it gets to end users.

There are different levels of detail within data lineages.

Which level you use will depend on who uses the lineages and what they use it for.

A business user will have a data lineages picture that focuses more on the who, when, and how of the processes for producing, moving and using data.

This is also sometimes called data provenance.

A technical, as in, an IT user, will have a data lineage picture with granular details of the specific data tables that labels the fields and how they move, transform and get used across the firm at a really granular level. You need both kinds of data lineage for a complete picture, the data map of what’s going on with the firms data.

Data lineage helps your firm ensure that reliable data is being used to drive business decisions.

Without data lineage, it’s nearly impossible for you to understand whether or not the correct data is being used, what it means, where it comes from, and whether or not it’s complete.

Data lineage can also help you fix issues or perform system migrations.

It helps you ensure the confidentiality and the integrity of data by tracking changes, how they were performed and who made them.

Some firms create these lineage documents manually by interviewing stakeholders and interrogating code by hand.

Some firms use data visualisation or lineage software that examines the code for them.

However your firm creates them, these lineage documents are the magic ingredient that helps you achieve trusted data and support for data governance.

Data lineage has an impact on several areas of the firm.

It enables business users to better understand processed data by viewing how it got transformed as it moved through the firm systems.

This helps improve business operations and make improvements to client services.

Data lineage also helps firms track different datasets because of evolving collection techniques and technologies.

It helps the firm to make optimal use of these old and new datasets.

IT teams are helped too. To upgrade systems, to migrate data, or to fix system issues because data lineage helps them understand the location and lifecycle of data and data sources.

Another impact is that data lineage helps data governance.

That’s because lineage provides detailed visibility of the data lifecycle.

It helps the firm to manage risks, to comply with regulations, to perform audits of its data and to identify stakeholders.

Data lineages also helps identify the root cause of data errors for business intelligence teams.

For example, why does this system and that system have different headcount numbers? Data lineages can help provide a reasonable explanation for these numbers and see if modifications made in the processing are to blame.

The final impact of data lineage on the firm is that it can help with change impact analysis.

A detailed lineage lets you identify the data elements, affected downstream systems, and reports.

It helps you pin-point the key stakeholders and affected end users before you do anything.

Using them to help you assess the impact of the change will help you decide what steps to take to make that change effective, or even if you should make that change at all.

Data lineage has many benefits, both for data governance and for other areas of the firm.

Data lineage draws a picture of where your data starts, where it ends up, and how it got there.

This picture could contain different levels of detail depending on the audience that needs to use it.

You can do data lineage manually or with software, and a key objective for your firm, and for your data governance initiative is to get it done somehow.

If you have questions, head over to IronCarrot.com and use the contact form to get in touch.

I’d love to include the answers to your questions in future episodes.

In the next episode, we’ll continue beginning at the beginning by answering the question: “What is a data strategy?”