Data lineage in a big data environment

March 1, 2018

01 March 2018

Data lineage is defined as a type of data life cycle. It is a detailed representation of any data over time: its origin, processes, and transformations. Although this isn’t a brand new concept, a paradigm shift is taking place…

Obtaining data lineage from a Data Warehouse, for example, was a pretty simple task. This centralized storage system allowed, “by design,” you to obtain data lineage from the data stored in the same place.

The data ecosystem has been evolving at a very rapid pace since the emergence of Big Data due to the appearance of various technologies and storage systems that complicate information systems in enterprises.

It has become impossible both to keep and to impose a single centralized tool in organizations. Softwares and methods used by urbanists and IS architects of the “old world” have become less and less maintainable, making their work obsolete and illegible.

So, how can you visualize an efficient data lineage in a Big Data environment?

In order to have a global vision of an enterprise’s IS data, new tools are emerging. We are talking about a data catalog. It allows for a maximum amount of metadata from all data storages to be treated via a user-friendly interface. By centralizing all of this information, it is possible to create data lineage in a Big Data environment at different levels:

At Datasets level. It can be a table in Oracle, a topic in Kafka or even a directory in the data lake. A data catalog highlights the processes and datasets that made it possible to create the final dataset.

However, this data lineage standard on its own does not make it possible for data users to answer all of their questions. Among others, these questions remain: what about sensitive data? What columns were created and with what processes? etc.

At Column level. A more granular way to approach this topic is to represent the different transformation stages of a dataset in a timeline of actions/events. By selecting a specific field, users will be able to see what columns and actions created it.

Next →

Nächste →

Suivant →

Zeenea Actian Logo

At Zeenea, we work hard to create a data fluent world by providing our customers with the tools and services that allow enterprises to be data driven.

Zeenea Actian Logo

Chez Zeenea, notre objectif est de créer un monde “data fluent” en proposant à nos clients une plateforme et des services permettant aux entreprises de devenir data-driven.

Zeenea Actian Logo

Das Ziel von Zeenea ist es, unsere Kunden "data-fluent" zu machen, indem wir ihnen eine Plattform und Dienstleistungen bieten, die ihnen datengetriebenes Arbeiten ermöglichen.

TECHNOLOGY

SOLUTIONS

CAPABILITIES

APPLICATIONS

INDUSTRIES

DATA LEADERS

KNOWLEDGE HUB

PRODUCT HUB

ABOUT

GET IN TOUCH

SERVICES

BELIEFS

Data lineage in a big data environment

So, how can you visualize an efficient data lineage in a Big Data environment?

Related posts

Articles similaires

Ähnliche Artikel

Be(come) data fluent

Devenez Data Fluent

Werden Sie Data Fluent

Product

Capabilities

Use Cases

Resources

Company

Produkt

Funktionalitäten

Use Cases

Ressourcen

Company

Produit

Capacités

Cas d'usage

Ressources

Société

TECHNOLOGY

SOLUTIONS

CAPABILITIES

APPLICATIONS

INDUSTRIES

DATA LEADERS

KNOWLEDGE HUB

PRODUCT HUB

ABOUT

GET IN TOUCH

SERVICES

BELIEFS

Data lineage in a big data environment

So, how can you visualize an efficient data lineage in a Big Data environment?

Related posts

Articles similaires

Ähnliche Artikel

Harnessing the Power of AI in Data Cataloging

The Role of Data Catalogs in Accelerating AI Initiatives

[SERIES] Data Shopping Part 2 – The Zeenea Data Shopping Experience

[SERIES] Building a Marketplace for Data Mesh Part 3: Feeding the Marketplace via domain-specific data catalogs

[SERIES] Building a Marketplace for Data Mesh Part 2: Setting up an enterprise-level marketplace

Be(come) data fluent

Devenez Data Fluent

Werden Sie Data Fluent

Product

Capabilities

Use Cases

Resources

Company

Produkt

Funktionalitäten

Use Cases

Ressourcen

Company

Produit

Capacités

Cas d'usage

Ressources

Société