smart-data-catalog-3-metadata-management

What makes a data catalog “smart”? #3 – Metadata Management

February 16, 2022

16 February 2022

A data catalog harnesses enormous amounts of very diverse information – and its volume will grow exponentially. This will raise 2 major challenges:

How to feed and maintain the volume of information without tripling (or more) the cost of metadata management?
How to find the most relevant datasets for any specific use case?

At Zeenea, we think that a data catalog should be Smart in order to answer these 2 questions, with smart technological and conceptual features that go wider than the sole integration of AI algorithms.

In this respect we have identified 5 areas in which a data catalog can be “Smart” – most of which do not involve machine learning:

—

It is in the field of metadata management that the notion of the Smart Data Catalog is most commonly associated with algorithms, machine learning, and AI.

How is metadata management automated?

Metadata management is the discipline that consists in valuing the metamodel attributes for the inventoried assets. The workload required is usually proportional to the number of attributes in the metamodel and the number of assets in the catalog.

The role of the Smart Data Catalog is to automate this activity as much as possible, or at the very least to help the human operators (Data Stewards) do so in order to ensure greater productivity and reliability.

As seen in our last article, a smart connectivity layer enables the automation of part of the metadata but this automation is very much restricted to a limited subset of the metamodel – mostly technical metadata. A complete metamodel, even a modest one, also has dozens of metadata that cannot be extracted from the source systems registries (because they are not there, to begin with).

To solve this equation, several approaches are possible:

Pattern recognition

The most direct approach consists in looking to identify patterns in the catalog in order to suggest metadata values for new assets.

Put simply, a pattern will include all the metadata of an asset and the metadata of its relations with other assets or other catalog entities. Pattern recognition is typically done with the help of machine learning algorithms.

The difficulty with the implementation of this approach is precisely qualifying the information assets in a numerical form in order to feed the algorithms and select the relevant patterns. A simple structural analysis is not enough: two datasets can contain identical data but in different structures. Relying on the identity of the data isn’t efficient either: two datasets can contain identical information but with different values. For example, 2020 client invoicing in one dataset, 2021 client invoicing in the other.

In order to solve this problem, Zeenea relies on a technology called fingerprinting. In order to build the fingerprint, we pull up 2 types of features from our clients’ data:

A group of features adapted to the numerical data (mostly statistical indicators)
Data emanating from word embedding models (word vectorization) for the textual data.

Fingerprinting is at the heart of our intelligent algorithms.

The other embedded approaches in a suggestion engine

While pattern recognition is indeed an efficient approach for suggesting the metadata of a new asset in a catalog, it rests on an important prerequisite: in order to recognize a pattern, there has to be one to recognize. In other words, this only works if there are a number of assets in the catalog (which is obviously not the case at the start of a project).

And it’s precisely in these initial phases of a catalog project that the metadata management load is the highest. It is, therefore, crucial to include other approaches likely to help the Data Stewards in these initial phases, when a catalog is more or less empty…

The Zeenea suggestion engine, which provides intelligent algorithms to assist the management of the metadata, also provides other approaches (which we enrich regularly).

Here are some of these approaches:

Structural similarity detection
Fingerprint similarity detection
Name approximation

This suggestion engine, which analyzes the catalog content in order to determine the probable values of the metadata from the assets that have been integrated, is an everlasting subject of experimentation. We regularly add new approaches, sometimes very simple and sometimes much more sophisticated. In our architecture, it is a dedicated service whose performances improve as the catalog grows and as we enrich our algorithms.

Zeenea has chosen to use the lead time as our main measuring metric for the productivity of the Data Stewards (which is the ultimate objective of smart metadata management). Lead time is a notion that stems from lean management and which measures, in a data catalog context, the time elapsed between the moment an asset is inventoried and the moment all its metadata has been valued.

For more information on how Smart metadata management enhances a Data Catalog, download our eBook:

“What is a Smart Data Catalog?”!

Download the ebook

← Previous Next →

← Vorherige Nächste →

← Précédent Suivant →

zeenea logo

At Zeenea, we work hard to create a data fluent world by providing our customers with the tools and services that allow enterprises to be data driven.

zeenea logo

Chez Zeenea, notre objectif est de créer un monde “data fluent” en proposant à nos clients une plateforme et des services permettant aux entreprises de devenir data-driven.

zeenea logo

Das Ziel von Zeenea ist es, unsere Kunden "data-fluent" zu machen, indem wir ihnen eine Plattform und Dienstleistungen bieten, die ihnen datengetriebenes Arbeiten ermöglichen.

TECHNOLOGY

SOLUTIONS

CAPABILITIES

APPLICATIONS

INDUSTRIES

DATA LEADERS

KNOWLEDGE HUB

PRODUCT HUB

ABOUT

GET IN TOUCH

SERVICES

BELIEFS

What makes a data catalog “smart”? #3 – Metadata Management

How is metadata management automated?

Pattern recognition

The other embedded approaches in a suggestion engine

Related posts

Articles similaires

Ähnliche Artikel

Be(come) data fluent

Devenez Data Fluent

Werden Sie Data Fluent

Product

Capabilities

Use Cases

Resources

Company

Produkt

Funktionalitäten

Use Cases

Ressourcen

Company

Produit

Capacités

Cas d'usage

Ressources

Société

TECHNOLOGY

SOLUTIONS

CAPABILITIES

APPLICATIONS

INDUSTRIES

DATA LEADERS

KNOWLEDGE HUB

PRODUCT HUB

ABOUT

GET IN TOUCH

SERVICES

BELIEFS

What makes a data catalog “smart”? #3 – Metadata Management

How is metadata management automated?

Pattern recognition

The other embedded approaches in a suggestion engine

Related posts

Articles similaires

Ähnliche Artikel

Harnessing the Power of AI in Data Cataloging

The Role of Data Catalogs in Accelerating AI Initiatives

[SERIES] Data Shopping Part 2 – The Zeenea Data Shopping Experience

[SERIES] Building a Marketplace for Data Mesh Part 3: Feeding the Marketplace via domain-specific data catalogs

[SERIES] Building a Marketplace for Data Mesh Part 2: Setting up an enterprise-level marketplace

Be(come) data fluent

Devenez Data Fluent

Werden Sie Data Fluent

Product

Capabilities

Use Cases

Resources

Company

Produkt

Funktionalitäten

Use Cases

Ressourcen

Company

Produit

Capacités

Cas d'usage

Ressources

Société