What makes a data catalog “smart”? #2 – The Data Inventory

February 16, 2022

16 February 2022

A data catalog harnesses enormous amounts of very diverse information – and its volume will grow exponentially. This will raise 2 major challenges:

How to feed and maintain the volume of information without tripling (or more) the cost of metadata management?
How to find the most relevant datasets for any specific use case?

At Zeenea, we think that a data catalog should be Smart in order to answer these 2 questions, with smart technological and conceptual features that go wider than the sole integration of AI algorithms.

In this respect we have identified 5 areas in which a data catalog can be “Smart” – most of which do not involve machine learning:

—

The second way to make a data catalog “smart“ is through its inventory. A data catalog is essentially a thorough inventory of information assets that include a bunch of metadata, which helps harness the information as efficiently as possible. Setting up a data catalog, therefore, depends first of all on an inventory of the assets from the different systems.

Automating the inventory: the challenges

A declarative approach to building the inventory doesn’t strike us as particularly smart, however well thought out it may be. It involves a lot of work at the launching and the up-keeping of the catalog – in a fast-changing digital landscape, the initial effort quickly becomes redundant.

The first step in creating a smart inventory is of course to automate it. With a few exceptions, enterprise datasets are managed by system specialists (involving distributed filing systems, ERPs, relational databases, software packages, data warehouses, etc.). They manage all these systems along with all the metadata required for them to work properly. There is no need to recreate this information manually: you just need to connect to the different registries and synchronize the catalog content with the source systems.

In theory, this should be straightforward but putting it into practice is actually rather difficult. The fact is, there is no universal standard to which the different technologies conform for a universal means of access to their metadata.

The essential role of connectivity to the system sources

A smart connectivity layer is a key part of the Smart Data Catalog. For a more detailed description of Zeenea’s connectivity technology, I recommend reading our previous ebook, the 5 technological breakthroughs of a next-generation catalog, but its main characteristics are:

Proprietary – we do not rely on third parties so as to maintain a highly specialized extraction of the metadata.
Distributed – in order to maximize the reach of the catalog.
Open – anyone looking to enrich the catalog can develop their own
connectors with ease.
Universal – it can synchronize any source of metadata.

This connectivity can not only read and synchronize the metadata contained in the source registries, it can also produce metadata.

This production of metadata requires more than simple access to the source system registries. It also requires access to the data itself, which will be analyzed by our scanners in order to enrich the catalog automatically.

To date, we produce 2 types of metadata:

Statistical analysis: to build a profile of the data – value distribution, rate of null values, top values, etc. (the nature of the metadata depends obviously on the native type of the data being analyzed);

Structural analysis: to determine the operational type of specific textual data (email, postal address, social security number, client code, etc. – the system is scalable and customizable).

The inventory mechanism must also be smart

Our inventory mechanism is also smart in several ways:

Dataset detection relies on extensive knowledge of the storage structures, particularly in a Big Data context. For example, an IoT dataset made up of thousands of files of time series measures can be identified as a unique dataset (the number of files and their location being only metadata).

The inventory is not integrated into the catalog by default to prevent the import of technical or temporary datasets that would be of little use (either because the data is unexploitable, or because it is duplicated data).

The selection process for the assets that should be imported into the catalog also benefits from some assistance – we strive to identify the most appropriate objects for integration in the catalog (with a variety of additional approaches to make this selection).

For more information on how Smart Data Inventorying enhances a Data Catalog, download our eBook:

“What is a Smart Data Catalog?”!

Download the ebook

← Previous Next →

← Vorherige Nächste →

← Précédent Suivant →

Zeenea Actian Logo

At Zeenea, we work hard to create a data fluent world by providing our customers with the tools and services that allow enterprises to be data driven.

Zeenea Actian Logo

Chez Zeenea, notre objectif est de créer un monde “data fluent” en proposant à nos clients une plateforme et des services permettant aux entreprises de devenir data-driven.

Zeenea Actian Logo

Das Ziel von Zeenea ist es, unsere Kunden "data-fluent" zu machen, indem wir ihnen eine Plattform und Dienstleistungen bieten, die ihnen datengetriebenes Arbeiten ermöglichen.

TECHNOLOGY

SOLUTIONS

CAPABILITIES

APPLICATIONS

INDUSTRIES

DATA LEADERS

KNOWLEDGE HUB

PRODUCT HUB

ABOUT

GET IN TOUCH

SERVICES

BELIEFS

What makes a data catalog “smart”? #2 – The Data Inventory

Automating the inventory: the challenges

The essential role of connectivity to the system sources

The inventory mechanism must also be smart

Related posts

Articles similaires

Ähnliche Artikel

Be(come) data fluent

Devenez Data Fluent

Werden Sie Data Fluent

Product

Capabilities

Use Cases

Resources

Company

Produkt

Funktionalitäten

Use Cases

Ressourcen

Company

Produit

Capacités

Cas d'usage

Ressources

Société

TECHNOLOGY

SOLUTIONS

CAPABILITIES

APPLICATIONS

INDUSTRIES

DATA LEADERS

KNOWLEDGE HUB

PRODUCT HUB

ABOUT

GET IN TOUCH

SERVICES

BELIEFS

What makes a data catalog “smart”? #2 – The Data Inventory

Automating the inventory: the challenges

The essential role of connectivity to the system sources

The inventory mechanism must also be smart

Related posts

Articles similaires

Ähnliche Artikel

Harnessing the Power of AI in Data Cataloging

The Role of Data Catalogs in Accelerating AI Initiatives

[SERIES] Data Shopping Part 2 – The Zeenea Data Shopping Experience

[SERIES] Building a Marketplace for Data Mesh Part 3: Feeding the Marketplace via domain-specific data catalogs

[SERIES] Building a Marketplace for Data Mesh Part 2: Setting up an enterprise-level marketplace

Be(come) data fluent

Devenez Data Fluent

Werden Sie Data Fluent

Product

Capabilities

Use Cases

Resources

Company

Produkt

Funktionalitäten

Use Cases

Ressourcen

Company

Produit

Capacités

Cas d'usage

Ressources

Société