Metadata management is an important component in a data management project and it requires more than just the data catalog solution, however connected it may be.
A data catalog tool will of course reduce the workload but won’t in and of itself guarantee the success of the project.
In this series of articles, discover the pitfalls and preconceived ideas that should be avoided when rolling out an enterprise-wide data catalog project. The traps described in this are articulated around 4 central themes that are crucial to the success of the initiative:
- Data culture within the organization
- Internal project sponsorship
- Project leadership
- Technical integration of the Data Catalog
—
Integrating the data catalog into the enterprise ecosystem will provide opportunities to create value. It is essential to consider these aspects and understand the potential rewards.
Not all metadata has to be entered manually
More and more systems produce, aggregate, and enable the entering of metadata for local value. This information has to be retrieved and consolidated in the catalog, without being entered twice, for obvious reasons (saving money, data reliability, and availability).
The data catalog, therefore, presents an opportunity to consolidate this information with the knowledge of the contributors in their respective fields. However, this consolidation has to be thought out through a technical integration rather than a manual effort. Even if it’s obvious that entering the same information twice isn’t efficient, nor is carrying out imports/exports between systems through human actions the way to go.
The strength of a data catalog remains its capacity to ingest metadata via technical integration chains and thus ensure a robust synchronization between systems.
The data catalog isn’t an “automagical” tool
On the flip side, thinking that a data catalog can extract all types of metadata regardless of its source or format, would be misleading.
The catalog should of course facilitate metadata retrieval, but some metadata won’t be retrievable automatically. There will therefore always be a cost linked to the intervention of the contributors.
The first reason for this resides in the origin of some metadata: some information may simply not be present in the systems because it originates solely from the knowledge of experts. The data catalog is therefore, in this case, a potential candidate for becoming the master system and eligible to receive this information.
And conversely, some information can be present in a system and be impossible to retrieve in an automated manner…for many reasons. For example, there could be an absence of
an interface that enables information to be accessed in a stable manner. The risk of producing noise around the information is therefore high and can lead to a degradation of the quality of the catalog content and ultimately turn the users off using it.
The data catalog must not be connected to a unique metadata source
Metadata stems from many varied layers. As a result, there are multiple and complementary sources involved for a global understanding. It is precisely the reconciliation of this information in a central solution, a data catalog, that will provide the necessary elements to the users.
Opting for a connected data catalog is a real asset, because asset discovery and the associated metadata retrieval are made considerably easier as a result of automation.
This connectivity can also extend to other complementary systems. These systems can potentially come before or after the first one, enabling, if needed, the materialization of the lineage and thus documenting the flows and transformations between systems.
The systems can also be independent of one another and simply allow for, by their addition to the catalog, an exhaustive cartography of the company’s patrimony.
Lastly, given the variety of the types of assets that can be documented in the catalog, the different connected sources can also contribute to the enrichment of a specific universe in the data catalog: semantic layers for some, physical layers for others, etc.
Always with an iterative approach in mind, the multiple sources that will feed the data catalog will be integrated progressively, in accordance with a strategy that seeks the production of value, under the global supervision of the Data Office.
The 10 Traps to Avoid for a Successful Data Catalog Project
To learn more about the traps to avoid when starting a data cataloging initiative, download our free eBook!