The Data Catalog market has developed rapidly, and it is now deemed essential when deploying a data-driven strategy. Victim of its own success, this market has attracted a number of players from adjacent markets.
These players have rejigged their marketing positioning in order to present themselves as Data Catalog solutions.
The reality is that, while relatively weak on the data catalog functionalities themselves, these companies attempt to convince, with degrees of success proportional to their marketing budgets, that a Data Catalog is not merely a high-performance search tool for data teams, but an integrated solution likely to address a host of other topics.
The purpose of this blog series is to deconstruct the pitch of these eleventh-hour Data Catalog vendors.
Here are, in our opinion, the 7 lies of the Data Catalog vendors:
- A Data Catalog is a Data Governance platform,
- A Data Catalog can measure and manage data quality,
- A Data Catalog can manage regulatory compliance,
- A Data Catalog can query data directly,
- A Data Catalog can model logical architecture and business processes around data,
- A Data Catalog is a collaborative cartography and metadata management tool that cannot be automated,
- A Data Catalog is a long, complex, and expensive project.
A Data Catalog is NOT a Business Modeling Solution
Some organizations, usually large ones, have invested for years in the modeling of their business processes and information architecture.
They have developed several layers of models (conceptual, logical, physical) and have put in place an organization that helps the maintenance and sharing of these models with specific populations (business experts and IT people mostly).
We do not question the value of these models. They play a key role in the urbanization, the schema blueprints, the IS management, as well as regulatory compliance. But we seriously doubt that these modeling tools can provide a decent Data Catalog.
There is also a market phenomenon at play here: certain historical business modeling players are looking to widen the scope of their offer by positioning themselves on the Data Catalog market. After all, they do already manage a great deal of information on physical architecture, business classifications, glossaries, ontologies, information lineage, processes and roles, etc. But we can identify two major flaws in their approach.
The first is organic. By their nature, modeling tools produce top-down models to outline the information in an IS. However accurate it may be, a model remains a model: a simplified representation of reality.
They are very useful communication tools in a variety of domains, but they are not an exact reflection of the day-to-day operational reality which, for me, is crucial to keeping the promises of a Data Catalog (enabling teams to find data, understanding and knowing how to use the datasets).
The second flaw?: It is not user -friendly.
A modeling tool is complex and handles an important number of abstract concepts which require an important learning curve. It’s a tool for experts.
We could consider improving user friendliness of course to open it up to a wider audience. But the built-in complexity of the information won’t go away.
Understanding the information provided by these tools requires a solid understanding of modeling principles (object classes, logical levels, nomenclatures, etc). It is quite a challenge for data teams and a challenge that seems difficult to justify from an operational perspective.
The truth is, modeling tools that have been turned into Data Catalogs are faced with important adoption issues with the teams (they have to make huge efforts to learn how to use the tool, only to not find wha t they are looking for).
A prospective client recently presented us with a metamodel they had built and asked us whether it was possible to implement it in Zeenea. Derived from their business models, the metamodel had several dozen classes of objects and thousands of attributes. To their question, the official answer was yes (the Zeenea metamodel is very flexible). But instead, we tried to dissuade them from taking that path: A metamodel that sophisticated ran the risk, in our opinion, of losing the end users, and turning the Data Catalog project into a failure…
Should we therefore abandon business models when putting a Data Catalog in place? Absolutely not.
It must, however, be remembered that business models are there to handle some issues, and the Data Catalog other issues. Some information contained within the models help structure the catalog and enrich its content in a very useful way (for instance responsibilities, classifications, and of course business glossaries).
The best approach is therefore, in our view, to conceive the catalog metamodel by focusing exclusively on the added value to the data teams (always with the same underlying question: does this information help find, localize, understand, and correctly use the data?), and then integrating the modeling tool and the Data Catalog in order to automate the supply of certain elements of the metamodel already present in the business model.
As useful and complete as they may be, business models are still just models: they are an imperfect reflection of the operational reality of the systems and therefore they struggle to provide a useful Data Catalog.
Modeling tools, as well as business models, are too complex and too abstract to be adopted by data teams. Our recommendation is that you define the metamodel of your catalog with a view to answering the questions of the data teams and supply some aspects of the metamodel with the business model.