The Data Catalog market has developed rapidly, and it is now deemed essential when deploying a data-driven strategy. Victim of its own success, this market has attracted a number of players from adjacent markets.
These players have rejigged their marketing positioning in order to present themselves as Data Catalog solutions.
The reality is that, while relatively weak on the data catalog functionalities themselves, these companies attempt to convince, with degrees of success proportional to their marketing budgets, that a Data Catalog is not merely a high-performance search tool for data teams, but an integrated solution likely to address a host of other topics.
The purpose of this blog series is to deconstruct the pitch of these eleventh-hour Data Catalog vendors.
Here are, in our opinion, the 7 lies of the Data Catalog vendors:
- A Data Catalog is a Data Governance platform,
- A Data Catalog can measure and manage data quality,
- A Data Catalog can manage regulatory compliance,
- A Data Catalog can query data directly,
- A Data Catalog can model logical architecture and business processes around data,
- A Data Catalog is a collaborative cartography and metadata management tool that cannot be automated,
- A Data Catalog is a long, complex, and expensive project.
A Data Catalog is NOT a Compliance Solution
As with governance, regulatory compliance is a crucial issue for any data-centric organization.
There is a plethora of data handling regulations spanning all sectors of activity and countries. On the subject of personal data alone, GDPR is mandatory across all EU countries, but each State has a lot of wiggle room on how its implemented, and most States have a large arsenal of legislation to complete, reinforce and adapt it (Germany alone for instance, has several dozen regulations across different sectors of activity related to personal data).
In the US, there are hundreds of laws and regulations across States and sectors of activity (with varying degrees of adherence). And here we are only referring to personal data…Rules and regulations also exist for financial data, medical data, biometric data, banking data, risk data, insurance data etc. Put simply, every organization has some regulation it has to be in compliance with.
So what does compliance mean in this case?
The vast majority of regulatory audits center on the following:
- The ability to provide complete and up to date documentation on the procedures and controls put in place in order to meet the norms,
- The ability to prove that the procedures described in the documentation are rolled out in the field,
- The ability to supervise all the measures deployed with a view towards continuous improvement.
A Data Catalog is neither a procedures library, or an evidence consolidation system, and even less a process supervision solution.
It strikes us as obvious that assigning those responsibilities to a Data Catalog will make it considerably less simple to use (norms are too obscure for most people) and will jeopardize adoption for those most likely to benefit from it (data teams).
Should we therefore forget about Data Catalogs in our quest for compliance?
No, of course not. Again, in terms of compliance, it would be much wiser to use the Da ta Catalog for the literacy of the data teams. And to tag the data appropriately thus, enabling the teams to quickly identify any norm or procedure they need to adhere to before using the data. The Catalog can even help place the tags using a variety of approaches. It can for example automatically detect sensitive or personal data.
That said, even with the help of ML, detection will never work perfectly ( the notion of “personal data” defined by GDPR for instance, is much larger and harder to detect than North American PII). The Catalog’s ability to manage these tags is therefore critical.
Regulatory compliance is above all a matter of documentation and proof and has no place in a Data Catalog.
However, the Data Catalog can help identify (more or less automatically) data that is subject to regulations. The Data Catalog plays a key role in the acculturation of the data teams with respect to the importance of regulations.