A Data Catalog is NOT a Compliance Solution
As with governance, regulatory compliance is a crucial issue for any data-centric organization.
There is a plethora of data handling regulations spanning all sectors of activity and countries. On the subject of personal data alone, GDPR is mandatory across all EU countries, but each State has a lot of wiggle room on how its implemented, and most States have a large arsenal of legislation to complete, reinforce and adapt it (Germany alone for instance, has several dozen regulations across different sectors of activity related to personal data).
In the US, there are hundreds of laws and regulations across States and sectors of activity (with varying degrees of adherence). And here we are only referring to personal data…Rules and regulations also exist for financial data, medical data, biometric data, banking data, risk data, insurance data etc. Put simply, every organization has some regulation it has to be in compliance with.
So what does compliance mean in this case?
The vast majority of regulatory audits center on the following:
- The ability to provide complete and up to date documentation on the procedures and controls put in place in order to meet the norms,
- The ability to prove that the procedures described in the documentation are rolled out in the field,
- The ability to supervise all the measures deployed with a view towards continuous improvement.
A Data Catalog is neither a procedures library, or an evidence consolidation system, and even less a process supervision solution.
It strikes us as obvious that assigning those responsibilities to a Data Catalog will make it considerably less simple to use (norms are too obscure for most people) and will jeopardize adoption for those most likely to benefit from it (data teams).
Should we therefore forget about Data Catalogs in our quest for compliance?
No, of course not. Again, in terms of compliance, it would be much wiser to use the Da ta Catalog for the literacy of the data teams. And to tag the data appropriately thus, enabling the teams to quickly identify any norm or procedure they need to adhere to before using the data. The Catalog can even help place the tags using a variety of approaches. It can for example automatically detect sensitive or personal data.
That said, even with the help of ML, detection will never work perfectly ( the notion of “personal data” defined by GDPR for instance, is much larger and harder to detect than North American PII). The Catalog’s ability to manage these tags is therefore critical.
Take Away
Regulatory compliance is above all a matter of documentation and proof and has no place in a Data Catalog.
However, the Data Catalog can help identify (more or less automatically) data that is subject to regulations. The Data Catalog plays a key role in the acculturation of the data teams with respect to the importance of regulations.