Data Quality refers to an organization’s ability to maintain the quality of its data in time. If we were to take some data professionals at their word, improving Data Quality is the panacea to all our business woes and should therefore be the top priority.
At Zeenea, we believe this should be nuanced: Data Quality is a means amongst others to limit the uncertainties of meeting corporate objectives.
In this series of articles, we will go over everything data professionals need to know about Data Quality Management (DQM):
- The nine dimensions of Data Quality
- The challenges and risks associated with Data Quality
- The main features of Data Quality Management tools
- The Data Catalog contribution to DQM
A data catalog is not a DQM tool
An essential element is that a data catalog should not be considered a Data Quality Management tool per se.
First of all, one of the core principles at the heart of Data Quality is that controls should ideally take place in the source system. Running these controls solely in the data catalog – rather than at the source and the data transformation flow – increases the global cost of the undertaking.
Furthermore, a data catalog must be both comprehensive and less intrusive to facilitate its rapid deployment within the company. This is simply incompatible with the complex nature of data transformation and the multitude of tools used to carry out these transformations.
Lastly, a data catalog must remain a simple tool to understand and use, as described in article 3 of our Data Democracy.
How does a data catalog contribute to DQM?
While the data catalog isn’t a Data Quality tool, its contribution to the upkeeping of Data Quality is nonetheless substantial. Here is how:
- A data catalog enables data consumers to easily understand metadata and avoid hazardous interpretations of the data. It echoes the clarity dimension of quality;
- A data catalog gives a centralized view of all the available enterprise data. Data Quality information is therefore metadata like any other that carries value and should be made available to all. They are easy to interpret and extract, an echo of the dimensions of accuracy, validity, consistency, uniqueness, completeness, and timeliness.
- A data catalog has data traceability capacities (Data Lineage), echoing the traceability dimension of quality;
- A data catalog usually allows direct access to the data sources, echoing the availability dimension of quality.
The implementation strategy of the DQM
The following table details how Data Quality is taken into account depending on the different solutions on the market:
As stated above, quality testing should by default take place directly in the source system. Quality test integration in a data catalog can improve user experience, but it isn’t a must in light of its limitations – as Data Quality isn’t integrated into the transformation flow.
That said, when the systems stacks become too complex and we need, for example, to consolidate data from different systems with different functional rules, a Data Quality tool becomes unavoidable.
The implementation strategy will depend on use cases and company objectives. It is nonetheless apropos to put Data Quality in place incrementally to:
- Ensure the source systems have put in place the relevant quality rules;
- Implement a data catalog to improve quality on the dimensions of clarity, traceability, and/or availability;
- Integrate Data Quality in the transformation flows with a specialized tool while importing this information automatically into the data catalog via APIs.
Conclusion
Data Quality refers to the ability of a company to maintain the sustainability of its data over time. At Zeenea, we define it through the prism of nine of the sixty dimensions described by DAMA International: completeness, accuracy, validity, uniqueness, consistency, timeliness, traceability, clarity, and availability.
As a data catalog provider, we reject the idea that a data catalog is a full-fledged quality management tool. In fact, it is only one of several ways to contribute to the improvement of Data Quality, notably through the dimensions of clarity, availability, and traceability.
Get our Data Quality Management guide for data-driven organizations
For more information on Data Quality and DQM, download our free guide: “A guide to Data Quality Management” now!
