Data Quality refers to an organization’s ability to maintain the quality of its data in time. If we were to take some data professionals at their word, improving Data Quality is the panacea to all our business woes and should therefore be the top priority.
At Zeenea, we believe this should be nuanced: Data Quality is a means amongst others to limit the uncertainties of meeting corporate objectives.
In this series of articles, we will go over everything data professionals need to know about Data Quality Management (DQM):
- The nine dimensions of Data Quality
- The challenges and risks associated with Data Quality
- The main features of Data Quality Management tools
- The Data Catalog contribution to DQM
The challenges of Data Quality for organizations
Initiatives for improving the quality of the data are usually put in place by organizations to meet the conformity requirements and risk reduction. They are indispensable for reliable decision-making. There are unfortunately many stumbling blocks that can hinder Data Quality improvement initiatives. Below are some examples:
- The exponential growth of the volume, speed, and variety of the data make the environment more complex and uncertain;
- Increasing pressure from conformity regulations such as GDPR, BCBS 239, or HIPAA;
- Teams are increasingly decentralized, and each have their own domain of expertise;
- IT and data teams are snowed under and don’t have time to solve Data Quality issues;
- The data aggregation processes are complex and long;
- It can be difficult to standardize data between different sources;
- Change audits among systems are complex;
- Governance policies are difficult to implement.
Having said that, there are also numerous opportunities to grab. High-quality data enables organizations to facilitate innovation with artificial intelligence and ensure a more personalized customer experience. Assuming there is enough quality data.
Gartner has actually forecasted that until 2022, 85% of AI projects will produce erroneous data as a result of bias in the data, algorithms, or from teams in charge of data management.
Reducing the level of risk by improving the quality of the data
Poor Data Quality should be seen as a risk and quality improvement software as a possible solution to reduce this level of risk.
Processing a quality issue:
If we accept the notion above, any quality issue should be addressed in several phases:
1. Risk identification: this phase consists in seeking out, recognizing, and describing the risks that can help/prevent the organization from reaching its objectives – in part because of a lack of Data Quality.
2. Risk Analysis: the aim of this phase is to understand the nature of the risk and its characteristics. It includes factors for event similarities and their consequences, the nature, and importance of these consequences, etc. Here, we should seek to identify what has caused the poor quality of the marketing data. We could cite for example:
- A poor user experience of the source system leading to typing errors;
- A lack of verification of the completeness, accuracy, validity, uniqueness, consistency, or timeliness of the data;
- A lack of simple means to ensure the traceability, clarity, and availability of the data;
- The absence of a governance process and the implication for business teams.
3. Risk evaluation: the purpose of this phase is to compare the results of the risk analysis with the established risk criteria. It helps establish whether further action is needed for the decision-making – for instance keeping the current means in place, undertaking further analysis, etc.
Let’s focus on the nine dimensions of Data Quality and evaluate the impact of poor quality on each of them:
The values for the levels of probability and severity should be defined by the main stakeholders, who know the data in question best.
4. Risk processing: this processing phase aims to set out the available options to reduce risk and roll them out. This processing also involves the ability to assess the usefulness of the actions taken, determining whether the residual risk is acceptable or not – and in this last case – consider further processing.
Therefore, improving the quality of the data is clearly not a goal in itself:
- Its cost must be evaluated based on company objectives;
- The treatments to be implemented must be evaluated through each dimension of quality.
Get our Data Quality Management guide for data-driven organizations
For more information on Data Quality and DQM, download our free guide: “A guide to Data Quality Management” now!