Everything you need to know about Data Contracts

Everything you need to know about Data Contracts

In today’s data-driven world, enterprises exchange vast volumes of data between different departments, services, and partner ecosystems from various applications, technologies, and sources. Ensuring that the data being exchanged is reliable, of high quality, and trustworthy is vital for generating tangible business value. This is where data contracts come in – Similar to traditional contracts that define expectations and responsibilities, data contracts serve as the framework for reliable data exchange.

In this article learn everything you need to know about data contracts!

What is a data contract?

 

A data contract is essentially an agreement between two or more parties regarding the structure, format, and semantics of the data being exchanged. It serves as a blueprint that defines how information should be organized, encoded, and validated during the communication process. Moreover, a crucial aspect of a data contract involves specifying how and when it should be delivered to ensure data freshness. Ideally, they should be provided at the start of any data-sharing agreement, setting clear guidelines from the outset while ensuring alignment with the evolving regulatory landscape and technological advancements

Data contracts typically serve as the bridge between data producers, such as software engineers, and data consumers, such as data engineers or scientists. These contracts meticulously outline how data should be structured and organized to facilitate its utilization by downstream processes, such as data pipelines. Accuracy in data becomes essential to prevent downstream quality issues and ensure the precision of data analyses.

Yet, data producers may lack insights into the specific requirements and essential information needed by each data team’s organization for effective data analysis. In response to this gap, data contracts have emerged as indispensable. They provide a shared understanding and agreement regarding data ownership, organization, and characteristics, facilitating smoother collaboration and more effective data utilization across diverse teams and processes.

It’s important to emphasize that data contracts are occasionally separated from data sharing agreements. While data contracts intricately outline the technical specifics and legal obligations inherent in data exchange, data sharing agreements provide a simplified version, often in formats like Word documents, specifically tailored for non-technical stakeholders like Data Protection Officers (DPOs) and legal counsels.

What is in a data contract?

 

A data contract typically includes agreements on:

Semantics

 

Semantics in a data contract clarify the meaning and intended usage of data elements and fields, ensuring mutual understanding among all parties. Clear documentation provides guidance on format, constraints, and requirements, promoting consistency and reliability across systems.

The Data Model (Schema)

 

The schema in a data contract defines the structure of datasets, including data types and relationships. It guides users in handling and processing data, ensuring consistency across systems for seamless integration and effective decision-making.

Service level agreements (SLA)

 

The SLAs component of a data contract sets out agreed standards for data-related services to ensure the freshness and availability of the data. It defines metrics like response times, uptime, and issue resolution procedures. SLAs assign accountability and responsibilities to both parties, ensuring service levels are met. Examples of delivery frequencies include in batch, e.g. once a week, on-demand as an API, or in real-time as a stream.

Data Governance

 

In the data contract, data governance establishes guidelines for managing data responsibly. It clarifies roles, responsibilities, and accountability, ensuring compliance with regulations and fostering trust among stakeholders. This framework helps maintain data integrity and reliability, aligning with legal requirements and organizational objectives.

Data Quality

 

The data quality section of a data contract ensures that exchanged data meets predefined standards, including criteria such as accuracy, completeness, consistency, and timeliness. By specifying data validation rules and error-handling protocols, the contract aims to maintain the integrity and reliability of the data throughout its lifecycle.

Data security and privacy

 

The data security and privacy part of a data contract outlines measures to protect sensitive information and ensure compliance with privacy regulations. It includes policies for encryption, access controls, and regular audits to safeguard data integrity and confidentiality. The contract emphasizes compliance with laws like GDPR, HIPAA, or CCPA to protect individuals’ privacy rights and build trust among stakeholders.

Here is an example of a data contract from PayPal’s open-sourced Data Contract:

Paypal Opensource Data Contract Example

Who is responsible for data contracts?

 

Creating data contracts typically involves collaboration between all stakeholders within an organization, including data architects, data engineers, compliance experts, and business analysts.

Data Architects

 

Data architects play a key role in defining the technical aspects of the data contract, such as data structures, formats, and validation rules. They ensure that the data contract aligns with the organization’s data architecture principles and standards, facilitating interoperability and integration across different systems and applications.

Data Engineers

 

Data engineers are responsible for implementing the technical specifications outlined in the data contract. They develop data pipelines, integration processes, and data transformation routines to ensure that data is exchanged, processed, and stored according to the contract requirements. Their expertise in data modeling, database management, and data integration is essential for translating the data contract into actionable solutions.

Compliance Experts

 

Compliance experts also play a crucial role in creating data contracts by ensuring that the agreements comply with relevant laws, regulations, and contractual obligations. They review and draft contractual clauses related to data ownership, privacy, security, intellectual property rights, and liability, mitigating legal risks and ensuring that the interests of all parties involved are protected.

Business Analysts

 

Business analysts contribute by providing insights into the business requirements, use cases, and data dependencies that inform the design and implementation of the data contract. They help identify data sources, define data attributes, and articulate business rules and validation criteria that drive the development of the contract.

The importance of data contracts

 

At the core of data contracts lies the establishment of clear guidelines, terms, and expectations governing data sharing activities. By outlining the rights, responsibilities, and usage parameters associated with shared data, data contracts help foster transparency and mitigate potential conflicts or misunderstandings among parties involved in data exchanges.

Data Quality

 

One of the primary importance of data contracts is their role in ensuring data quality and integrity throughout the data lifecycle. By defining standards, formats, and validation protocols for data exchange, contracts promote adherence to consistent data structures and quality benchmarks. This, in turn, helps minimize data discrepancies, errors, and inconsistencies, thereby enhancing the reliability and trustworthiness of shared data assets for downstream analysis and decision-making processes.

Data Governance and Regulatory Compliance

 

Data contracts serve as indispensable tools for promoting data governance and regulatory compliance within organizations. In an increasingly regulated environment, where data privacy laws and industry standards govern the handling and protection of sensitive information, contracts provide a framework for implementing robust data protection measures and ensuring adherence to legal requirements. By incorporating provisions for data security, privacy, and compliance with relevant regulations, contracts help mitigate legal risks, protect sensitive data, and uphold the trust and confidence of data subjects and stakeholders.

Data Collaboration

 

Data contracts facilitate effective collaboration and partnership among diverse stakeholders involved in data sharing initiatives. By articulating the roles, responsibilities, and expectations of each party, contracts create a shared understanding and alignment of objectives, fostering a collaborative environment conducive to innovation and knowledge exchange.

In conclusion, data contracts extend beyond mere legal instruments; they serve as foundational pillars for promoting data-driven decision-making, fostering trust and accountability, and enabling efficient data exchanging ecosystems.

What is sensitive data discovery?

What is sensitive data discovery?

Protecting sensitive data stands as a paramount concern for data-centric enterprises. To navigate this landscape effectively, one must first embark on the meticulous task of accurately cataloging sensitive data – this is the essence of sensitive data discovery.

Data confidentiality is a core tenet, yet not all data is created equal. It is imperative to differentiate between sensitive data and information requiring heightened security and care. Sensitive data encompasses a broad spectrum, including personal and confidential details whose exposure could lead to significant harm to individuals or organizations. This encompasses various forms of information, such as medical records, social security numbers, financial data, biometric data, and details about personal attributes like sexual orientation, religious beliefs, and political opinions, among others.

The handling of sensitive data necessitates relentless adherence to rigorous security and privacy standards. As part of your organizational responsibilities, you are required to implement robust security measures to thwart data leaks, prevent unauthorized access, and shield against data breaches. This entails employing techniques such as encryption, two-factor authentication, access management, and other advanced cybersecurity practices.

Once this foundational principle is acknowledged, a pivotal question remains: Does your business engage in the collection and management of sensitive data? To ascertain this, you must undertake the identification and protection of sensitive data within your organization.

How do you define and distinguish between data discovery and sensitive data discovery?

 

Data discovery is the overarching process of identifying, collecting, and analyzing data to extract valuable insights and information. It involves exploring and comprehending data in its entirety, recognizing patterns, generating reports, and making informed decisions based on the findings. Data discovery is fundamental for enhancing business operations, improving efficiency, and facilitating data-driven decision-making. Its primary objective is to maximize the utility of available data for various organizational purposes.

On the other hand, sensitive data discovery is a more specialized subset of data discovery. It specifically centers on the identification, protection, and management of highly confidential or sensitive data. Sensitive data discovery involves pinpointing this specific type of data within an organization, categorizing it, establishing appropriate security protocols and policies, and safeguarding it against potential threats, such as data breaches and unauthorized access.

What is considered sensitive data?

 

Since the enforcement of the GDPR in 2018, even seemingly harmless data can be deemed sensitive. However, it’s important to understand that sensitive data has a specific definition. Here are some concrete examples.

Sensitive data, to begin with, includes Personally Identifiable Information, often referred to as PII. This category covers crucial data like names, social security numbers, addresses, and telephone numbers, which are essential for the identification of individuals, whether they are your customers or employees.

Moreover, banking data, such as credit card numbers and security codes, holds a high degree of sensitivity, given its attractiveness to cybercriminals. Customer data, encompassing purchase histories, preferences, and contact details, is invaluable to businesses but must be diligently safeguarded to protect the privacy of your customers.
Likewise, health data, consisting of medical records, diagnoses, and medical histories, stands as particularly sensitive due to its deeply personal nature and its vital role in the realm of healthcare.

However, the realm of sensitive data extends far beyond these examples. Legal documents, such as contracts, non-disclosure agreements, and legal correspondence, house critical legal information and thus must remain confidential to preserve the interests of the parties involved. Depending on the nature of your business, sensitive data can encompass a variety of critical information types, all necessitating robust security measures to ward off unauthorized access or potential breaches.

What are the different methodologies associated with the discovery of sensitive data?

 

The discovery of sensitive data entails several essential methodologies aimed at its accurate identification, protection, management, and adherence to regulatory requirements. These methodologies play a crucial role in securing sensitive information:

Identification and Classification

 

This methodology involves pinpointing sensitive data within the organization and categorizing it based on its level of confidentiality. It enables the organization to focus its efforts on data that requires heightened protection.

Data Profiling

 

Data profiling entails a detailed analysis of the characteristics and attributes of sensitive data. This process enhances understanding, helping to identify inconsistencies, potential errors, and risks associated with the data’s use.

Data Masking

 

Data masking, also known as data anonymization, is pivotal for safeguarding sensitive data. This technique involves substituting or masking data in a way that maintains its usability for legitimate purposes while preserving its confidentiality.

Regulatory Compliance

 

Complying with laws and regulations pertaining to the protection of sensitive data is a strategic imperative. Regulatory frameworks like the GDPR in Europe or HIPAA in the United States establish stringent standards that must be followed. Non-compliance can result in significant financial penalties and reputation damage.

Data Retention and Deletion

 

Effective management of data retention and deletion is essential to prevent excessive data storage. Obsolete information should be securely and legally disposed of in accordance with regulations to avoid data hoarding.

Specific Use Cases

 

Depending on the specific needs of particular activities or industries, additional approaches can be implemented. These may include data encryption, auditing of access and activities, security monitoring, and employee awareness programs focused on data protection.

 

Managing sensitive data is a substantial responsibility, demanding both rigor and an ongoing commitment to data governance. It necessitates a proactive approach to ensure data security and compliance with ever-evolving data protection standards and regulations.

How AI Strengthens Data Governance

How AI Strengthens Data Governance

According to a report published by McKinsey at the end of 2022, 50% of organizations will have already integrated the use of artificial intelligence to optimize service operations and create new products. The development of AI and machine learning in everyday business reflects the eminent role of data in management development strategies. To function effectively, AI depends on vast sets of data, which must be the subject of methodical and rigorous governance.

Behind the concept of data governance lies the set of processes, policies, and standards that govern the collection, storage, management, quality, and access to data within an organization. The role of data governance? To ensure that data is accurate, secure, accessible, and compliant with current regulations. The relationship between AI and data governance is a close one. AI models learn from data, and poor quality or biased data can lead to erroneous or discriminatory decisions.

Do you want to ensure that the data used by AI systems and their algorithms is reliable, ethical, and privacy-compliant? Then data governance is an essential prerequisite. By moving forward on a dual project of AI and data governance, you create a virtuous loop. Indeed, AI can also be used to improve data governance by automating tasks such as anomaly detection or data classification.

Let’s take a look at the (many!) benefits of AI-enhanced data governance!

What are the benefits of AI-powered data governance?

Improve the quality of your data

 

Data quality must be a key fundamental of any data strategy. The more reliable the data, the more relevant the lessons, choices, and orientations that emerge from it, and AI contributes to improving data quality through a number of mechanisms. In fact, AI algorithms can automate the detection and correction of errors in datasets, thereby reducing inconsistencies and inaccuracies.

Moreover, AI can help standardize data by structuring it in a coherent way, making it easier and more reliable to use, compare, and put into perspective. With machine learning, it is also possible to identify trends and patterns hidden in the data, enabling the discovery of errors or missing data.

Automate data compliance

 

At a time when cyber threats are literally exploding, data compliance must be a priority in your organization. But guaranteeing compliance requires constant vigilance, which can’t depend exclusively on human intelligence. Especially as AI can proactively monitor potential violations of data regulations by performing real-time analysis of all data flows – detecting any anomalies or unauthorized access, triggering automatic alerts, and even making recommendations to correct any problems. In addition, AI facilitates the classification and labeling of sensitive data, ensuring that it is handled appropriately. Finally, AI systems can also generate automatic compliance reports, reducing the administrative workload.

Strengthen data security

 

Through its ability to proactively detect threats by analyzing data access patterns in real time, AI can alert about suspicious behavior, such as attempted intrusions or unauthorized access. To take data governance even further, AI leverages machine-learning-based malware detection systems. These systems can identify known malware signatures and detect unknown variants by analyzing behavior. Finally, it contributes to security by automating the management of security patches and monitoring compliance with security policies.

Democratize data

 

At the heart of your data strategy lies one objective: to encourage your employees to use data whenever possible. In this way, you will foster the development of a data culture within your organization. The key to achieving this is to facilitate access to data by simplifying the search and analysis of complex data. AI search engines can quickly extract relevant information from large datasets, enabling employees to quickly find what they need. In addition, AI can automate the aggregation and presentation of data in the form of interactive dashboards, making information ever more accessible and easy to share!

What does the future hold for data governance?

 

Increasing amounts of data, increasing levels of analysis, increasing levels of predictability. This is where history is heading. In so doing, companies will adopt more holistic approaches to their challenges: gain in perspective, distance, and proximity to their markets. To meet this challenge, it is vital to integrate data governance into the overall business strategy. In this regard, automation will be essential, relying heavily on artificial intelligence and machine learning tools to proactively detect, classify, and secure data.

The future will be shaped by greater collaboration between the IT, legal, and business teams, which will be key to ensuring the success of data governance and maintaining the trust of all stakeholders.

Data Masking, the shield to protect your business

Data Masking, the shield to protect your business

The chameleon changes its color to defend itself. Similarly, walking sticks mimic the appearance of twigs to deceive predators. Data masking follows the same principle! Let’s explore a methodical approach that ensures the security and usability of your data.

According to IBM’s 2022 report on the cost of data breaches, the average expense incurred by a data breach amounts to $4.35 million. The report further highlights that 83% of surveyed companies experienced multiple data breaches, with only 17% stating it was their initial incident. As sensitive data holds immense value, it becomes a desirable target and requires effective protection. Among all compromised data types, personally identifiable information (PII) is the most expensive. To safeguard this information and maintain its confidentiality, data masking has emerged as an indispensable technique.

What is data masking?

 

The purpose of data masking is to ensure the confidentiality of sensitive information. In practice, data masking entails substituting genuine data with fictional or modified data, while retaining its visual representation and structure. This approach finds extensive application in test and development settings, as well as in situations where data is shared with external entities, in order to avert unauthorized exposure. By employing data masking, data security is assured while preserving its usefulness and integrity, thereby mitigating the likelihood of breaches compromising confidentiality.

What are the different types of data masking?

 

To guarantee the effective masking of your data, data masking can employ various techniques, each with its unique advantages, allowing you to select the most suitable approach for maximizing data protection.

Static Data Masking

 

Static Data Masking is a data masking technique that involves modifying sensitive data within a static version of a database. The process begins with an analysis phase, where data is extracted from the production environment to create the static copy. During the masking phase, real values are substituted with fictitious ones, information is partially deleted, or data is anonymized. These modifications are permanent, and the data cannot be restored to its original state.

Format Preserving Masking

 

Format Preserving Masking (FPM) differs from traditional masking methods as it preserves the length, character types, and structure of the original data. By utilizing cryptographic algorithms, sensitive data is transformed into an irreversible and unidentifiable form. The masked data retains its original characteristics, allowing it to be used in systems and processes that require a specific format.

Dynamic Data Masking

 

Dynamic Data Masking (DDM) applies varying masking techniques each time a new user attempts to access the data. When a collaborator accesses a database, DDM enforces defined masking rules to limit the visibility of sensitive data, ensuring that only authorized users can view the actual data. Masking can be implemented by dynamically modifying query results, substituting sensitive data with fictional values, or restricting access to specific columns.

On-the-fly Data Masking

 

On-the-Fly data masking, also known as real-time masking, differs from static masking by applying the masking process at the time of data access. This approach ensures enhanced confidentiality without the need to create additional data copies. However, real-time masking may result in processing overload, especially when dealing with large data volumes or complex operations, potentially causing delays or slowdowns in data access.

What are the different data masking techniques?

Random substitution

 

Random substitution involves replacing sensitive data, such as names, addresses, or social security numbers, with randomly generated data. Real names can be replaced with fictitious names, addresses can be replaced with generic addresses, and telephone numbers can be substituted with random numbers.

Shuffling

 

Shuffling is a technique where the order of sensitive data is randomly rearranged without significant modification. This means that sensitive values within a column or set of columns are shuffled randomly. Shuffling preserves the relationships between the original data while making it virtually impossible to associate specific values with a particular entity.

Encryption

 

Encryption involves making sensitive data unreadable using an encryption algorithm. The data is encrypted using a specific key, rendering it unintelligible without the corresponding decryption key.

Anonymization

 

Anonymization is the process of removing or modifying information that could lead to the direct or indirect identification of individuals. This may involve removing names, first names, addresses, or any other identifying information.

Averaging

 

The averaging technique replaces a sensitive value with an aggregated average value or an approximation thereof. For example, instead of masking an individual’s salary, averaging can use the average salary of all employees in the same job category. This provides an approximation of the true value without revealing specific information about an individual.

Date Switching

 

Date switching involves modifying date values by retaining the year, month, and day but mixing them up or replacing them with unrelated dates. This ensures that time-sensitive information cannot be used to identify or trace specific events or individuals while maintaining a consistent date structure.

Conclusion

 

The significant benefit of data masking for businesses is its ability to preserve the informational richness, integrity, and representativeness of data while minimizing the risk of compromising sensitive information. With data masking, companies can successfully address compliance challenges without sacrificing their data strategy.

Data masking empowers organizations to establish secure development and testing environments without compromising the confidentiality of sensitive data. By implementing data masking, developers and testers can work with realistic datasets while avoiding the exposure of confidential information. This enhances the efficiency of development and testing processes while mitigating the risks associated with the utilization of actual sensitive data.

The top 5 benefits of data lineage

The top 5 benefits of data lineage

Do you have the ambition to turn your organization into a data-driven enterprise? You cannot escape the need to accurately map all your data assets, monitor their quality and guarantee their reliability. Data lineage can help you accomplish this mission. Here are some explanations.

To know what data you use, what it means, where it comes from, and how reliable it is throughout its life cycle, you need a holistic view of everything that is likely to transform, modify or alter it. This is exactly the mission that data lineage fulfills, which is a data analysis technique that allows you to follow the path of data from its source to its final use. A technique that has many benefits!

Benefit #1: Improved data governance

 

Data governance is a key issue for your business and for ensuring that your data strategy can deliver its full potential. By following the path of data – from its collection to its exploitation – data lineage allows you to understand where it comes from and the transformations it has undergone over time to create a rich and contextualized data ecosystem. This 360° view of your data assets guarantees reliable and quality data governance.

Benefit #2: More reliable, accurate, and quality data

 

As mentioned above, one of the key strengths of data lineage is its ability to trace the origin of data. However, another great benefit is its ability to identify the errors that occur during its transformation and manipulation. Hence, you are able to take measures to not only correct these errors but also ensure that they do not reoccur, ultimately improving the quality of your data assets. A logic of continuous improvement that is particularly effective for the success of your data strategy.

Benefit #3: Quick impact analysis

 

Data lineage accurately identifies data flows, making sure you never stay in the wrong for too long. The first phase is based on the detailed knowledge of your business processes and your available data sources. When critical data flows are identified and mapped, it is possible to quickly analyze the potential impacts of a given transformation on data or a business process. With the impacts of each data transformation assessed in real-time, you have all the information you need to identify the ways and means to mitigate the consequences. Visibility, traceability, reactivity – data lineage saves you precious time!

Benefit #4: More context to the data

 

As you probably understood by now, data lineage continuously monitors the course of your data assets. Therefore, beyond the original source of the data, you have full visibility of the transformations that have been applied to the data throughout its journey. This visibility also extends to the use that is made of the data within your various processes or through the
applications deployed in your organization. This ultra-precise tracking of the history of interactions with data allows you to give more context to data in order to improve data quality, facilitate analysis and audits, and make more informed decisions based on accurate and complete information.

Benefit #5: Build (even more!) reliable compliance reports

 

The main expectations of successful regulatory compliance are transparency and traceability. This is the core value promise of data lineage. By using data lineage, you have all the cards in your hand to reduce compliance risks, improve data quality, facilitate audits and verifications, and reinforce stakeholders’ confidence in the compliance reports produced.

4 best practices for your ESG data strategy

4 best practices for your ESG data strategy

Environmental, Social, and Governance (ESG) is a central topic for CDOs, CFOs, and data managers. In this article, discover the best practices to deploy in your company to deliver effective ESG data reporting.

With massive fires, floods, and major heat waves, 2022 marked a turning point in the global climate crisis. It raises awareness and leads companies (and society as a whole) to act in a more responsible and sustainable way. More than a simple trend, the deployment of a relevant ESG strategy is a major challenge for companies.

ESG criteria are used to analyze and evaluate the consideration of sustainable development and long-term issues in corporate strategy. This directly affects the way you manage, administer and operate your data assets. For a long time, ESG governance was based on simple communication, but it is now based on evidence. Evidence that is fed by ESG data.

Investors, partners, customers, and the public at large are now demanding real transparency not only on how organizations are protecting and effectively using data to create value but also on how they are achieving long-term sustainability by focusing on corporate social responsibility and environmental impact as applied to data management.

What is the role of ESG data in companies?

Because companies must demonstrate their commitments to sustainability through facts, ESG data plays a key role. This data is analyzed by independent financial rating agencies that ensure the veracity of companies’ claims. The information declared is cross-checked with other sources from non-governmental organizations, associations, or institutions. ESG data then results in an accurate assessment of a company’s ESG practices within a given industry.

What are the best practices for successful ESG data reporting?

The preparation of efficient and relevant ESG data reporting relies on a precise and demanding methodology. The challenge is to quickly collect the information required for ESG reporting and ensure optimal traceability and rigorous security. To meet the challenge, we must be able to apply a number of good practices.

Centralize data in one place

The foundation for transparent ESG data reporting is the ability to centralize all data in a single location for collection and processing. This centralization is an essential prerequisite for data governance that represents the spirit that drives your company.

Guarantee data traceability (data lineage)

Because the heart of sincere ESG data reporting is data traceability, you must implement a data lineage tool. The latter ensures real-time tracking of data and acts as an aid within your company to ensure that your data emanates from a reliable and controlled source; that the transformations it may have undergone are known, tracked, and legitimate; and that it is available in the right place, at the right time and for the right user.

Implement a data governance policy

Quality, reliability, traceability. These are the three pillars that guarantee the veracity of your ESG data and demonstrate your commitment to sustainable development. These three pillars are united around a key issue: a true data governance policy. Data governance is the overall management of the availability, usability, integrity, and security of data used in your business.

Democratize data access for all (data literacy)

One of the major challenges in guaranteeing the reliability, security, and transparency of ESG data is to ensure that all stakeholders within the company rely on a strong data culture. This data culture allows each employee to position themselves as an essential link in the data quality chain by claiming the ability to identify, process, analyze and interpret data. Also known as Data Literacy, this data culture allows the development of a critical mindset that gives the company’s data its full value.

What is the BCBS 239?

What is the BCBS 239?

In order for banks to have complete visibility on their risk exposure, the Basel Committee defined 14 key principles that were made into a standard called BCBS 239.

It’s objective? Give banks access to reliable and consolidated data. Let’s get into it.

In 2007, the world economy was teetering on the brink of collapse. A number of supposedly stable banking institutions were on the edge of bankruptcy following the failure of the American bank Lehman Brothers. In response to a crisis of unprecedented violence, a wind of regulation blew over the world, giving birth to the BCBS 239, also known as the Basel Committee on Banking Supervision’s standard number 239. 

Published in 2013, the Basel Committee’s standard number 239, was intended to create conditions for transparency in banking institutions by defining a clear framework for the aggregation of financial risk data. In practice, its objective is to enable financial and banking institutions to produce precise reports on the risks to which they are exposed. BCBS 239 is a binding framework, but contributes to the stability of the global financial system, which was severely tested during the 2007 financial crisis.

 

BCBS 239: a little history

The Basel Committee was created in 1974 at the instigation of the G10 central bank governors. As of 2009, the organization has 27 member countries and is dedicated to strengthening the safety and soundness of the financial system and establishing standards for prudential supervision. 

BCBS 239 is one of the Basel Committee’s most emblematic standards because it is a barrier to the abuses that led to the 2007 crisis. 

Indeed, the growth and diversification of the activities of banking institutions, as well as the multiplication of subsidiaries within the same group, created a certain opacity that generated inaccuracies in the banks’ reporting. 

Inaccuracies that could, once accumulated, represent billions of dollars of vagueness, hindering quick and reliable decision-making by managers. The critical size reached by financial institutions required to guarantee reliable decision making based on consolidated and quality data. This is the very purpose of BCBS 239.

 

The 14 founding principles of BCBS 239

Although BCBS 239 was published in 2013, the thirty or so G-SIBs (globally systemically important institutions) that had to comply had until January 1, 2016 to do so. The national systemically important banking institutions (also called D-SIBs) had three more years to comply.

Since January 1, 2019, G-SIBs and D-SIBs must therefore comply with the 14 principles set out in BCBS 239. 

Eleven of them concern banking institutions in the first place. The other three are addressed to supervisory authorities. The 14 principles of BCBS 239 can be classified into four categories: governance and infrastructure, risk data aggregation capabilities, reporting capabilities and prudential supervision. 

 

Governance and infrastructure

In the area of governance and infrastructure, there are two principles. The first is the deployment of a data quality governance system to improve financial communication and the production of more accurate and relevant reports in order to speed up and make decision-making processes more reliable. 

 

Risk data aggregation capabilities

The second principle affects the IT infrastructure and requires banks to put in place a data architecture that enables the automation and reliability of the data aggregation chain.

The section on risk data integration capabilities brings together four key principles: data accuracy and integrity, completeness, timeliness and adaptability.

Four pillars that enable decisions to be based on tangible, reliable and up-to-date information. 

 

Reporting capabilities

The third component of BCBS 239 concerns the improvement of risk reporting practices.

This is an important part of the standard, which brings together five principles: the accuracy and precision of information, the completeness of information relating to the risks incurred in order to guarantee real and sincere visibility of the institution’s exposure to risks, but also the clarity and usefulness of reporting, the frequency of updating and the sincerity of distribution.

These reports must be transmitted to the persons concerned. 

Supervision

The last three principles apply to the control and supervisory authorities. They set out the conditions for monitoring banks’ compliance with the first 11 principles. They also provide for the implementation of corrective actions and prudential measures and set the framework for cooperation with supervisory authorities. 

Thanks to BCBS 239, data becomes one of the levers of stability in a globalized economy!

Data mapping, the key to regulatory compliance

Data mapping, the key to regulatory compliance

Regardless of the business sector, data management is a key strategic asset for companies. This information is key to innovate on tomorrow’s products and services. In addition, with the rise of new technologies such as Big Data, IoT or artificial intelligence, organizations are collecting exponential volumes of data, from different sources and in a variety of formats.

In addition, with increasingly strict data regulations such as the GDPR, data processing now requires the implementation of appropriate security measures to protect against information leaks and abusive processing. 

The challenge lies in re-appropriating its data assets. In other words, companies are looking for solutions to maintain data mapping that reflects their operational reality. 

 

What is data mapping?

Let’s go back to the basics: data mapping allows users to evaluate and graphically visualize data entry points as well as their processes. There are several types of information to be mapped, such as:

  • The information on data
  • The data processes themselves

Information on data

The idea of data mapping is to work on data semantics (the analysis of word meanings and relations between them). 

This work is not done on data itself, but rather on metadata. Metadata gives meaning and context to data, which in term enables a better understanding of it. It can represent the data’s “business” name, its technical name, its location, when it was stored, by whom, etc… 

By setting up semantic rules and a common data language through a business glossary, companies can identify and locate their data, and thus facilitate access to data for all employees.

On data processes

Concerning data processing, it is important to identify :

  • data flows: with their sources and destinations,
  • data transformations: all the transformations applied to the data during its processing.

A powerful tool : Data Lineage

Data lineage is defined as the data’s life cycle and shows all of the transformations that took place between its initial state and its final state. 

Data lineage is strongly linked to data mapping and processing; it is essential to see which data are concerned by these processes and be able to analyze the impacts very quickly. For example, if a process anomaly has caused a corruption, it is possible to know which data is potentially affected.

In another case, the mapping from a data point of view must be able to tell which data sets the data comes from.  Thus, one can quickly analyze the impacts of a change in the source data set by quickly finding the related data. 

 

The benefits of implementing data mapping

With a mapping solution, companies can therefore respond to data regulations, in particular the GDPR, by answering these questions:

  • Who? Who is responsible for the data or a processing operation? Data protection? Who are the possible subcontractors?
  • What?  What is the nature of the data collected? Is it sensitive data?
  • Why? Can we justify the purpose of collecting and processing the information?
  • Where? Where is the data stored? In what database? 
  • Until when? What is the retention period for each category of data?
  • How? How is it stored? What is the framework and what security measures are in place for the secure collection and storage of personal data?

By answering these questions, IT Managers, Data Lab Managers, Business Analysts and Data Scientists are able to make their work on data relevant and efficient.

These highlighted questions allow companies to comply with regulations but also to :

  • Improve data quality: Providing as much information as possible to allow users to know if the data is suitable for use.
  • Make employees more efficient and autonomous in understanding data through graphical and ergonomic data mapping. 
  • Analyze data in depth, so that better decisions can be made based on the data and ultimately become a data-driven organization.

Conclusion

It is by having properly mapped information that an enterprise will be able to leverage its data. Quality data analysis is only possible with data that is properly documented, tracked, and accessible to all. 

Are you looking for a data mapping tool?

 

You can learn more about our data catalog solution by visiting the links below:

Zeenea Data Catalog

Zeenea Studio – the solution for data managers

Zeenea Explorer – making your data teams’ daily life easier

 or directly schedule an appointment for a demo of our solution

 

The DPO in 2019: the results are in!

The DPO in 2019: the results are in!

Since May 2018, the General Data Protection Regulations (GDPR) requires companies to assign a “DPO”, or Data Protection Officer within their organization. This new job consists of managing personal data and informing employees of obligations to be respected in regards to the European regulations.

More than a year after the implementation of these regulations, we at Zeenea organized a workshop with DPOs from different business sectors with one idea in mind: How to help them in their GDPR implementation? We would like to share their feedback with you today.

Current Assessment

To better understand Data Protection Officers and their function, let’s assess their current situation.

The tools

Our audience affirms that the applications used are only a means for implementing governance on data.

Enterprises have nevertheless adopted new tools to help DPOs put GDPR in place. These software are considered to be unintuitive and complicated to use. However, some manage to stand out:

Among the DPO’s tools, one of the most appreciated ones is the catalog application, mainly for its macro vision of the exchanges between different apps, and the easy and rapid detection of personal information.

At the same time, data catalogs, one of the most recent tools in the market, are starting to reach the DPO community. Investing in these tools is a strategic choice that some participants have already made. The possibility of informing and historicizing information on data by collecting catalogued company data, has indeed convinced them!

Governance

DPOs are well aware that the efforts must be placed on acculturation and raising employee awareness in order to hope for better results.

The search for governance only aims to help the business side understand and assess the risks on the data they handle. Their energy is thus placed on the implementation of effective management and communication of shared rules so that the company acquires the right reflexes. Because yes, data remains a subject that few employees master in business.

Information systems

The heterogeneity of information systems is a “normal” environment with which DPOs are confronted.

They are thus faced with trying by all means to bring IS into conformity, which very often prove to be complex and costly to update technically.

Internationally

We associate GDPR Data Regulation with DPOs, often forgetting the “the rest of the world”.

Many countries also have their own regulations such as Switzerland and the United States. DPOs are no exception and neither are their companies!

One thing is certain, the scope of the work is gigantic and requires a strong prioritization of subjects. But beyond the priorities linked to urgency, this requires finding the right cursor between meeting compliance standards and meeting business requirements!

The challenges of DPOs for 2020

In light of this previous observation, the workshop concluded with 2020 and its new challenges.

Together with them, we drew up a list of “resolutions” for the new year:

    • Invest more in improving the qualification and requirements for data documentation,
    • Integrate more examples on good practices in the employee awareness phase,
    • Provide precise indicators on the use and purpose of the data in order to predict the risks and impacts as soon as possible,
    • Become a stakeholder in the implementation of data governance to guarantee effective data acculturation in the enterprise.

GDRP: What is trending in 2019?

GDRP: What is trending in 2019?

The Big Data market has greatly evolved since the General Data Protection Regulation (GDPR) came into effect on May 25, 2018, from which new partnerships were formed, new technologies were developed, and start-ups started taking off. 

Nevertheless, this is just the beginning! In 2019 so far, enterprises continue to adapt their data management. This article will delve into GDPR trends and predictions of 2019

The 2018 GDPR report

The regulation was certainly the subject of the year! GDPR profoundly changed the way enterprises treated data, including enterprises of the CAC 40 (the French stock market index) as well as SMEs. Overall, we can see that the enterprises began making several changes to adjust to GDPR:

Data breach violations

It is now required to declare any personal data violations that may cause risk to individuals within 72 hours to the CNIL. Upon receipt, the CNIL will investigate the alert and may close your file or require you to inform the individuals concerned in accordance with certain criteria.

Visit https://www.cnil.fr/en/rights-and-obligations for more information.

The implementation of Data Protection Governance

Few enterprises had true governance around data protection before GDPR; they were entrusted to the legal department or company data protection agents. But ever since the regulation, as per the latest IAPP-EY annual report, over 50% of enterprises have set up an organization dedicated to data protection. According to the CNIL, they have more than 15,000 Data Protection Officers (DPOs) compared to 5,000 company data protection agents before GDPR.

Updating privacy policies

The majority of enterprises also had to proceed with revising their privacy policies and legal notices. However, they also had to update their supplier or partner contracts with new data protection clauses. The tidal wave of mail in our mailboxes all around May 25th was certainly proof of GDPR’s importance!

 

Raising awareness of the regulations within the enterprise

Finally, maybe you’ve noticed that within your enterprise, raising awareness on data protection between collaborators has become important, whether they are e-learning modules, training courses, or various internal communications. 

 

2019 GDPR predictions and trends

To protect themselves from huge fines (up to 4% of revenue or €20 million) enterprises are going to have to continue adapting to GDPR. The data authorities, like the CNIL, were very lenient in 2018 and thus are more strict in 2019. It is also imperative that enterprises acclimatize to regulations, both in Europe and The United States. 

GDPR itself is a 2019 trend; it will soon be considered a global standard. For example, U.S. Senator Ron Wyden of Oregon recently introduced the Consumer Data Privacy Act. Countries like Japan, South Korea, and Tunisia have also adopted regulations similar to GDPR. Mick Levy, Business Innovation Director at Business & Decision said, “Data is an enterprise’s asset, like its human capital or its means of production. We must give ourselves the means to exploit and protect it.” (orange-business.com)

How to implement data governance while adapting to GDPR?

As mentioned above, good data governance is nowadays obligatory to properly organize, search, retrieve, and protect data. Zeenea offers you a data catalog that is capable of centralizing your enterprise’s data knowledge in one intuitive platform to help you become a data-driven enterprise and to construct data governance in an agile and lean start-up mode. 

For more information or to request a demo: https://zeenea.com/fr/contact/

    GDPR: An additional burden for the data industry?

    GDPR: An additional burden for the data industry?

    The concern of companies on the challenges of implementing the GDPR is very real. Will we know if we are still capable of doing business starting in May 2018? What will be the technical and, above all, the financial impacts of this compliance?

    The GDPR, a gray area for enterprises

    Let’s face it, there is still the “Y2K-like bug” effect with the arrival of the GDPR…Many enterprises perceive GDPR as an additional burden in the data industry, which is already far from easy. They find themselves in the grey area trying to implement this regulation and to avoid the heavy penalties for non-compliant companies.

    Yes, but…

    The GDPR must be seen as an opportunity to reach a certain maturity in terms of governance and data control. Above all, it means establishing a contract of trust between data subjects and data controllers. Without a doubt, this contract of trust will benefit everyone!

    For instance, individuals are rather reluctant to give their personal information to companies. However, numerous studies show that in the context of a new deal where personal data are delivered for a specific purpose and can be restored or deleted at any time, users are willing to share their personal data. It is, therefore, an opportunity to offer value-added services to customers – a give and take.

    For organizations, the GDPR will bring greater confidence as well as an excellent reputation for processing data, which will result in more commitment.

    Rethink your data management

    The GDPR is also an opportunity to check up on data in enterprises:

    • Clean up the wrong data.
    • Avoid (costly) over-acquisition of data.
    • Establish or improve data governance.
    • Implement best practices around Big Data and Data Science initiatives.

    Thus, this new control and governance of data will result in taking the best insights from these data so that from these, you can make creations of the highest value.

    Holding companies accountable

    What does the legal jargon of GDPR mean in the end, which we have detailed in our series of articles “GDPR – the legal bases”? It is certainly a question of making companies responsible for the data used. Hence, this regulation requires you to ask the right questions:

    • What personal data do I have? Where are they?

    • What are the possible uses for my data?

    • What can I do with my data?

    • Why do I collect them?

    • What result am I trying to achieve?

    Implement technological initiatives

    Thus, the arrival of the GDPR  impacts mainly our legal and organizational areas of business. This regulation will also be the time to implement technological initiatives in our Big Data ecosystems, which will not only help enterprises comply with the regulation but will also have intrinsic value.  In our opinion, the first thing that needs to be done is to map the personal data used and stored within your enterprise. Data Catalog tools can be the beginning of such a response.

    GDPR: Main Content of the European Regulation

    GDPR: Main Content of the European Regulation

    This article is an introduction to the General Data Protection Regulation (GDPR) in the framework of your Big Data projects.

    Be careful though! This isn’t going to be about giving legal advice, but rather,  a refresher course on the changes that GDPR will make.

    The terms of the GDPR to define

    Personal data

    All information relating to a human being (or a data subject) that can be used to identify that person directly or indirectly. With the arrival of the GDPR, this definition was broadened to include online data. I.e., name, photos, email addresses, bank details, social networking publications, websites, medical information, IP addresses, location data, etc.

    Sensitive data

    It is personal data that directly or indirectly reveal political opinions, philosophical or religious or trade union memberships of persons, or that which is related to their health or their sexual orientation. They may only be processed with the explicit consent of individuals.

    Data processing

    This broad term refers to any operations carried out on personal data, via automated or non-automated means.  Some examples of processing include collection, recording, organization, storage, use and destruction of personal data.

    Data controller

    A data controller is a person who determines– alone or jointly with others – the purposes and the means of data processing (the collecting and processing methods).

    The principles emerging from the GDPR

    Whom does it concern?

    • All companies located in the European Union and processing personal data, regardless of its size.
    • All companies not located in the E.U. concerning the process of personal data relating to persons located in the European Union.

    The obligation to appoint a DPO

    The GDPR created a position of Data Protection Officer (DPO). Their responsibilities include:

    • Monitor the company’s compliance with regulations
    • Be the point of contact with the Supervisory Authorities as well as those who have questions on personal data processing
    • Advise and inform the company, its employees, and any possible processors.

    The responsibility

    Companies must ensure that they comply with GDPR’s obligations and be able to demonstrate compliance with its principles.

    Valid consent

    The controller must be able to demonstrate that the data subject has given his or her consent.

    Notification of violations

    In the case of a violation, the company is obligated to inform its Supervisory Authority within 72 hours after its discovery.

    Privacy protection from the design stage

    The controller must implement any data protection measures (pseudonymization, minimization, etc.) from the design stage; i.e., identify the means of processing.

    The opposition to profiling

    Any person may object to the automatic processing of their personal data in order to evaluate certain personal aspects relating to a physical person (analysis, prediction, etc).

    Data portability

    Any person concerned by the processing of their data can obtain from the controller a copy of their processed personal data and, where applicable,  the transfer of these data to a third party.

    Sanctions

    Violation of basic principles including the conditions of consent or the rights of the persons concerned will be subject to a sanction of up to 20 million or 4% of annual worldwide turnover.

    GDPR : 7 principles to follow when treating personal data

    GDPR : 7 principles to follow when treating personal data

    In December 2018, Matthieu Blanc – Ex – VP Product for Zeenea – asked himself: “How will the GDPR change the Big Data world?” In this series of articles, we focus on the legal aspects explained during his conference at XebiCon’17.

     

    Personal data treatment must obey these 7 principles:

     

    1) The principal of lawfulness, loyalty and transparency

    The law imposes that data must be collected and treated in a loyal and lawful manner, implicitly dictating to the person in charge of the treatment that they must be transparent to the person concerned.

    Let’s go a bit deeper:

     

    • The law guarantees to the people that submit their data the necessary information relative to the treatments concerning them.
    • It assures the possibility of personal control.
    • The person in charge of personal data is obligated to warn the people concerned as soon as the data is collected and when the data is transmitted to third parties.

    2) The principal of purpose

    All personal data that is collected and treated must be for legitimate purposes, corresponding to the person in charge or the enterprise’s missions. The misuse of these data is punishable by criminal law. 

    3) The principal of proportion

    The regulation demands that data must be collected for specific treatment and must be clearly defined.

    For example : in the case of a marketing operation where the last name, first name and email address are sufficient for the intended treatment, the collection of street address, family situation, financial situation, etc., will be judged out of proportion thus, punishable by law.

     

    4) The principle of relevant data

    In other words, enterprises must ensure that the data is exact and up to date if necessary.

     

    5) The principle of limited access and conservation of data

    This information cannot be kept for an unlimited period of time in the enterprise’s information systems. A time limit must be established for each file. When that limit has passed, the data must be deleted or kept anonymous.

     

    6) The principal of security and confidentiality

    The regulation reinforces security measures. Enterprises are responsable for the data they treat’s security and must implement the adequate measures to guarantee it (pseudonymization of data, impact analysis, intrusion tests, etc.).

    This means that the person responsable for their treatment is constrained to these security measures. They must implement these to: 

    • Guarantee their data’s confidentiality and avoid their disclosure. In other words, the person in charge must assure that any third parties that do not have authorization to their data can’t have access.
    • Prevent data from being distorted or damaged.
    • Etc.

    This responsibility is put forward by a new principle “Privacy By Design”. This principle refers to the process of taking all the necessary steps to protect the rights of the people (ie from the design of a product or service) and throughout the data’s lifecycle (from collection to deletion).

    Security measures, both physical and logical, must be taken.

    For example: fire protection, backup copies, the installation of anti-virus softwares, frequent change of passwords, etc.). Security measures should be appropriate to the nature of the data and the risks presented by the treatment.

     

    7) The principal of responsibility

    One of the most major changes is the principle of responsibility. This principle obligates enterprises to document all measures and procedures in terms of personal data security.

    This documentation serves as proof of conformity to the new rules and regulations if ever an enterprise were to have administrative checks. This measure results in the obligation to maintain a register of treatments. Indeed, this register makes it possible to constitute a database for the treatments, but could also serve to centralize and to follow all the steps of conformity implemented by the company.
    The abolition of the obligation of  declaration prior to the CNIL. This measure reflects the principle that governs the GDPR: empowering businesses by developing self-control.

    It is no longer up to regulators to prove that you are in the wrong, but it is up to you to prove that you are in the right!

    GDPR: What are enterprises main concerns?

    GDPR: What are enterprises main concerns?

    In December 2018, we posed the question: Will the GDPR change the Big Data world? In this article series, we return to the legal aspects explained during la conférence de la XebiCon’17.

    The application scope of the GDPR

    If the GDPR applies to “Data Controllers,” i.e. the bodies that determine the purposes and methods of processing personal data, it also extends to the “Data Processor” as well.

    The rules and obligations of the GDPR apply to the processing – automated or not – of personal data.

     

    Definitions of GDPR terms

    First of all, let us agree on the definition of these terms:

     

    Personal data

    The GDPR provides a precise definition of personal data. It is:

    “Any information relating to an identified or identifiable natural person.”

    An identifiable natural person is understood to be “a natural person who can be identified, directly or indirectly, in particular by reference to an identifier, such as a name, a postal address, an e-mail, or several elements specific to his physical, physiological, genetic, psychological, economic, cultural or social identity”.

    A definition that has therefore been broadened to include certain online data such as location data, online identifiers, identification numbers (device identifiers, cookies, IP addresses, etc.).

     

    Data processing

    This broad term refers to any operation carried out on personal data, whether or not by automated means. Examples of processing include the collection, recording, organization, storage, use, and destruction of personal data, which means that the vast majority of European companies are affected by the GDPR’s systems.

    Ultimately, the vast majority of European companies are affected by the GDPR’s measures.