Datalake Big Data Warehouse Data Lake Platform Analytics Technology

Everything you need to know about Data Products

January 18, 2024

18 January 2024

In recent years, the data management and analytics landscape has witnessed a paradigm shift with the emergence of the Data Mesh framework. Coined by Zhamak Dehghani in 2019, Data Mesh is a framework that emphasizes a decentralized and domain-oriented approach to managing data. One notable discipline in the Data Mesh architecture is to treat data as a product, introducing the concept of “data products”. However, the term “data product” is often tossed around without a clear understanding of its essence. In this article, we will shed light on everything you need to know about data products and data product thinking.

Shifting to Product Thinking

For organizations to treat data as products and transform their datasets as data products, it is essential for teams to first shift to a product-thinking mindset. According to J. Majchrzak et al. in Data Mesh in Action,

Product thinking serves as a problem-solving methodology, prioritizing the comprehensive understanding of user needs and the core problem at hand before delving into the product creation process. The primary objective is to narrow the gap between user requirements and the proposed solution.

In their book, they highlight two main principles:

Love the problem, not the solution: Before embarking on the design phase of a product, it is imperative to gain an understanding of the users and the specific problem being addressed.
Think in products, not features: While there is a natural inclination to concentrate on adding new features and customizing assets, it is crucial to view data as a product that directly satisfies user needs.

Therefore, before unveiling a dataset, adhering to product thinking involves posing essential questions:

What is the problem that you want to solve?
Who will use your data product?
Why are you doing this? What is the vision behind it?
What is your strategy? How will you do it?

Here are some examples of answers to these questions from an excerpt of Data Mesh in Action:

What is the problem that you want to solve? Currently, the production cost statement data is used for direct billing between the production team and finance team. The data file also has costs assigned to categories. This information could be used for more complex analysis and cost comparisons across categories of different productions. Therefore, making this data more widely available for complex analysis makes sense.

Who will use your product? The data analyst will use it to manually analyze and compile production costs and forecast budgets for new productions. The data engineer will use it to import data into the analytical solution.

Why are you doing this? What is the vision behind it? We will create a dedicated and customized solution to analyze the data for production costs and planning activities. Data engineers can use the original files to import historical data.

Read the full excerpt here: https://livebook.manning.com/book/data-mesh-in-action/chapter-5/37

Data Product Definition

The philosophy of product thinking, therefore, urges us to view a data product through a long-term, entailing ongoing development, an adaptation based on user feedback, and a commitment to continuous improvement and quality. And to define a product: an object, system, or service made available for consumer use as of the consumer demand. So what makes a data product a data product?

At Zeenea, we define a Data Product as a set of value-driven data assets specifically designed and managed to be consumed quickly and securely while ensuring the highest level of quality, availability, and compliance with regulations and internal policies.

According to Data Mesh in Action, the deliberate use of the term “product” in the context of a data mesh is intentional and stands in contrast to the commonly used term “project” in organizational initiatives. It is important to underscore that the creation of a data product is not synonymous with a project. As mentioned in Products Over Projects by Sriram Narayan, projects are temporal endeavors aimed at achieving specific goals, with a defined endpoint that may not necessarily lead to continuity.

Fundamental Characteristics of a Data Product

In How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh, Zhamak Dehghani says a data product must exhibit the following essential characteristics:

Discoverable:

Ensuring the easy discoverability of a data product is imperative. A widely adopted approach involves implementing a registry or data catalog containing comprehensive meta-information such as owners, source of origin, lineage, and sample datasets for all available data products.
This centralized discoverability enables data consumers, engineers, and scientists within an organization to locate datasets of interest effortlessly.

Addressable:

Once discovered, a data product should possess a unique address following a global convention for programmable access. Organizations, influenced by the storage and format of their data, may adopt diverse naming conventions. In pursuit of user-friendly accessibility, common conventions become imperative in a decentralized architecture.

Trustworthy and Truthful:

Data product owners must commit to Service Level Objectives regarding the truthfulness of data, requiring a shift from traditional error-prone extractions. Employing techniques such as data cleansing and automated integrity testing during the data product’s creation is crucial to ensure an acceptable level of quality.

Self-Describing Semantics and Syntax:

High-quality data products demand a user experience without the need for handholding—they should be independently discoverable, understandable, and consumable. To construct datasets as products with minimal friction for data engineers and scientists, it is essential to articulate the semantics and syntax of the data thoroughly.

Inter-Operable and Governed by Global Standards:

Correlating data across domains in a distributed architecture relies on adherence to global standards and harmonization rules. Governance on standardizations, including field formatting, polysemes identification, address conventions, metadata fields, and event formats, ensures interoperability and meaningful correlation.

Secure and governed by a global access control

Securing access to product datasets is imperative, whether the architecture is centralized or decentralized. In the realm of decentralized, domain-oriented data products, access control operates at a more nuanced level—specifically tailored for each domain data product. Just as operational domains centrally define access control policies, these policies are applied dynamically when accessing individual dataset products. Leveraging an Enterprise Identity Management system, often facilitated through Single Sign-On (SSO), and employing Role-Based Access Control (RBAC) policies, provides a convenient and effective approach to implement access control for product datasets.

Examples of Data Products

A potential data product can take various forms, with different data representations that offer value to users. Here are several examples of technologies containing data products:

Recommendation Engines: Platforms like Netflix, Amazon, and Spotify use recommendation engines as data products to suggest content or products based on user behavior and preferences.

Predictive Analytics Models: Models predicting customer churn, sales forecasts, or equipment failures are examples of data products that provide valuable insights for decision-making.

Fraud Detection Systems: Financial institutions deploy data products to detect and prevent fraudulent activities by analyzing transaction patterns and identifying anomalies.

Personalized Marketing Campaigns: Targeted advertising and personalized marketing campaigns utilize data products to tailor content based on user demographics, behavior, and historical interactions.

Healthcare Diagnostics Tools: Diagnostic tools that analyze medical data, such as patient records and test results, to assist healthcare professionals in making accurate diagnoses.

← Previous Next →

← Vorherige Nächste →

← Précédent Suivant →

zeenea logo

At Zeenea, we work hard to create a data fluent world by providing our customers with the tools and services that allow enterprises to be data driven.

zeenea logo

Chez Zeenea, notre objectif est de créer un monde “data fluent” en proposant à nos clients une plateforme et des services permettant aux entreprises de devenir data-driven.

zeenea logo

Das Ziel von Zeenea ist es, unsere Kunden "data-fluent" zu machen, indem wir ihnen eine Plattform und Dienstleistungen bieten, die ihnen datengetriebenes Arbeiten ermöglichen.

TECHNOLOGY

SOLUTIONS

CAPABILITIES

APPLICATIONS

INDUSTRIES

DATA LEADERS

KNOWLEDGE HUB

PRODUCT HUB

ABOUT

GET IN TOUCH

SERVICES

BELIEFS

Everything you need to know about Data Products

Shifting to Product Thinking

Data Product Definition

Fundamental Characteristics of a Data Product

Discoverable:

Addressable:

Trustworthy and Truthful:

Self-Describing Semantics and Syntax:

Inter-Operable and Governed by Global Standards:

Secure and governed by a global access control

Examples of Data Products

Related posts

Articles similaires

Ähnliche Artikel

[SERIES] Data Shopping Part 2 – The Zeenea Data Shopping Experience

[SERIES] Data Shopping Part 1 – How to Shop for Data Products

[SERIES] Building a Marketplace for Data Mesh Part 3: Feeding the Marketplace via domain-specific data catalogs

[SERIES] Building a Marketplace for Data Mesh Part 2: Setting up an enterprise-level marketplace

[SERIES] Building a Marketplace for data mesh Part 1: Facilitating data product consumption through metadata

Be(come) data fluent

Devenez Data Fluent

Werden Sie Data Fluent

Product

Capabilities

Use Cases

Resources

Company

Produkt

Funktionalitäten

Use Cases

Ressourcen

Company

Produit

Capacités

Cas d'usage

Ressources

Société