GEMA
  • Music Industry
GEMA
Germany

About our customer

 
GEMA is a German association that represents the interests of over 95,000 members, including composers, lyricists, and publishers. With a mission to ensure that music creators earn what they deserve, GEMA distributes license revenues from public music usage—concerts as well as any form of online usage, including streams, downloads, radio broadcasts, or the use of music in audiovisual production—fairly among its members. As the music business increasingly becomes a data business, GEMA faces the challenge of managing and processing vast amounts of data to ensure fair distribution of revenues.
 

Key facts & figures:

 

Established: 1947

Members: 95,000+ (2024)

Payments to rights holders worldwide: 1.2.082 bn euros (2024)

 

Challenge

 
A key challenge for GEMA is matching reported public music usage to their musical works database. In this process, they are confronted with large volumes of data. Every reported use of music must be processed and prepared for fair distribution purposes.

In 2021, GEMA embarked on a data journey to address significant challenges such as data silos, growing data volumes, and increasing complexity. Martin Zürn, Head of Data Engineering at GEMA, recalls, “We had data in silos, it was hard to combine them, and we couldn’t scale a central data team to handle all of it.” GEMA needed a solution that could decentralize data management and make data accessible across the organization.
 

 

A Meshy Journey Towards Data Decentralization

 
To tackle these challenges, GEMA embarked on a journey towards data decentralization built on three pillars:
 

1 – Building the Data Lake

GEMA’s first step was to build a data platform based on a data lake, integrating the twenty most relevant systems with over a hundred users within the first year. This initiative empowered business units to work independently with data, marking a significant shift from dependency on a central data team.
 

2 – Implementing a Lean Governance Model

To scale the platform further and allow the data to be used in reporting as well as enabling other IT systems to consume data, GEMA introduced a lean governance model. This model incorporates the best ideas from data products, data mesh, data fabric, lakehouse architecture, and data marts. Markus Zachai, Head of Data Governance at GEMA, emphasizes, “We needed a governance model to ensure data validity and correctness. A central team would never scale, so we adopted a decentralized approach.” GEMA now has decentralized roles across the organization, where the different stakeholders either manage data pipelines, data products and their contents, or specific data domains.
 

3 – Introducing Zeenea Data Discovery Platform’s Metadata Catalog

A crucial component of GEMA’s data strategy was the integration of Zeenea’s metadata catalog. The catalog facilitated the independent creation of data contracts and enabled efficient discovery of all data within the organization’s data platform. Martin highlights, “Zeenea stood out for us with its clean, easy-to-use interface. It made it simple for users to understand what data products we have, their sources, and who to contact for more information.”

As GEMA aimed to decentralize data responsibility, Zeenea was not only a tool that helped but also one of the foundations that made it possible.

GEMA’s Data Productization

 

Data Product Definition

GEMA’s data ecosystem revolves around data products, which are typically developed around business objects and consist of one or more tables. Data products are categorized into layers representing different stages of data processing:

Bronze: A raw material, such as a raw copy of a data source.

Silver: An intermediate good, such as a denormalized and cleansed dataset.

Gold: A consumer good, such as a business-level aggregate.

 

Data Product Roles

Each data product is managed by:

A Data Owner: Responsible for the data.

A Data Steward: Possesses domain knowledge of the specific data product.

A Data Custodian: An engineer who implements the actual data pipeline.

 

Data Products Milestones

Within a year of launching the platform, GEMA had over 35 data sources and more than 100 data products in production—averaging one new data product every other working day. “Some of them are of low complexity but we also see a lot of data products with high complexity, with more than a thousand lines of code,” explains Markus.

The important aspect for GEMA is the reusability of every data product, allowing business units to leverage complex data products for various use cases. Zeenea’s data catalog serves as the user-friendly interface for GEMA’s data product producers, significantly improving data management and utilization across GEMA.

Real-World Use Cases

 
One notable success is the creation of account statements for GEMA’s members on their website, a task that had been challenging for years before the new data platform. The platform’s efficiency enabled GEMA to develop this use case within six months. Additionally, many intermediate products were also developed during this process and used for other use cases.

By spring 2024, over ten different business services were consuming data products from the platform, enhancing reporting services and enabling advanced machine learning use cases. “This was made possible thanks to the transparency brought by our data catalog,” concludes Markus.
 

GEMA’s recommendations for Data Mesh Implementation

 
For organizations considering a data mesh approach, GEMA offers the following recommendations:

Mindset Change: Treat data as a vital business asset.

Seamless Integration: Ensure all components of the data platform integrate smoothly.

Understandable Data Products: Make data products discoverable and well-marketed for your end users.

Single Source of Truth: Implement a metadata catalog (Zeenea) for a comprehensive overview of data assets.

As we wanted to decentralize the responsibility for data in our organization, Zeenea was not only a tool that helped us, it was one of the foundations that made it possible in the first place. The platform has a great user interface; it’s very clean, slick, and easy to use. Our users liked it from day one. It makes it easy for them to figure out what data products we have, where the data is coming from, and in what visualizations the data gets used. Also, they always know who to contact if they have questions about a specific data product.


Martin ZÃœRN
Head of Data Engineering
GEMA
Martin ZÃœRN - GEMA

Key people involved

Head of Data Engineering

Head of Data Governance

Data Analysts

Data Scientists

Data Product Owners, Engineers & Custodians

Parties prenantes

Head of Data Engineering

Head of Data Governance

Data Analysts

Data Scientists

Data Product Owners, Engineers & Custodians

Stakeholder

Head of Data Engineering

Head of Data Governance

Data Analysts

Data Scientists

Data Product Owners, Engineers & Custodians

Main Data Sources

Principales sources de données

Wichtigste Datenquellen

Google Bigquery Logo
Google Cloud Storage Logo
Tableau Logo
MongoDB Logo
Oracle Logo
PostgreSQL Logo
SAP Logo
Databricks Logo

35
Number of data sources connected

100+
Number of data products created

10+
Number of business services consuming the data products

35
Number of data sources connected

100+
Number of data products created

10+
Number of business services consuming the data products

35
Number of data sources connected

100+
Number of data products created

10+
Number of business services consuming the data products

Soc 2 Type 2
Iso 27001
© 2024 Zeenea - All Rights Reserved
Soc 2 Type 2
Iso 27001
© 2024 Zeenea - Tous droits réservés.
Soc 2 Type 2
Iso 27001
© 2024 Zeenea - All Rights Reserved