Metadata through the eyes of Web Giants

March 17, 2020

17 March 2020

Data life cycle analysis is an element in data management that enterprises are still struggling to implement.

Organizations at the forefront of data innovation such as Uber, LinkedIn, Netflix, Airbnb and Lyft have also seen the value of metadata in the magnitude of this challenge.

They thus developed a metadata management strategy using dedicated platforms. Frequently developed on a custom basis, they facilitate data ingestion, indexing, search, annotation and discovery in order to maintain high quality datasets.

The following examples highlight a shared constant: the difficulty, increased by volume and variety, of transforming business data into exploitable knowledge.

Let’s take a look at the analysis and context of these Web giants:

Uber

Every interaction on Uber’s platform, from their ride sharing services to their food deliveries, is data-driven. Through analysis, their data enables more reliable and relevant user experiences.

Uber’s key stats

thousands of billions of Kafka messages a day,
hundreds of petabytes of data in HDFS in data centers,
millions of analytical queries weekly.

However, the volume of data generated alone is not sufficient to leverage the information it represents; to be used effectively and efficiently, data requires more context to make optimal business decisions.

To provide additional information, Uber therefore developed “Databook”, the company’s internal platform that collects and manages metadata on internal datasets in order to transform data into knowledge.

Databook is designed to enable Uber employees to effectively explore, discover and use Uber’s data. Databook gives context to their data (its meaning, quality, etc) and ensures that it is maintained in its platform for the thousands of employees who want to analyze the data. In short, Databook’s metadata enables data leaders to move from viewing raw data to actionable knowledge.

In the article Databook: Turning Big Data into Knowledge with Metadata at Uber, the article concludes that one of the biggest challenges for Databook was to move from manual metadata repository updates to automation.

Airbnb

At a conference in May 2017, John Bodley, Data Engineer at AirBnB, outlined new issues arising from the company’s growth: a confusing and non-unified landscape that wasn’t allowing access to increasingly important information.
What can we do with all this data collected on a daily basis? How do we turn them into assets for all Airbnb employees?

A dedicated team set out to develop a tool that would democratize access to data within the company. Their work was based both on the knowledge of the analysts and their ability to understand the critical points, and on that of the engineers, who were able to offer a more technical vision. At the heart of the project, interviews of employees concerning their issues were conducted.

What emerged from this survey was a difficulty in finding the information employees needed to work, and a still too tribal approach to sharing and holding information.

To meet these challenges, AirBnB created Data Portal, a metadata management platform. Data Portal centralizes and shares this information via this self-service platform.

Lyft

Lyft is a ride-sharing service and is Uber’s main competitor in the North American market.

The company found they were inefficiently providing data access for its analytical profiles. Its reflections focused on making data knowledge available to optimize its processes. In just a few months, their goal of creating an interface for researching data presented these two major challenges:

Productivity – Whether it’s to create a new model, instrument a new metric, or perform an ad hoc analysis, how can Lyft use this data in the most productive and efficient way possible?
Compliance – When collecting data about an organization’s users, how can Lyft comply with increasing regulatory requirements and maintain the trust of its users?

In their article Amundsen – Lyft’s data discovery & metadata engine, Lyft states that the key does not lie in the data, but in the metadata!

Netflix

As the world leader in video streaming, data exploitation at Netflix is, of course, a major strategic focus.

Given the diversity of their data sources, the video platform wanted to offer a way to federate and interact with these assets from a single tool. This search for a solution led to Metacat.

This tool acts as a layer of access to data and metadata from Netflix data sources. It allows its users to access data from any storage system through three different features:

Adding business metadata: By hand or user-defined, business metadata can be added via Metacat.
Data discovery: The tool publishes schema and business metadata defined by its users in Elasticsearch, facilitating full-text search of information in data sources.
Data Change Notification and Auditing: Metacat records and notifies all changes to metadata from storage systems.

In their blog article, “Metacat: Making Big Data Discoverable and Meaningful”, at Netflix, the firm confirms that they are far from finished working on their solution!

There are a few more features they have yet to work on to improve the data warehousing experience:

Schema and metadata visioning to provide table history.
Provide contextual information on arrays for better data lineage.
Add support for datastores like Elasticsearch and Kafka.

Learn more about data discovery solutions in our white paper: “Data Discovery through the eyes of Tech Giants”

Discover the various data discovery solutions developed by large Tech companies, some belonging to the famous “Big Five” or “GAFAM”, and how they helped them become data-driven.

download our white paper

← Previous Next →

← Vorherige Nächste →

← Précédent Suivant →

Zeenea Actian Logo

At Zeenea, we work hard to create a data fluent world by providing our customers with the tools and services that allow enterprises to be data driven.

Zeenea Actian Logo

Chez Zeenea, notre objectif est de créer un monde “data fluent” en proposant à nos clients une plateforme et des services permettant aux entreprises de devenir data-driven.

Zeenea Actian Logo

Das Ziel von Zeenea ist es, unsere Kunden "data-fluent" zu machen, indem wir ihnen eine Plattform und Dienstleistungen bieten, die ihnen datengetriebenes Arbeiten ermöglichen.

TECHNOLOGY

SOLUTIONS

CAPABILITIES

APPLICATIONS

INDUSTRIES

DATA LEADERS

KNOWLEDGE HUB

PRODUCT HUB

ABOUT

GET IN TOUCH

SERVICES

BELIEFS

Metadata through the eyes of Web Giants

Uber

Airbnb

Lyft

Netflix

Learn more about data discovery solutions in our white paper: “Data Discovery through the eyes of Tech Giants”

Related posts

Articles similaires

Ähnliche Artikel

Be(come) data fluent

Devenez Data Fluent

Werden Sie Data Fluent

Product

Capabilities

Use Cases

Resources

Company

Produkt

Funktionalitäten

Use Cases

Ressourcen

Company

Produit

Capacités

Cas d'usage

Ressources

Société

TECHNOLOGY

SOLUTIONS

CAPABILITIES

APPLICATIONS

INDUSTRIES

DATA LEADERS

KNOWLEDGE HUB

PRODUCT HUB

ABOUT

GET IN TOUCH

SERVICES

BELIEFS

Metadata through the eyes of Web Giants

Uber

Airbnb

Lyft

Netflix

Learn more about data discovery solutions in our white paper: “Data Discovery through the eyes of Tech Giants”

Related posts

Articles similaires

Ähnliche Artikel

5 essential Zeenea features for a five-star Data Stewardship Program

Data Mesh 101: Best Practices for Metadata Management

What is sensitive data discovery?

The Guide to Understanding the Difference Between a Business Glossary, a Data Catalog, and a Data Dictionary

Metadata management vs. master data management: the differences and similarities

Be(come) data fluent

Devenez Data Fluent

Werden Sie Data Fluent

Product

Capabilities

Use Cases

Resources

Company

Produkt

Funktionalitäten

Use Cases

Ressourcen

Company

Produit

Capacités

Cas d'usage

Ressources

Société