The arrival of Big Data did not simplify how enterprises work with data. The volume, the variety, and the various data storage systems are exploding.

To prove this, Matt Turck published what we call the Big Data Landscape. Updated every year, this infographic shows the different key players in various sub-domains of the Big Data landscape.

Matt-Turck-FirstMark-Big-Data-Landscape-2018-reduced

Thus, with the Big Data revolution, it is even more difficult to answer “primary” questions related to data mapping:

  • What are the most pertinent datasets and tables for my use cases and my organization?

  • Do I have sensitive data? How are they used?

  • Where does my data come from? How have they been transformed?

  • What will be the impacts on my datasets if they are transformed?

So many questions that information systems managers, Data Lab managers, Data Analysts or even Data Scientists ask themselves to be able to deliver efficient and pertinent data analysis.

Among others, these questions allow enterprises to:

  • Improve data quality: Providing as much information as possible allows users to know if the data is suitable for use.

  • Comply with European regulations (GDPR): mark personal data and the carried out processes.

  • Make employees more efficient and autonomous in understanding data through graphical and ergonomic data mapping.

To put these into action, companies must build what is called data lineage.