A data catalog harnesses enormous amounts of very diverse information – and its volume will grow exponentially. This will raise 2 major challenges:
- How to feed and maintain the volume of information without tripling (or more) the cost of metadata management?
- How to find the most relevant datasets for any specific use case?
At Zeenea, we think that a data catalog should be Smart in order to answer these 2 questions, with smart technological and conceptual features that go wider than the sole integration of AI algorithms.
In this respect we have identified 5 areas in which a data catalog can be “Smart” – most of which do not involve machine learning:
A data catalog should also be smart in the experience it offers to its different pools of users. Indeed, one of the main challenges with the deployment of a data catalog is its level of adoption from those it is meant for: data consumers. And user experience plays a major role in this adoption.
User experience within the data catalog
The underlying purpose of user experience is the identification of personas whose behavior and objectives we are looking to model in order to provide them with a slick and efficient graphic interface. Pinning down personas in a data catalog is challenging – it is a universal tool that provides added value for any company regardless of its size, across all sectors of activity anywhere in the world.
Rather than attempting to model personas that are hard to define, it’s possible to handle the situation by focusing on the issue of data cataloging adoption. Here, there are two user populations that stand out:
- Metadata producers who feed the catalog and monitor the quality of its content – this population is generally referred to as Data Stewards;
- Metadata consumers who use the catalog to meet their business needs – well will call them Users.
These two groups are not totally unrelated to each other of course: some Data Stewards will also be Users.
The challenges of enterprise-wide catalog adoption
The real value of a data catalog resides in large-scale adoption by a substantial pool of (meta) data consumers, not just the data management specialists.
The pool of data consumers is very diverse. It includes data experts (engineers, architects, data analysts, data scientists, etc.), business people (project managers, business unit managers, product managers, etc.), compliance and risk managers. And more generally, all operational managers are likely to leverage data to improve their performances.
Data Catalog adoption by Users is often slowed down for the following reasons:
- Data catalog usage is sporadic: they will log on from time to time to obtain very specific answers to specific queries. They rarely have the time or patience to go through a learning curve on a tool they will only use periodically – weeks can go by between catalog usage.
- Not everyone has the same stance on metadata. Some will focus more on technical metadata, others will focus heavily on the semantic challenges, and others might be more interested in the organizational and governance aspects.
- Not everybody will understand the metamodel or the internal organization of the information within the catalog. They can quickly feel put off by an avalanche of concepts that feel irrelevant to their day-to-day needs.
The Smart Data Catalog attempts to jump these hurdles in order to accelerate catalog adoption. Here is how Zeenea meets these challenges.
How Zeenea facilitates catalog adoption
The first solution is the graphic interface. The Users’ learning curve needs to be as short as possible. Indeed, the User should be up and running without the need for any training. To make this possible, we made a number of choices.
The first choice was to provide two different interfaces, one for the Data Stewards and one for the Users:
Zeenea Studio: the management and monitoring tool for the catalog content – an expert tool solely for the Data Stewards.
Zeenea Explorer: for the Users – it provides them with the simplest search and exploration experience possible.
Our approach is aligned with the user-friendly principles of marketplace solutions – the recognized specialists in catalog management (in the general sense). These solutions usually have two applications on offer. The first, a “back office” solution, which enables the staff of the marketplace (or its partners) to feed the catalog in the most automated manner possible and control its content to ensure its quality. The second application, for the consumers, usually takes the form of an e-commerce website and enables end-users to find articles or explore the catalog. Zeenea Studio and Zeena Explorer reflect these two roles.
The information is ranked in accordance with the role of the user within the organization
Our second choice is still at the experimental stage and consists in dynamically adapting the information hierarchy in the catalog according to User profiles.
This information hierarchy challenge is what differentiates a data catalog from a marketplace type catalog. Indeed, a data catalog’s information hierarchy depends on the operational role of the user. For some, the most relevant information in a dataset will be technical: location, security, formats, types, etc. Others will need to know the data semantics and their business lineage. Others still will want to know the processes and controls that drive data production – for compliance or operational considerations.
The Smart Data Catalog should be able to dynamically adjust the structure of the information to adapt to its different prisms.
The last remaining challenge is the manner in which the information is organized in the catalog in the form of exploration paths by theme (something similar to shelving in a marketplace). It is difficult to find a structure that agrees with everybody. Some will explore the catalog along technical lines (systems, applications, technologies, etc.). Others will explore the catalog from a more functional perspective (business domains), others still from a semantic angle (through business glossaries, etc.).
The challenge of having everyone agree on a sole universal classification seems (to us) insurmountable. The Smart Data Catalog should be adaptable and should not ask Users to understand a classification that makes no sense to them. Ultimately, user experience is one of the most important success factors for a data catalog.