What makes a data catalog “smart”? #4 – The search engine

February 16, 2022

16 February 2022

A data catalog harnesses enormous amounts of very diverse information – and its volume will grow exponentially. This will raise 2 major challenges:

How to feed and maintain the volume of information without tripling (or more) the cost of metadata management?
How to find the most relevant datasets for any specific use case?

At Zeenea, we think that a data catalog should be Smart in order to answer these 2 questions, with smart technological and conceptual features that go wider than the sole integration of AI algorithms.

In this respect we have identified 5 areas in which a data catalog can be “Smart” – most of which do not involve machine learning:

—

A powerful search engine for an efficient exploration

Given the enormous volumes of data involved in an enterprise catalog, we consider the search engine the principal mechanism through which users can explore the catalog. The search engine needs to be easy to use, powerful, and, most importantly, efficient – the results must meet user expectations. Google and Amazon have raised the bar very high in this respect and the search experience they offer has become a reference in the field.

This second to none search experience can be summed up thus:

I write a few words in the search bar, often with the help of a suggestion system that offers frequent associations of terms to help me narrow down my search.

The near-instantaneous response provides results in a specific order and I fully expect to find the most relevant one on page one.

Should this not be the case, I can simply add terms to narrow the search down even further or use the available filters to cancel out the non-relevant results.

Alas, the best currently on offer in the data cataloging market in terms of search capabilities seems to be limited to capable systems indexations, scoring, and filtering. This approach is satisfactory when the user has a specific idea of what they are looking for (high intent search) but can prove disappointing when the search is more exploratory (low intent search) or when the idea is simply to spontaneously suggest relevant results to a user (no intent).

In short, simple indexation is great for finding information whose characteristics are well known but falls short when the search is more exploratory. The results often include false positives and the order in which the search comes out is over-represented with exact matches.

A multidimensional search approach

We decided from the get-go that a simple indexation system would prove limited and would fall short of providing the most relevant results for the users. We, therefore, chose to isolate the search engine in a dedicated module on the platform and to turn it into a powerful innovation (and investment) zone.

We naturally took an interest in the work of the founders of Google on Page Rank, their algorithm. Page Rank takes into account several dozen aspects (called features), amongst which are the density of the relation between different graph objects (hypertext links in the case of internet pages), the linguistic treatment of search terms, or the semantic analysis of the knowledge graph.

Of course, we do not have the means Google has, nor its expertise in terms of search result optimization. But we have integrated into our search engine several features that provide a high level of relevant results, and those features are permanently evolving.

We have integrated the following core features:

Standard, flat, indexation of all the attributes of an object (name, description, and properties) weighing it up in accordance with the type of property.
An NLP layer (Natural Language Processing) that takes into account the near misses (typing or spelling errors).
A semantic analysis layer that relies on the processing of the knowledge graph.
A personalization layer that currently relies on a simple user classification according to their uses, and will in the future be enriched by individual profiling.

Smart filtering to contextualize and limit search results

To complete the search engine, we also provide what we call a smart filtering system. Smart filtering is something we often find on e-commerce websites (such as Amazon, booking.com, etc.) and it consists in providing contextual filters to limit the search result. These filters work in the following way:

Only those properties that help reduce the list of results are offered in the list of filters – non-discriminating properties do not show up.
Each filter shows its impact – meaning the number of residual results once the filter has been applied.
Applying a filter refreshes the list of results instantaneously.

With this combination of multi-dimensional search and smart filtering, we feel that we offer a superior search experience to any of our competitors. And our decoupled architecture enables us to explore new approaches continuously, and rapidly integrate those that seem efficient.

For more information on how a Smart search engine enhances a Data Catalog, download our eBook:

“What is a Smart Data Catalog?”!

Download the ebook

← Previous Next →

← Vorherige Nächste →

← Précédent Suivant →

Zeenea Actian Logo

At Zeenea, we work hard to create a data fluent world by providing our customers with the tools and services that allow enterprises to be data driven.

Zeenea Actian Logo

Chez Zeenea, notre objectif est de créer un monde “data fluent” en proposant à nos clients une plateforme et des services permettant aux entreprises de devenir data-driven.

Zeenea Actian Logo

Das Ziel von Zeenea ist es, unsere Kunden "data-fluent" zu machen, indem wir ihnen eine Plattform und Dienstleistungen bieten, die ihnen datengetriebenes Arbeiten ermöglichen.

TECHNOLOGY

SOLUTIONS

CAPABILITIES

APPLICATIONS

INDUSTRIES

DATA LEADERS

KNOWLEDGE HUB

PRODUCT HUB

ABOUT

GET IN TOUCH

SERVICES

BELIEFS

What makes a data catalog “smart”? #4 – The search engine

A powerful search engine for an efficient exploration

A multidimensional search approach

Smart filtering to contextualize and limit search results

Related posts

Articles similaires

Ähnliche Artikel

Be(come) data fluent

Devenez Data Fluent

Werden Sie Data Fluent

Product

Capabilities

Use Cases

Resources

Company

Produkt

Funktionalitäten

Use Cases

Ressourcen

Company

Produit

Capacités

Cas d'usage

Ressources

Société

TECHNOLOGY

SOLUTIONS

CAPABILITIES

APPLICATIONS

INDUSTRIES

DATA LEADERS

KNOWLEDGE HUB

PRODUCT HUB

ABOUT

GET IN TOUCH

SERVICES

BELIEFS

What makes a data catalog “smart”? #4 – The search engine

A powerful search engine for an efficient exploration

A multidimensional search approach

Smart filtering to contextualize and limit search results

Related posts

Articles similaires

Ähnliche Artikel

Harnessing the Power of AI in Data Cataloging

The Role of Data Catalogs in Accelerating AI Initiatives

[SERIES] Data Shopping Part 2 – The Zeenea Data Shopping Experience

[SERIES] Building a Marketplace for Data Mesh Part 3: Feeding the Marketplace via domain-specific data catalogs

[SERIES] Building a Marketplace for Data Mesh Part 2: Setting up an enterprise-level marketplace

Be(come) data fluent

Devenez Data Fluent

Werden Sie Data Fluent

Product

Capabilities

Use Cases

Resources

Company

Produkt

Funktionalitäten

Use Cases

Ressourcen

Company

Produit

Capacités

Cas d'usage

Ressources

Société