smart-data-catalog-4-search-engine

What makes a data catalog “smart”? #4 – The search engine

February 16, 2022
February 16, 2022
16 February 2022

A data catalog harnesses enormous amounts of very diverse information – and its volume will grow exponentially. This will raise 2 major challenges: 

  • How to feed and maintain the volume of information without tripling (or more) the cost of metadata management?
  • How to find the most relevant datasets for any specific use case?

At Zeenea, we think that a data catalog should be Smart in order to answer these 2 questions, with smart technological and conceptual features that go wider than the sole integration of AI algorithms.

In this respect we have identified 5 areas in which a data catalog can be “Smart” – most of which do not involve machine learning:

  1. Metamodeling
  2. The data inventory
  3. Metadata management
  4. The search engine
  5. User experience

A powerful search engine for an efficient exploration

Given the enormous volumes of data involved in an enterprise catalog, we consider the search engine the principal mechanism through which users can explore the catalog. The search engine needs to be easy to use, powerful, and, most importantly, efficient – the results must meet user expectations. Google and Amazon have raised the bar very high in this respect and the search experience they offer has become a reference in the field. 

This second to none search experience can be summed up thus:

  • I write a few words in the search bar, often with the help of a suggestion system that offers frequent associations of terms to help me narrow down my search.

  • The near-instantaneous response provides results in a specific order and I fully expect to find the most relevant one on page one.

  • Should this not be the case, I can simply add terms to narrow the search down even further or use the available filters to cancel out the non-relevant results.

Alas, the best currently on offer in the data cataloging market in terms of search capabilities seems to be limited to capable systems indexations, scoring, and filtering. This approach is satisfactory when the user has a specific idea of what they are looking for (high intent search) but can prove disappointing when the search is more exploratory (low intent search) or when the idea is simply to spontaneously suggest relevant results to a user (no intent).

In short, simple indexation is great for finding information whose characteristics are well known but falls short when the search is more exploratory. The results often include false positives and the order in which the search comes out is over-represented with exact matches.

A multidimensional search approach

We decided from the get-go that a simple indexation system would prove limited and would fall short of providing the most relevant results for the users. We, therefore, chose to isolate the search engine in a dedicated module on the platform and to turn it into a powerful innovation (and investment) zone.

We naturally took an interest in the work of the founders of Google on Page Rank, their algorithm. Page Rank takes into account several dozen aspects (called features), amongst which are the density of the relation between different graph objects (hypertext links in the case of internet pages), the linguistic treatment of search terms, or the semantic analysis of the knowledge graph.

Of course, we do not have the means Google has, nor its expertise in terms of search result optimization. But we have integrated into our search engine several features that provide a high level of relevant results, and those features are permanently evolving.

We have integrated the following core features:

  • Standard, flat, indexation of all the attributes of an object (name, description, and properties) weighing it up in accordance with the type of property.
  • An NLP layer (Natural Language Processing) that takes into account the near misses (typing or spelling errors).
  • A semantic analysis layer that relies on the processing of the knowledge graph.
  • A personalization layer that currently relies on a simple user classification according to their uses, and will in the future be enriched by individual profiling.

 

Smart filtering to contextualize and limit search results

To complete the search engine, we also provide what we call a smart filtering system. Smart filtering is something we often find on e-commerce websites (such as Amazon, booking.com, etc.) and it consists in providing contextual filters to limit the search result. These filters work in the following way:

  • Only those properties that help reduce the list of results are offered in the list of filters – non-discriminating properties do not show up.
  • Each filter shows its impact – meaning the number of residual results once the filter has been applied.
  • Applying a filter refreshes the list of results instantaneously.

With this combination of multi-dimensional search and smart filtering, we feel that we offer a superior search experience to any of our competitors. And our decoupled architecture enables us to explore new approaches continuously, and rapidly integrate those that seem efficient.

post-wp-smart-data-catalog-en

For more information on how a Smart search engine enhances a Data Catalog, download our eBook:

“What is a Smart Data Catalog?”!

zeenea logo

At Zeenea, we work hard to create a data fluent world by providing our customers with the tools and services that allow enterprises to be data driven.

zeenea logo

Chez Zeenea, notre objectif est de créer un monde “data fluent” en proposant à nos clients une plateforme et des services permettant aux entreprises de devenir data-driven.

zeenea logo

Das Ziel von Zeenea ist es, unsere Kunden "data-fluent" zu machen, indem wir ihnen eine Plattform und Dienstleistungen bieten, die ihnen datengetriebenes Arbeiten ermöglichen.

Related posts

Articles similaires

Ähnliche Artikel

Be(come) data fluent

Read the latest trends on big data, data cataloging, data governance and more on Zeenea’s data blog.

Join our community by signing up to our newsletter!

Devenez Data Fluent

Découvrez les dernières tendances en matière de big data, data management, de gouvernance des données et plus encore sur le blog de Zeenea.

Rejoignez notre communauté en vous inscrivant à notre newsletter !

Werden Sie Data Fluent

Entdecken Sie die neuesten Trends rund um die Themen Big Data, Datenmanagement, Data Governance und vieles mehr im Zeenea-Blog.

Melden Sie sich zu unserem Newsletter an und werden Sie Teil unserer Community!

Let's get started
Make data meaningful & discoverable for your teams
Learn more >

Los geht’s!

Geben Sie Ihren Daten einen Sinn

Mehr erfahren >

Soc 2 Type 2
Iso 27001
© 2024 Zeenea - All Rights Reserved
Soc 2 Type 2
Iso 27001
© 2024 Zeenea - All Rights Reserved
Démarrez maintenant
Donnez du sens à votre patrimoine de données
En savoir plus
Soc 2 Type 2
Iso 27001
© 2024 Zeenea - Tous droits réservés.