Knowledge graphs have been interacting with us for quite some time. Whether it be through personalized shopping experiences via online recommendations on websites such as Amazon, Zalando, or through our favorite search engine Google.
However, this concept is still often a challenge for most data and analytics managers who struggle to aggregate and link their business assets in order to take advantage of them as do these web giants.
In fact, to support this claim, Gartner stated in their article “How to Build Knowledge Graphs That Enable AI-Driven Enterprise Applications” that:
“Data and analytics leaders are encountering increased hype around knowledge graphs, but struggle to find meaningful use cases that can secure business buy-in.”.
In this article, we will define the concept of a knowledge graph by illustrating it with the example of Google and highlight how it can empower a data catalog.
What is a knowledge graph exactly?
According to GitHub, a knowledge graph is a type of ontology that depicts knowledge in terms of entities and their relationships in a dynamic and data-driven way. Contrary to static ontologies, who are very hard to maintain.
Here are other definitions of a knowledge graph by various experts:
- A “means of storing and using data, which allows people and machines to better tap into the connections in their datasets.” (Datanami)
- A “database which stores information in a graphical format – and, importantly, can be used to generate a graphical representation of the relationships between any of its data points.” (Forbes)
- “Encyclopedias of the Semantic World.” (Forbes)
Through machine learning algorithms, it provides structure for all your data and enables the creation of multilateral relations throughout your data sources. The fluidity of this structure grows more as new data is introduced, allowing more relations to be created and more context to be added which helps your data teams to make informed decisions with connections you may have never found.
The idea of a knowledge graph is to build a network of objects, and more importantly, create semantic or functional relationships between the different assets.
Within a data catalog, a knowledge graph is therefore what represents different concepts and what links objects together through semantic or static links.
Google’s algorithm uses this system to gather and provide end users with information relevant to their queries.
Google’s knowledge graph contains more than 500 million objects, as well as more than 3.5 billion facts about and relationships between these different objects.
Their knowledge graph enhances Google Search in three main ways:
- Find the right thing: Search not only based on keywords but on their meanings.
- Get the best summary: Collect the most relevant information from various sources based on the intent.
- Go deeper and broader: Discover more than you expected thanks to relevant suggestions.
How do knowledge graphs empower data catalog usages ?
Powered by a data catalog, knowledge graphs can benefit your enterprise in their data strategy through:
Rich and in-depth search results
Today, many search engines use multiple knowledge graphs in order to go beyond basic keyword-based searching. Knowledge graphs allow these search engines to understand concepts, entities and the relationships between them. Benefits include:
- The ability to provide deeper and more relevant results, including facts and relationships, rather than just documents,
- The ability to form searches as questions or sentences — rather than a list of words,
- The ability to understand complex searches that refer to knowledge found in multiple items using the relationships defined in the graph.
Optimized data discovery
Enterprise data moves from one location to another in the speed of light, and is being stored in various data sources and storage applications. Employees and partners are accessing this data from anywhere and anytime, so identifying, locating and classifying your data in order to protect it and gain insights from it should be the priority!
The benefits of knowledge graphs for data discovery include:
- A better understanding of enterprise data, where it is, who can access it and where, and how it will be transmitted,
- Automatic data classification based on context,
- Risk management and regulatory compliance,
- Complete data visibility,
- Identification, classification, and tracking of sensitive data,
- The ability to apply protective controls to data in real time based on predefined policies and contextual factors
- Adequately assess the full data picture.
On one hand it helps implement the appropriate security measures to prevent the loss of sensitive data and avoid devastating financial and reputational consequences for the enterprise. On the other, it enables teams to dig deeper into the data context to identify the specific items that reveal the answers and find ways to answer your questions.
As mentioned in the introduction, recommendation services are now a familiar component of many online stores, personal assistants and digital platforms.
The recommendations need to take a content-based approach. Within a data catalog, machine learning capabilities combined with a knowledge graph, will be able to detect certain types of data, apply tags, or statistical rules on data to run effective and smart asset suggestions.
This capacity is also known as data pattern recognition. It refers to being able to identify similar assets and rely on statistical algorithms and ML capabilities that are derived from other pattern recognition systems.
This data pattern recognition system helps data stewards maintain their metadata management :
- Identify duplicates and copy metadata
- Detect logical data types (emails, city, addresses, and so on)
- Suggest attribute values (recognize documentation patterns to apply to a similar object or a new one)
- Suggest links – semantic or lineage links
- Detect potential errors to help improve the catalog’s quality and relevance
The idea is to use some techniques that are derived from content-based recommendations found in general-purpose catalogs. When the user has found something, the catalog will suggest alternatives based both on their profile and pattern recognition.
Some data catalog use cases empowered by knowledge graphs
- Gathering assets that have been used or related to causes of failure in digital projects.
- Finding assets with particular interests aligned with new products for the marketing department.
- Generating complete 360° views of people and companies in the sales department.
- Matching enterprise needs to people and projects for HRs.
- Finding regulations relating to specific contracts and investments assets in the finance department.
With the never ending increase of data in enterprises, organizing your information without a strategy means not being able to stay competitive and relevant in the digital age. Ensuring that your data catalog has an enterprise Knowledge Graph is essential for avoiding the dreaded ‘black box’ effect.
Through a knowledge graph in combination with AI and machine learning algorithms, your data will have more context and will enable you to not only discover deeper and more subtle patterns but also to make smarter decisions.
For more insights on what is a knowledge graph, here is a great article by BARC Analyst Timm Grosser “Linked Data for Analytics?“
Start your data catalog journey with Zeenea
Zeenea is a 100% cloud-based solution, available anywhere in the world with just a few clicks. By choosing Zeenea Data Catalog, control the costs associated with implementing and maintaining a data catalog while simplifying access for your teams.
The automatic feeding mechanisms, as well as the suggestion and correction algorithms, reduce the overall costs of a catalog, and guarantee your data teams with quality information in record time.