Get AI summaries of any video or article — Sign up free
KGC 2023 Masterclass: Build a Semantic Layer in a Knowledge Graph — Stardog thumbnail

KGC 2023 Masterclass: Build a Semantic Layer in a Knowledge Graph — Stardog

6 min read

Based on The Knowledge Graph Conference 's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Semantic knowledge graphs add a semantic layer that connects entities and relationships across disconnected enterprise data sources without requiring full data consolidation.

Briefing

Stardog’s masterclass frames semantic knowledge graphs as the practical way to answer richer business questions across disconnected enterprise data—without forcing every system into a single database. The core idea is that a knowledge graph sits on top of existing data sources and builds a semantic layer: a machine-understandable model of entities, relationships, and meaning that enables downstream analytics, AI, and machine learning to work from consistent definitions.

The session starts with a simple limitation: with only one dataset, organizations can answer narrow questions (like contact details for a single customer). Knowledge graphs matter because they connect that data to other sources—purchases, product categories, locations, and even behavioral signals from social channels—so questions become multi-faceted. The talk connects this to three pillars of modern data management: storage, governance, and analytics. In real enterprises, data is spread across many catalogs and systems, and governance functions are fragmented. The semantic layer is positioned as the unifying “single language” that lets teams work across that sprawl while keeping data physically where it already lives.

Stardog’s approach is built around three foundational components. First is virtualization: data exists everywhere at different speeds and in different technologies, so the solution can’t rely on consolidating everything into one place. Second is the semantic graph: it creates a connected view of concepts across sources (for example, how a Postgres customer table relates to a Databricks lakehouse purchase history) without moving all the underlying data. Third is an inference engine: it encodes meaning and patterns directly in the graph so new relationships can be derived at query time. The emphasis is on making implicit knowledge explicit—so machines can interpret not just data values but what those values mean.

The masterclass then narrows to semantic graph fundamentals. Semantic graphs use nodes and edges where relationships are treated like sentence-like triples (subject–predicate–object), and relationship meaning can include time and qualifiers. The instructor contrasts this with labeled property graphs, noting that the session focuses on semantic graphs.

Hands-on work in Stardog Explorer and Designer shows how a “Customer 360” knowledge graph is built iteratively. Participants start by defining concepts (Customer, Product Category, Product, Purchase) and connecting them with relationships (e.g., purchases link customers to products, products belong to categories). Designer’s Query Builder helps craft queries using the ontology/metadata so users can ask questions without learning query syntax. The workflow then moves to publishing the model to a Stardog endpoint and mapping CSV data sources into the graph.

Mapping is presented as a lightweight alternative to classic ETL: instead of moving data, users define correspondences between relational columns and graph concepts, then publish those mappings so Stardog can generate the necessary queries across systems. Auto-mapping accelerates this by suggesting identifiers and attributes based on the data model, while users can add missing properties inline.

Finally, the inference engine turns the graph from a static representation into a reasoning system. Rules classify patterns (like “2022 orders” or “large orders”) and infer higher-level concepts (like “sports category shopper”). Inferences are computed on the fly so results update immediately as underlying data changes. The session also highlights explainability via proof trees—answering not only “what is connected” but “why,” including which asserted facts and rules led to the inference.

By the end, the masterclass positions semantic knowledge graphs as an iterative, software-like process: start small, model the essentials, map data sources, add rules, and expand connectivity. The payoff is a connected enterprise view that hides enterprise messiness behind the scenes while delivering consistent, machine-readable meaning for analytics and AI use cases.

Cornell Notes

Semantic knowledge graphs in Stardog are presented as a way to answer richer enterprise questions across disconnected systems by adding a semantic layer on top of existing data. The workflow builds a data model (concepts and relationships), maps enterprise sources into that model without moving data, and then uses an inference engine to derive new relationships from patterns encoded as rules. This reasoning happens at query time, so inferred facts behave like asserted graph data and update immediately when inputs change. Explainability is emphasized through proof trees that show which facts and rules produced an inference. The practical goal is a “Customer 360” style graph that enables downstream analytics and AI/ML to work from consistent meaning rather than isolated tables.

Why does a knowledge graph enable questions that a single dataset can’t?

A single source limits answers to what that dataset contains (e.g., customer contact fields). A knowledge graph connects that customer to other entities—purchases, products, categories, and other behavioral signals—so queries can combine facets. In the masterclass’s framing, the “network effect” from connectivity turns narrow lookups into multi-step questions like “which customers bought products in the sports department” and then drilling into orders and products.

What are the three pillars Stardog uses to make semantic graphs work in an enterprise?

The talk organizes the approach into (1) virtualization—data stays where it is across many technologies and speeds; (2) the semantic graph—creates a unified semantic layer by relating concepts across sources without consolidating all data; and (3) an inference engine—derives additional meaning by encoding patterns and rules so new relationships can be inferred during query time.

How does Stardog’s semantic graph differ from labeled property graphs in the session’s framing?

The session focuses on semantic graphs where relationship meaning is part of the graph structure, described using sentence-like triples (subject–predicate–object). It also notes that relationship qualifiers can capture details like time ranges for events (e.g., attended college with start/end dates). Labeled property graphs are mentioned as a common alternative, but the masterclass emphasizes semantic graphs for meaning-driven connectivity.

What does the hands-on workflow look like in practice (model → publish → map → query)?

Participants first define concepts and relationships in Designer (e.g., Customer, Product Category, Product, Purchase; plus relationships like products-in-categories and purchases linking customers to products). They publish the model to a Stardog endpoint, then use Explorer to inspect the model. Next, they map CSV data sources into the graph in Designer by creating correspondences between columns and graph concepts/attributes (with auto-mapping suggestions). Finally, they query using Explorer and Query Builder, which uses the ontology/metadata to generate queries without requiring query-language fluency.

How do inference rules change what the graph can answer?

Rules classify patterns and create higher-level concepts. In the masterclass example, rules define “2022 orders” based on purchase dates, “large orders” based on price thresholds, and then infer “sports category shopper” by linking customers to purchases that involve products in the sports category. These inferred relationships appear in query results as if they were asserted, and Stardog computes them on the fly so changes propagate immediately.

What does explainability mean in this inference setup?

Explainability is presented as proof trees: when a connection is inferred (for example, why one entity is connected to another), the system can show the chain of asserted facts and the rules used to reach that conclusion. This supports the “why” question, not just the “what,” and helps users trust and audit reasoning outcomes.

Review Questions

  1. What limitations of isolated datasets does a semantic knowledge graph address, and how does connectivity change the types of questions that can be answered?
  2. Describe the end-to-end workflow for building a Customer 360 knowledge graph in Stardog, including what happens during mapping and publishing.
  3. How does query-time inference differ from precomputing inferred relationships, and why does that matter for data freshness and explainability?

Key Points

  1. 1

    Semantic knowledge graphs add a semantic layer that connects entities and relationships across disconnected enterprise data sources without requiring full data consolidation.

  2. 2

    Stardog’s approach is built on virtualization, a semantic graph, and an inference engine to handle enterprise sprawl and enable meaning-driven reasoning.

  3. 3

    Semantic graphs treat relationships as first-class meaning units (triple-style subject–predicate–object), supporting richer event and relationship context.

  4. 4

    Designer supports an iterative modeling workflow: define a minimal set of concepts and relationships, publish, then map additional data sources as requirements expand.

  5. 5

    Mapping in Stardog focuses on correspondences between source columns and graph concepts/attributes, so data can remain in place while the semantic layer unifies access.

  6. 6

    Inference rules classify patterns and infer new relationships at query time, making implicit knowledge behave like asserted graph data.

  7. 7

    Explainability is delivered through proof trees that show which facts and rules produced an inference, supporting auditability and trust.

Highlights

The semantic layer is positioned as a “single language” for enterprise data—unifying meaning across many catalogs and systems while keeping data physically distributed.
Auto-mapping and inline attribute creation let users evolve the data model during mapping, reducing the need for perfect upfront schema design.
Inference rules can turn patterns into new concepts (e.g., “sports category shopper”) and return them as if they were stored facts.
Proof trees provide “why” explanations for inferred connections, not just “what” results.
The masterclass emphasizes iterative, software-like development: start small, map quickly, then expand connectivity and rules over time.

Topics

  • Semantic Layer
  • Knowledge Graph Modeling
  • Designer Mapping
  • Inference Engine
  • Customer 360

Mentioned

  • Stardog
  • Stardog Explorer
  • Stardog Designer
  • Stardog Studio
  • Mike
  • Ingrid
  • Laura