KGC 2023 Masterclass: Taxonomy-Driven Ontology Design

TL;DR

Taxonomy-driven ontology design starts with controlled vocabularies for consistent tagging and retrieval, then adds ontology semantics for richer querying.

Briefing Cornell Notes

Briefing

Taxonomy-driven ontology design hinges on a practical idea: start with controlled, hierarchical (or faceted) taxonomies for consistent tagging and search, then add an ontology as a semantic “layer” that models relationships, attributes, and constraints so data and content can be queried in richer, multi-step ways. The payoff is fewer missed items, better normalization across synonyms and languages, and the ability to move beyond keyword search into structured discovery—especially when organizations face messy, siloed data with inconsistent naming.

The session frames both taxonomies and ontologies as knowledge organization systems, but draws a sharp line between their roles. A taxonomy is defined as a collection of controlled vocabulary terms organized into a structure—typically hierarchical, sometimes faceted—where each concept has an unambiguous, non-redundant meaning. Control means concepts, not raw strings: terms are governed so that different names (including synonyms, alternative labels, and hidden labels) map to the same concept. That concept-centric approach is what enables precision and recall improvements over keyword search, supports browsing and faceted filtering, and provides consistent metadata for tagging and indexing.

Ontologies, by contrast, are treated as knowledge representation: they model a domain using classes, properties, and semantic relations expressed as subject–predicate–object triples (a theme tied to RDF and semantic web standards). Ontologies add meaning beyond “broader/narrower/related” by specifying semantic relationships between classes and by attaching attributes (data type properties) with constraints that can support reasoning and inference. The result is a model that can connect multiple vocabularies and enable complex queries—like finding contacts based on chains of relationships (e.g., employment, industry membership, and location), not just “aboutness” tags.

A key architectural claim is that an ontology can sit above existing taxonomies and other controlled vocabularies as a semantic layer. In the PoolParty context, the taxonomy supplies the controlled concepts used for tagging and user-facing navigation, while the ontology supplies additional classes and relations that connect those concepts across dimensions. The talk illustrates this with examples such as recipe taxonomies (concepts like “appetizer,” with multilingual labels and scope notes) and an ontology-driven “cocktail” model where relationships like “consists of,” “part of,” and “uses garnish” enable different navigation paths than faceted browsing alone.

The discussion also clarifies why taxonomies and ontologies are often extended together rather than replaced. Taxonomies excel at consistent naming, multilingual synonym management, and retrieval workflows; ontologies add explicit relationships, multi-part search, data-centric modeling, and reasoning. Knowledge graphs then enter as the broader system: instance data extracted from spreadsheets, databases, and other repositories is stored in graph databases (with RDF triple stores emphasized for semantic web alignment), linked to ontology and taxonomy metadata, and used by applications for search, discovery, personalization, and AI-driven recommendations.

Finally, the session offers guidance on ontology building approaches: top-down (starting from a foundation or upper ontology and extending) versus bottom-up (starting from existing taxonomies and control vocabularies, importing them as concept schemes, and modeling only what’s needed in the ontology). The practical recommendation is to begin with taxonomies already grounded in real tagging and retrieval use cases, then extend into ontologies where business needs demand richer relationships, attributes, and cross-domain querying.

Cornell Notes

The session distinguishes taxonomies from ontologies by function. Taxonomies organize controlled vocabulary concepts into structured hierarchies or facets to standardize tagging and improve search and browsing through synonym normalization and consistent metadata. Ontologies add a semantic modeling layer—classes, properties, and explicit relations—so systems can answer multi-step questions and support reasoning beyond “aboutness.” A taxonomy-driven approach starts with existing taxonomies (often already used for tagging) and extends them with an ontology that connects those concepts via semantic relations and attributes. This combination feeds enterprise knowledge graphs, where instance data from multiple repositories is stored in graph databases and linked to ontology/taxonomy metadata for richer discovery and applications.

What makes a taxonomy “controlled,” and why does it matter for search?

A taxonomy is controlled because terms are governed by an authority so each concept has an unambiguous, non-redundant definition. The control targets concepts rather than raw strings, which lets multiple names (synonyms, alternative labels, hidden labels, and multilingual labels) map to the same concept ID/URI. That normalization improves retrieval by increasing recall (different user names still find the same tagged content) while maintaining precision (tagging and matching occur against the concept, not arbitrary keywords).

How do taxonomies and ontologies differ in what they model?

Taxonomies primarily model classification for tagging and navigation—typically broader/narrower structure and faceted grouping. Ontologies model a domain using classes, properties, and semantic relations expressed in triples (subject–predicate–object). Ontologies also support constraints and reasoning, enabling queries that follow explicit relationships (e.g., “contacts located in Austin” plus chains through employment and industry membership), not just filtering by a single tag.

Why extend a taxonomy into an ontology instead of building an ontology from scratch?

Organizations often already have taxonomies grounded in real tagging and content retrieval needs. Starting from those taxonomies provides a stable, concept-centric foundation for multilingual labels and consistent metadata. The ontology then adds only the additional semantic relationships and attributes required for richer use cases (multi-part search, explicit relationship exploration, and inference). This also reduces the risk of guessing ontology structure that doesn’t match actual business workflows.

What does “ontology as a semantic layer over taxonomies” mean in practice?

The ontology connects and enriches existing concept schemes rather than replacing them. In the PoolParty examples, taxonomy concepts drive tagging and user navigation, while ontology classes and relations add cross-dimension links (e.g., relationships like “consists of” or “goes with” between ingredients, occasions, and dishes). The ontology tends to be smaller than the taxonomy because it models the semantic structure needed for relationships and attributes, while the taxonomy holds the bulk of hierarchical concepts and instances/named entities.

How do knowledge graphs fit into the taxonomy–ontology–data picture?

A knowledge graph is presented as taxonomy/ontology plus instance data stored in a graph database. Instance data (values and records from spreadsheets, databases, and repositories) is extracted and stored as graph data (with RDF triple stores emphasized for semantic web alignment). The ontology and taxonomies provide the schema/terminology metadata that links those instances to controlled concepts, enabling search and discovery across heterogeneous structured and unstructured sources.

What are the two main approaches to building ontologies mentioned, and when would each help?

Top-down starts from an upper/foundation ontology and extends with domain-specific subclasses, relations, and attributes. Bottom-up starts from existing taxonomies/control vocabularies, imports them as concept schemes, and models only what’s needed in the ontology (often leaving the full hierarchy and most named entities in the taxonomy layer). The talk favors bottom-up when organizations already have taxonomies tied to real tagging and retrieval use cases.

Review Questions

Explain how synonym management in a taxonomy improves precision and recall compared with keyword search.
Give an example of a question that requires ontology-style semantic relations rather than taxonomy-only browsing.
Describe how instance data, ontologies, and taxonomies combine inside an enterprise knowledge graph.

Key Points

1
Taxonomy-driven ontology design starts with controlled vocabularies for consistent tagging and retrieval, then adds ontology semantics for richer querying.
2
A taxonomy’s control targets concepts (not just strings), enabling multilingual and synonym normalization through preferred/alternative/hidden labels.
3
Ontologies model classes, properties, and semantic relations using triple-based statements, supporting constraints and reasoning beyond broader/narrower navigation.
4
An ontology can act as a semantic layer that connects and enriches existing taxonomies and other controlled vocabularies rather than replacing them.
5
Knowledge graphs combine instance data stored in graph databases with ontology/taxonomy metadata to enable cross-repository discovery and applications.
6
Bottom-up ontology building is often practical when organizations already have taxonomies grounded in real business tagging and search needs.

Highlights

Taxonomies standardize “aboutness” for tagging and search by mapping many user names and languages to the same controlled concept.

Ontologies add explicit semantic relations and attributes so systems can answer multi-step questions that faceted browsing can’t handle alone.

Enterprise knowledge graphs are framed as instance data + ontology/taxonomy metadata stored in graph databases, enabling discovery across heterogeneous repositories.

Topics

Taxonomy vs Ontology
Controlled Vocabularies
Semantic Relations
Knowledge Graphs
RDF Standards

Mentioned

PoolParty
Symantec
SharePoint
Adobe Experience Manager
Heather Hedden
ANSI
NISO
SCOs
RDF
RDFS
OWL
URI

KGC 2023 Masterclass: Taxonomy-Driven Ontology Design — Heather Hedden, PoolParty