Chapter 1 // Ontology

What Exists, and How Do We Talk About It?

Relational databases through the lens of ontology

~20 min read 5 sections 1 exercise

You Already Know This (In the Wrong Language)

Mapping Aristotle's Categories to Data Vault — Hubs as substance, Satellites as accidents, Links as relation — isn't just a metaphor. It reveals a structural isomorphism: two systems, 2,400 years apart, converging on the same architecture because the problem is the same.

The entire field of relational databases rests on that same ontological foundation. This chapter translates what you already understand into the vocabulary the Data Vault community uses.

Three new concepts per section. No more.

✓ Checkpoint: Can you state, in one sentence, why Hubs and Aristotelian substance are structurally equivalent? If yes, you already have the conceptual foundation for everything that follows.

Tables, Rows, Columns: The Ontological Furniture

In 1970, a mathematician named Edgar Codd proposed a way to organize data using relations — a term from set theory meaning "a set of tuples that all share the same structure." In plain language: a table where every row has the same columns. That's why it's called a relational database — not because things are "related to each other" (though they are), but because each table is a mathematical relation.

Table (also: Relation)
Latin tabula — a flat board for writing

A structured collection of data about one type of thing. One table = one category of entity. The Customer table is the category "customer." Think of it as an Aristotelian universal — a class under which particulars fall.

Row (also: Record, Tuple)
From set theory: a tuple is an ordered sequence of values

One specific instance of whatever the table represents. One customer. One order. One flight. If the table is the category (universal), the row is the particular — the individual substance that falls under it.

Column (also: Field, Attribute)
A property shared by every row in the table

Name, City, Email — each is a column. Columns are the accidents in the Aristotelian sense: properties that describe a substance but aren't essential to its identity. Marie's city can change from Paris to Lyon; she's still Marie.

Data Vault Connection

Hubs, Links, and Satellites are all tables. A Hub_Customer is a table. A Sat_Customer_Details is a table. Data Vault tells you which kinds of tables to create and why. The relational database is the medium they live in.

✓ Checkpoint: If someone says "a relation with 50 tuples and 4 attributes" — can you translate that to plain language? (A table with 50 rows and 4 columns.)

Keys: What Makes This Thing This Thing?

In an RPG, every item in your inventory has a unique ID under the hood. You might have two "Iron Swords" — they look identical — but the game engine tracks them as Item #4021 and Item #4022. Without that ID, the engine can't tell them apart. Drop one, and it might delete both.

Databases have the same problem, and the same solution.

Primary Key (PK)
A key unlocks one specific door and no other

A column that uniquely identifies each row. No duplicates allowed. No blanks allowed. Customer_ID "001" points to exactly one customer, forever. This is Aristotelian substance made technically precise: what makes this customer this customer is its key.

Foreign Key (FK)
"Foreign" = belongs to another table, like a foreign passport — a visitor carrying a reference back home

A column in one table that points to the primary key of another table. The Order table has a column called Customer_ID — that's a foreign key pointing to the Customer table. It says: "this order belongs to that customer."

Without keys, a database is just a collection of spreadsheets. Keys create the web of references that lets you ask cross-cutting questions like "show me all orders from customers in Paris." That following-of-references is called a JOIN — which you'll write in Chapter 4.

Data Vault Connection

Every Hub has a primary key — a hash key (generated by running the business key through a hashing algorithm like SHA-256). Links are fundamentally tables of foreign keys: a Link_Customer_Order contains the hash keys of Hub_Customer and Hub_Order, sitting side by side. A Link is pure Aristotelian relation — a record that says "these two things were connected at this point in time."

✓ Checkpoint: What's the difference between a primary key and a foreign key? (A PK identifies a row in its own table; a FK points to a row in another table.)

Normalization: Everything in Its Right Place

"Normalize" comes from "make normal" — bring something into conformity with a standard. In databases, it means: each fact is stored in exactly one place. No redundancy. No contradiction.

Why does this matter? Look at this badly-designed table:

⚠ Redundant Data
Order 1001 | Marie | Paris  | Widget
Order 1002 | Marie | Paris  | Gadget
Order 1003 | Marie | Lyon   | Widget

Marie appears three times. In rows 1–2 she's in Paris, in row 3 she's in Lyon. Did she move? Typo? The database can't tell you. The same fact (Marie's city) is stored in multiple places, and the copies disagree. This is like a D&D character sheet where your HP is written in three different places with three different numbers.

Normalization fixes this by putting each fact in one table where it belongs. Marie's city goes in the Customer table. Orders reference Marie by her key. Change her city once, it's correct everywhere.

Data Vault = Normalization Perfected

Business keys → Hubs (and nowhere else). Relationships → Links (and nowhere else). Descriptive attributes → Satellites (and nowhere else, split by source/rate-of-change). Every fact, exactly once, in exactly the right table. If a customer moves, you add a new row to Sat_Customer_Details. The Hub doesn't change. The Links don't change.

Key Concept

"Data Vault takes normalization to its logical conclusion — it separates identity from relationships from descriptive context, so changes in one area never ripple into the others."

✓ Checkpoint: Why is redundancy dangerous in a database? (Same fact stored in multiple places can lead to contradictions — you lose trust in the data.)

OLTP vs OLAP: Two Different Games

Think of it as the difference between playing the game and analyzing your replays.

OLTP (Online Transaction Processing) is the live game. Your bank processes a transfer. Amazon records your order. A hospital registers a patient. Each operation is small, fast, and specific. Optimized for speed on individual transactions. The "T" stands for Transaction — from Latin transactio, "a driving through." One thing, done, next.

OLAP (Online Analytical Processing) is the replay analysis. "What were total sales by region last quarter?" "Which products are trending down with 25–34 year olds?" These are big, complex questions across millions of records. The "A" stands for Analytical — from Greek analusis, "a breaking apart." Taking a huge dataset apart to find patterns.

Where Data Vault Lives

Source systems (OLTP) → Staging (temporary landing zone) → Raw Data Vault (Hubs, Links, Satellites — the warehouse) → Business Vault (business rules applied) → Information Marts (star schemas for dashboards). ScaleFree's entire business is building that middle layer — the vault between operational systems and business intelligence.

Key Concept

"OLTP systems handle individual transactions fast. OLAP systems answer analytical questions across large datasets. Data Vault is an OLAP architecture — it sits between the source systems and the business intelligence layer."

✓ Checkpoint: Data Vault is an OLAP architecture — designed for analytical workloads, not transactional ones. This distinction is fundamental.

Your Vocabulary in Three Columns

What You Know (Philosophy) Database Term Data Vault
Category (universal)TableHub, Link, or Satellite
Particular (substance)RowOne business key / relationship / attribute snapshot
Accident (property)ColumnAn attribute in a Satellite
Essential identityPrimary KeyHash Key in a Hub
RelationForeign KeyThe contents of a Link
System of categoriesSchemaData Vault Model
CategorizationNormalizationHub / Link / Satellite separation
Day-to-day actsOLTPSource systems
Analysis & reflectionOLAPThe Data Vault warehouse

Exercise: Build a Data Vault in Your Head

Pick a domain — D&D, a video game you know well, a TV show — and identify:

  1. 3 Hubs — core entities with unique identifiers (business keys)
  2. 2 Links — relationships between those entities
  3. 3 Satellites — descriptive attributes that change over time

Then ask yourself: which are the primary keys? Where are the foreign keys? Which facts change and which persist?

If you can walk through this exercise verbally, you're demonstrating that you think in Data Vault — not that you memorized a definition.