What Exists, and How Do We Talk About It?
Relational databases through the lens of ontology
You Already Know This (In the Wrong Language)
Mapping Aristotle's Categories to Data Vault — Hubs as substance, Satellites as accidents, Links as relation — isn't just a metaphor. It reveals a structural isomorphism: two systems, 2,400 years apart, converging on the same architecture because the problem is the same.
The entire field of relational databases rests on that same ontological foundation. This chapter translates what you already understand into the vocabulary the Data Vault community uses.
Three new concepts per section. No more.
Tables, Rows, Columns: The Ontological Furniture
In 1970, a mathematician named Edgar Codd proposed a way to organize data using relations — a term from set theory meaning "a set of tuples that all share the same structure." In plain language: a table where every row has the same columns. That's why it's called a relational database — not because things are "related to each other" (though they are), but because each table is a mathematical relation.
A structured collection of data about one type of thing. One table = one category of entity. The Customer table is the category "customer." Think of it as an Aristotelian universal — a class under which particulars fall.
One specific instance of whatever the table represents. One customer. One order. One flight. If the table is the category (universal), the row is the particular — the individual substance that falls under it.
Name, City, Email — each is a column. Columns are the accidents in the Aristotelian sense: properties that describe a substance but aren't essential to its identity. Marie's city can change from Paris to Lyon; she's still Marie.
Hubs, Links, and Satellites are all tables. A Hub_Customer is a table. A Sat_Customer_Details is a table. Data Vault tells you which kinds of tables to create and why. The relational database is the medium they live in.
Keys: What Makes This Thing This Thing?
In an RPG, every item in your inventory has a unique ID under the hood. You might have two "Iron Swords" — they look identical — but the game engine tracks them as Item #4021 and Item #4022. Without that ID, the engine can't tell them apart. Drop one, and it might delete both.
Databases have the same problem, and the same solution.
A column that uniquely identifies each row. No duplicates allowed. No blanks allowed. Customer_ID "001" points to exactly one customer, forever. This is Aristotelian substance made technically precise: what makes this customer this customer is its key.
A column in one table that points to the primary key of another table. The Order table has a column called Customer_ID — that's a foreign key pointing to the Customer table. It says: "this order belongs to that customer."
Without keys, a database is just a collection of spreadsheets. Keys create the web of references that lets you ask cross-cutting questions like "show me all orders from customers in Paris." That following-of-references is called a JOIN — which you'll write in Chapter 4.
Every Hub has a primary key — a hash key (generated by running the business key through a hashing algorithm like SHA-256). Links are fundamentally tables of foreign keys: a Link_Customer_Order contains the hash keys of Hub_Customer and Hub_Order, sitting side by side. A Link is pure Aristotelian relation — a record that says "these two things were connected at this point in time."
Normalization: Everything in Its Right Place
"Normalize" comes from "make normal" — bring something into conformity with a standard. In databases, it means: each fact is stored in exactly one place. No redundancy. No contradiction.
Why does this matter? Look at this badly-designed table:
Order 1002 | Marie | Paris | Gadget
Order 1003 | Marie | Lyon | Widget
Marie appears three times. In rows 1–2 she's in Paris, in row 3 she's in Lyon. Did she move? Typo? The database can't tell you. The same fact (Marie's city) is stored in multiple places, and the copies disagree. This is like a D&D character sheet where your HP is written in three different places with three different numbers.
Normalization fixes this by putting each fact in one table where it belongs. Marie's city goes in the Customer table. Orders reference Marie by her key. Change her city once, it's correct everywhere.
Business keys → Hubs (and nowhere else). Relationships → Links (and nowhere else). Descriptive attributes → Satellites (and nowhere else, split by source/rate-of-change). Every fact, exactly once, in exactly the right table. If a customer moves, you add a new row to Sat_Customer_Details. The Hub doesn't change. The Links don't change.
"Data Vault takes normalization to its logical conclusion — it separates identity from relationships from descriptive context, so changes in one area never ripple into the others."
OLTP vs OLAP: Two Different Games
Think of it as the difference between playing the game and analyzing your replays.
OLTP (Online Transaction Processing) is the live game. Your bank processes a transfer. Amazon records your order. A hospital registers a patient. Each operation is small, fast, and specific. Optimized for speed on individual transactions. The "T" stands for Transaction — from Latin transactio, "a driving through." One thing, done, next.
OLAP (Online Analytical Processing) is the replay analysis. "What were total sales by region last quarter?" "Which products are trending down with 25–34 year olds?" These are big, complex questions across millions of records. The "A" stands for Analytical — from Greek analusis, "a breaking apart." Taking a huge dataset apart to find patterns.
Source systems (OLTP) → Staging (temporary landing zone) → Raw Data Vault (Hubs, Links, Satellites — the warehouse) → Business Vault (business rules applied) → Information Marts (star schemas for dashboards). ScaleFree's entire business is building that middle layer — the vault between operational systems and business intelligence.
"OLTP systems handle individual transactions fast. OLAP systems answer analytical questions across large datasets. Data Vault is an OLAP architecture — it sits between the source systems and the business intelligence layer."
Your Vocabulary in Three Columns
| What You Know (Philosophy) | Database Term | Data Vault |
|---|---|---|
| Category (universal) | Table | Hub, Link, or Satellite |
| Particular (substance) | Row | One business key / relationship / attribute snapshot |
| Accident (property) | Column | An attribute in a Satellite |
| Essential identity | Primary Key | Hash Key in a Hub |
| Relation | Foreign Key | The contents of a Link |
| System of categories | Schema | Data Vault Model |
| Categorization | Normalization | Hub / Link / Satellite separation |
| Day-to-day acts | OLTP | Source systems |
| Analysis & reflection | OLAP | The Data Vault warehouse |
Exercise: Build a Data Vault in Your Head
Pick a domain — D&D, a video game you know well, a TV show — and identify:
- 3 Hubs — core entities with unique identifiers (business keys)
- 2 Links — relationships between those entities
- 3 Satellites — descriptive attributes that change over time
Then ask yourself: which are the primary keys? Where are the foreign keys? Which facts change and which persist?
If you can walk through this exercise verbally, you're demonstrating that you think in Data Vault — not that you memorized a definition.