Data Vault Foundations · Chapter 3

SQL: The Question Language

You built the Skyrim Data Vault. Now learn the grammar for asking it questions. SQL isn't code — it's a way of describing what you want to know.

10 sections 5 key concepts 3 exercises Builds on Tutorials 1 & 2

What SQL Actually Is

SQL isn't code. It's a question language. You describe what you want to know, and the database figures out how to find it. You already know the tables — Hub_Character, Sat_Character_Details. SQL is just the grammar for asking them questions.

Language Bridge

SQL stands for Structured Query Language. Three words, each one important:

Structured — has a fixed grammatical form. Every question follows the same sentence pattern. Not freeform — more like a legal filing format than a conversation.

Query — from Latin quaerere, to ask, to seek. Same root as "question." A query IS a question asked to the database.

Language — but not a programming language in the usual sense. Programming languages tell the computer how to do something step by step. SQL tells the database what you want and lets the database figure out the how. The distinction: a recipe (procedural) vs. an order at a restaurant (declarative). You say "give me all Stormcloak characters" — you don't say "open the table, go to row 1, check if faction = Stormcloak..."

4-Layer Explanation
Simple For anyone, no background needed

SQL is how you ask a database to show you specific information. You describe what you want, the database finds it.

Mechanism How it works under the hood

SQL has a fixed sentence structure — you say which columns you want (SELECT), which table to look in (FROM), and optionally a filter condition (WHERE). The database engine reads your query, figures out the fastest way to find matching rows, and returns them as a result set.

Technical Correct terminology, full detail

SQL is a declarative language for relational database management. It operates on tables (relations) through set-based operations — a query defines a result set, not a procedural sequence. The query optimizer translates the declarative statement into an execution plan. The standard is defined by ISO/IEC 9075; dialects exist (PostgreSQL, Snowflake, T-SQL) but core SELECT/FROM/WHERE syntax is universal.

Expert Why this matters for Data Vault

SQL's declarative nature is why it pairs so well with Data Vault's automation philosophy. datavault4dbt generates SQL — it writes the SELECT, JOIN, and INSERT statements that load Hubs, Links, and Satellites. When ScaleFree builds a DV implementation, the SQL is largely generated from metadata, not hand-written. Understanding SQL means you can read and debug what the automation produces — not that you hand-write every query.

Key Concept
"SQL is the language you use to ask a database questions. It stands for Structured Query Language — structured because every query follows a fixed grammatical pattern, and query because it's fundamentally about asking for data back. You describe what you want, and the database figures out how to get it."

SELECT — What Do You Want to See?

In a video game: "Select your character." In SQL: "Select which columns you want the database to show you." Same word. Same action. Choose from the available options.

Language Bridge

Select — everyday meaning: to choose from available options. "Select all that apply" on a form. "Select your starting class" in a character creator.

Why this word for this concept: Because that's literally what it does — you choose which columns (which pieces of information) you want the database to show you. Out of all the columns in a table, you select the ones relevant to your question.

Here's the Hub_Character table you built in Chapter 1. All four columns, five characters:

Hub_Character — Full Table
hub_character_hkbusiness_keyload_daterecord_source
a3f7...Ulfric2026-01-01skyrim_crm
b82c...Lydia2026-01-01skyrim_crm
d91e...Nazeem2026-01-01skyrim_crm
f4a1...Aela2026-01-01skyrim_crm
c7b3...Balgruuf2026-01-01skyrim_crm

What if you only want the character names — not the hash keys, not the dates? You select just that one column:

The Question

"Give me just the character names."

In SQL
SELECT business_key
FROM Hub_Character;
Execute Query
Result — 1 column, 5 rows
business_key
Ulfric
Lydia
Nazeem
Aela
Balgruuf

The other three columns exist in the table but aren't shown — because they weren't selected.

Want two columns? List them with a comma:

SQL
SELECT business_key, load_date
FROM Hub_Character;

SELECT doesn't change the table. It doesn't delete columns or filter rows. It defines what the result looks like. The table underneath stays exactly the same. SELECT is a camera angle, not an edit.

FROM — Where to Look

"Where are you from?" "From the filing cabinet." FROM is the origin. In a database with dozens of tables — Hub_Character, Hub_Weapon, Sat_Character_Details, Link_Character_Weapon — FROM points at the one you're asking about.

Language Bridge

From — everyday meaning: the origin or source. "Where is this from?" "Which table does the data come from?"

Why this word: Because it specifies which table the data comes from. Every Hub, every Satellite, every Link is its own table. FROM Hub_Character means "look in the Hub for characters." FROM Sat_Character_Details means "look in the Satellite that stores character attributes."

In your Skyrim Data Vault, you have multiple tables. FROM is how you point at one. Without FROM, SQL doesn't know where to look — it's like asking "give me the race" without saying which character sheet to check.

Your Skyrim Tables — Each One Is a FROM Target

Hub_Character — who exists (identity)

Sat_Character_Details — what they're like (attributes)

Hub_Weapon — which weapons exist

Link_Character_Weapon — who carries what

Every one of these is a table. Every one can go after FROM.

WHERE — The Filter

WHERE is propositional logic in disguise — same truth tables, same AND/OR/NOT from symbolic logic. The syntax is new. The machinery is identical.

Language Bridge

Where — not spatial "where." Conditional "where." Think of it as: "under what circumstances should a row show up?" WHERE sets the condition that rows must satisfy to be included.

Technically: WHERE is a boolean filter. Every row in the table is tested against the condition. True = included. False = excluded. This is exactly propositional logic — the condition evaluates to true or false for each row, same as truth tables.

Here's Sat_Character_Details — the Satellite with all the character attributes:

Sat_Character_Details — Full Table
hub_character_hkload_daterecord_sourceracelevelfactionhold
a3f7...2026-01-01skyrim_crmNord30StormcloakEastmarch
b82c...2026-01-01skyrim_crmNord25CompanionsWhiterun
d91e...2026-01-01skyrim_crmRedguard10NoneWhiterun
f4a1...2026-01-01skyrim_crmNord28CompanionsWhiterun
c7b3...2026-01-01skyrim_crmNord35JarlWhiterun
The Question

"Show me all the Nords."

In SQL
SELECT race, level, faction
FROM Sat_Character_Details
WHERE race = 'Nord';
Execute Query
Result — 4 rows pass the filter
racelevelfaction
Nord30Stormcloak
Nord25Companions
Nord28Companions
Nord35Jarl

Nazeem (Redguard) excluded — the WHERE condition evaluated to false for that row.

A Harder Question

"Show me Nords above level 25."

In SQL — two conditions joined with AND
SELECT race, level, faction
FROM Sat_Character_Details
WHERE race = 'Nord' AND level > 25;
Execute Query
Result — 3 rows
racelevelfaction
Nord30Stormcloak
Nord28Companions
Nord35Jarl

Lydia excluded — she's level 25, not above 25. Nazeem excluded — not Nord.

The Three-Part Pattern

Every SQL query follows the same sentence:

SELECT
What to show
race, level, faction
FROM
Where to look
Sat_Character_Details
WHERE
Which rows to include
race = 'Nord'

"Give me [SELECT] from [FROM] only if [WHERE]." This is the single most important pattern in SQL Part 1. Every query is a variation on this sentence. WHERE is optional — without it, you get all rows.

Key Concept
"SELECT defines which columns you want to see — like choosing which fields on a character sheet to look at. FROM tells the database which table to look in. WHERE is the filter — it's a boolean condition that each row is tested against. Only rows where the condition is true show up in the result."

Operators in WHERE

Comparison Operators
=
equals
WHERE race = 'Nord'
!= or <>
not equal to
WHERE race != 'Redguard'
<
less than
WHERE level < 20
>
greater than
WHERE level > 25
<=
less than or equal to
WHERE level <= 30
>=
greater than or equal to
WHERE level >= 10
Logical Operators — Same as Propositional Logic
AND
both conditions must be true — conjunction
WHERE race = 'Nord' AND level > 25
OR
at least one condition must be true — disjunction
WHERE faction = 'Stormcloak' OR faction = 'Companions'
NOT
negation
WHERE NOT race = 'Nord'

These are the same connectives from propositional logic. AND = conjunction (both true). OR = disjunction (at least one true). NOT = negation. Same truth table behavior as symbolic logic — the syntax is new, the machinery is identical.

Special Operators
LIKE
pattern matching with wildcards — % = any characters, _ = one character
WHERE business_key LIKE 'U%' — starts with U (matches Ulfric)
IN
matches any value in a list — shorthand for multiple ORs
WHERE faction IN ('Stormcloak', 'Companions')
BETWEEN
within a range (inclusive on both ends)
WHERE level BETWEEN 20 AND 30 — levels 20 through 30
IS NULL
checks for missing/unknown values (more on this in the NULL section)
WHERE faction IS NULL

String Literals and SELECT *

The Quoting Rule

Values (the actual data you're comparing against) go in single quotes. Column names and table names do not get quotes:

'Nord'
Value — single quotes
race
Column name — no quotes
Why This Matters

Without quotes, the database thinks you're referring to a column name. WHERE race = Nord would make the database look for a column called Nord — which doesn't exist. WHERE race = 'Nord' tells it you're looking for the literal text value "Nord."

Single quotes = "I mean this actual value." No quotes = "I mean this column or table." That's the whole rule for now.

SELECT * — "Give Me Everything"

The asterisk * is a wildcard — it means "all columns."

SQL
SELECT *
FROM Hub_Character;

Returns all four columns, all five rows. Useful for exploring — "what does this table even look like?" But in production, you name the columns you need. It's more readable and performs better.

Key Concept
"SELECT * is fine for exploration, but in production you name the columns you need — it's more readable and performs better."

AS, ORDER BY, LIMIT

Column Aliases (AS)

Language Bridge

Alias — from Latin alius, meaning "other." An alternate name for the same thing. A character alias in a spy movie. An alias in a witness protection program.

In SQL: AS gives a column a different name in the result, without changing the actual column name in the table. The alias exists only in this particular query's output.

The Question

"Show me character names and when they first appeared, but use human-readable column headers."

In SQL
SELECT business_key AS character_name,
       load_date AS first_seen
FROM Hub_Character;
Execute Query
Result
character_namefirst_seen
Ulfric2026-01-01
Lydia2026-01-01
Nazeem2026-01-01
Aela2026-01-01
Balgruuf2026-01-01

The column headers say character_name and first_seen instead of business_key and load_date. Same data, better labels. The table itself is unchanged.

ORDER BY — Sorting the Result

Language Bridge

Order by — "arrange according to some criterion." "Order by date" on a filing system. "Sort by relevance" on a search engine. It sorts the result rows according to a column's values.

Ascending (default) — lowest first
SELECT race, level, faction
FROM Sat_Character_Details
ORDER BY level;
Descending — highest first
SELECT race, level, faction
FROM Sat_Character_Details
ORDER BY level DESC;

You can sort by multiple columns: ORDER BY faction, level DESC sorts by faction first, then by level (highest first) within each faction.

ORDER BY changes the order of rows in the result. It doesn't filter (that's WHERE) and it doesn't change which columns appear (that's SELECT). Each clause has one job.

LIMIT — "Just the First N"

Language Bridge

Limit — a restriction on quantity. "Speed limit." "Limit 2 per customer." It restricts how many rows the result returns.

The Question

"Who are the top 3 highest-level characters?"

In SQL
SELECT race, level, faction
FROM Sat_Character_Details
ORDER BY level DESC
LIMIT 3;
Execute Query
Result — top 3
racelevelfaction
Nord35Jarl
Nord30Stormcloak
Nord28Companions

Dialect note: LIMIT is PostgreSQL/MySQL/Snowflake syntax. SQL Server uses TOP. The concept is identical. For practical purposes, LIMIT is fine — ScaleFree and most modern data platforms work primarily with Snowflake, which supports LIMIT.

The Full Clause Order

SELECT what columns to show Required
FROM which table to look in Required
WHERE which rows to include Optional
ORDER BY how to sort the result Optional
LIMIT how many rows to return Optional

This order is fixed — SQL requires these clauses in this sequence. The minimum query is SELECT ... FROM ...;

NULL — The Unknown

NULL doesn't mean zero. It doesn't mean blank. NULL means "we don't know." The database is being honest — this value is missing or unknown, and pretending otherwise would be a lie.

Language Bridge

Null — from Latin nullus, meaning "none." But in SQL, the everyday meaning is misleading. NULL does NOT mean zero or empty string. It means unknown or missing. The database doesn't know what the value is.

Consider Sat_Character_Details. If a character's faction is NULL, it doesn't mean they have no faction. It means the source system didn't provide that information. Big difference — especially for auditability, which is the whole point of Data Vault.

The IS NULL Rule

You CANNOT use = NULL. It doesn't work the way you'd expect. Because NULL means "unknown," the question "does unknown equal unknown?" can't be answered — it's not true OR false, it's unknown. So = NULL never evaluates to true.

WRONG vs CORRECT
-- WRONG (always returns no results):
SELECT * FROM Sat_Character_Details WHERE faction = NULL;

-- CORRECT:
SELECT * FROM Sat_Character_Details WHERE faction IS NULL;
4-Layer Explanation — NULL
Simple For anyone

NULL means "we don't know this value." Not zero, not blank — genuinely unknown.

Mechanism Three-valued logic

SQL uses three-valued logic, not two-valued. Every comparison can be TRUE, FALSE, or UNKNOWN. When NULL is involved, the result is UNKNOWN — and UNKNOWN rows don't appear in results. That's why WHERE faction = NULL returns nothing: the comparison produces UNKNOWN, not TRUE.

Technical Codd's relational model

NULL in SQL follows Codd's relational model (E.F. Codd, who invented the relational database). NULL represents the absence of a value in a column. Any arithmetic or comparison operation involving NULL propagates NULL (NULL + 5 = NULL, NULL = NULL yields UNKNOWN). The only way to test for NULL is the IS operator (IS NULL, IS NOT NULL), which is purpose-built for this check.

Expert Edge cases and DV implications

NULL semantics are one of SQL's most debated design decisions. The three-valued logic it introduces causes subtle bugs — NULLs in NOT IN clauses, NULLs in aggregates (COUNT(*) counts NULLs, COUNT(column) doesn't), NULLs in UNIQUE constraints (most databases allow multiple NULLs in a UNIQUE column because NULL != NULL). In Data Vault, Hub columns are NOT NULL by convention — a Hub row with a missing business key is meaningless. Satellites allow NULLs in attribute columns because the source system might not provide every field.

Data Vault connection: Hub columns are always NOT NULL — every Hub row must have a hash key, business key, load date, record source. Satellite attribute columns can be NULL — the source system might not provide race, level, or faction for every character. NULL preserves honesty — better to say "we don't know" than to fill in a guess.

How SQL Connects to Data Vault

SQL is not a separate topic from Data Vault. SQL is how you read the Data Vault model you already built. Hub_Character is a table. SELECT * FROM Hub_Character; reads it.

SQL Concept → Data Vault Connection
SQL ConceptWhat It Touches in DV
SELECT columnsChoosing which Hub/Sat columns to see (hash_key, business_key, race, level...)
FROM tablePointing at a specific Hub, Satellite, or Link
WHERE conditionFiltering rows — "only Nords," "only records from skyrim_crm," "only level > 25"
AS (aliases)Renaming DV-technical column names to human-readable ones
NULLWhat it means when a Satellite attribute has no value (unknown, not zero)
Key Concept
Every Hub, Link, and Satellite is a table in the database. SQL is what reads them. SELECT pulls columns from a Hub or Satellite, WHERE filters to specific records, and JOINs follow the foreign keys that connect Hubs to their Satellites and Links.
After This Chapter
This chapter covers SQL through Data Vault tables — Hubs, Links, and Satellites as the practice domain. By the end, reading a query and explaining what it does should feel natural, and writing basic SELECT/WHERE statements should be comfortable. JOINs and more complex queries come next.
Final Exercise — You Know Enough For This

Read, Write, Debug

Three tasks. Each one uses what you just learned. All completable right here.

Task 1 — Read This Query

What does this query return? Describe it in plain English.

Read this
SELECT race, faction, hold
FROM Sat_Character_Details
WHERE hold = 'Whiterun';

Answer: This query selects the race, faction, and hold columns from the Sat_Character_Details table, filtering to only show rows where the hold is Whiterun. The result is 4 rows: Lydia (Nord, Companions), Nazeem (Redguard, None), Aela (Nord, Companions), and Balgruuf (Nord, Jarl). Ulfric is excluded because his hold is Eastmarch, not Whiterun.

Task 2 — Write a Query

Write the SQL to answer this question: "Give me all characters from the Companions faction."

(Hint: work with Sat_Character_Details only — we'll learn how to connect to Hub_Character names in Day 4.)

Hint: SELECT the columns you'd want to see, FROM the right table, WHERE the faction matches.

Answer
SELECT race, level, faction
FROM Sat_Character_Details
WHERE faction = 'Companions';

Any valid SELECT columns work — the key parts are FROM Sat_Character_Details and WHERE faction = 'Companions' (with single quotes).

Task 3 — Spot the Bug

This query has an error. What's wrong with it?

Broken query
SELECT race, level
FROM Sat_Character_Details
WHERE faction = NULL;

The bug: = NULL should be IS NULL. You can't use = to compare against NULL because NULL means "unknown" — the question "does unknown equal unknown?" produces UNKNOWN, not TRUE. So WHERE faction = NULL always returns zero rows. The correct syntax is WHERE faction IS NULL.

All Three Complete

You can read a query, write one, and debug one. That's SQL Part 1. Next up: JOINs — following foreign keys across Hubs, Links, and Satellites.