SQL: The Question Language
You built the Skyrim Data Vault. Now learn the grammar for asking it questions. SQL isn't code — it's a way of describing what you want to know.
What SQL Actually Is
SQL isn't code. It's a question language. You describe what you want to know, and the database figures out how to find it. You already know the tables — Hub_Character, Sat_Character_Details. SQL is just the grammar for asking them questions.
SQL stands for Structured Query Language. Three words, each one important:
Structured — has a fixed grammatical form. Every question follows the same sentence pattern. Not freeform — more like a legal filing format than a conversation.
Query — from Latin quaerere, to ask, to seek. Same root as "question." A query IS a question asked to the database.
Language — but not a programming language in the usual sense. Programming languages tell the computer how to do something step by step. SQL tells the database what you want and lets the database figure out the how. The distinction: a recipe (procedural) vs. an order at a restaurant (declarative). You say "give me all Stormcloak characters" — you don't say "open the table, go to row 1, check if faction = Stormcloak..."
SQL is how you ask a database to show you specific information. You describe what you want, the database finds it.
SQL has a fixed sentence structure — you say which columns you want (SELECT), which table to look in (FROM), and optionally a filter condition (WHERE). The database engine reads your query, figures out the fastest way to find matching rows, and returns them as a result set.
SQL is a declarative language for relational database management. It operates on tables (relations) through set-based operations — a query defines a result set, not a procedural sequence. The query optimizer translates the declarative statement into an execution plan. The standard is defined by ISO/IEC 9075; dialects exist (PostgreSQL, Snowflake, T-SQL) but core SELECT/FROM/WHERE syntax is universal.
SQL's declarative nature is why it pairs so well with Data Vault's automation philosophy. datavault4dbt generates SQL — it writes the SELECT, JOIN, and INSERT statements that load Hubs, Links, and Satellites. When ScaleFree builds a DV implementation, the SQL is largely generated from metadata, not hand-written. Understanding SQL means you can read and debug what the automation produces — not that you hand-write every query.
"SQL is the language you use to ask a database questions. It stands for Structured Query Language — structured because every query follows a fixed grammatical pattern, and query because it's fundamentally about asking for data back. You describe what you want, and the database figures out how to get it."
SELECT — What Do You Want to See?
In a video game: "Select your character." In SQL: "Select which columns you want the database to show you." Same word. Same action. Choose from the available options.
Select — everyday meaning: to choose from available options. "Select all that apply" on a form. "Select your starting class" in a character creator.
Why this word for this concept: Because that's literally what it does — you choose which columns (which pieces of information) you want the database to show you. Out of all the columns in a table, you select the ones relevant to your question.
Here's the Hub_Character table you built in Chapter 1. All four columns, five characters:
| hub_character_hk | business_key | load_date | record_source |
|---|---|---|---|
| a3f7... | Ulfric | 2026-01-01 | skyrim_crm |
| b82c... | Lydia | 2026-01-01 | skyrim_crm |
| d91e... | Nazeem | 2026-01-01 | skyrim_crm |
| f4a1... | Aela | 2026-01-01 | skyrim_crm |
| c7b3... | Balgruuf | 2026-01-01 | skyrim_crm |
What if you only want the character names — not the hash keys, not the dates? You select just that one column:
"Give me just the character names."
SELECT business_key FROM Hub_Character;
| business_key |
|---|
| Ulfric |
| Lydia |
| Nazeem |
| Aela |
| Balgruuf |
The other three columns exist in the table but aren't shown — because they weren't selected.
Want two columns? List them with a comma:
SELECT business_key, load_date FROM Hub_Character;
SELECT doesn't change the table. It doesn't delete columns or filter rows. It defines what the result looks like. The table underneath stays exactly the same. SELECT is a camera angle, not an edit.
FROM — Where to Look
"Where are you from?" "From the filing cabinet." FROM is the origin. In a database with dozens of tables — Hub_Character, Hub_Weapon, Sat_Character_Details, Link_Character_Weapon — FROM points at the one you're asking about.
From — everyday meaning: the origin or source. "Where is this from?" "Which table does the data come from?"
Why this word: Because it specifies which table the data comes from. Every Hub, every Satellite, every Link is its own table. FROM Hub_Character means "look in the Hub for characters." FROM Sat_Character_Details means "look in the Satellite that stores character attributes."
In your Skyrim Data Vault, you have multiple tables. FROM is how you point at one. Without FROM, SQL doesn't know where to look — it's like asking "give me the race" without saying which character sheet to check.
Hub_Character — who exists (identity)
Sat_Character_Details — what they're like (attributes)
Hub_Weapon — which weapons exist
Link_Character_Weapon — who carries what
Every one of these is a table. Every one can go after FROM.
WHERE — The Filter
WHERE is propositional logic in disguise — same truth tables, same AND/OR/NOT from symbolic logic. The syntax is new. The machinery is identical.
Where — not spatial "where." Conditional "where." Think of it as: "under what circumstances should a row show up?" WHERE sets the condition that rows must satisfy to be included.
Technically: WHERE is a boolean filter. Every row in the table is tested against the condition. True = included. False = excluded. This is exactly propositional logic — the condition evaluates to true or false for each row, same as truth tables.
Here's Sat_Character_Details — the Satellite with all the character attributes:
| hub_character_hk | load_date | record_source | race | level | faction | hold |
|---|---|---|---|---|---|---|
| a3f7... | 2026-01-01 | skyrim_crm | Nord | 30 | Stormcloak | Eastmarch |
| b82c... | 2026-01-01 | skyrim_crm | Nord | 25 | Companions | Whiterun |
| d91e... | 2026-01-01 | skyrim_crm | Redguard | 10 | None | Whiterun |
| f4a1... | 2026-01-01 | skyrim_crm | Nord | 28 | Companions | Whiterun |
| c7b3... | 2026-01-01 | skyrim_crm | Nord | 35 | Jarl | Whiterun |
"Show me all the Nords."
SELECT race, level, faction FROM Sat_Character_Details WHERE race = 'Nord';
| race | level | faction |
|---|---|---|
| Nord | 30 | Stormcloak |
| Nord | 25 | Companions |
| Nord | 28 | Companions |
| Nord | 35 | Jarl |
Nazeem (Redguard) excluded — the WHERE condition evaluated to false for that row.
"Show me Nords above level 25."
SELECT race, level, faction FROM Sat_Character_Details WHERE race = 'Nord' AND level > 25;
| race | level | faction |
|---|---|---|
| Nord | 30 | Stormcloak |
| Nord | 28 | Companions |
| Nord | 35 | Jarl |
Lydia excluded — she's level 25, not above 25. Nazeem excluded — not Nord.
The Three-Part Pattern
Every SQL query follows the same sentence:
"Give me [SELECT] from [FROM] only if [WHERE]." This is the single most important pattern in SQL Part 1. Every query is a variation on this sentence. WHERE is optional — without it, you get all rows.
"SELECT defines which columns you want to see — like choosing which fields on a character sheet to look at. FROM tells the database which table to look in. WHERE is the filter — it's a boolean condition that each row is tested against. Only rows where the condition is true show up in the result."
Operators in WHERE
These are the same connectives from propositional logic. AND = conjunction (both true). OR = disjunction (at least one true). NOT = negation. Same truth table behavior as symbolic logic — the syntax is new, the machinery is identical.
% = any characters, _ = one characterString Literals and SELECT *
The Quoting Rule
Values (the actual data you're comparing against) go in single quotes. Column names and table names do not get quotes:
Without quotes, the database thinks you're referring to a column name. WHERE race = Nord would make the database look for a column called Nord — which doesn't exist. WHERE race = 'Nord' tells it you're looking for the literal text value "Nord."
Single quotes = "I mean this actual value." No quotes = "I mean this column or table." That's the whole rule for now.
SELECT * — "Give Me Everything"
The asterisk * is a wildcard — it means "all columns."
SELECT * FROM Hub_Character;
Returns all four columns, all five rows. Useful for exploring — "what does this table even look like?" But in production, you name the columns you need. It's more readable and performs better.
"SELECT * is fine for exploration, but in production you name the columns you need — it's more readable and performs better."
AS, ORDER BY, LIMIT
Column Aliases (AS)
Alias — from Latin alius, meaning "other." An alternate name for the same thing. A character alias in a spy movie. An alias in a witness protection program.
In SQL: AS gives a column a different name in the result, without changing the actual column name in the table. The alias exists only in this particular query's output.
"Show me character names and when they first appeared, but use human-readable column headers."
SELECT business_key AS character_name, load_date AS first_seen FROM Hub_Character;
| character_name | first_seen |
|---|---|
| Ulfric | 2026-01-01 |
| Lydia | 2026-01-01 |
| Nazeem | 2026-01-01 |
| Aela | 2026-01-01 |
| Balgruuf | 2026-01-01 |
The column headers say character_name and first_seen instead of business_key and load_date. Same data, better labels. The table itself is unchanged.
ORDER BY — Sorting the Result
Order by — "arrange according to some criterion." "Order by date" on a filing system. "Sort by relevance" on a search engine. It sorts the result rows according to a column's values.
SELECT race, level, faction FROM Sat_Character_Details ORDER BY level;
SELECT race, level, faction FROM Sat_Character_Details ORDER BY level DESC;
You can sort by multiple columns: ORDER BY faction, level DESC sorts by faction first, then by level (highest first) within each faction.
ORDER BY changes the order of rows in the result. It doesn't filter (that's WHERE) and it doesn't change which columns appear (that's SELECT). Each clause has one job.
LIMIT — "Just the First N"
Limit — a restriction on quantity. "Speed limit." "Limit 2 per customer." It restricts how many rows the result returns.
"Who are the top 3 highest-level characters?"
SELECT race, level, faction FROM Sat_Character_Details ORDER BY level DESC LIMIT 3;
| race | level | faction |
|---|---|---|
| Nord | 35 | Jarl |
| Nord | 30 | Stormcloak |
| Nord | 28 | Companions |
Dialect note: LIMIT is PostgreSQL/MySQL/Snowflake syntax. SQL Server uses TOP. The concept is identical. For practical purposes, LIMIT is fine — ScaleFree and most modern data platforms work primarily with Snowflake, which supports LIMIT.
The Full Clause Order
This order is fixed — SQL requires these clauses in this sequence. The minimum query is SELECT ... FROM ...;
NULL — The Unknown
NULL doesn't mean zero. It doesn't mean blank. NULL means "we don't know." The database is being honest — this value is missing or unknown, and pretending otherwise would be a lie.
Null — from Latin nullus, meaning "none." But in SQL, the everyday meaning is misleading. NULL does NOT mean zero or empty string. It means unknown or missing. The database doesn't know what the value is.
Consider Sat_Character_Details. If a character's faction is NULL, it doesn't mean they have no faction. It means the source system didn't provide that information. Big difference — especially for auditability, which is the whole point of Data Vault.
You CANNOT use = NULL. It doesn't work the way you'd expect. Because NULL means "unknown," the question "does unknown equal unknown?" can't be answered — it's not true OR false, it's unknown. So = NULL never evaluates to true.
-- WRONG (always returns no results): SELECT * FROM Sat_Character_Details WHERE faction = NULL; -- CORRECT: SELECT * FROM Sat_Character_Details WHERE faction IS NULL;
NULL means "we don't know this value." Not zero, not blank — genuinely unknown.
SQL uses three-valued logic, not two-valued. Every comparison can be TRUE, FALSE, or UNKNOWN. When NULL is involved, the result is UNKNOWN — and UNKNOWN rows don't appear in results. That's why WHERE faction = NULL returns nothing: the comparison produces UNKNOWN, not TRUE.
NULL in SQL follows Codd's relational model (E.F. Codd, who invented the relational database). NULL represents the absence of a value in a column. Any arithmetic or comparison operation involving NULL propagates NULL (NULL + 5 = NULL, NULL = NULL yields UNKNOWN). The only way to test for NULL is the IS operator (IS NULL, IS NOT NULL), which is purpose-built for this check.
NULL semantics are one of SQL's most debated design decisions. The three-valued logic it introduces causes subtle bugs — NULLs in NOT IN clauses, NULLs in aggregates (COUNT(*) counts NULLs, COUNT(column) doesn't), NULLs in UNIQUE constraints (most databases allow multiple NULLs in a UNIQUE column because NULL != NULL). In Data Vault, Hub columns are NOT NULL by convention — a Hub row with a missing business key is meaningless. Satellites allow NULLs in attribute columns because the source system might not provide every field.
Data Vault connection: Hub columns are always NOT NULL — every Hub row must have a hash key, business key, load date, record source. Satellite attribute columns can be NULL — the source system might not provide race, level, or faction for every character. NULL preserves honesty — better to say "we don't know" than to fill in a guess.
How SQL Connects to Data Vault
SQL is not a separate topic from Data Vault. SQL is how you read the Data Vault model you already built. Hub_Character is a table. SELECT * FROM Hub_Character; reads it.
| SQL Concept | What It Touches in DV |
|---|---|
| SELECT columns | Choosing which Hub/Sat columns to see (hash_key, business_key, race, level...) |
| FROM table | Pointing at a specific Hub, Satellite, or Link |
| WHERE condition | Filtering rows — "only Nords," "only records from skyrim_crm," "only level > 25" |
| AS (aliases) | Renaming DV-technical column names to human-readable ones |
| NULL | What it means when a Satellite attribute has no value (unknown, not zero) |
Every Hub, Link, and Satellite is a table in the database. SQL is what reads them. SELECT pulls columns from a Hub or Satellite, WHERE filters to specific records, and JOINs follow the foreign keys that connect Hubs to their Satellites and Links.
This chapter covers SQL through Data Vault tables — Hubs, Links, and Satellites as the practice domain. By the end, reading a query and explaining what it does should feel natural, and writing basic SELECT/WHERE statements should be comfortable. JOINs and more complex queries come next.
Read, Write, Debug
Three tasks. Each one uses what you just learned. All completable right here.
What does this query return? Describe it in plain English.
SELECT race, faction, hold FROM Sat_Character_Details WHERE hold = 'Whiterun';
Answer: This query selects the race, faction, and hold columns from the Sat_Character_Details table, filtering to only show rows where the hold is Whiterun. The result is 4 rows: Lydia (Nord, Companions), Nazeem (Redguard, None), Aela (Nord, Companions), and Balgruuf (Nord, Jarl). Ulfric is excluded because his hold is Eastmarch, not Whiterun.
Write the SQL to answer this question: "Give me all characters from the Companions faction."
(Hint: work with Sat_Character_Details only — we'll learn how to connect to Hub_Character names in Day 4.)
Hint: SELECT the columns you'd want to see, FROM the right table, WHERE the faction matches.
SELECT race, level, faction FROM Sat_Character_Details WHERE faction = 'Companions';
Any valid SELECT columns work — the key parts are FROM Sat_Character_Details and WHERE faction = 'Companions' (with single quotes).
This query has an error. What's wrong with it?
SELECT race, level FROM Sat_Character_Details WHERE faction = NULL;
The bug: = NULL should be IS NULL. You can't use = to compare against NULL because NULL means "unknown" — the question "does unknown equal unknown?" produces UNKNOWN, not TRUE. So WHERE faction = NULL always returns zero rows. The correct syntax is WHERE faction IS NULL.