From Pieces
to a Working Vault
Build it. Load it. Query it. Then see how the automation does the same thing.
You know what Hubs are. You know what Satellites track. You know how Links connect them. Today you build all five and wire them together.
“Integration” — from Latin integrare, to make whole. The parts exist. Today they become one working system.
INSERT INTO = put this row into the table.
Parentheses after table name = which columns you’re filling.
VALUES = the actual data you’re putting in.
Positional mapping = first column gets first value, second gets second, and so on.
“Write the CREATE TABLE statement for Hub_Character — a Hub tracking character business keys.”
hub_character_hk — the hash key. CHAR(64) because SHA2 produces a 64-character hex string. NOT NULL because every row must have an identity.
load_date — when this row arrived. TIMESTAMP type. NOT NULL because every row must be time-stamped.
record_source — where this row came from. VARCHAR(100) for flexibility. NOT NULL because provenance is mandatory.
business_key — the real-world identifier (e.g., a character name). VARCHAR(255). NOT NULL because the business key IS the reason the Hub exists.
PRIMARY KEY — the hash key uniquely identifies each Hub row. One character, one row, forever.
Five tables. One dependency chain. You know the pattern — write them from memory.
Write Exercise 1 — Hub_Character
Write CREATE TABLE for Hub_Character. 4 columns, correct types, correct constraints, correct PK.
Write Exercise 2 — Hub_Faction
Same pattern, new entity. Write Hub_Faction.
Write Exercise 3 — Link_Character_Faction
The Link connects both Hubs. 5 columns: its own hash key, load_date, record_source, and two foreign hash keys.
Write Exercise 4 — Sat_Character_Details
Satellite for character_name, race, and level. Remember: composite PK, hash_diff, and nullable descriptive columns. No load_end_date.
Write Exercise 5 — Sat_Faction_Details
Last one. Satellite for faction_name, hold, and alignment.
Two Hubs — same structure. Two Satellites — same structure. One Link — same structure. The patterns repeat. This is why datavault4dbt works.
Five tables built. The empty vault is ready.
Chunk 1 of 3 complete
The vault exists but it is empty. Time to put some Nords in it.
“Insert Ulfric Stormcloak into Hub_Character.”
INSERT INTO — the destination table. Where this row will live.
Column list in parentheses — which blanks you’re filling. Same columns as CREATE TABLE defined.
VALUES — what goes in each blank, same order as the column list. First column gets first value, second gets second.
Write Exercise 6 — Hub INSERT
Insert Ulfric Stormcloak into Hub_Character. Use ‘a1b2c3d4...’ as a placeholder hash key. Load date: ‘2026-03-01 10:00:00’. Source: ‘Skyrim_NPC_DB’.
Write Exercise 7 — Link INSERT
Ulfric belongs to the Stormcloaks. Insert this relationship into Link_Character_Faction. Use ‘lnk001...’ as the link hash key. Use ‘a1b2c3d4...’ for Ulfric’s hub hash key and ‘fac001...’ for the Stormcloaks hub hash key.
The Link’s foreign hash keys (hub_character_hk, hub_faction_hk) must match values that already exist in the respective Hub tables. The INSERT order mirrors the CREATE TABLE order — you cannot record a relationship between entities that have not been registered.
Write Exercise 8 — Satellite INSERT
Give Ulfric attributes: character_name ‘Ulfric Stormcloak’, race ‘Nord’, level 50. Use ‘hdiff001...’ as the hash_diff.
| hub_character_hk | load_date | character_name | race | level |
|---|---|---|---|---|
| a1b2c3d4... | 2026-03-01 10:00:00 | Ulfric Stormcloak | Nord | 50 |
| a1b2c3d4... | 2026-03-02 10:00:00 | Ulfric Stormcloak | Nord | 52 |
Same hub_character_hk. Different load_dates. Both rows coexist — that is the history. Insert-only. The old row is never updated or deleted.
Four tables. Three JOINs. One answer. The hash keys are the rails.
What faction does Ulfric belong to, and what hold is that faction based in?
Start with the table that has Ulfric’s name…
| character_name | faction_name | hold |
|---|---|---|
| Ulfric Stormcloak | Stormcloaks | Eastmarch |
Four tables. Three JOINs. One answer. Every JOIN followed a hash key — from Hub through Link to another Hub to its Satellite. The hash keys are the rails the JOINs ride on.
Write Exercise 9 — GROUP BY Query Bonus
How many characters are in each faction? Write a query using GROUP BY.
Data loaded, queried across 4 tables. The vault is alive.
Chunk 2 of 3 complete
You just built this by hand. datavault4dbt means you almost never have to.
A configuration file that generates SQL. You write 7 lines. The macro generates 50+.
You tell datavault4dbt three things: what to name the hash key, which column to hash, and where to find the data. It generates the full INSERT SQL with deduplication, hashing, and insert-only guards.
| Line | What it says | What it means |
|---|---|---|
config(materialized=’incremental’) |
This table is incremental | Only process new records each run. Matches DV insert-only. |
set yaml_metadata |
Here is the configuration | Defines a variable with the model’s settings. |
hashkey: ’hub_character_hk’ |
Hash key column name | Same column you used in CREATE TABLE. |
business_keys: - character_name |
Hash this column | The macro runs SHA2(UPPER(TRIM(...)), 256) on it. |
source_models: stage_character |
Read from this staging model | Upstream dbt model with cleaned source data. |
datavault4dbt.hub(...) |
Generate Hub loading SQL | One macro call expands to ~50 lines of INSERT SQL. |
The Satellite model uses parent_hashkey instead of hashkey. The Hub OWNS its hash key (primary key, generated from business key). The Satellite BORROWS its parent’s hash key (foreign key reference). hashkey = generated by this entity (the Hub’s own primary key). parent_hashkey = generated by the parent entity, referenced here as a foreign key.
Exercise 1 — dbt Model Components
Match each dbt model component to what it does.
config(materialized=’incremental’)hashkey: ’hub_character_hk’business_keys: - character_namesource_models: stage_characterdatavault4dbt.hub(...)dbt run is like compile and execute. The model file is the source code. The compiled SQL is the machine code. Snowflake is the processor.
Understanding the SQL is why you can debug when the automation breaks.
7 lines of configuration. The macro generates everything you wrote by hand. Understanding the SQL is why you can debug when the automation breaks — or explain it to a client.
This chapter covers both sides of the same coin: hand-written Data Vault SQL and the datavault4dbt automation that generates it. Understanding the SQL is what makes the automation debuggable.
You see both sides. The hand-written SQL and the automation that generates it.
Chunk 3 of 3 complete