CHAPTER 12 // INTEGRATION DAY

From Pieces
to a Working Vault

Build it. Load it. Query it. Then see how the automation does the same thing.

Integration CREATE TABLE INSERT JOIN dbt Reading
From Pieces to a Working Vault
Core

You know what Hubs are. You know what Satellites track. You know how Links connect them. Today you build all five and wire them together.

LANGUAGE BRIDGE

“Integration” — from Latin integrare, to make whole. The parts exist. Today they become one working system.

IS / IS NOT — Integration Day
IS: Integration Day IS building a complete working Data Vault from scratch.
IS NOT: It IS NOT learning new SQL — every concept was covered in Sessions 3, 4, and 9.
THE FIVE TABLES
Hub_Character
4 columns
Sat_Character_Details
7 columns
Hub_Faction
4 columns
Sat_Faction_Details
7 columns
Hub_Character
Hub_Faction
NEW SYNTAX — INSERT INTO
INSERT INTO table_name (column1, column2) VALUES ('value1', 'value2');

INSERT INTO = put this row into the table.
Parentheses after table name = which columns you’re filling.
VALUES = the actual data you’re putting in.
Positional mapping = first column gets first value, second gets second, and so on.

WORKED EXAMPLE

“Write the CREATE TABLE statement for Hub_Character — a Hub tracking character business keys.”

CREATE TABLE Hub_Character ( hub_character_hk CHAR(64) NOT NULL, load_date TIMESTAMP NOT NULL, record_source VARCHAR(100) NOT NULL, business_key VARCHAR(255) NOT NULL, PRIMARY KEY (hub_character_hk) );

hub_character_hk — the hash key. CHAR(64) because SHA2 produces a 64-character hex string. NOT NULL because every row must have an identity.
load_date — when this row arrived. TIMESTAMP type. NOT NULL because every row must be time-stamped.
record_source — where this row came from. VARCHAR(100) for flexibility. NOT NULL because provenance is mandatory.
business_key — the real-world identifier (e.g., a character name). VARCHAR(255). NOT NULL because the business key IS the reason the Hub exists.
PRIMARY KEY — the hash key uniquely identifies each Hub row. One character, one row, forever.

Core

Five tables. One dependency chain. You know the pattern — write them from memory.

BUILDING: HUBS
Hub_Character
4 columns
Sat_Character_Details
7 columns
Hub_Faction
4 columns
Sat_Faction_Details
7 columns
Hub_Character
Hub_Faction
BUILDING: HUBS
Hub_Character
4 columns
Sat_Character_Details
7 columns
Hub_Faction
4 columns
Sat_Faction_Details
7 columns
Hub_Character
Hub_Faction
BUILDING: LINK
Hub_Character
4 columns
Sat_Character_Details
7 columns
Hub_Faction
4 columns
Sat_Faction_Details
7 columns
Hub_Character
Hub_Faction
BUILDING: SATELLITES
Hub_Character
4 columns
Sat_Character_Details
7 columns
Hub_Faction
4 columns
Sat_Faction_Details
7 columns
Hub_Character
Hub_Faction
BUILDING: SATELLITES
Hub_Character
4 columns
Sat_Character_Details
7 columns
Hub_Faction
4 columns
Sat_Faction_Details
7 columns
Hub_Character
Hub_Faction

Write Exercise 1 — Hub_Character

Write CREATE TABLE for Hub_Character. 4 columns, correct types, correct constraints, correct PK.

Write Exercise 2 — Hub_Faction

Same pattern, new entity. Write Hub_Faction.

Write Exercise 3 — Link_Character_Faction

The Link connects both Hubs. 5 columns: its own hash key, load_date, record_source, and two foreign hash keys.

Write Exercise 4 — Sat_Character_Details

Satellite for character_name, race, and level. Remember: composite PK, hash_diff, and nullable descriptive columns. No load_end_date.

Write Exercise 5 — Sat_Faction_Details

Last one. Satellite for faction_name, hold, and alignment.

THE PATTERN

Two Hubs — same structure. Two Satellites — same structure. One Link — same structure. The patterns repeat. This is why datavault4dbt works.

Five tables built. The empty vault is ready.

You can stop here and it is a win.

Chunk 1 of 3 complete

Core

The vault exists but it is empty. Time to put some Nords in it.

WORKED EXAMPLE

“Insert Ulfric Stormcloak into Hub_Character.”

INSERT INTO Hub_Character (hub_character_hk, load_date, record_source, business_key) VALUES ( 'a1b2c3d4e5f6...64chars...', '2026-03-01 10:00:00', 'Skyrim_NPC_DB', 'Ulfric Stormcloak' );

INSERT INTO — the destination table. Where this row will live.
Column list in parentheses — which blanks you’re filling. Same columns as CREATE TABLE defined.
VALUES — what goes in each blank, same order as the column list. First column gets first value, second gets second.

Write Exercise 6 — Hub INSERT

Insert Ulfric Stormcloak into Hub_Character. Use ‘a1b2c3d4...’ as a placeholder hash key. Load date: ‘2026-03-01 10:00:00’. Source: ‘Skyrim_NPC_DB’.

Write Exercise 7 — Link INSERT

Ulfric belongs to the Stormcloaks. Insert this relationship into Link_Character_Faction. Use ‘lnk001...’ as the link hash key. Use ‘a1b2c3d4...’ for Ulfric’s hub hash key and ‘fac001...’ for the Stormcloaks hub hash key.

INSERT ORDER MATTERS

The Link’s foreign hash keys (hub_character_hk, hub_faction_hk) must match values that already exist in the respective Hub tables. The INSERT order mirrors the CREATE TABLE order — you cannot record a relationship between entities that have not been registered.

Write Exercise 8 — Satellite INSERT

Give Ulfric attributes: character_name ‘Ulfric Stormcloak’, race ‘Nord’, level 50. Use ‘hdiff001...’ as the hash_diff.

ULFRIC LEVELS UP — History in Action
hub_character_hk load_date character_name race level
a1b2c3d4... 2026-03-01 10:00:00 Ulfric Stormcloak Nord 50
a1b2c3d4... 2026-03-02 10:00:00 Ulfric Stormcloak Nord 52

Same hub_character_hk. Different load_dates. Both rows coexist — that is the history. Insert-only. The old row is never updated or deleted.

Core

Four tables. Three JOINs. One answer. The hash keys are the rails.

THE QUESTION

What faction does Ulfric belong to, and what hold is that faction based in?

QUERY SO FAR

Start with the table that has Ulfric’s name…

QUERY SO FAR
FROM Hub_Character h WHERE h.business_key = 'Ulfric Stormcloak'
QUERY SO FAR
FROM Hub_Character h JOIN Link_Character_Faction l ON h.hub_character_hk = l.hub_character_hk WHERE h.business_key = 'Ulfric Stormcloak'
QUERY SO FAR
FROM Hub_Character h JOIN Link_Character_Faction l ON h.hub_character_hk = l.hub_character_hk JOIN Hub_Faction hf ON l.hub_faction_hk = hf.hub_faction_hk WHERE h.business_key = 'Ulfric Stormcloak'
QUERY SO FAR
FROM Hub_Character h JOIN Link_Character_Faction l ON h.hub_character_hk = l.hub_character_hk JOIN Hub_Faction hf ON l.hub_faction_hk = hf.hub_faction_hk JOIN Sat_Faction_Details sf ON hf.hub_faction_hk = sf.hub_faction_hk WHERE h.business_key = 'Ulfric Stormcloak'
STEP 1 OF 5
Which table has Ulfric’s name? Write the FROM and WHERE clauses.
STEP 2 OF 5
Connect to the Link. What column do Hub_Character and Link_Character_Faction share? Write the JOIN + ON.
STEP 3 OF 5
Connect to Hub_Faction. What column do the Link and Hub_Faction share? Write the JOIN + ON.
STEP 4 OF 5
Connect to Sat_Faction_Details for the hold. Write the JOIN + ON.
STEP 5 OF 5
What columns answer the question? Write the SELECT.
THE COMPLETE QUERY
SELECT h.business_key AS character_name, hf.business_key AS faction_name, sf.hold FROM Hub_Character h JOIN Link_Character_Faction l ON h.hub_character_hk = l.hub_character_hk JOIN Hub_Faction hf ON l.hub_faction_hk = hf.hub_faction_hk JOIN Sat_Faction_Details sf ON hf.hub_faction_hk = sf.hub_faction_hk WHERE h.business_key = 'Ulfric Stormcloak';
EXPECTED RESULT
character_name faction_name hold
Ulfric Stormcloak Stormcloaks Eastmarch

Four tables. Three JOINs. One answer. Every JOIN followed a hash key — from Hub through Link to another Hub to its Satellite. The hash keys are the rails the JOINs ride on.

Write Exercise 9 — GROUP BY Query Bonus

How many characters are in each faction? Write a query using GROUP BY.

Data loaded, queried across 4 tables. The vault is alive.

Everything below is a different kind of work. You can stop here and it is a win.

Chunk 2 of 3 complete

Important

You just built this by hand. datavault4dbt means you almost never have to.

4-Layer Explanation — dbt Hub Model File
L1Plain language — for anyone

A configuration file that generates SQL. You write 7 lines. The macro generates 50+.

THE ENTIRE HUB MODEL FILE
{{ config(materialized='incremental') }} {%- set yaml_metadata -%} hashkey: 'hub_character_hk' business_keys: - character_name source_models: stage_character {%- endset -%} {{ datavault4dbt.hub(yaml_metadata=yaml_metadata) }}
L2With mechanism — how it works

You tell datavault4dbt three things: what to name the hash key, which column to hash, and where to find the data. It generates the full INSERT SQL with deduplication, hashing, and insert-only guards.

L3Technical — full detail
LINE-BY-LINE BREAKDOWN
Line What it says What it means
config(materialized=’incremental’) This table is incremental Only process new records each run. Matches DV insert-only.
set yaml_metadata Here is the configuration Defines a variable with the model’s settings.
hashkey: ’hub_character_hk’ Hash key column name Same column you used in CREATE TABLE.
business_keys: - character_name Hash this column The macro runs SHA2(UPPER(TRIM(...)), 256) on it.
source_models: stage_character Read from this staging model Upstream dbt model with cleaned source data.
datavault4dbt.hub(...) Generate Hub loading SQL One macro call expands to ~50 lines of INSERT SQL.
L4Expert — nuance & edge cases

The Satellite model uses parent_hashkey instead of hashkey. The Hub OWNS its hash key (primary key, generated from business key). The Satellite BORROWS its parent’s hash key (foreign key reference). hashkey = generated by this entity (the Hub’s own primary key). parent_hashkey = generated by the parent entity, referenced here as a foreign key.

SATELLITE MODEL FILE — SPOT THE DIFFERENCE
{{ config(materialized='incremental') }} {%- set yaml_metadata -%} parent_hashkey: 'hub_character_hk' src_hashdiff: 'hash_diff' src_payload: - character_name - race - level source_models: stage_character {%- endset -%} {{ datavault4dbt.sat(yaml_metadata=yaml_metadata) }}

Exercise 1 — dbt Model Components

Match each dbt model component to what it does.

config(materialized=’incremental’)
hashkey: ’hub_character_hk’
business_keys: - character_name
source_models: stage_character
datavault4dbt.hub(...)
dbt run — FROM MODEL TO TABLE
Model file (.sql)
dbt run
Dependency resolution
Macro expansion
Compiled SQL
Snowflake execution
Tables populated

dbt run is like compile and execute. The model file is the source code. The compiled SQL is the machine code. Snowflake is the processor.

Important

Understanding the SQL is why you can debug when the automation breaks.

WHAT YOU WROTE TODAY
CREATE TABLE Hub_Character ( hub_character_hk CHAR(64) NOT NULL, ... PRIMARY KEY (hub_character_hk) ); INSERT INTO Hub_Character (...) VALUES (...); SELECT h.business_key, hf.business_key, sf.hold FROM Hub_Character h JOIN Link_Character_Faction l ON ... JOIN Hub_Faction hf ON ... JOIN Sat_Faction_Details sf ON ... WHERE h.business_key = 'Ulfric Stormcloak';
WHAT A CONSULTANT WRITES
{{ config(materialized='incremental') }} {%- set yaml_metadata -%} hashkey: 'hub_character_hk' business_keys: - character_name source_models: stage_character {%- endset -%} {{ datavault4dbt.hub(yaml_metadata=yaml_metadata) }}

7 lines of configuration. The macro generates everything you wrote by hand. Understanding the SQL is why you can debug when the automation breaks — or explain it to a client.

KEY CONCEPT

This chapter covers both sides of the same coin: hand-written Data Vault SQL and the datavault4dbt automation that generates it. Understanding the SQL is what makes the automation debuggable.

Integration Complete

You built a Data Vault from scratch, loaded it, queried across it, and read the automation that does it at scale.

You see both sides. The hand-written SQL and the automation that generates it.

You built it, loaded it, queried it, and read the automation. That is the whole chain.

Chunk 3 of 3 complete