Chapter 14: Python Vault Runner — Data Vault Foundations

Mission Briefing

Stardate 865.7. USS Discovery, Data Operations Division.
Ensign _________ — you have been assigned to rebuild the ship's crew registry system. The previous system was lost during the Burn. You are starting from scratch.

You are building one thing across this entire tutorial: a Starfleet crew registry system — a Python script that tracks crew members, their ranks, their assignments, and their alert status. By the final mission, you will have a working script you can run. Not exercises. Not practice problems. One real thing, built piece by piece.

Enter your name, Ensign:

This registry is structurally a Data Vault — the same architecture ScaleFree builds for enterprise clients.
Hub — unique crew members. Satellite — rank and assignment over time. Link — crew-to-ship assignments.
You won't call it that until Operation 5. For now, you're just building.

Replit Setup

Before Mission 1.1, open Replit:

Go to replit.com — create a free account if needed
Click + Create Repl
Select Python
Name it: crew-registry
You will write every mission's code in this same file

You will keep adding to this one file. By Mission 5.3, it's your full crew registry.

Operation 1 — First Contact

Core

Operation 1: First Contact — Variables and Data Types

Your first task: name the crew members aboard Discovery. Before Python can track them, it needs to know their names exist.

Mission 1.1 — Name Your First Crew Member CORE

Python doesn't know Michael Burnham exists until you tell it.

LANGUAGE BRIDGE

"Variable" — in everyday English means something that can change — variable temperature, variable cost, variable mood. In Python, a variable is a named container. You give it a name; it holds a value. Why "variable"? Because the value isn't fixed — you can reassign it any time. The name is yours to choose.

The Concept

You create a variable by writing a name, then =, then a value. The = here does not mean "equals" the way it does in math — it means "henceforth called" or "assign this value to this name." Read captain = "Michael Burnham" as: "henceforth, captain refers to "Michael Burnham"." From this point forward, whenever Python sees captain, it looks up what that name refers to and uses that value.

WORKED EXAMPLE

# Assign the captain's name to a variable called captain
captain henceforth called= "Michael Burnham"

# Print it to confirm Python knows it
print(captain)

Output: Michael Burnham
The variable captain now holds the string "Michael Burnham". When you type print(captain), Python looks up what captain refers to and prints it.

Your Turn — Replit Exercise

Add this to your crew-registry Replit script. Run it.

captain = "Michael Burnham"
first_officer = "Saru"

print(captain)
print(first_officer)

# Now reassign first_officer and run again
first_officer = "Tilly"
print(first_officer)

Run the script. Watch first_officer change from "Saru" to "Tilly" — same name, new value.

+ Hint Explore

The third print(first_officer) runs after the reassignment, so it will print the new value, not the old one. Order matters — Python reads top to bottom.

+ See Answer Explore

Michael Burnham
Saru
Tilly

The first two prints show the original values. After first_officer = "Tilly" reassigns the variable, the third print shows Tilly. The variable name stayed the same; only its contents changed.

You can now create a named container in Python and retrieve its value.

Safe stop point — your code is saved in Replit.

Mission 1.2 — What Kind of Thing Is This? CORE

Python doesn't just store values — it remembers what kind of thing each value is, and that changes what you're allowed to do with it.

LANGUAGE BRIDGE

"Type" — in everyday English means category, kind, sort — blood type, personality type, type of document. In Python, type is the category of a value, which determines what operations are valid. You can add two integers. You can concatenate two strings. You cannot add an integer to True and expect the result to mean anything useful. Python needs to know the type to know the rules.

The Concept

Python has four basic types you will use constantly: str (text, always in quotes), int (whole number, no quotes, no decimal), float (decimal number, no quotes), and bool (True or False — capital T and F, no quotes). Every value has a type whether you declare it or not — Python infers it from how you write the value. You can check any variable's type with type(x).

WORKED EXAMPLE

crew_count henceforth called= 204              # int — whole number, no quotes
warp_factor henceforth called= 9.5             # float — decimal number
shields_up henceforth called= True             # bool — True or False, capital T
captain henceforth called= "Michael Burnham"   # str — text, in quotes

print(type(crew_count))       # <class 'int'>
print(type(warp_factor))      # <class 'float'>
print(type(shields_up))       # <class 'bool'>
print(type(captain))          # <class 'str'>

Python reports the type of each value. The word class here just means "category" — ignore it for now.

Your Turn — Replit Exercise

Add this to your crew-registry Replit script. Run it.

crew_count = 204
warp_factor = 9.5
shields_up = True

print(type(crew_count))
print(type(shields_up))

Confirm the output matches <class 'int'> and <class 'bool'>.

Now add one more variable yourself: home_planet = "Vulcan" — then run print(type(home_planet)). What type is it?

+ Hint Explore

True must start with a capital T. true (lowercase) will cause an error — Python treats it as an undefined variable name, not a boolean.

+ See Answer Explore

<class 'int'>
<class 'bool'>

crew_count = 204 — no quotes, no decimal — so Python reads it as an int. shields_up = True — capital T, no quotes — so Python reads it as a bool. The type() function looks at the value and reports its category.

You can now identify the four basic Python types and check the type of any variable.

Safe stop point — your code is saved in Replit.

Mission 1.3 — The Crew Manifest Row CORE

What if you need to store not just one fact about Burnham, but five facts together — and look them up by name, not by number?

LANGUAGE BRIDGE

"Dictionary" — a book where you look up a word (the key) and find its definition (the value). In Python, a dict does exactly this: it maps names to values. You give it a key, it hands you the value. One dict equals one record equals one row in a database equals one crew member's file. The word was chosen because the lookup works the same way: word in, meaning out; key in, value out.

The Concept

A dict is created with curly braces {}. Each entry is a key-value pair: the key and value are separated by :, and pairs are separated by commas. Keys are almost always strings (in quotes). Values can be any type — strings, ints, floats, bools, or even other dicts. You access a value by writing the dict name, then the key in square brackets: burnham["rank"].

WORKED EXAMPLE

burnham henceforth called= {
    "crew_id": "BURNHAM-001",
    "name": "Michael Burnham",
    "rank": "Captain",
    "station": "Command"
}

print(burnhamaccess by name["name"])    # Michael Burnham
print(burnhamaccess by name["rank"])    # Captain

The dict burnham holds four facts about one crew member. You retrieve any fact by passing its key in square brackets — like looking up a word in a dictionary.

IS / IS NOT — Python Dict

IS: a single record — one crew member's data, all fields together

IS NOT: a table — a dict is one row, not many rows

Your Turn — Replit Exercise

Add this to your crew-registry Replit script. Run it.

burnham = {
    "crew_id": "BURNHAM-001",
    "name": "Michael Burnham",
    "rank": "Captain",
    "station": "Command"
}

print(burnham["name"])
print(burnham["rank"])

Confirm that each print returns the value for that key, not the key itself.

+ Hint Explore

The key goes inside square brackets and quotation marks: burnham["rank"] — not burnham.rank or burnham[rank]. The quotes are required because the key is a string.

+ See Answer Explore

Michael Burnham
Captain

burnham["name"] looks up the key "name" in the dict and returns its value: "Michael Burnham". burnham["rank"] does the same for "rank". The dict maps each key to exactly one value — like a word mapped to its definition.

You can now store multiple related facts in one Python dict and access any value by key.

Safe stop point — your code is saved in Replit.

💻

Replit Workspace

Open your crew-registry project in Replit and code along as you read each mission.

Open Replit →

Operation 2 — The Manifest

Core

Operation 2: The Manifest — Lists and Dictionaries

The crew names exist. Now build the manifest — the full list of everyone aboard.

Mission 2.1 — One Crew Member to Many CORE

One dict holds one crew member, but nothing holds the whole ship yet.

LANGUAGE BRIDGE

"List" — in everyday English means a sequence of items in order — shopping list, guest list, manifest. In Python, a list is exactly that: a sequence of values that preserves order. It can hold anything — including dicts. A list of crew member dicts IS the ship's manifest. The word was chosen because it does what a list does: keeps things in order so you can count them, scan them, and retrieve them by position.

The Concept

Create a list with square brackets []. Items are separated by commas. A list can hold variables, dicts, numbers, strings — anything. The items stay in the order you put them in. Use len(list) to count how many items are in the list.

WORKED EXAMPLE

saru henceforth called= {
    "crew_id": "SARU-002",
    "name": "Saru",
    "rank": "Commander",
    "station": "Science"
}

crew henceforth called= [burnham, saru]  # a list containing two crew member dicts

print(len(crew))  # 2

burnham was defined in Operation 1 — it's already in the script. crew now holds two dicts in order. len(crew) counts the items and returns 2.

Your Turn — Replit Exercise

Add this to your crew-registry Replit script. Run it.

saru = {
    "crew_id": "SARU-002",
    "name": "Saru",
    "rank": "Commander",
    "station": "Science"
}

crew = [burnham, saru]

print(len(crew))
print(crew[0])

Before you run: what do you expect crewaccess by position[0] to print? Write your prediction, then run and check.

Then add a third crew member dict with any crew_id, name, rank, and station — then add them to crew and print len(crew) again.

+ Hint Explore

crew[0] means "the item at position 0." Python starts counting from 0, not 1 — so position 0 is the first item.

+ See Answer Explore

2
{'crew_id': 'BURNHAM-001', 'name': 'Michael Burnham', 'rank': 'Captain', 'station': 'Command'}

len(crew) counts two items and returns 2. crew[0] retrieves the item at position 0 — the first item in the list, which is the burnham dict. Python prints the dict's full contents.

You can now hold multiple records together in a Python list and count them.

Safe stop point — your code is saved in Replit.

Mission 2.2 — Access a Value by Name CORE

Python doesn't search your 200-entry manifest for Burnham's rank — it goes directly to it.

LANGUAGE BRIDGE

The ["key"] notation is possessive — like English's apostrophe-s. burnham["rank"] reads as "burnham's rank." The dict is the possessor; the key in brackets is the thing possessed. Compare to SQL dot notation: burnham.rank. Same possessive logic, different punctuation. In both cases, you name the owner first, then the thing you want — no scanning, no searching, straight to the value.

The Concept

To get a value from a dict, write the dict name, then the key in square brackets and quotes: dict_name["key"]. Python goes directly to that key — it does not scan the dict from top to bottom. If the key doesn't exist, Python raises a KeyError — which just means "you asked for a key that isn't there."

WORKED EXAMPLE

print(burnhamaccess by name["rank"])    # Captain
print(burnhamaccess by name["station"]) # Command
print(saruaccess by name["rank"])       # Commander

Each lookup goes directly to the named key and returns its value. saru["rank"] is not affected by burnham["rank"] — each dict is its own namespace.

Your Turn — Replit Exercise

Add this to your crew-registry Replit script. Run it.

print(burnham["crew_id"])
print(burnham["name"])
print(burnham["rank"])
print(burnham["station"])

print(saru["name"])
print(saru["station"])

Write out what you expect each line to print before you run it — then check.

Now write one new print statement yourself — access any field of saru that you haven't printed yet.

+ Hint Explore

The key must match exactly what's in the dict, including spelling and case. burnham["Rank"] would raise a KeyError because the key in the dict is "rank" (lowercase).

+ See Answer Explore

BURNHAM-001
Michael Burnham
Captain
Command
Saru
Science

Each ["key"] lookup retrieves the value stored at that key in the dict. The key must match exactly. burnham["station"] returns "Command" and saru["station"] returns "Science" — same key, different dicts, different values.

You can now retrieve any value from a dict by name using square bracket notation.

Safe stop point — your code is saved in Replit.

Mission 2.3 — The Full Result Set CORE

When SQL runs a query and returns rows, Python receives them in exactly this shape.

LANGUAGE BRIDGE

"Result set" — in SQL, a query returns rows. Each row has column values. In Python, this arrives as a list of dicts: each dict is one row, the keys are column names, the values are cell values. This is not a metaphor. This is literally the data structure used by every Python database library — psycopg2, snowflake-connector, sqlalchemy. When you query Snowflake from Python, you get a list of dicts back. The shape you've been building is the shape data travels in.

The Concept

A list of dicts is the standard Python shape for tabular data between pipeline steps. It is what you get when you query a database, what you pass to a transformation function, what you load into Snowflake. It has rows (the list items) and columns (the dict keys). You navigate it with two coordinates: a position index for the row, and a key name for the column.

WORKED EXAMPLE

SQL result set versus Python equivalent:

SQL result set:            Python equivalent:
crew_id   | name           [
BURN-001  | Burnham    →       {"crew_id": "BURN-001", "name": "Burnham", "rank": "Captain"},
SARU-002  | Saru               {"crew_id": "SARU-002", "name": "Saru",    "rank": "Commander"},
                           ]

Then in code:

tilly henceforth called= {
    "crew_id": "TILLY-003",
    "name": "Sylvia Tilly",
    "rank": "Ensign",
    "station": "Engineering"
}

crew henceforth called= [burnham, saru, tilly]  # three crew members — three rows

print(crewaccess by position[0]access by name["name"])   # Burnham — first row, name column
print(crewaccess by position[2]access by name["rank"])   # Ensign  — third row, rank column

Two coordinates: crew[0] selects the first row (the burnham dict), then ["name"] selects the name column from that row. crew[2] selects the third row (Tilly), then ["rank"] selects her rank.

IS / IS NOT — List of Dicts

IS: the shape of data between pipeline steps — rows and columns, held in memory

IS NOT: a database table — it lives in memory, not on disk; no SQL needed to query it

Your Turn — Replit Exercise

Add this to your crew-registry Replit script. Run it.

tilly = {
    "crew_id": "TILLY-003",
    "name": "Sylvia Tilly",
    "rank": "Ensign",
    "station": "Engineering"
}

crew = [burnham, saru, tilly]

print(crew[0]["name"])
print(crew[1]["station"])
print(crew[2]["rank"])

Before you run: write down what you expect each line to print. Then run and check.

Now write one new print statement yourself — access crew[2]["crew_id"] (the third crew member's ID). This is a cell you haven't accessed yet.

+ Hint Explore

Two coordinates, in order: first the row (position index in the list), then the column (key name in the dict). crew[1] is the second item — remember Python starts at 0.

+ See Answer Explore

Michael Burnham
Science
Ensign

crew[0]["name"] — row 0 is burnham, key "name" is "Michael Burnham". crew[1]["station"] — row 1 is saru, key "station" is "Science". crew[2]["rank"] — row 2 is tilly, key "rank" is "Ensign". Row index first, column key second — same two-coordinate logic as a SQL result set.

You can now represent a SQL result set in Python as a list of dicts, and access any cell by row index and column name.

Safe stop point — your code is saved in Replit.

DATA VAULT CONNECTION

This list of dicts IS a result set — the shape SQL returns in Python. Each dict is a row. Each key is a column name. crew[0]["rank"] is like SELECT rank FROM crew LIMIT 1.

💻

Replit Workspace

Open your crew-registry project in Replit and code along as you read each mission.

Open Replit →

Operation 3 — Standing Orders

Core

Operation 3: Standing Orders — Functions

You've been writing code that runs once. Functions let you name a task and run it on command — on any crew member, any time.

Mission 3.1 — Package Reusable Logic CORE

Python will gladly repeat the same logic 200 times — but only if you write it 200 times.

LANGUAGE BRIDGE

"Define" — to state the meaning of something formally. def is short for "define." You are defining what a function means — giving it a name and a recipe. A function is like a named recipe: you write it once and call it whenever you need it. SQL analogy: def is like creating a stored procedure or a dbt macro — write the transformation once, apply it anywhere.

The Concept

Define a function with def function_name(argument):, then indent the body. The return keyword sends a value back to whoever called the function. After you define it, you call it by name: function_name(value). Python runs the body with value substituted for argument.

WORKED EXAMPLE

define a functiondef greet(crew_member):
    hand back the resultreturn f"Welcome aboard, {crew_member['rank']} {crew_member['name']}."

print(greet(burnham))   # Welcome aboard, Captain Michael Burnham.
print(greet(saru))      # Welcome aboard, Commander Saru.

crew_member is the parameter — a placeholder. When you call greet(burnham), Python substitutes burnham for crew_member in the body. Define once, call with any crew member.

Your Turn — Replit Exercise

Add this to your crew-registry Replit script. Run it.

def greet(crew_member):
    return f"Welcome aboard, {crew_member['rank']} {crew_member['name']}."

print(greet(burnham))
print(greet(saru))
print(greet(tilly))

Confirm the three greetings print correctly. Then write a new function yourself:

def station_report(crew_member):
    # write the return statement here
    # output should be: "[NAME] is stationed at [STATION]."

Call it on one crew member and print the result.

+ Hint Explore

The return value is a string. Use an f-string and pull two keys from the dict: "name" and "station".

+ See Answer Explore

def station_report(crew_member):
    return f"{crew_member['name']} is stationed at {crew_member['station']}."

print(station_report(burnham))   # Michael Burnham is stationed at Command.

The function accesses two keys from the dict and assembles them into a formatted string. Call it with any crew member dict and it works — that's reusability.

You can now define a named reusable function in Python and call it with any argument.

Safe stop point — your code is saved in Replit.

Mission 3.2 — Format a Display Name CORE

Change the format once, in one place, and every crew member on the viewscreen updates — that's what a function buys you.

LANGUAGE BRIDGE

"Format" — in everyday English, "format" means the shape or arrangement of something (document format, date format, file format). Here, we're choosing the display format for a crew member's identity line — the specific arrangement of rank, name, and station. There is no new Python keyword; the bridge is the concept of a function as a single source of truth for a format: one definition, used everywhere.

The Concept

Building on def and return: a function that formats a crew member's data for display. It takes one argument (a crew member dict) and returns one formatted string. The challenge is assembling the string correctly from three dict keys — and the payoff is that one change in the function updates every output line automatically.

WORKED EXAMPLE

# Target output: "Captain Michael Burnham — Command"
define a functiondef format_display(crew_member):
    rank    henceforth called= crew_memberaccess by name["rank"]
    name    henceforth called= crew_memberaccess by name["name"]
    station henceforth called= crew_memberaccess by name["station"]
    hand back the resultreturn f"{rank} {name} — {station}"

print(format_display(burnham))   # Captain Michael Burnham — Command
print(format_display(saru))      # Commander Saru — Science

Three keys, one f-string, one function. Change the separator in the return line and all three outputs update — without touching any print call.

Your Turn — Replit Exercise

Add this to your crew-registry Replit script. Run it.

Then modify the function: change the — separator to | instead. Make the change in exactly one place — the return line inside the function definition. Do not touch any print calls.

def format_display(crew_member):
    rank    = crew_member["rank"]
    name    = crew_member["name"]
    station = crew_member["station"]
    return f"{rank} {name} — {station}"   # change — to | here

print(format_display(burnham))
print(format_display(saru))
print(format_display(tilly))

Confirm all three lines update with one edit.

+ Hint Explore

The only line you need to change is the return statement — replace — {station} with | {station}. If you find yourself editing a print call, you've changed the wrong line.

+ See Answer Explore

def format_display(crew_member):
    rank    = crew_member["rank"]
    name    = crew_member["name"]
    station = crew_member["station"]
    return f"{rank} {name} | {station}"

print(format_display(burnham))   # Captain Michael Burnham | Command
print(format_display(saru))      # Commander Saru | Science
print(format_display(tilly))     # Ensign Sylvia Tilly | Engineering

One change in the function body updated three outputs. That's the point: when display format changes — and it always does — you fix it in one place, not everywhere it's used.

You can now write a function that extracts and formats multiple fields from a dict and apply it consistently across all records.

Safe stop point — your code is saved in Replit.

Mission 3.3 — Hash a Business Key IMPORTANT

datavault4dbt never stores the business key you give it — it stores an unrecognisable fingerprint of it instead.

LANGUAGE BRIDGE

"Hash" — to chop finely, to mix into something uniform. In cryptography, a hash function turns any input into a fixed-length output. The same input always produces the same output. You can't reverse it. In Data Vault: business keys are hashed before storage so that parallel loading works without collision, and PII keys can be pseudonymized.

The Concept

hashlib.sha256() is the hash function datavault4dbt uses — the same operation as SHA2(UPPER(TRIM(column)), 256) in Snowflake SQL. To use it: import hashlib, call hashlib.sha256(value.encode()).hexdigest(). The .upper().strip() normalizes the input first — identical to what UPPER(TRIM(...)) does in SQL. The output is a 64-character hex string, always the same for the same input.

WORKED EXAMPLE

import hashlib

define a functiondef hash_key(business_key):
    hand back the resultreturn hashlib.sha256(
        business_key.upper().strip().encode()
    ).hexdigest()

print(hash_key("BURNHAM-001"))
# 64-character hex string — always the same for the same input
print(hash_key("burnham-001"))   # same output — because .upper()
print(hash_key("BURNHAM-001 ")) # same output — because .strip()

This is what SHA2(UPPER(TRIM(column)), 256) looks like in Python. Same operation, different language. datavault4dbt runs this in Snowflake; you're running the equivalent in Python.

IS / IS NOT — Hash Function

IS: the same deterministic hash function datavault4dbt uses in Snowflake SQL — SHA2(UPPER(TRIM(column)), 256)

IS NOT: encryption — you can't reverse it, but it's not secret. Anyone with the input and the algorithm gets the same output.

Python

hashlib.sha256(
  bk.upper().strip().encode()
).hexdigest()

Snowflake SQL (datavault4dbt)

SHA2(
  UPPER(TRIM(column)),
  256
)

Your Turn — Replit Exercise

Add this to your crew-registry Replit script. Run it.

import hashlib

def hash_key(business_key):
    return hashlib.sha256(
        business_key.upper().strip().encode()
    ).hexdigest()

print(hash_key(burnham["crew_id"]))
print(hash_key(saru["crew_id"]))
print(hash_key(tilly["crew_id"]))

Confirm three 64-character hex strings print. Then write these two lines yourself:

# Are these the same?
print(hash_key("BURNHAM-001") == hash_key("burnham-001"))

# How long is the output?
print(len(hash_key("BURNHAM-001")))

What do you expect before you run? Then run and confirm.

+ Hint Explore

The .upper() inside hash_key() normalizes case before hashing — so "BURNHAM-001" and "burnham-001" produce identical inputs to sha256(). SHA2-256 always outputs 256 bits — how many hex characters is that?

+ See Answer Explore

True
64

True — because .upper() normalizes both inputs to "BURNHAM-001" before hashing. 64 — SHA2-256 produces 256 bits; each hex character encodes 4 bits, so 256 / 4 = 64 characters. This matches the CHAR(64) column type you've seen in datavault4dbt Hub schemas.

You can now generate a SHA2 hash key in Python — the same hash that datavault4dbt uses in Snowflake.

Safe stop point — your code is saved in Replit.

💻

Replit Workspace

Open your crew-registry project in Replit and code along as you read each mission.

Open Replit →

Operation 4 — The Patrol

Important

Operation 4: The Patrol — Loops

The manifest is built. Now walk it. Apply logic to every crew member in sequence.

Mission 4.1 — Walk the Manifest CORE

Two lines of Python can do what would take 200 separate print statements.

LANGUAGE BRIDGE

"for" — in everyday English, "for" introduces a scope: for every crew member aboard, check their badge. You mean: take each one in turn, do the same thing. Python's for loop does exactly that. The word was chosen because it maps directly to the English "for each [item] in [collection], do [action]" sentence structure. Reading for member in crew: out loud sounds like plain English because it is.

The Concept

A for loop says: for each item in this collection, run this code. The indented block beneath the for line runs once per item, with member taking the value of each element in turn. You name the variable yourself — member is just a readable choice. SQL analogy: a for loop is what happens inside a database cursor — row by row processing. In SQL the database engine does it invisibly; in Python you write it explicitly.

WORKED EXAMPLE

for each onefor member belonging toin crew:                    # visit each dict in the list in turn
    print(format_display(member))      # call format_display on each one

Output — one formatted line per crew member, in list order. member is just a name: it takes the value of each dict in crew in turn. After the loop finishes, member holds the last item. The indentation is what tells Python which lines are inside the loop.

Your Turn — Replit Exercise

Add this to your crew-registry Replit script. Run it.

for member in crew:
    print(format_display(member))

Confirm all three crew members print. Then add a second line inside the loop body yourself — indented the same as the first print:

# Write this line yourself, indented to match:
print(member["crew_id"])

Run again. Each crew member should now print twice — once formatted, once just their crew ID.

+ Hint Explore

Indentation is how Python knows a line is inside the loop. The second print must be indented the same number of spaces as the first — four spaces is standard. If it's at the left margin, Python runs it only once, after the loop ends.

+ See Answer Explore

for member in crew:
    print(format_display(member))
    print(member["crew_id"])

Output (one block per crew member):

Captain Michael Burnham — Command
BURNHAM-001
Commander Saru — Science
SARU-002
Ensign Sylvia Tilly — Engineering
TILLY-003

Both lines are indented equally — both run on every iteration.

You can now iterate over a list and apply logic to every item with a for loop.

Safe stop point — your code is saved in Replit.

Mission 4.2 — Filter the Manifest CORE

SQL's WHERE clause is just a Python if statement that someone hid inside the database engine.

LANGUAGE BRIDGE

"if" — the most direct word available. "If" in English states a condition: if it rains, bring an umbrella. In Python, if is identical: if this condition is true, run this code. The indented block only executes when the condition holds. The word was chosen because there is no better word — it is exactly what the English "if" means, nothing more.

The Concept

Inside a for loop, an if statement makes the indented block conditional — it only runs for items that match. == is comparison (is this equal to that?). = is assignment (make this equal to that). They look nearly identical but do completely different things — confusing them is one of the most common Python bugs. This combination of for + if is the Python equivalent of a SQL WHERE clause.

WORKED EXAMPLE

# SQL: SELECT * FROM crew WHERE rank = 'Captain'
# Python equivalent:
for each onefor member belonging toin crew:
    only whenif memberaccess by name["rank"] == "Captain":
        print(format_display(member))

The inverse — filtering out a value — uses !=:

# SQL: SELECT * FROM crew WHERE rank != 'Captain'
for member in crew:
    if member["rank"] != "Captain":
        print(format_display(member))

In both cases, the loop visits every member. The if decides whether to act on each one. The database does the same thing — the WHERE clause doesn't skip rows upfront, it evaluates each row and discards non-matching ones.

IS / IS NOT — For + If

IS: a row-by-row condition check — Python evaluates every item and runs the indented block only when the condition is true

IS NOT: skipping items before the loop — the loop still visits every member; if decides what to do with each one

Your Turn — Replit Exercise

Add this to your crew-registry Replit script. Run it.

# SQL: SELECT * FROM crew WHERE station = 'Engineering'
for member in crew:
    if member["station"] == "Engineering":
        print(format_display(member))

Then write your own filter yourself — no starter code:

# Write a loop that prints only crew members whose rank is "Commander"

+ Hint Explore

Follow the same pattern: for member in crew: on the first line, then if member["rank"] == "Commander": indented beneath it, then the print indented one more level.

+ See Answer Explore

for member in crew:
    if member["rank"] == "Commander":
        print(format_display(member))

This is structurally identical to SELECT * FROM crew WHERE rank = 'Commander'. The for loop is the full-table scan; the if is the WHERE predicate; the print is the SELECT output. Same operation — different language.

You can now filter a list using if inside a for loop — the Python equivalent of a SQL WHERE clause.

Safe stop point — your code is saved in Replit.

Mission 4.3 — The Alert Mission IMPORTANT

Python can make a different decision for every crew member, every single run.

LANGUAGE BRIDGE

"random" — in everyday English, random means without pattern, unpredictable, not deliberate. A random event has no determining cause you can trace. random.choice() picks one item from a list without pattern — each call is independent of the last. This matters in data engineering: random sampling (pick 10% of rows for testing), A/B test splitting (randomly assign users to variants), and simulations (model uncertain outcomes) all depend on this property. Same word, same meaning — just applied to code.

The Concept

import random at the top of the script makes the random module available. random.choice(list) picks one item from a list at random — the same item can be picked on consecutive calls. Each call is independent. Run the script multiple times and the output changes. That unpredictability is the feature, not a bug.

WORKED EXAMPLE

import random

alert_levels henceforth called= ["GREEN", "YELLOW", "RED"]

for each onefor member belonging toin crew:
    alert henceforth called= random.choice(alert_levels)           # one independent random pick per member
    print(f"{member['name']}: {alert} alert")     # output: name + their alert level

Run this three times. The output changes each time — different crew members get different alert levels on each run. random.choice() makes one independent decision per loop iteration.

Your Turn — Replit Exercise

Add this to your crew-registry Replit script. Run it.

import random

alert_levels = ["GREEN", "YELLOW", "RED"]

for member in crew:
    alert = random.choice(alert_levels)
    print(f"{member['name']}: {alert} alert")

Run it three times and confirm the output changes.

Then write this yourself — no starter code:

Modify the loop so that crew members whose rank is "Captain" can receive any of the three alert levels, but all other crew members can only receive "GREEN" or "YELLOW". A Captain leads into combat; the rest of the crew does not.

This combines everything from Operation 4: the for loop (4.1), the if filter (4.2), and random.choice() (4.3). Take it one layer at a time.

+ Hint Explore

Use two different lists — one for Captains, one for everyone else. Inside the for loop, use an if/else to pick which list to pass to random.choice(). The structure is: for → if rank == "Captain" → choice from full list; else → choice from restricted list.

+ See Answer Explore

import random

captain_alerts = ["GREEN", "YELLOW", "RED"]
crew_alerts = ["GREEN", "YELLOW"]

for member in crew:
    if member["rank"] == "Captain":
        alert = random.choice(captain_alerts)
    else:
        alert = random.choice(crew_alerts)
    print(f"{member['name']}: {alert} alert")

The for loop visits every crew member. The if checks rank. random.choice() picks from the appropriate list. Three concepts — one block. This is exactly how rule-based data transformations work in Python: loop over rows, apply conditional logic, produce output.

You can now use random.choice() inside a loop to make independent decisions per item — and combine loops, filters, and randomness in one block.

Safe stop point — your code is saved in Replit.

💻

Replit Workspace

Open your crew-registry project in Replit and code along as you read each mission.

Open Replit →

Operation 5 — The Bridge

Bonus

Operation 5: The Bridge — Reading Python in the Wild

You've built a Python script from scratch. Now read Python that someone else wrote. This is what the job actually looks like.

Mission 5.1 — Read a dbt Python Model CORE

dbt models are SQL — except when they're Python.

LANGUAGE BRIDGE

"Model" — in everyday English means a simplified representation of something real — a model aircraft, a conceptual model of a system. In dbt, a model is a file that produces a table or view. A dbt Python model is a .py file that returns a DataFrame instead of a SQL query. Same concept — a representation that produces a table — different language. The word was chosen deliberately: whether the file is .sql or .py, what it models is always the same thing, a transformation that ends as a table.

The Concept

When your crew registry needs to do something SQL cannot handle — calling an API, applying a Python library, or running complex Python logic — dbt lets you replace a .sql model file with a .py file. The .py version does the same job: it produces a table. But instead of writing a SELECT, you write a function that returns a DataFrame. A dbt Python model for the crew registry might read from stg_crew, apply format_display() and hash_key() using the same Python you just wrote, and return the result for dbt to load into Snowflake. The engineer writes Python; dbt handles everything that happens after the return.

WORKED EXAMPLE

define a functiondef model(dbt, session):
    # dbt.ref() is the same as {{ ref() }} in SQL — looks up another model by name
    df henceforth called= dbt.ref("stg_crew")
    # return a DataFrame — dbt handles loading it to the warehouse
    hand back the resultreturn df

Walk through each line: def model(dbt, session): declares a function named model that takes two arguments — dbt (connects this file to the dbt project and exposes ref()) and session (the live Snowflake or Databricks session for running queries). dbt.ref("stg_crew") looks up the staging model called stg_crew — the Python equivalent of {{ ref('stg_crew') }} in a SQL model. return df hands a DataFrame back to dbt, which loads it to the warehouse as a table or view.

Your Turn

Read this function out loud. Then write one sentence explaining what it does, as if a senior consultant asked you.

+ Hint Explore

Start with: "This is a dbt Python model that..."

+ See Answer Explore

"This is a dbt Python model that reads the stg_crew staging model — the same way {{ ref('stg_crew') }} works in a SQL model — and returns it as a DataFrame for dbt to load to the warehouse."

You can now read a dbt Python model and explain what each line does in plain language.

Safe stop point — your code is saved in Replit.

Mission 5.2 — The TurboVault4dbt Connection BONUS

The most Python-adjacent thing in a typical Data Vault workflow is running a script you never have to read.

LANGUAGE BRIDGE

"Metadata" — comes from the Greek meta- (about) and the Latin data (things given). Metadata is data about data. In TurboVault4dbt, a Python dict that describes a Hub — its name, its business key, its source table — IS the metadata that TurboVault reads to generate the Hub model file. The dict doesn't store crew members; it stores information about what the Hub should look like. That's metadata: data that describes the structure, not the content.

The Concept

TurboVault4dbt is a ScaleFree-built Python tool. Imagine describing your crew registry in writing: "I need a Hub called hub_crew, with a business key called crew_id, sourced from stg_crew." You fill that description into a spreadsheet; TurboVault reads it, builds a Python dict from each row, and generates the datavault4dbt model files automatically. The engineer describes the vault structure; TurboVault writes the code. The only Python you need to recognise is the dict shape in the worked example — you already know it from Operation 2.

WORKED EXAMPLE

# This is the shape of what TurboVault reads for one Hub
hub_metadata henceforth called= {
    "hub_name": "hub_crew",          # what the Hub will be called
    "business_key": "crew_id",       # the column that identifies each unique crew member
    "source_table": "stg_crew",      # where the data comes from
}

# TurboVault reads this and generates the dbt model file automatically
# You recognise this — it's a Python dict. Same structure as your crew member dicts.

A dict of strings that describes a Hub. Same Python type you built in Operation 2 — here it carries metadata instead of crew data.

KEY CONCEPT

"The Python script behind the TurboVault GUI reads metadata dicts that describe each Hub, Link, and Satellite, and generates the datavault4dbt model files automatically."

Your Turn

Answer these three questions in writing. No Replit needed.

What Python type is hub_metadata?
What does hub_metadata['business_key'] return?
If a senior consultant asked what TurboVault does, what would you say in one sentence?

+ Hint Explore

For Q3: think about what problem TurboVault solves, not what code it runs.

+ See Answer Explore

dict
"crew_id"
Something like: "TurboVault4dbt reads metadata from a spreadsheet and automatically generates the dbt model files for each Hub, Link, and Satellite — so you don't write them by hand."

You can now explain what TurboVault4dbt does in Python terms and in plain language.

Safe stop point — your code is saved in Replit.

Mission 5.3 — Final Mission: Read Your Own Script CORE

The most impressive thing you can do as a consultant is read your own code as if you didn't write it.

LANGUAGE BRIDGE

"Narrate" — in everyday English means to give a running account of events as they unfold — a narrator in a film, a play-by-play commentary. In a technical context, narrating your code means walking someone through what each block does, why it's there, and what it produces. It is one of the highest-value skills in consulting — more valuable than being able to write code from scratch, because it demonstrates both understanding and communication.

The Concept

This is the Feynman technique applied to code: if you can explain each line to someone who didn't write it, you understand it. This mission contains no new Python. The challenge is the narration — naming what each block does, connecting it to the Data Vault concepts underneath. The complete crew registry script is below. You wrote every piece of it across Operations 1 through 4.

WORKED EXAMPLE — The Complete Crew Registry

import hashlib   # provides SHA2 hash function — same as datavault4dbt uses
import random    # provides random.choice() for alert assignment

# Operation 1-2: variables, data types, dicts, and the crew list
captain    henceforth called= "Michael Burnham"    # a standalone string variable (str)
crew_count henceforth called= 204                  # a standalone int variable
shields_up henceforth called= True                 # a standalone bool variable

burnham henceforth called= {"crew_id": "BURNHAM-001", "name": "Michael Burnham", "rank": "Captain",  "station": "Command"}
saru    henceforth called= {"crew_id": "SARU-002",    "name": "Saru",            "rank": "Commander","station": "Science"}
tilly   henceforth called= {"crew_id": "TILLY-003",   "name": "Sylvia Tilly",    "rank": "Ensign",   "station": "Engineering"}
crew    henceforth called= [burnham, saru, tilly]  # the manifest — a list of dicts (one row per crew member)

# Operation 3: functions — named reusable transformations
define a functiondef format_display(member):
    """Returns a formatted display name: RANK NAME — STATION"""
    hand back the resultreturn f"{member['rank']} {member['name']} — {member['station']}"

define a functiondef hash_key(business_key):
    """SHA2(UPPER(TRIM(bk)), 256) — same hash as datavault4dbt uses in Snowflake"""
    hand back the resultreturn hashlib.sha256(
        business_key.upper().strip().encode()   # normalise then encode to bytes
    ).hexdigest()                               # return 64-character hex string

# Operation 4: loops — apply logic to every crew member
for each onefor member belonging toin crew:
    print(format_display(member))               # display line per member
    print(f"  Hash key: {hash_key(member['crew_id'])}")  # their vault hash key

# Operation 4.3: the alert mission — random decision per member
alert_levels henceforth called= ["GREEN", "YELLOW", "RED"]
for each onefor member belonging toin crew:
    alert henceforth called= random.choice(alert_levels)         # one independent random pick
    print(f"{member['name']}: {alert} alert")

The Data Vault structure hidden inside this script: crew is a Hub — it stores unique crew members identified by their business key. burnham, saru, and tilly are Hub records — each one is a row in the Hub. format_display is a transformation — the kind of logic that lives in a Business Vault view. hash_key IS the datavault4dbt hash function, implemented in Python rather than SQL.

Your Crew Registry = A Data Vault

crew = [burnham, saru, tilly] → Hub (unique crew members)

{"crew_id": "BURNHAM-001"} → Business Key

hash_key("BURNHAM-001") → SHA2(UPPER(TRIM(...)), 256)

format_display(burnham) → Business Vault transformation

random alert status per run → Satellite transient attribute

Your Turn — Replit Exercise

By now the full script should already be assembled in Replit from all previous missions. Confirm it runs and produces output for all three crew members. Then narrate: walk through the script block by block, out loud or in writing. For each block, name: (1) what Python concept it uses, (2) what it does, (3) what Data Vault concept it maps to.

+ Hint Explore

Work block by block in the order they appear. For each one: Python concept first, then what it does, then the DV mapping. If you get stuck on the DV mapping, skip it and come back — the Python narration is the priority.

+ See Answer Explore

Block 1 — imports: Python concept: import statement. What it does: loads the hashlib and random standard-library modules so their functions are available. DV mapping: no direct mapping — setup only.

Block 2 — standalone variables: Python concept: variable assignment, data types (str, int, bool). What it does: stores three named values. DV mapping: these are attribute-level data — the kind of values that live in a Satellite column.

Block 3 — crew dicts and list: Python concept: dict literals, list literal. What it does: defines one dict per crew member (each dict is a row), then collects them into a list (the full manifest). DV mapping: crew is the Hub. Each dict is a Hub record. crew_id is the business key.

Block 4 — format_display function: Python concept: function definition, f-string, key access. What it does: takes a crew dict and returns a formatted string. DV mapping: a Business Vault transformation — a computed view over raw Hub data.

Block 5 — hash_key function: Python concept: function definition, method chaining, hashlib.sha256. What it does: normalises the business key (upper, strip) and returns its SHA2-256 hex digest. DV mapping: this IS the datavault4dbt hash key generation — SHA2(UPPER(TRIM(bk)), 256) — in Python.

Block 6 — display loop: Python concept: for loop, function calls inside loop. What it does: iterates over every crew member and prints their display name and hash key. DV mapping: reads from the Hub and applies two transformations per row.

Block 7 — alert loop: Python concept: for loop, random.choice(), conditional logic (from 4.3 version). What it does: assigns an alert level to each crew member independently. DV mapping: a transient status — the kind of value that would be stored in a Satellite with a load timestamp.

Commander ◊◊◊◊◊

You can read and write Python. You can read dbt models that generate Data Vault structures. You built a crew registry that IS a Data Vault — and you can explain every line.

Tutorial complete.

Reference

Ship's Codex

Your accumulated reference — every Python concept from all 15 missions.

1. Variables +

name = "value"       # assign a value
name = 42            # reassign anytime
print(name)          # retrieve and display

2. Data Types +

x = "text"           # str — text in quotes
x = 42               # int — whole number
x = 3.14             # float — decimal number
x = True             # bool — True or False
type(x)              # check what type x is

3. Dictionaries +

record = {"key": "value", "other_key": 42}
record["key"]        # access a value by key

4. Lists +

items = [item1, item2, item3]
items[0]             # first item (0-indexed)
len(items)           # count of items

5. List of Dicts (result set shape) +

rows = [
    {"col1": "val1", "col2": "val2"},
    {"col1": "val3", "col2": "val4"},
]
rows[0]["col1"]      # first row, first column

6. Functions +

def function_name(argument):
    return result

output = function_name(input)

7. Loops +

for item in collection:
    # do something with item

for item in collection:
    if item["key"] == "value":
        # only items that match

8. Hash Key (datavault4dbt style) +

import hashlib
def hash_key(bk):
    return hashlib.sha256(bk.upper().strip().encode()).hexdigest()

9. Random Choice +

import random
random.choice(["A", "B", "C"])    # picks one at random

10. f-strings (string formatting) +

f"{variable} and literal text"
f"{dict['key']} more text"

Python Vault Runner

Mission Briefing

Operation 1: First Contact — Variables and Data Types

Mission 1.1 — Name Your First Crew Member CORE

Your Turn — Replit Exercise

Mission 1.2 — What Kind of Thing Is This? CORE

Your Turn — Replit Exercise

Mission 1.3 — The Crew Manifest Row CORE

Your Turn — Replit Exercise

Replit Workspace

Operation 1 Complete — Lieutenant JG

Operation 2: The Manifest — Lists and Dictionaries

Mission 2.1 — One Crew Member to Many CORE

Your Turn — Replit Exercise

Mission 2.2 — Access a Value by Name CORE

Your Turn — Replit Exercise

Mission 2.3 — The Full Result Set CORE

Your Turn — Replit Exercise

Replit Workspace

Operation 2 Complete — Lieutenant

Operation 3: Standing Orders — Functions

Mission 3.1 — Package Reusable Logic CORE

Your Turn — Replit Exercise

Mission 3.2 — Format a Display Name CORE

Your Turn — Replit Exercise

Mission 3.3 — Hash a Business Key IMPORTANT

Your Turn — Replit Exercise

Replit Workspace

Operation 3 Complete — Lieutenant Commander

Operation 4: The Patrol — Loops

Mission 4.1 — Walk the Manifest CORE

Your Turn — Replit Exercise

Mission 4.2 — Filter the Manifest CORE

Your Turn — Replit Exercise

Mission 4.3 — The Alert Mission IMPORTANT

Your Turn — Replit Exercise

Replit Workspace

Operation 4 Complete — Commander

Operation 5: The Bridge — Reading Python in the Wild

Mission 5.1 — Read a dbt Python Model CORE

Your Turn

Mission 5.2 — The TurboVault4dbt Connection BONUS

Your Turn

Mission 5.3 — Final Mission: Read Your Own Script CORE

Your Turn — Replit Exercise

Ship's Codex

QUEST COMPLETE