Python Vault Runner
Build a Data Vault crew registry from scratch — 15 missions, 5 operations
Mission Briefing
Stardate 865.7. USS Discovery, Data Operations Division.
Ensign _________ — you have been assigned to rebuild the ship's crew registry system.
The previous system was lost during the Burn. You are starting from scratch.
You are building one thing across this entire tutorial: a Starfleet crew registry system — a Python script that tracks crew members, their ranks, their assignments, and their alert status. By the final mission, you will have a working script you can run. Not exercises. Not practice problems. One real thing, built piece by piece.
This registry is structurally a Data Vault — the same architecture ScaleFree builds for enterprise clients.
Hub — unique crew members. Satellite — rank and assignment over time. Link — crew-to-ship assignments.
You won't call it that until Operation 5. For now, you're just building.
Before Mission 1.1, open Replit:
- Go to replit.com — create a free account if needed
- Click + Create Repl
- Select Python
- Name it:
crew-registry - You will write every mission's code in this same file
You will keep adding to this one file. By Mission 5.3, it's your full crew registry.
Operation 1: First Contact — Variables and Data Types
Your first task: name the crew members aboard Discovery. Before Python can track them, it needs to know their names exist.
Mission 1.1 — Name Your First Crew Member CORE
Python doesn't know Michael Burnham exists until you tell it.
"Variable" — in everyday English means something that can change — variable temperature, variable cost, variable mood. In Python, a variable is a named container. You give it a name; it holds a value. Why "variable"? Because the value isn't fixed — you can reassign it any time. The name is yours to choose.
You create a variable by writing a name, then =, then a value. The = here does not mean "equals" the way it does in math — it means "henceforth called" or "assign this value to this name." Read captain = "Michael Burnham" as: "henceforth, captain refers to "Michael Burnham"." From this point forward, whenever Python sees captain, it looks up what that name refers to and uses that value.
# Assign the captain's name to a variable called captain
captain henceforth called= "Michael Burnham"
# Print it to confirm Python knows it
print(captain)
Output: Michael Burnham
The variable captain now holds the string "Michael Burnham". When you type print(captain), Python looks up what captain refers to and prints it.
Your Turn — Replit Exercise
Add this to your crew-registry Replit script. Run it.
captain = "Michael Burnham"
first_officer = "Saru"
print(captain)
print(first_officer)
# Now reassign first_officer and run again
first_officer = "Tilly"
print(first_officer)
Run the script. Watch first_officer change from "Saru" to "Tilly" — same name, new value.
The third print(first_officer) runs after the reassignment, so it will print the new value, not the old one. Order matters — Python reads top to bottom.
Michael Burnham
Saru
Tilly
The first two prints show the original values. After first_officer = "Tilly" reassigns the variable, the third print shows Tilly. The variable name stayed the same; only its contents changed.
You can now create a named container in Python and retrieve its value.
Safe stop point — your code is saved in Replit.
Mission 1.2 — What Kind of Thing Is This? CORE
Python doesn't just store values — it remembers what kind of thing each value is, and that changes what you're allowed to do with it.
"Type" — in everyday English means category, kind, sort — blood type, personality type, type of document. In Python, type is the category of a value, which determines what operations are valid. You can add two integers. You can concatenate two strings. You cannot add an integer to True and expect the result to mean anything useful. Python needs to know the type to know the rules.
Python has four basic types you will use constantly: str (text, always in quotes), int (whole number, no quotes, no decimal), float (decimal number, no quotes), and bool (True or False — capital T and F, no quotes). Every value has a type whether you declare it or not — Python infers it from how you write the value. You can check any variable's type with type(x).
crew_count henceforth called= 204 # int — whole number, no quotes
warp_factor henceforth called= 9.5 # float — decimal number
shields_up henceforth called= True # bool — True or False, capital T
captain henceforth called= "Michael Burnham" # str — text, in quotes
print(type(crew_count)) # <class 'int'>
print(type(warp_factor)) # <class 'float'>
print(type(shields_up)) # <class 'bool'>
print(type(captain)) # <class 'str'>
Python reports the type of each value. The word class here just means "category" — ignore it for now.
Your Turn — Replit Exercise
Add this to your crew-registry Replit script. Run it.
crew_count = 204
warp_factor = 9.5
shields_up = True
print(type(crew_count))
print(type(shields_up))
Confirm the output matches <class 'int'> and <class 'bool'>.
Now add one more variable yourself: home_planet = "Vulcan" — then run print(type(home_planet)). What type is it?
True must start with a capital T. true (lowercase) will cause an error — Python treats it as an undefined variable name, not a boolean.
<class 'int'>
<class 'bool'>
crew_count = 204 — no quotes, no decimal — so Python reads it as an int. shields_up = True — capital T, no quotes — so Python reads it as a bool. The type() function looks at the value and reports its category.
You can now identify the four basic Python types and check the type of any variable.
Safe stop point — your code is saved in Replit.
Mission 1.3 — The Crew Manifest Row CORE
What if you need to store not just one fact about Burnham, but five facts together — and look them up by name, not by number?
"Dictionary" — a book where you look up a word (the key) and find its definition (the value). In Python, a dict does exactly this: it maps names to values. You give it a key, it hands you the value. One dict equals one record equals one row in a database equals one crew member's file. The word was chosen because the lookup works the same way: word in, meaning out; key in, value out.
A dict is created with curly braces {}. Each entry is a key-value pair: the key and value are separated by :, and pairs are separated by commas. Keys are almost always strings (in quotes). Values can be any type — strings, ints, floats, bools, or even other dicts. You access a value by writing the dict name, then the key in square brackets: burnham["rank"].
burnham henceforth called= {
"crew_id": "BURNHAM-001",
"name": "Michael Burnham",
"rank": "Captain",
"station": "Command"
}
print(burnhamaccess by name["name"]) # Michael Burnham
print(burnhamaccess by name["rank"]) # Captain
The dict burnham holds four facts about one crew member. You retrieve any fact by passing its key in square brackets — like looking up a word in a dictionary.
Your Turn — Replit Exercise
Add this to your crew-registry Replit script. Run it.
burnham = {
"crew_id": "BURNHAM-001",
"name": "Michael Burnham",
"rank": "Captain",
"station": "Command"
}
print(burnham["name"])
print(burnham["rank"])
Confirm that each print returns the value for that key, not the key itself.
The key goes inside square brackets and quotation marks: burnham["rank"] — not burnham.rank or burnham[rank]. The quotes are required because the key is a string.
Michael Burnham
Captain
burnham["name"] looks up the key "name" in the dict and returns its value: "Michael Burnham". burnham["rank"] does the same for "rank". The dict maps each key to exactly one value — like a word mapped to its definition.
You can now store multiple related facts in one Python dict and access any value by key.
Safe stop point — your code is saved in Replit.
Replit Workspace
Open your crew-registry project in Replit and code along as you read each mission.
Operation 1 Complete — Lieutenant JG
You've named crew members, identified their types, and stored them as structured records. The crew registry has its first entries.
Operation 2: The Manifest — Lists and Dictionaries
The crew names exist. Now build the manifest — the full list of everyone aboard.
Mission 2.1 — One Crew Member to Many CORE
One dict holds one crew member, but nothing holds the whole ship yet.
"List" — in everyday English means a sequence of items in order — shopping list, guest list, manifest. In Python, a list is exactly that: a sequence of values that preserves order. It can hold anything — including dicts. A list of crew member dicts IS the ship's manifest. The word was chosen because it does what a list does: keeps things in order so you can count them, scan them, and retrieve them by position.
Create a list with square brackets []. Items are separated by commas. A list can hold variables, dicts, numbers, strings — anything. The items stay in the order you put them in. Use len(list) to count how many items are in the list.
saru henceforth called= {
"crew_id": "SARU-002",
"name": "Saru",
"rank": "Commander",
"station": "Science"
}
crew henceforth called= [burnham, saru] # a list containing two crew member dicts
print(len(crew)) # 2
burnham was defined in Operation 1 — it's already in the script. crew now holds two dicts in order. len(crew) counts the items and returns 2.
Your Turn — Replit Exercise
Add this to your crew-registry Replit script. Run it.
saru = {
"crew_id": "SARU-002",
"name": "Saru",
"rank": "Commander",
"station": "Science"
}
crew = [burnham, saru]
print(len(crew))
print(crew[0])
Before you run: what do you expect crewaccess by position[0] to print? Write your prediction, then run and check.
Then add a third crew member dict with any crew_id, name, rank, and station — then add them to crew and print len(crew) again.
crew[0] means "the item at position 0." Python starts counting from 0, not 1 — so position 0 is the first item.
2
{'crew_id': 'BURNHAM-001', 'name': 'Michael Burnham', 'rank': 'Captain', 'station': 'Command'}
len(crew) counts two items and returns 2. crew[0] retrieves the item at position 0 — the first item in the list, which is the burnham dict. Python prints the dict's full contents.
You can now hold multiple records together in a Python list and count them.
Safe stop point — your code is saved in Replit.
Mission 2.2 — Access a Value by Name CORE
Python doesn't search your 200-entry manifest for Burnham's rank — it goes directly to it.
The ["key"] notation is possessive — like English's apostrophe-s. burnham["rank"] reads as "burnham's rank." The dict is the possessor; the key in brackets is the thing possessed. Compare to SQL dot notation: burnham.rank. Same possessive logic, different punctuation. In both cases, you name the owner first, then the thing you want — no scanning, no searching, straight to the value.
To get a value from a dict, write the dict name, then the key in square brackets and quotes: dict_name["key"]. Python goes directly to that key — it does not scan the dict from top to bottom. If the key doesn't exist, Python raises a KeyError — which just means "you asked for a key that isn't there."
print(burnhamaccess by name["rank"]) # Captain
print(burnhamaccess by name["station"]) # Command
print(saruaccess by name["rank"]) # Commander
Each lookup goes directly to the named key and returns its value. saru["rank"] is not affected by burnham["rank"] — each dict is its own namespace.
Your Turn — Replit Exercise
Add this to your crew-registry Replit script. Run it.
print(burnham["crew_id"])
print(burnham["name"])
print(burnham["rank"])
print(burnham["station"])
print(saru["name"])
print(saru["station"])
Write out what you expect each line to print before you run it — then check.
Now write one new print statement yourself — access any field of saru that you haven't printed yet.
The key must match exactly what's in the dict, including spelling and case. burnham["Rank"] would raise a KeyError because the key in the dict is "rank" (lowercase).
BURNHAM-001
Michael Burnham
Captain
Command
Saru
Science
Each ["key"] lookup retrieves the value stored at that key in the dict. The key must match exactly. burnham["station"] returns "Command" and saru["station"] returns "Science" — same key, different dicts, different values.
You can now retrieve any value from a dict by name using square bracket notation.
Safe stop point — your code is saved in Replit.
Mission 2.3 — The Full Result Set CORE
When SQL runs a query and returns rows, Python receives them in exactly this shape.
"Result set" — in SQL, a query returns rows. Each row has column values. In Python, this arrives as a list of dicts: each dict is one row, the keys are column names, the values are cell values. This is not a metaphor. This is literally the data structure used by every Python database library — psycopg2, snowflake-connector, sqlalchemy. When you query Snowflake from Python, you get a list of dicts back. The shape you've been building is the shape data travels in.
A list of dicts is the standard Python shape for tabular data between pipeline steps. It is what you get when you query a database, what you pass to a transformation function, what you load into Snowflake. It has rows (the list items) and columns (the dict keys). You navigate it with two coordinates: a position index for the row, and a key name for the column.
SQL result set versus Python equivalent:
SQL result set: Python equivalent:
crew_id | name [
BURN-001 | Burnham → {"crew_id": "BURN-001", "name": "Burnham", "rank": "Captain"},
SARU-002 | Saru {"crew_id": "SARU-002", "name": "Saru", "rank": "Commander"},
]
Then in code:
tilly henceforth called= {
"crew_id": "TILLY-003",
"name": "Sylvia Tilly",
"rank": "Ensign",
"station": "Engineering"
}
crew henceforth called= [burnham, saru, tilly] # three crew members — three rows
print(crewaccess by position[0]access by name["name"]) # Burnham — first row, name column
print(crewaccess by position[2]access by name["rank"]) # Ensign — third row, rank column
Two coordinates: crew[0] selects the first row (the burnham dict), then ["name"] selects the name column from that row. crew[2] selects the third row (Tilly), then ["rank"] selects her rank.
Your Turn — Replit Exercise
Add this to your crew-registry Replit script. Run it.
tilly = {
"crew_id": "TILLY-003",
"name": "Sylvia Tilly",
"rank": "Ensign",
"station": "Engineering"
}
crew = [burnham, saru, tilly]
print(crew[0]["name"])
print(crew[1]["station"])
print(crew[2]["rank"])
Before you run: write down what you expect each line to print. Then run and check.
Now write one new print statement yourself — access crew[2]["crew_id"] (the third crew member's ID). This is a cell you haven't accessed yet.
Two coordinates, in order: first the row (position index in the list), then the column (key name in the dict). crew[1] is the second item — remember Python starts at 0.
Michael Burnham
Science
Ensign
crew[0]["name"] — row 0 is burnham, key "name" is "Michael Burnham". crew[1]["station"] — row 1 is saru, key "station" is "Science". crew[2]["rank"] — row 2 is tilly, key "rank" is "Ensign". Row index first, column key second — same two-coordinate logic as a SQL result set.
You can now represent a SQL result set in Python as a list of dicts, and access any cell by row index and column name.
Safe stop point — your code is saved in Replit.
This list of dicts IS a result set — the shape SQL returns in Python. Each dict is a row. Each key is a column name. crew[0]["rank"] is like SELECT rank FROM crew LIMIT 1.
Replit Workspace
Open your crew-registry project in Replit and code along as you read each mission.
Operation 2 Complete — Lieutenant
The manifest is built. Three crew members, stored as dicts inside a list — rows and columns, held in memory. The shape data travels in.
Operation 3: Standing Orders — Functions
You've been writing code that runs once. Functions let you name a task and run it on command — on any crew member, any time.
Mission 3.1 — Package Reusable Logic CORE
Python will gladly repeat the same logic 200 times — but only if you write it 200 times.
"Define" — to state the meaning of something formally. def is short for "define." You are defining what a function means — giving it a name and a recipe. A function is like a named recipe: you write it once and call it whenever you need it. SQL analogy: def is like creating a stored procedure or a dbt macro — write the transformation once, apply it anywhere.
Define a function with def function_name(argument):, then indent the body. The return keyword sends a value back to whoever called the function. After you define it, you call it by name: function_name(value). Python runs the body with value substituted for argument.
define a functiondef greet(crew_member):
hand back the resultreturn f"Welcome aboard, {crew_member['rank']} {crew_member['name']}."
print(greet(burnham)) # Welcome aboard, Captain Michael Burnham.
print(greet(saru)) # Welcome aboard, Commander Saru.
crew_member is the parameter — a placeholder. When you call greet(burnham), Python substitutes burnham for crew_member in the body. Define once, call with any crew member.
Your Turn — Replit Exercise
Add this to your crew-registry Replit script. Run it.
def greet(crew_member):
return f"Welcome aboard, {crew_member['rank']} {crew_member['name']}."
print(greet(burnham))
print(greet(saru))
print(greet(tilly))
Confirm the three greetings print correctly. Then write a new function yourself:
def station_report(crew_member):
# write the return statement here
# output should be: "[NAME] is stationed at [STATION]."
Call it on one crew member and print the result.
The return value is a string. Use an f-string and pull two keys from the dict: "name" and "station".
def station_report(crew_member):
return f"{crew_member['name']} is stationed at {crew_member['station']}."
print(station_report(burnham)) # Michael Burnham is stationed at Command.
The function accesses two keys from the dict and assembles them into a formatted string. Call it with any crew member dict and it works — that's reusability.
You can now define a named reusable function in Python and call it with any argument.
Safe stop point — your code is saved in Replit.
Mission 3.2 — Format a Display Name CORE
Change the format once, in one place, and every crew member on the viewscreen updates — that's what a function buys you.
"Format" — in everyday English, "format" means the shape or arrangement of something (document format, date format, file format). Here, we're choosing the display format for a crew member's identity line — the specific arrangement of rank, name, and station. There is no new Python keyword; the bridge is the concept of a function as a single source of truth for a format: one definition, used everywhere.
Building on def and return: a function that formats a crew member's data for display. It takes one argument (a crew member dict) and returns one formatted string. The challenge is assembling the string correctly from three dict keys — and the payoff is that one change in the function updates every output line automatically.
# Target output: "Captain Michael Burnham — Command"
define a functiondef format_display(crew_member):
rank henceforth called= crew_memberaccess by name["rank"]
name henceforth called= crew_memberaccess by name["name"]
station henceforth called= crew_memberaccess by name["station"]
hand back the resultreturn f"{rank} {name} — {station}"
print(format_display(burnham)) # Captain Michael Burnham — Command
print(format_display(saru)) # Commander Saru — Science
Three keys, one f-string, one function. Change the separator in the return line and all three outputs update — without touching any print call.
Your Turn — Replit Exercise
Add this to your crew-registry Replit script. Run it.
Then modify the function: change the — separator to | instead. Make the change in exactly one place — the return line inside the function definition. Do not touch any print calls.
def format_display(crew_member):
rank = crew_member["rank"]
name = crew_member["name"]
station = crew_member["station"]
return f"{rank} {name} — {station}" # change — to | here
print(format_display(burnham))
print(format_display(saru))
print(format_display(tilly))
Confirm all three lines update with one edit.
The only line you need to change is the return statement — replace — {station} with | {station}. If you find yourself editing a print call, you've changed the wrong line.
def format_display(crew_member):
rank = crew_member["rank"]
name = crew_member["name"]
station = crew_member["station"]
return f"{rank} {name} | {station}"
print(format_display(burnham)) # Captain Michael Burnham | Command
print(format_display(saru)) # Commander Saru | Science
print(format_display(tilly)) # Ensign Sylvia Tilly | Engineering
One change in the function body updated three outputs. That's the point: when display format changes — and it always does — you fix it in one place, not everywhere it's used.
You can now write a function that extracts and formats multiple fields from a dict and apply it consistently across all records.
Safe stop point — your code is saved in Replit.
Mission 3.3 — Hash a Business Key IMPORTANT
datavault4dbt never stores the business key you give it — it stores an unrecognisable fingerprint of it instead.
"Hash" — to chop finely, to mix into something uniform. In cryptography, a hash function turns any input into a fixed-length output. The same input always produces the same output. You can't reverse it. In Data Vault: business keys are hashed before storage so that parallel loading works without collision, and PII keys can be pseudonymized.
hashlib.sha256() is the hash function datavault4dbt uses — the same operation as SHA2(UPPER(TRIM(column)), 256) in Snowflake SQL. To use it: import hashlib, call hashlib.sha256(value.encode()).hexdigest(). The .upper().strip() normalizes the input first — identical to what UPPER(TRIM(...)) does in SQL. The output is a 64-character hex string, always the same for the same input.
import hashlib
define a functiondef hash_key(business_key):
hand back the resultreturn hashlib.sha256(
business_key.upper().strip().encode()
).hexdigest()
print(hash_key("BURNHAM-001"))
# 64-character hex string — always the same for the same input
print(hash_key("burnham-001")) # same output — because .upper()
print(hash_key("BURNHAM-001 ")) # same output — because .strip()
This is what SHA2(UPPER(TRIM(column)), 256) looks like in Python. Same operation, different language. datavault4dbt runs this in Snowflake; you're running the equivalent in Python.
SHA2(UPPER(TRIM(column)), 256)hashlib.sha256(
bk.upper().strip().encode()
).hexdigest()
SHA2(
UPPER(TRIM(column)),
256
)
Your Turn — Replit Exercise
Add this to your crew-registry Replit script. Run it.
import hashlib
def hash_key(business_key):
return hashlib.sha256(
business_key.upper().strip().encode()
).hexdigest()
print(hash_key(burnham["crew_id"]))
print(hash_key(saru["crew_id"]))
print(hash_key(tilly["crew_id"]))
Confirm three 64-character hex strings print. Then write these two lines yourself:
# Are these the same?
print(hash_key("BURNHAM-001") == hash_key("burnham-001"))
# How long is the output?
print(len(hash_key("BURNHAM-001")))
What do you expect before you run? Then run and confirm.
The .upper() inside hash_key() normalizes case before hashing — so "BURNHAM-001" and "burnham-001" produce identical inputs to sha256(). SHA2-256 always outputs 256 bits — how many hex characters is that?
True
64
True — because .upper() normalizes both inputs to "BURNHAM-001" before hashing. 64 — SHA2-256 produces 256 bits; each hex character encodes 4 bits, so 256 / 4 = 64 characters. This matches the CHAR(64) column type you've seen in datavault4dbt Hub schemas.
You can now generate a SHA2 hash key in Python — the same hash that datavault4dbt uses in Snowflake.
Safe stop point — your code is saved in Replit.
Replit Workspace
Open your crew-registry project in Replit and code along as you read each mission.
Operation 3 Complete — Lieutenant Commander
You've packaged logic into named functions, formatted crew data with a single source of truth, and generated datavault4dbt-identical hash keys. The crew registry now has reusable operations.
Operation 4: The Patrol — Loops
The manifest is built. Now walk it. Apply logic to every crew member in sequence.
Mission 4.1 — Walk the Manifest CORE
Two lines of Python can do what would take 200 separate print statements.
"for" — in everyday English, "for" introduces a scope: for every crew member aboard, check their badge. You mean: take each one in turn, do the same thing. Python's for loop does exactly that. The word was chosen because it maps directly to the English "for each [item] in [collection], do [action]" sentence structure. Reading for member in crew: out loud sounds like plain English because it is.
A for loop says: for each item in this collection, run this code. The indented block beneath the for line runs once per item, with member taking the value of each element in turn. You name the variable yourself — member is just a readable choice. SQL analogy: a for loop is what happens inside a database cursor — row by row processing. In SQL the database engine does it invisibly; in Python you write it explicitly.
for each onefor member belonging toin crew: # visit each dict in the list in turn
print(format_display(member)) # call format_display on each one
Output — one formatted line per crew member, in list order. member is just a name: it takes the value of each dict in crew in turn. After the loop finishes, member holds the last item. The indentation is what tells Python which lines are inside the loop.
Your Turn — Replit Exercise
Add this to your crew-registry Replit script. Run it.
for member in crew:
print(format_display(member))
Confirm all three crew members print. Then add a second line inside the loop body yourself — indented the same as the first print:
# Write this line yourself, indented to match:
print(member["crew_id"])
Run again. Each crew member should now print twice — once formatted, once just their crew ID.
Indentation is how Python knows a line is inside the loop. The second print must be indented the same number of spaces as the first — four spaces is standard. If it's at the left margin, Python runs it only once, after the loop ends.
for member in crew:
print(format_display(member))
print(member["crew_id"])
Output (one block per crew member):
Captain Michael Burnham — Command
BURNHAM-001
Commander Saru — Science
SARU-002
Ensign Sylvia Tilly — Engineering
TILLY-003
Both lines are indented equally — both run on every iteration.
You can now iterate over a list and apply logic to every item with a for loop.
Safe stop point — your code is saved in Replit.
Mission 4.2 — Filter the Manifest CORE
SQL's WHERE clause is just a Python if statement that someone hid inside the database engine.
"if" — the most direct word available. "If" in English states a condition: if it rains, bring an umbrella. In Python, if is identical: if this condition is true, run this code. The indented block only executes when the condition holds. The word was chosen because there is no better word — it is exactly what the English "if" means, nothing more.
Inside a for loop, an if statement makes the indented block conditional — it only runs for items that match. == is comparison (is this equal to that?). = is assignment (make this equal to that). They look nearly identical but do completely different things — confusing them is one of the most common Python bugs. This combination of for + if is the Python equivalent of a SQL WHERE clause.
# SQL: SELECT * FROM crew WHERE rank = 'Captain'
# Python equivalent:
for each onefor member belonging toin crew:
only whenif memberaccess by name["rank"] == "Captain":
print(format_display(member))
The inverse — filtering out a value — uses !=:
# SQL: SELECT * FROM crew WHERE rank != 'Captain'
for member in crew:
if member["rank"] != "Captain":
print(format_display(member))
In both cases, the loop visits every member. The if decides whether to act on each one. The database does the same thing — the WHERE clause doesn't skip rows upfront, it evaluates each row and discards non-matching ones.
if decides what to do with each oneYour Turn — Replit Exercise
Add this to your crew-registry Replit script. Run it.
# SQL: SELECT * FROM crew WHERE station = 'Engineering'
for member in crew:
if member["station"] == "Engineering":
print(format_display(member))
Then write your own filter yourself — no starter code:
# Write a loop that prints only crew members whose rank is "Commander"
Follow the same pattern: for member in crew: on the first line, then if member["rank"] == "Commander": indented beneath it, then the print indented one more level.
for member in crew:
if member["rank"] == "Commander":
print(format_display(member))
This is structurally identical to SELECT * FROM crew WHERE rank = 'Commander'. The for loop is the full-table scan; the if is the WHERE predicate; the print is the SELECT output. Same operation — different language.
You can now filter a list using if inside a for loop — the Python equivalent of a SQL WHERE clause.
Safe stop point — your code is saved in Replit.
Mission 4.3 — The Alert Mission IMPORTANT
Python can make a different decision for every crew member, every single run.
"random" — in everyday English, random means without pattern, unpredictable, not deliberate. A random event has no determining cause you can trace. random.choice() picks one item from a list without pattern — each call is independent of the last. This matters in data engineering: random sampling (pick 10% of rows for testing), A/B test splitting (randomly assign users to variants), and simulations (model uncertain outcomes) all depend on this property. Same word, same meaning — just applied to code.
import random at the top of the script makes the random module available. random.choice(list) picks one item from a list at random — the same item can be picked on consecutive calls. Each call is independent. Run the script multiple times and the output changes. That unpredictability is the feature, not a bug.
import random
alert_levels henceforth called= ["GREEN", "YELLOW", "RED"]
for each onefor member belonging toin crew:
alert henceforth called= random.choice(alert_levels) # one independent random pick per member
print(f"{member['name']}: {alert} alert") # output: name + their alert level
Run this three times. The output changes each time — different crew members get different alert levels on each run. random.choice() makes one independent decision per loop iteration.
Your Turn — Replit Exercise
Add this to your crew-registry Replit script. Run it.
import random
alert_levels = ["GREEN", "YELLOW", "RED"]
for member in crew:
alert = random.choice(alert_levels)
print(f"{member['name']}: {alert} alert")
Run it three times and confirm the output changes.
Then write this yourself — no starter code:
Modify the loop so that crew members whose rank is "Captain" can receive any of the three alert levels, but all other crew members can only receive "GREEN" or "YELLOW". A Captain leads into combat; the rest of the crew does not.
This combines everything from Operation 4: the for loop (4.1), the if filter (4.2), and random.choice() (4.3). Take it one layer at a time.
Use two different lists — one for Captains, one for everyone else. Inside the for loop, use an if/else to pick which list to pass to random.choice(). The structure is: for → if rank == "Captain" → choice from full list; else → choice from restricted list.
import random
captain_alerts = ["GREEN", "YELLOW", "RED"]
crew_alerts = ["GREEN", "YELLOW"]
for member in crew:
if member["rank"] == "Captain":
alert = random.choice(captain_alerts)
else:
alert = random.choice(crew_alerts)
print(f"{member['name']}: {alert} alert")
The for loop visits every crew member. The if checks rank. random.choice() picks from the appropriate list. Three concepts — one block. This is exactly how rule-based data transformations work in Python: loop over rows, apply conditional logic, produce output.
You can now use random.choice() inside a loop to make independent decisions per item — and combine loops, filters, and randomness in one block.
Safe stop point — your code is saved in Replit.
Replit Workspace
Open your crew-registry project in Replit and code along as you read each mission.
Operation 4 Complete — Commander
You've walked the manifest, filtered by condition, and made independent random decisions per crew member. Every concept from Operations 1 through 3 is now executable in a loop. One last operation remains.
Operation 5: The Bridge — Reading Python in the Wild
You've built a Python script from scratch. Now read Python that someone else wrote. This is what the job actually looks like.
Mission 5.1 — Read a dbt Python Model CORE
dbt models are SQL — except when they're Python.
"Model" — in everyday English means a simplified representation of something real — a model aircraft, a conceptual model of a system. In dbt, a model is a file that produces a table or view. A dbt Python model is a .py file that returns a DataFrame instead of a SQL query. Same concept — a representation that produces a table — different language. The word was chosen deliberately: whether the file is .sql or .py, what it models is always the same thing, a transformation that ends as a table.
When your crew registry needs to do something SQL cannot handle — calling an API, applying a Python library, or running complex Python logic — dbt lets you replace a .sql model file with a .py file. The .py version does the same job: it produces a table. But instead of writing a SELECT, you write a function that returns a DataFrame. A dbt Python model for the crew registry might read from stg_crew, apply format_display() and hash_key() using the same Python you just wrote, and return the result for dbt to load into Snowflake. The engineer writes Python; dbt handles everything that happens after the return.
define a functiondef model(dbt, session):
# dbt.ref() is the same as {{ ref() }} in SQL — looks up another model by name
df henceforth called= dbt.ref("stg_crew")
# return a DataFrame — dbt handles loading it to the warehouse
hand back the resultreturn df
Walk through each line: def model(dbt, session): declares a function named model that takes two arguments — dbt (connects this file to the dbt project and exposes ref()) and session (the live Snowflake or Databricks session for running queries). dbt.ref("stg_crew") looks up the staging model called stg_crew — the Python equivalent of {{ ref('stg_crew') }} in a SQL model. return df hands a DataFrame back to dbt, which loads it to the warehouse as a table or view.
Your Turn
Read this function out loud. Then write one sentence explaining what it does, as if a senior consultant asked you.
Start with: "This is a dbt Python model that..."
"This is a dbt Python model that reads the stg_crew staging model — the same way {{ ref('stg_crew') }} works in a SQL model — and returns it as a DataFrame for dbt to load to the warehouse."
You can now read a dbt Python model and explain what each line does in plain language.
Safe stop point — your code is saved in Replit.
Mission 5.2 — The TurboVault4dbt Connection BONUS
The most Python-adjacent thing in a typical Data Vault workflow is running a script you never have to read.
"Metadata" — comes from the Greek meta- (about) and the Latin data (things given). Metadata is data about data. In TurboVault4dbt, a Python dict that describes a Hub — its name, its business key, its source table — IS the metadata that TurboVault reads to generate the Hub model file. The dict doesn't store crew members; it stores information about what the Hub should look like. That's metadata: data that describes the structure, not the content.
TurboVault4dbt is a ScaleFree-built Python tool. Imagine describing your crew registry in writing: "I need a Hub called hub_crew, with a business key called crew_id, sourced from stg_crew." You fill that description into a spreadsheet; TurboVault reads it, builds a Python dict from each row, and generates the datavault4dbt model files automatically. The engineer describes the vault structure; TurboVault writes the code. The only Python you need to recognise is the dict shape in the worked example — you already know it from Operation 2.
# This is the shape of what TurboVault reads for one Hub
hub_metadata henceforth called= {
"hub_name": "hub_crew", # what the Hub will be called
"business_key": "crew_id", # the column that identifies each unique crew member
"source_table": "stg_crew", # where the data comes from
}
# TurboVault reads this and generates the dbt model file automatically
# You recognise this — it's a Python dict. Same structure as your crew member dicts.
A dict of strings that describes a Hub. Same Python type you built in Operation 2 — here it carries metadata instead of crew data.
"The Python script behind the TurboVault GUI reads metadata dicts that describe each Hub, Link, and Satellite, and generates the datavault4dbt model files automatically."
Your Turn
Answer these three questions in writing. No Replit needed.
- What Python type is
hub_metadata? - What does
hub_metadata['business_key']return? - If a senior consultant asked what TurboVault does, what would you say in one sentence?
For Q3: think about what problem TurboVault solves, not what code it runs.
dict"crew_id"- Something like: "TurboVault4dbt reads metadata from a spreadsheet and automatically generates the dbt model files for each Hub, Link, and Satellite — so you don't write them by hand."
You can now explain what TurboVault4dbt does in Python terms and in plain language.
Safe stop point — your code is saved in Replit.
Mission 5.3 — Final Mission: Read Your Own Script CORE
The most impressive thing you can do as a consultant is read your own code as if you didn't write it.
"Narrate" — in everyday English means to give a running account of events as they unfold — a narrator in a film, a play-by-play commentary. In a technical context, narrating your code means walking someone through what each block does, why it's there, and what it produces. It is one of the highest-value skills in consulting — more valuable than being able to write code from scratch, because it demonstrates both understanding and communication.
This is the Feynman technique applied to code: if you can explain each line to someone who didn't write it, you understand it. This mission contains no new Python. The challenge is the narration — naming what each block does, connecting it to the Data Vault concepts underneath. The complete crew registry script is below. You wrote every piece of it across Operations 1 through 4.
import hashlib # provides SHA2 hash function — same as datavault4dbt uses
import random # provides random.choice() for alert assignment
# Operation 1-2: variables, data types, dicts, and the crew list
captain henceforth called= "Michael Burnham" # a standalone string variable (str)
crew_count henceforth called= 204 # a standalone int variable
shields_up henceforth called= True # a standalone bool variable
burnham henceforth called= {"crew_id": "BURNHAM-001", "name": "Michael Burnham", "rank": "Captain", "station": "Command"}
saru henceforth called= {"crew_id": "SARU-002", "name": "Saru", "rank": "Commander","station": "Science"}
tilly henceforth called= {"crew_id": "TILLY-003", "name": "Sylvia Tilly", "rank": "Ensign", "station": "Engineering"}
crew henceforth called= [burnham, saru, tilly] # the manifest — a list of dicts (one row per crew member)
# Operation 3: functions — named reusable transformations
define a functiondef format_display(member):
"""Returns a formatted display name: RANK NAME — STATION"""
hand back the resultreturn f"{member['rank']} {member['name']} — {member['station']}"
define a functiondef hash_key(business_key):
"""SHA2(UPPER(TRIM(bk)), 256) — same hash as datavault4dbt uses in Snowflake"""
hand back the resultreturn hashlib.sha256(
business_key.upper().strip().encode() # normalise then encode to bytes
).hexdigest() # return 64-character hex string
# Operation 4: loops — apply logic to every crew member
for each onefor member belonging toin crew:
print(format_display(member)) # display line per member
print(f" Hash key: {hash_key(member['crew_id'])}") # their vault hash key
# Operation 4.3: the alert mission — random decision per member
alert_levels henceforth called= ["GREEN", "YELLOW", "RED"]
for each onefor member belonging toin crew:
alert henceforth called= random.choice(alert_levels) # one independent random pick
print(f"{member['name']}: {alert} alert")
The Data Vault structure hidden inside this script: crew is a Hub — it stores unique crew members identified by their business key. burnham, saru, and tilly are Hub records — each one is a row in the Hub. format_display is a transformation — the kind of logic that lives in a Business Vault view. hash_key IS the datavault4dbt hash function, implemented in Python rather than SQL.
Your Turn — Replit Exercise
By now the full script should already be assembled in Replit from all previous missions. Confirm it runs and produces output for all three crew members. Then narrate: walk through the script block by block, out loud or in writing. For each block, name: (1) what Python concept it uses, (2) what it does, (3) what Data Vault concept it maps to.
Work block by block in the order they appear. For each one: Python concept first, then what it does, then the DV mapping. If you get stuck on the DV mapping, skip it and come back — the Python narration is the priority.
Block 1 — imports: Python concept: import statement. What it does: loads the hashlib and random standard-library modules so their functions are available. DV mapping: no direct mapping — setup only.
Block 2 — standalone variables: Python concept: variable assignment, data types (str, int, bool). What it does: stores three named values. DV mapping: these are attribute-level data — the kind of values that live in a Satellite column.
Block 3 — crew dicts and list: Python concept: dict literals, list literal. What it does: defines one dict per crew member (each dict is a row), then collects them into a list (the full manifest). DV mapping: crew is the Hub. Each dict is a Hub record. crew_id is the business key.
Block 4 — format_display function: Python concept: function definition, f-string, key access. What it does: takes a crew dict and returns a formatted string. DV mapping: a Business Vault transformation — a computed view over raw Hub data.
Block 5 — hash_key function: Python concept: function definition, method chaining, hashlib.sha256. What it does: normalises the business key (upper, strip) and returns its SHA2-256 hex digest. DV mapping: this IS the datavault4dbt hash key generation — SHA2(UPPER(TRIM(bk)), 256) — in Python.
Block 6 — display loop: Python concept: for loop, function calls inside loop. What it does: iterates over every crew member and prints their display name and hash key. DV mapping: reads from the Hub and applies two transformations per row.
Block 7 — alert loop: Python concept: for loop, random.choice(), conditional logic (from 4.3 version). What it does: assigns an alert level to each crew member independently. DV mapping: a transient status — the kind of value that would be stored in a Satellite with a load timestamp.
Commander ◊◊◊◊◊
You can read and write Python. You can read dbt models that generate Data Vault structures. You built a crew registry that IS a Data Vault — and you can explain every line.
Tutorial complete.
Ship's Codex
Your accumulated reference — every Python concept from all 15 missions.
name = "value" # assign a value
name = 42 # reassign anytime
print(name) # retrieve and display
x = "text" # str — text in quotes
x = 42 # int — whole number
x = 3.14 # float — decimal number
x = True # bool — True or False
type(x) # check what type x is
record = {"key": "value", "other_key": 42}
record["key"] # access a value by key
items = [item1, item2, item3]
items[0] # first item (0-indexed)
len(items) # count of items
rows = [
{"col1": "val1", "col2": "val2"},
{"col1": "val3", "col2": "val4"},
]
rows[0]["col1"] # first row, first column
def function_name(argument):
return result
output = function_name(input)
for item in collection:
# do something with item
for item in collection:
if item["key"] == "value":
# only items that match
import hashlib
def hash_key(bk):
return hashlib.sha256(bk.upper().strip().encode()).hexdigest()
import random
random.choice(["A", "B", "C"]) # picks one at random
f"{variable} and literal text"
f"{dict['key']} more text"