Chapter 9: AI Act and Data Vault — Data Vault Foundations

AI SPAGHETTI

Important

A bank has 3 teams pulling the same customer data from 3 different sources, training 3 different AI models — and nobody documented which data was used, which version, or whether anyone checked for bias. ScaleFree calls this AI Spaghetti. The EU AI Act makes it a compliance violation.

WHAT AI SPAGHETTI ACTUALLY LOOKS LIKE

Team A pulls customer transaction data from the core banking system, trains a fraud detection model. Team B pulls the same customer data from a different extract, trains a credit scoring model. Team C pulls customer data from a CRM export, trains a churn predictor. None of them documented which data they used, which version, or whether anyone checked for bias.

Team A’s fraud model uses a customer table from January. Team B’s credit model uses the same table from March. The January version had 50,000 rows with a labeling error in the “high risk” column. Nobody caught it because nobody tracks which version went where.

An auditor asks: “What data trained this fraud model?” The answer is a shrug and a Slack thread from 6 months ago.

LONE WOLF DEPLOYMENTS

A data scientist pulls data directly from Salesforce, trains a model on their laptop, deploys it to production. Bypasses the entire data warehouse. Now you have GDPR exposure (was there a legal basis for this processing?) AND AI Act exposure (where’s the documentation for Article 10?). Nobody in the data team even knows this model exists.

THE FIX

Route all data — including AI training data — through the Data Vault layers. Source → Staging → Raw Vault → Business Vault → AI-Mart. Every row has record_source (where it came from) and load_date (when it arrived). Every transformation is documented by the layer it passes through.

The spaghetti becomes a pipeline with a paper trail.

+ What’s the AI Act’s actual name and legal form? Explore

Regulation (EU) 2024/1689 — entered into force 1 August 2024. The world’s first comprehensive legal framework for artificial intelligence.

Despite being called an “Act,” it’s technically a Regulation — directly binding in all 27 member states, no national transposition needed. Same legal form as GDPR. Extraterritorial scope — applies to any organization whose AI system’s output is used within the EU, regardless of where that organization is based.

THE PLAYERS

Core

The AI Act doesn’t regulate AI itself. It regulates the people who build and use AI systems — proportional to the risk those systems pose.

LANGUAGE BRIDGE

“Provider” — the entity that develops the AI system or places it on the market. The one who designed the thing. Think: the factory.

“Deployer” — the entity that uses the AI system professionally in their business context. Think: the driver.

Example: HEC Paris buys an AI admissions tool (deployer). The company that built and sold the tool is the provider. If HEC then fine-tunes the model significantly, they might become a provider too. A company can be both simultaneously.

Providers carry the heavier compliance burden — conformity assessments, documentation, post-market monitoring. Deployers must implement human oversight, inform users, and keep logs.

IS / IS NOT — Provider and Deployer

IS: Provider IS the entity that develops the AI system or places it on the market.

IS NOT: Provider IS NOT necessarily a tech company. HEC Paris could become a provider if they fine-tune an AI tool significantly.

IS: Deployer IS the entity that uses the AI system professionally in their business.

IS NOT: Deployer IS NOT a consumer using AI casually — the AI Act’s deployer definition requires professional use.

IS NOT: A company IS NOT always one or the other — it can be both simultaneously.

LANGUAGE BRIDGE

“Regulation” vs “Directive” — The AI Act is technically a Regulation. That matters: a Regulation is directly binding in all 27 member states, no national transposition needed. A Directive (like the old Data Protection Directive 95/46/EC that GDPR replaced) requires each country to write its own implementing law.

The AI Act, like GDPR, is a Regulation — same text applies in France, Germany, Ireland, everywhere. No room for 27 different interpretations.

IS / IS NOT — Regulation vs Directive

IS: A Regulation IS directly binding in all member states from the day it takes effect.

IS NOT: A Regulation IS NOT a Directive — Directives require each country to write implementing national law.

IS: The AI Act IS a Regulation (like GDPR) — one text, 27 countries, no transposition.

KEY CONCEPT

“The AI Act doesn’t regulate AI itself — it regulates the people who build and use AI systems, proportional to the risk those systems pose.”

RISK TIERS

Core

4-Layer Explanation — The Four Risk Tiers

L1Plain language — for anyone↓

Think of it like fire safety codes. A candle on your desk — no rules. A restaurant kitchen — must label the fire exit. A hospital — fire alarms, sprinkler systems, evacuation plans, annual inspections. A building full of explosives — you can’t build that at all.

The AI Act works the same way: four levels based on how much damage the AI can do to people’s lives. The higher the risk, the more rules you follow.

L2With mechanism — how it works↓

Unacceptable Risk — Social scoring by a government (rating citizens’ behavior to restrict services) — banned outright, Article 5. Subliminal manipulation also banned. No compliance path — these are prohibited.

High Risk — A bank’s AI deciding who gets a loan, or an HR tool screening CVs. These touch people’s livelihoods: conformity assessments, documented data governance (Article 10), human oversight, and a human who can override the AI.

Limited Risk — A chatbot on a retail website. Just has to tell the user “you’re talking to an AI.” Deepfakes and AI-generated content also fall here — transparency labels required.

Minimal Risk — A spam filter, a video game NPC. No obligations at all.

L3Technical — full detail↓

Article 5 prohibitions are already in force since Feb 2025 — social scoring and subliminal manipulation are banned now.

Two routes to high-risk classification:

Annex II (product safety) — AI embedded in already-regulated products: medical devices, vehicles, machinery, toys. Enforcement: August 2027.

Annex III (standalone) — AI systems in 8 specific sensitive domains (see table below). These trigger Article 10 data governance. Enforcement: August 2026.

The staggered timeline matters: financial services clients (Annex III Domain 5) must comply a full year before manufacturing clients with product-embedded AI (Annex II).

L4Expert — nuance & edge cases↓

Annex III Domain 5 (credit scoring, insurance risk) is where most ScaleFree financial services clients land — the domain triggering Article 10 obligations. Manufacturing clients may hit Domain 2 (critical infrastructure).

Digital Omnibus proposal (Nov 2025): The Commission proposed delaying Annex III enforcement by up to 16 months because harmonised standards aren’t ready. Not yet adopted. Germany’s Federal Cabinet approved the KI-MIG in February 2026, designating the Bundesnetzagentur as primary AI market surveillance authority — though the law is still completing its parliamentary process.

ScaleFree’s clients are in limbo: the law exists, the enforcement timeline is shifting, and the companies that built compliant infrastructure early get competitive advantage regardless of when the deadline lands.

IS / IS NOT — Risk Tiers

IS: High-risk IS a classification based on potential impact on people’s rights, health, or livelihoods — not whether the AI is “dangerous” in everyday terms.

IS NOT: Unacceptable IS NOT just “very high-risk.” There is no compliance path — these systems are banned outright.

IS: Limited IS systems whose only AI Act risk is lack of transparency (chatbots, deepfakes).

IS NOT: Limited IS NOT a “lighter version” of High. They are structurally different categories — Limited needs a disclosure label, not conformity assessments.

ANNEX III — 8 HIGH-RISK DOMAINS

#	Domain	Examples
1	Biometrics	Remote biometric identification, emotion recognition
2	Critical infrastructure	Safety components in digital infrastructure, road traffic, utilities
3	Education	School admissions, learning evaluation, test proctoring
4	Employment	Recruitment, CV screening, promotion/termination, performance monitoring
5	Essential services	Credit scoring, insurance risk, public benefit eligibility
6	Law enforcement	Risk of victimization, evidence reliability, reoffending risk
7	Migration & border	Visa/asylum examination, security/health risk assessment
8	Justice & democracy	Legal research AI, election influence

ENFORCEMENT TIMELINE

Date	What Happens	Status
1 Aug 2024	AI Act enters into force	Done
2 Feb 2025	Prohibited practices banned (Art. 5) + AI literacy (Art. 4)	In effect
2 Aug 2025	GPAI obligations + national authorities operational	In effect
2 Aug 2026	High-risk system obligations enforceable (Annex III)	Upcoming
2 Aug 2027	Product-embedded AI (Annex II) + legacy GPAI compliance	Future

+ What are the penalties? How do they compare to GDPR? Explore

Violation	Max Fine	Turnover %
Prohibited practices (Art. 5)	EUR 35,000,000	7%
High-risk non-compliance (Arts. 9–15)	EUR 15,000,000	3%
Incorrect information	EUR 7,500,000	1%

Compare: GDPR’s maximum is EUR 20M / 4%. The AI Act’s top tier is nearly double. The EU is signaling that AI non-compliance is treated more seriously than data protection non-compliance.

ARTICLE 10

Core

Article 10 is the article that turns a data warehouse into compliance infrastructure. Every requirement it lists maps to something Data Vault already does.

4-Layer Explanation — Article 10 and Data Vault

L1Plain language — for anyone↓

Say a bank trains an AI to score loan applications. Article 10 says: you must know where that training data came from (which source system, when it was extracted). You must document every transformation (how raw data became the features the model consumed). You must check whether the data is biased. And you must keep immutable records of all of this for the auditor.

Data Vault does every one of these things as part of its base design — not as an add-on.

L2With mechanism — how it works↓

Origin tracking — DV puts record_source and load_date on every single row. Article 10 requires knowing where data came from — DV provides it by default.

Transformation documentation — Data passes through staging, raw vault, business vault, and mart — each layer is a documented step.

Bias examination — Profiling queries on the Business Vault (e.g., “what percentage of this training set is female vs male?”).

Immutable records — Satellites are append-only. Yesterday’s data is still there next to today’s data. You never overwrite history.

L3Technical — full detail↓

The full Article 10 → DV mapping:

Data provenance — 10(2)(aa)

→

record_source + load_date on every row

Transformation docs — 10(2)(c)

→

Layered arch: staging → raw vault → BV → mart

Bias examination — 10(2)(f)

→

Profiling queries on Business Vault

Data quality — 10(3)

→

Validation gates at AI-Mart / Feature Mart

Contextual appropriateness — 10(4)

→

Multi-source integration + record_source tracking

Immutability for audit — Arts. 11-12

→

Append-only Satellite design

Logging — Art. 12

→

AI Log Loading back into the warehouse

Metadata and semantics — 10(2)(d)

→

Data catalog / semantic layer on Business Vault

L4Expert — nuance & edge cases↓

Article 10(5) opens a narrow door: you can process special category data — race, health, religion — for bias detection purposes, even though GDPR Article 9 normally prohibits it. But it’s a dependent derogation — it activates GDPR Art. 9(2)(g), not a standalone override. Only when anonymized or synthetic data won’t do the job, and strict safeguards apply.

This matters for ScaleFree clients: checking whether a credit scoring model discriminates by ethnicity requires looking at ethnicity data — Article 10(5) is the legal basis.

Article 10(6): Even non-training systems (rule-based expert systems that don’t learn from data) must still govern their testing datasets. The compliance chain is universal.

SCALEFREE’S AI-MART

A specialized Information Mart at the very top of the DV stack, right before data reaches the AI model. This is where quality checks happen, bias audits run, and representativeness is validated. Data gets “cleaned, integrated, and approved by data experts” before the model ever sees it.

The AI-Mart is the compliance enforcement point — not the Raw Vault (which stores everything as-is), not the Business Vault (which integrates but doesn’t gatekeep for AI purposes).

AI LOG LOADING

After the AI model runs, ScaleFree recommends loading its logs back into the warehouse: what data went in, what features were derived, what the model parameters were, what decisions it made, with what confidence scores. This creates the Article 12 audit trail.

The warehouse becomes a closed loop — data flows out to the AI, and the AI’s behavior flows back in.

IS / IS NOT — Article 10 and AI-Mart

IS: AI-Mart IS the top Information Mart layer of the existing DV stack — same warehouse, governed output layer.

IS NOT: AI-Mart IS NOT a separate database, a separate product, or a bolt-on.

IS: Article 10 IS a data governance requirement — about the DATA used to train the AI.

IS NOT: Article 10 IS NOT a requirement about the AI model itself, its architecture, or its output accuracy.

KEY CONCEPT

“Article 10 requires data provenance, transformation documentation, bias examination, and immutable audit trails. Data Vault delivers all of these by design — record_source, load_date, layered architecture, append-only Satellites. The compliance capability is the base architecture, not a bolt-on.”

DUAL COMPLIANCE

Core

GDPR says delete the data. The AI Act says keep the records. Both are law. Both apply to the same system. Here’s what that actually looks like.

TENSION

1. Bias detection needs sensitive data

A German insurance company’s credit scoring AI rejects 40% more applicants from certain postal codes. To check whether the model discriminates by ethnicity, someone needs to look at ethnicity data. But GDPR Article 9 says: no processing special category data without explicit consent or another legal basis.

Article 10(5) of the AI Act creates a narrow exception: process sensitive data for bias detection — but only when anonymized or synthetic data won’t work, and only with strict safeguards (access controls, time limits, deletion after use).

In practice: the bias audit team gets temporary, logged access to ethnicity data in the Business Vault, runs profiling queries, documents the results, and access is revoked. The PII Satellite holds the sensitive data; access is controlled at the mart level.

ALIGNMENT

2. Automated decisions need human oversight

A bank’s AI auto-rejects a loan application. Under GDPR Article 22: the applicant has the right not to be subject to a solely automated decision with legal effects. Under AI Act Article 14: the deployer must implement human oversight — a person who can understand, override, and stop the AI.

Both laws push the same direction. ScaleFree’s architecture supports this: the AI-Mart logs every decision with its inputs and confidence score, so the human reviewer has the data to actually override meaningfully — not just rubber-stamp.

DIFFERENCE

3. Transparency has two different audiences

A company deploys a chatbot that recommends financial products. GDPR says: tell the individual that their data is being processed, by whom, for what purpose, and their rights. AI Act says: tell the deployer how the system works, what its limitations are, and what data it was trained on.

Both transparency obligations must be satisfied, but they point in different directions — one toward the end user, one toward the organization. A company that only does GDPR transparency (privacy notice) hasn’t touched AI Act transparency (technical documentation).

SYNERGY

4. Documentation builds on GDPR foundations

A company already maintains GDPR Article 30 records of processing activities. AI Act Articles 11-12 require technical documentation and logging for high-risk AI. The underlying data documentation overlaps significantly — both need to know what data exists, where it comes from, who accesses it.

A company with solid GDPR Article 30 records has maybe 60% of the AI Act documentation foundation already built. The DV architecture helps here: record_source, load_date, and the layered transformation trail serve both.

UNRESOLVED

5. Delete vs. keep — the hard one

A customer whose data was used to train the insurance company’s credit scoring model submits a GDPR Article 17 right-to-erasure request. GDPR says: delete their personal data. The AI Act says: maintain audit trails of your training data for regulatory review.

You must prove what data trained the model AND delete this person’s data. Nobody has fully resolved this in court yet.

The emerging approach: delete from the PII Satellite (satisfies GDPR), keep pseudonymized metadata in non-PII Satellites — “a 34-year-old male from postal code 10115 was in the training set” without the name or identifiers (satisfies AI Act audit trail). If technically feasible, retrain the model without that individual’s data.

IS / IS NOT — GDPR + AI Act

IS: GDPR + AI Act IS two separate legal obligations that apply simultaneously and compound each other.

IS NOT: GDPR + AI Act IS NOT one replaces the other. The AI Act does not create a GDPR exemption for AI systems.

IS: PII Satellite isolation IS the architectural mechanism to delete personal data (GDPR) while retaining pseudonymized audit records (AI Act).

IS NOT: PII Satellite isolation IS NOT a full solution — model retraining is still required if feasible.

+ Incident reporting: GDPR vs AI Act timelines? Explore

Two clocks: GDPR data breach notification = 72 hours to the supervisory authority. AI Act serious incident = 15 days to the market surveillance authority (2 days if critical infrastructure).

A single event — say, a data breach that exposes AI training data — can trigger both clocks simultaneously. ~90% of Annex III high-risk AI involves personal data. A single system might need both a DPIA (GDPR) and a conformity assessment (AI Act).

KEY CONCEPT

“GDPR and the AI Act compound each other — most high-risk AI processes personal data, so you need both. Data Vault handles this through PII Satellite isolation — delete for GDPR, retain pseudonymized audit trail for AI Act.”

ARCHITECTURE

Core

4-Layer Explanation — ScaleFree’s AI Architecture

L1Plain language — for anyone↓

ScaleFree’s position is simple: if your company uses AI, all the data feeding that AI must flow through the governed data warehouse. No data scientist pulls data from Salesforce on their laptop. No team builds a side pipeline from a CSV export. Everything goes through the same layers, with the same documentation, the same audit trail.

L2With mechanism — the 4 recommendations↓

1. AI-Mart as the last governed layer before data reaches any AI model — where quality, bias, and representativeness get checked.

2. AI Log Loading — after the model runs, feed its logs (inputs, outputs, confidence scores, parameters) back into the warehouse.

3. Data lineage as compliance infrastructure — not a nice-to-have. Article 10 makes it a legal requirement.

4. Access controls at the mart level — not everyone gets to pull data for AI training; access is logged and governed.

L3Technical — the full pipeline↓

Source

→

Staging

→

Raw Vault

→

Business Vault

→

AI-Mart

→

AI Model

→

AI Log Loading

ScaleFree’s framing to clients: “If you cannot explain why your AI gave a specific answer or which data it used, you could face fines up to EUR 15 million or 3% of global turnover.” That line converts architectural decisions into budget approvals.

L4Expert — company context↓

Christof Wenzeritt (co-CEO) ran a Feb 2026 webinar on the “AI-Enabling Data Platform” — DV 2.0 as the foundation for trustworthy AI. a senior consultant speaks at conferences about trustworthy AI.

The argument isn’t that DV was designed for AI — it wasn’t. The argument is that DV’s existing properties (lineage via record_source, historization via append-only Satellites, separation of concerns via Hub/Link/Sat) happen to be exactly what the AI Act requires. Compliance as a side effect of good architecture.

Lina Sibbel (ScaleFree team) presented in Oct 2025 on agentic AI: treating AI agents as identity-bearing entities with role-based access control — the agent gets a record in the Hub just like any other business entity.

AI LITERACY — ARTICLE 4

Applies to ALL AI systems, not just high-risk. Already in effect since Feb 2025. No standalone fine — but non-compliance is an aggravating factor if you violate other provisions.

For BI consultants working with European clients, understanding the AI Act isn’t optional professional development — it’s a legal obligation. When advising clients on AI-enabling data platforms, consultants are expected to know the regulatory requirements.

KEY CONCEPT

“ScaleFree positions Data Vault 2.0 as the foundation for AI Act compliance. The AI-Mart is the compliance enforcement point — the last governed layer before data reaches the AI model. Quality checks, bias auditing, and representativeness validation all happen there. And AI logs get loaded back into the warehouse for Article 12 audit trails.”

THE PROFESSIONAL ADVANTAGE

Core

KEY CONCEPT — THE PROFESSIONAL ADVANTAGE

The ability to translate between legal obligation and architectural implementation is rare. When Article 10 requires bias examination, that means profiling queries on the Business Vault and validation gates on the AI-Mart. That translation layer — from regulation to architecture — is what European clients increasingly need from their data consultants.

EXERCISES

Core

WORKED EXAMPLE

“A hospital’s AI system analyses radiology scans and flags potential cancer for a radiologist to review.”

High Risk (Annex III Domain 1 — Biometrics / Annex II if embedded in a medical device)

The AI output directly influences a clinical decision that can affect a patient’s life. Even though a human reviews it, the AI’s flag shapes what the radiologist looks for. The stakes (life/health) place it firmly in high-risk. It is not Unacceptable (no social scoring or manipulation), not Limited (not merely informational), not Minimal (consequences are significant).

Exercise 1 — Risk Tier Classification

Match each scenario to the correct risk tier: Unacceptable / High / Limited / Minimal.

A social media platform’s algorithm decides which job ads to show users based on age and gender

A chatbot answers customer service questions about product returns

A government system scores citizens’ social behavior to determine access to public services

A bank’s AI evaluates loan applications and determines creditworthiness

A video game uses AI to generate NPC dialogue

WORKED EXAMPLE

Article 10 requirement: “The high-risk AI system shall use data that is relevant, sufficiently representative, and to the best extent possible, free of errors.”

Validation gates at AI-Mart / Feature Mart

“Relevant, representative, free of errors” means you CHECK before the AI sees the data. That check cannot happen in the Raw Vault (stores everything as-is) or the Business Vault (integrates but does not gatekeep for AI). The AI-Mart is the last governed layer — where quality, representativeness, and bias audits run.

Exercise 2 — Article 10 → DV Feature Matching

Match each Article 10 requirement to the Data Vault feature that addresses it.

Data must have documented origin — source system, collection date

All data preparation steps must be documented

Data must be examined for biases

Records must be preserved for regulatory review

Data quality must be verified before AI consumption

AI system usage must be logged with timestamps

WORKED EXAMPLE

A bank’s AI approves or rejects mortgage applications. A customer complains their application was rejected without explanation.

GDPR Art. 22 + AI Act Art. 14 — Alignment

GDPR: Article 22 — the customer has the right not to be subject to solely automated decisions with legal effect. They can request human review.
AI Act: Article 14 — the deployer must implement human oversight. A person must be able to understand and override the AI.
Tension? None — both push the same direction. The bank needs a human reviewer who can actually understand and override the AI, AND the customer has a right to request that review.
Architecture: AI-Mart logs each decision with inputs and confidence score — the human reviewer has the data to override meaningfully.

Exercise 3 — Dual Compliance Scenario

A German insurance company uses an AI to assess health insurance risk (Annex III high-risk). The AI is trained on data from their Data Vault. A customer submits a GDPR Article 17 right-to-erasure request.

Walk through: (a) which AI Act obligations apply, (b) which GDPR obligations apply, (c) what the conflict is, and (d) how you’d advise the client.

YOUR ANSWER

REFERENCE ANSWER

(a) AI Act obligations:
- Article 10 data governance: training data must have documented
  lineage (provenance, quality, bias checks).
- Article 12 logging: system must automatically keep operation records.
- Article 14 human oversight: deployer must have a person who can
  understand and override the AI's risk assessments.

(b) GDPR obligations:
- Right to erasure (Art. 17): customer can request deletion.
- Data minimization (Art. 5(1)(c)): only necessary data collected.
- Lawful basis (Art. 6 + Art. 9 for health data): needed a valid
  legal basis to use health data for AI training.

(c) The conflict:
AI Act requires immutable audit trail of training data — "show the
regulator what data trained this model." GDPR requires deletion on
request. You cannot show "this data trained the model" if you've
deleted the data. Both obligations point in opposite directions.

(d) Advice — PII Satellite isolation:
- DELETE identifiable data from PII Satellite (satisfies GDPR Art. 17).
- PRESERVE pseudonymized records in non-PII Satellites:
  "a 42-year-old female in postal code 80331 was in the training set,
  loaded 2025-06-15 from source Y" — no name, no re-identifiable
  attributes (satisfies AI Act audit trail).
- If feasible: retrain the model without this individual's data.
- If not feasible: document why (cost, model size) and what steps
  were taken to mitigate. Document in conformity assessment file.

The EU AI Act& Data Vault

Exercise 1 — Risk Tier Classification

Exercise 2 — Article 10 → DV Feature Matching

Exercise 3 — Dual Compliance Scenario

YOUR ANSWER

The EU AI Act
& Data Vault