GDPR Art 25GDPR Art 30OpenLineagePrivacy by DesignDeep Prototype

DataLineage — Column-Level Data Lineage

28 columns mapped source → transformations → destinations. Each column carries data classification (direct PII / sensitive / cardholder / aggregate / indirect), every transformation that touches it (PII-scrub, hash, anonymize, redact, generalize), every downstream consumer (warehouse, ML training, partner shares). Surfaces 4 columns flowing raw to ML training without redaction + 2 lineage gaps where the source was lost during platform migration.

DataLineage — Column-Level Data Lineage preview
Open live →

What it is

The shape behind every modern data-lineage platform — Atlan, Collibra, OpenLineage, dbt-style lineage. PIIScout (batch 9) maps each column at the schema level. DataLineage maps where each column FLOWS — from source through every transformation to every destination.

What’s in it

  • 28 columns across the realistic SaaS surface — customers (email, phone, DOB, SSN), payments (card_number dropped, last4 retained), addresses (street, geo_lat), orders (IP), support_tickets, employees (payroll, performance), analytics_events (user_id_hash, session_id, fingerprint), search_history, and partner data shares.
  • Per-column lineage flow rendered as source → [transformations] → destinations:
    • Source identifies origin table + column
    • Transformations include the operation (sha256, generalize, truncate, tokenize, drop) and whether it produced a redacted output
    • Destinations are downstream services (analytics warehouse, ML training, partner share, audit log) with per-destination ok/not-ok flag
  • Worst-offender findings:
    • DL-002 customers.phone — flows raw to events_raw + ML training without redaction (PIIScout C002)
    • DL-008 addresses.geo_lat — 7-decimal precision flows raw (identifies a single building per Sweeney 2000)
    • DL-024 search_history.query — raw queries flow to ML training (users typing emails into search)
    • DL-026 customers.notes — LINEAGE GAP: pre-2022 column with unknown downstream paths after platform migration
  • Cross-tool callbacks — every column references its PIIScout entry, RtbfFlow path, and IncidentLog incidents.

Why this shape

GDPR Art 25 (privacy by design) + Art 30 (RoPA) + CCPA §1798.140 demand the visibility DataLineage prototypes — not just “does this column exist” but “where does this column travel and is it transformed appropriately at each hop.” The killer audit finding: a sensitive column flowing untransformed into ML training data.

How it ships

Single HTML file, ~22KB. Zero dependencies. 28 columns × per-column flow renderer + status classifier in 240 lines of vanilla JavaScript.

Open the tool →