Skip to content

Data Model

Everything lives in one SQLite file in WAL mode, with a busy_timeout set so readers never error out underneath the writer. All the SQL is tucked away in app/db.py, which is what makes a future Postgres swap a contained job: you reimplement db.py and keep the same function signatures.

Tables

flowchart LR
    subgraph truth["Event log · source of truth · append-only"]
        msg[("message")]
        vex[("vex_log")]
    end
    subgraph cache["Read model · rebuildable cache"]
        ss[("student_state")]
        te[("trigger_event")]
    end
    cur[("ingest_cursor")]
    vex -->|"project / replay"| ss
    vex -->|"project / replay"| te
    cur -. "tracks consumed position" .-> vex
Group Tables Role
Event log (truth) message, vex_log append-only raw events, unique source_event_id
Cursor ingest_cursor how far we've consumed
Read model (cache) student_state, trigger_event the materialized projection, rebuildable
Roster tracked_student the allowlist, plus the presence/picked toggles
Researcher input note, pick_event observations and the pick/unpick history
Control meta cross-process signals (reset, polling, disabled triggers)

Tip

The read-model tables are just a cache of the event log. Delete them, or hit Reset, and they rebuild from vex_log to exactly the same state.

Two Contracts That Have To Hold

db.py enforces two contracts that the rest of the system leans on:

Datetime Contract

Timestamps are stored UTC-naive in fixed-width %Y-%m-%d %H:%M:%S.%f format. Because the width is fixed, comparing the strings is the same as comparing the times, so the cursor and cutoff SQL (ORDER BY started_at, resolved_at >= cutoff) work directly on the stored strings. Two helper functions are the only place this conversion happens.

JSON Contract

The runs, episodes, and detail columns are stored as JSON text and go through json.loads / json.dumps helpers. Where the daemon needs to query inside a blob, it uses SQLite's json_extract, for example the big-rewrite per-run dedupe on $.run_index.

Event Log As Truth

vex_log is append-only, and each row carries a unique source_event_id that's what makes ingestion idempotent (see Write path). Everything else, student_state and trigger_event, is just a projection built from that log. That's the property that lets you treat the derived tables as a disposable cache, and it's why reset and recovery are so simple.

Why SQLite

Reason Detail
Single host, single writer The daemon is the only writer, which is exactly what SQLite WAL is good at.
Tiny data A classroom's worth of events is megabytes, not gigabytes.
Zero setup No server to install, no connection string to manage. A laptop runs it as-is.

The catch is a ceiling on write concurrency, but the single-writer design never gets near it. And the door to Postgres stays open because all the SQL is behind db.py.