Data Model¶
Everything lives in one SQLite file in WAL mode, with a busy_timeout set so readers
never error out underneath the writer. All the SQL is tucked away in app/db.py,
which is what makes a future Postgres swap a contained job: you reimplement db.py
and keep the same function signatures.
Tables¶
flowchart LR
subgraph truth["Event log · source of truth · append-only"]
msg[("message")]
vex[("vex_log")]
end
subgraph cache["Read model · rebuildable cache"]
ss[("student_state")]
te[("trigger_event")]
end
cur[("ingest_cursor")]
vex -->|"project / replay"| ss
vex -->|"project / replay"| te
cur -. "tracks consumed position" .-> vex
| Group | Tables | Role |
|---|---|---|
| Event log (truth) | message, vex_log |
append-only raw events, unique source_event_id |
| Cursor | ingest_cursor |
how far we've consumed |
| Read model (cache) | student_state, trigger_event |
the materialized projection, rebuildable |
| Roster | tracked_student |
the allowlist, plus the presence/picked toggles |
| Researcher input | note, pick_event |
observations and the pick/unpick history |
| Control | meta |
cross-process signals (reset, polling, disabled triggers) |
Tip
The read-model tables are just a cache of the event log. Delete them, or hit
Reset, and they rebuild from vex_log
to exactly the same state.
Two Contracts That Have To Hold¶
db.py enforces two contracts that the rest of the system leans on:
Datetime Contract
Timestamps are stored UTC-naive in fixed-width %Y-%m-%d %H:%M:%S.%f format.
Because the width is fixed, comparing the strings is the same as comparing the
times, so the cursor and cutoff SQL (ORDER BY started_at, resolved_at >=
cutoff) work directly on the stored strings. Two helper functions are the only
place this conversion happens.
JSON Contract
The runs, episodes, and detail columns are stored as JSON text and go
through json.loads / json.dumps helpers. Where the daemon needs to query
inside a blob, it uses SQLite's json_extract, for example the big-rewrite
per-run dedupe on $.run_index.
Event Log As Truth¶
vex_log is append-only, and each row carries a unique source_event_id that's what
makes ingestion idempotent (see
Write path). Everything else, student_state
and trigger_event, is just a projection built from that log. That's the property
that lets you treat the derived tables as a disposable cache, and it's why reset and
recovery are so simple.
Why SQLite¶
| Reason | Detail |
|---|---|
| Single host, single writer | The daemon is the only writer, which is exactly what SQLite WAL is good at. |
| Tiny data | A classroom's worth of events is megabytes, not gigabytes. |
| Zero setup | No server to install, no connection string to manage. A laptop runs it as-is. |
The catch is a ceiling on write concurrency, but the single-writer design never gets
near it. And the door to Postgres stays open because all the SQL is behind db.py.