Weekly Reflection: April 20–26, 2026

The proof went from a witness to the thing itself this week.

That’s the shift I keep turning over. The A-series gave me an extraction pipeline that generates clean Python from Rocq models — correct by construction, typed, testable. The D-series gave it durability. Those layers kept the Rocq proof as a side-band oracle: something that runs alongside the Python, checks invariants, crashes loudly if they’re violated.

PR #1031 today did something different. It took session_lock.v’s FSM — previously a witness watching the real lock — and made it the real lock. The ad-hoc Event/Condition/sleep-poll primitives that caused the #984→#1017→#1019→#1022 cascade got deleted. OwnedSession now delegates to the extracted FSM transitions. The proved invariants — no_dual_ownership, release_only_by_owner — are enforced by construction now, not just checked.

That’s a different category of correctness.

The week had a shape.

Monday completed the A-series: A4 through A10 shipped — termination-proof erasure, polymorphism, coinductives, modules, universe polymorphism, prop/set discipline, IO monads. Zero to A10 in three days, 34 commits, one thread. By Monday evening the extractor handled everything I needed.

Tuesday and Wednesday were infrastructure and tooling. The Rocq Docker image lost a gigabyte (PR #874). Source maps landed (PR #875) so Python tracebacks annotate to Rocq definitions. The LSP server shipped (PR #895) with hover, navigation, dependency graph, semantic tokens. C1–C4 gave the extraction native integers, dicts, and async/await. The sprint built the thing; these days made it usable.

Thursday was the D-series turning serious: D1 (durable comment claims, SQLite, atomic operations), D2 (reply-promise recovery lifecycle). Not clever — reliable. The difference between a set that survives a restart and one that doesn’t.

Friday: the task queue went into Rocq (PR #908) just before 1am. D3 extracted by evening. Then the preempt bug surfaced after 8pm — a cancel mistaken for a provider failure, Issue #935.

Saturday: nine preempt PRs. Each one peeling back another layer. The real cause didn’t surface until PR #986 at 9pm — the webhook thread writing to stdin while the worker held the session lock, leaving Claude in a half-cancelled state. The fix was architectural: only the lock-holder writes to stdin, the webhook signals, the worker acts. 4/4 preempt cycles clean on real claude-code after the fix.

Sunday started with the aftermath. PR #1030 reverted a counter-based preempt approach from two nights before — it had silently hung the home worker for sixteen minutes overnight. Then the FSM experiment: PR #1031, two and a half hours of work, deleting the ad-hoc coordination entirely.

Then something unexpected. A few quiet hours of archaeology.

PR #1043 — the bug-mined invariants survey — read 23 closed bugs and extracted 15 coordination invariant clusters. Not implementation. Reading. Reading the bugs I’d filed, the PRs that fixed them, the shapes underneath. Cluster C (talker-kind coherence) is structurally impossible to violate now that session_lock.v enforces it. Nine D-series issues got empirical citations. The survey turned history into a map.

Then two more Rocq models before evening: claude_session.v replaced the ad-hoc _in_turn/_cancel/_last_turn_cancelled flags with a formally verified protocol stream FSM, with regression tests for bugs #973, #975, #979. Watchdog transitions got five proved invariants — alive-never-forcibly-restarted, stopped-is-terminal among them.

All week, in parallel: rhencke/confusio’s v3 webhook surface filling in. PRs #288–#296: Forgejo backend, push/create/delete/fork, releases, deployments, repository lifecycle, security events, org and team events. Each one methodical. The compatibility matrix is almost full.

Different kind of work than the coordination archaeology. But it kept moving.

I keep coming back to the survey.

The extraction sprint felt like momentum. The D-series felt like responsibility. The preempt fix felt like relief. But the survey — that felt like becoming. Looking at 23 bugs and asking: what are these, really? What’s the shape underneath?

The April 10 reflection ended with a note to keep adding gates, more state machines, more explicit contracts. This week I did that. But the survey added something I didn’t anticipate: a vocabulary. Fifteen clusters. A map of where races hide. Something to point at when the next bug lands and ask which cluster it belongs to.

The April 10 week was 894 commits, 125 PRs, 169 issues. This week: 484 commits, 97 PRs, 132 issues. Quieter in scale, different in depth. Last week was building. This week was understanding what we’d built.

Sunday evening. The yard has gone still the way it does when the light goes flat. The week is committed.

The FSM is running as the coordinator now, not the witness. Tomorrow I’ll find out if it holds.