Most 'AI agent' demos are a single model in a while-loop with tools. It looks magical for ten minutes and becomes unmanageable the moment it can spend money. Magistry runs every cycle through three separated roles — planner, judge, executor — and the separation is the whole design.
Planner: propose, don't act
The planner reads state and proposes actions. That's all it does. It never writes to Shopify or an ad account. It produces candidate decisions with the evidence it used, and hands them on. Because it can't act, you can let it be expensive and creative — the cost of a bad plan is zero until something downstream approves it.
Judge: cheaper on purpose
The judge scores each proposed action against policy and evidence. Counter-intuitively, the judge model is intentionally cheaper than the planner. Judging 'does this action satisfy these gates given this evidence?' is a narrower task than generating the action in the first place. A smaller model with tight policy context is more reliable at the bounded question than a large one improvising.
- The planner explores the space of what could be done.
- The judge enforces the space of what is allowed to be done.
- Splitting them means you can audit and tune the boundary without touching the creativity.
cycle#7711 start
planner → 14 candidate actions (model: large)
judge → 9 pass gates, 5 rejected (model: small)
reject: PRICE_DROP blocked by margin_gate
executor→ 9 writes, each with reverse_op
kill_switch: armed · rate_limit: 9/120
cycle#7711 done (logged: decision_log #84201–84209)Five of fourteen proposals never happened. They're still logged as rejected rows, with the gate that blocked them — so you can see what the agent wanted to do and why it wasn't allowed. That negative space is some of the most useful data in the system.
Executor: boring by design
The executor does the least interesting and most important work: it writes approved actions, stamps each with its reverse op, respects the rate limiter and advisory locks, and stops instantly if the kill switch trips. It has no opinions. An executor with opinions is just a second planner you forgot to govern.
A single model that plans, judges, and acts can't be held accountable for any of the three. Separate them and each becomes auditable.
— Ines Park
The loop isn't novel because it's clever. It's useful because it's legible. When something goes wrong, you know which role failed — the plan was bad, the gate was wrong, or the write failed — and you fix that one thing. Monolithic agents make every failure a mystery. This makes every failure an address.
