Simplify twice: cut the plan before you cut the code
Most teams simplify code in review. The diff is up, someone says "this abstraction is doing too much," and you refactor. That pass is real and worth doing — but it's the second-cheapest moment to simplify, and the gap between it and the cheapest is enormous.
The cheapest moment is before the code exists. Cutting an abstraction from a plan costs one line in a doc. Cutting the same abstraction after it's built costs a refactor, a re-review, and the sunk cost of having written it. By the time it's in the diff, deleting it feels like throwing work away — because it is.
So I run the simplify pass twice: once on the plan, before the agent writes a line, and again on the diff, after. This is the method I used to ship a subagent-orchestration layer for monday.com/vibe across eight stacked PRs, with an AI doing the implementation and me doing the architecture. The double pass is the reason the stack stayed small.
The loop
Each PR in the stack went through the same cycle:
- Plan the PR. A short design doc — what's in scope, what existing code to reuse, the key decisions.
- Simplify the plan. A separate
simplify-PR<n>.mdpass: what to cut, merge, or not over-engineer relative to that plan. I review it and approve before any code exists. - Implement. The agent writes the now-trimmed PR.
- Simplify the code. A
/simplifypass over the diff — catch structure that's correct but wrong, before human review sees it. - Review and land. Then the next PR stacks on top.
Two simplify gates, one on each side of the agent's work. They catch completely different things, which is the whole point.
Gate one: simplify the plan
This is the high-leverage gate, and it's the one almost nobody runs.
The clearest example came from the PR that added asynchronous subagents — fire off a background agent, return to the parent immediately, wake the parent when the child finishes. My plan included a debounce: a short in-memory timer that would coalesce several near-simultaneous completions into a single parent wake, so we wouldn't hammer the parent with rapid-fire signals.
The plan-simplify pass killed it before it was written. The reasoning, verbatim from the doc:
A process-level debounce would break across multiple workers and be lost on restart. A dedicated wake coordinator workflow would add a whole extra workflow for zero extra benefit — the orchestration layer's existing "signal-or-start, use existing if running" primitive already coalesces at the server.
The infrastructure already had a primitive that did exactly what the debounce was trying to do, correctly, with no in-memory state and no multi-worker race. The debounce was solving a problem the stack had already solved.
Here's the part that matters: in the doc, that decision is one bullet under "Cuts / defer." One line. If the same realization had arrived during code review — gate two — it would have been a built-and-deleted distributed timer: an activity, a piece of shared state, a test for the coalescing behavior, and then a PR to rip it all out once someone noticed the primitive. The front gate turned a two-PR detour into a sentence.
The "Cuts / defer" section was the most valuable part of every plan doc. Another PR's version dropped a proposed service-plus-interface- plus-DI-registration in favor of a single plain function — three files and ~60 lines that looked reasonable in the abstract and added nothing in practice. Crystallizing that cut before the code existed is the only time it's genuinely cheap to make.
Across the stack, the plan-simplify passes cut something like a third of the surface area — not by dropping real features, but by dropping premature abstractions, deferred-to-later scope, and gold-plating that would have been invisible to users anyway.
Gate two: simplify the code
This is the gate everyone already runs, so I'll be brief about it. The second pass runs over the actual diff and catches code that works — tests green, behavior correct — but is structured worse than it needs to be: the duplicated query, the copy-pasted type, the helper that wants to be shared. The implementation phase converges on something functional; it doesn't always converge on something clean.
The one thing worth saying is that gate two occasionally catches more than tidiness. On this stack it flagged an activity registered on two different work queues — which read like a harmless copy-paste but actually let the job bypass the concurrency controls on the token-heavy queue. A capacity bug wearing the costume of a duplication. That's the case for running the pass after every PR rather than trusting human review to catch it: the correct-but-wrong stuff doesn't fail a test, and it compounds quietly if you let it stack.
Why two gates and not one
Because they catch disjoint problems.
Gate one operates on intent. It can see "you're about to build a thing the platform already gives you" or "this interface earns its keep only if there are three implementations and there's one." It cannot see a dual queue registration, because there's no code yet.
Gate two operates on the artifact. It can see the duplicated query and the copy-pasted struct, because they exist. It cannot easily see that an entire well-built subsystem shouldn't have been built — by the time the code is clean and tested, the abstraction looks earned, and deleting it is the hardest review conversation there is.
Collapse them into a single review-time pass and you lose the front half entirely. You'll write the debounce, build it well, test it thoroughly, and then either keep it (because ripping out working code is painful) or spend a PR removing it. The early gate is cheap precisely because nothing is sunk yet.
The substrate: small, inert, stacked
None of this works without the PR structure underneath it. The stack was eight PRs, each small enough to plan in a page and review in a sitting, and the early ones were deliberately inert — types and a feature flag, then a workspace adapter, then a child runtime not yet callable from anywhere. No live behavior until the foundations had landed and been reviewed.
Small inert PRs are what make both gates cheap. A plan you can hold in your head is a plan you can simplify in one pass. A diff you can read in one sitting is a diff where a dual-queue registration actually stands out. Try to run either gate on a thousand-line "add subagents" mega-PR and gate one has too much to reason about while gate two has too much to read. The discipline of stacking — don't build the roof before the walls — is what keeps each simplify pass inside the budget where it stays sharp.
What the division of labor actually was
The shorthand is "the AI implements, I architect," but that undersells
both sides. What I brought was the constraints: keep the PR small, ask
"what happens with concurrent workers," ask "what if the database is
slow," and be willing to revert a clever simplification when it
introduced a correctness issue. What the AI brought was breadth — it
knew the coalescing primitive existed, it held the whole activity-
registration pattern in mind well enough to flag the dual queue, it
reached for Omit the instant it saw the copied struct.
The two simplify gates are where that division actually paid off. Gate one is mostly judgment — is this abstraction worth its weight? — and that's where a human saying "cut it" is decisive. Gate two is mostly breadth — does this diff contain a structure that's quietly worse than some known-better one? — and that's where an agent that has seen ten thousand diffs earns its place. Running both, every PR, is what kept an eight-PR infrastructure stack from turning into the fifteen-PR version that does the same thing.
The cheapest code to delete is the code you never wrote. The second-cheapest is the code you delete before anyone calls it done. Everything after that is a refactor.