Engineering PracticeJun 2026·14 min read

Loop Engineering: You Design the System, Not the Prompt

The creator of Claude Code doesn't prompt his own tool anymore. He writes loops that prompt it for him. Loop engineering is the next abstraction layer above prompt engineering, and both major agent platforms now ship the primitives for it.

Boris Cherny, head of Claude Code at Anthropic, said it publicly in June 2026: “I don't prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops.”

The same week, Peter Steinberger posted: “You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents.”

Loop engineering is replacing yourself as the person who prompts the agent. You design the system that does it instead. A loop is a recursive goal: you define a purpose and the AI iterates until complete. It is the shift from holding the tool to building the system that holds the tool.

The progression that got us here

A year ago, the way you got something out of a coding agent was: write a good prompt, share enough context, read what came back, write the next prompt. The agent was a tool and you held it the entire time, one turn after another.

The progression looks like this:

2023

Writing code with autocomplete. Copilot suggests the next line.

2024

Prompting AI to write whole functions, files, features. You describe, it generates.

2025

Running multiple agents in parallel. You dispatch work, agents execute across tracks.

2026

Writing loops that do the prompting themselves. You design the system. The system prompts the agents.

Each layer compressed the timeline to the next. Cherny himself went from writing code with autocomplete, to prompting Claude, to running 5-10 agents in parallel, to uninstalling his IDE entirely because he hadn't opened it in a month. His coding now consists of designing loops.

The five building blocks

A loop needs five primitives and one place to remember state. What surprised Addy Osmani when he mapped this out is that both Claude Code and the Codex app now ship all five. The shape is identical across products. You stop arguing about which tool and start designing loops that work regardless of which agent you happen to be sitting in.

1. Automations: the heartbeat

Automations make a loop an actual loop, not just one run you did once. You define a task, give it a cadence, and the findings come to you instead of you going around checking.

Claude Code gets here through /loop (re-runs on a cadence), /goal (keeps going until a verifiable condition holds, with a separate small model checking whether you are done), hooks (shell commands at agent lifecycle points), and GitHub Actions for loops that keep running after you close the laptop.

The Codex app has its Automations tab: pick a project, a prompt, a cadence, and whether it runs on your local checkout or a background worktree. Runs that find something go to a Triage inbox. Runs that find nothing archive themselves.

/goaldeserves special attention. You give it a stopping condition like “all tests in test/auth pass and lint is clean” and walk away. After every turn, a different model (not the one writing code) checks whether the condition holds. The maker doesn't grade its own homework. This is the primitive that makes unattended loops trustworthy.

2. Worktrees: isolation so parallel doesn't mean chaos

The second you run more than one agent, files collide. Two agents writing the same file is the same problem as two engineers committing to the same lines without talking to each other. A git worktree fixes this: a separate working directory on its own branch, sharing repo history, so one agent's edits cannot touch another's checkout.

Claude Code gives you --worktree to open a session in its own checkout, plus isolation: worktree on subagents so each helper gets a fresh directory that cleans itself up after. Codex builds worktree support in so threads hit the same repo without collisions.

3. Skills: stop re-explaining your project every session

A skill is how you stop explaining the same project context every session. Both tools use a folder with a SKILL.md inside: instructions, metadata, optional scripts and references. The agent reads it every run. Without skills, the loop re-derives your whole project from zero every cycle. With skills, it compounds.

Skills solve what Osmani calls intent debt: the agent starts cold and fills any hole in your intent with a confident guess. A skill is that intent written down where the agent reads it before guessing. Conventions, build steps, “we don't do it like this because of that one incident” written once, applied every run.

4. Plugins and connectors: the loop touches your real tools

A loop that can only see the filesystem is a tiny loop. Connectors (built on MCP) let the agent read your issue tracker, query a database, hit a staging API, drop a message in Slack. Both Claude Code and Codex speak MCP, so a connector written for one usually works in the other.

This is the difference between an agent that says “here is the fix” and a loop that opens the PR, links the Linear ticket, and pings the channel once CI is green. The connectors are the reason the loop can act inside your actual environment instead of telling you what it would do if it could.

5. Sub-agents: keep the maker away from the checker

The most useful structural primitive in a loop is splitting the one who writes from the one who checks. The model that wrote the code is too generous grading its own homework. A second agent with different instructions (and sometimes a different model) catches what the first one talked itself into.

Codex defines agents as TOML files in .codex/agents/, each with a name, instructions, and optional model settings. Claude Code does the same with .claude/agents/ and agent teams that pass work between them. The usual split: one explores, one implements, one verifies against the spec.

Inside a loop, this matters more than anywhere else. The loop runs while you are not watching. A verifier you trust is the only reason you can walk away.

Plus: state that outlives the conversation

A markdown file, a Linear board, anything that lives outside the single conversation and holds what's done and what's next. The model forgets everything between runs. The memory has to be on disk, not in the context. The agent forgets. The repo doesn't.

What one loop looks like

An automation runs every morning on the repo. Its prompt calls a triage skill that reads yesterday's CI failures, open issues, and recent commits, then writes findings into a markdown state file. For each finding worth acting on, the loop opens an isolated worktree and sends a sub-agent to draft the fix. A second sub-agent reviews that draft against the project skills and existing tests.

Connectors let the loop open the PR and update the ticket. Anything it cannot handle lands in the triage inbox for a human. The state file remembers what got tried, what passed, what is still open. Tomorrow morning, the run picks up where today stopped.

You designed it once. You did not prompt any of those steps. That is Steinberger's point made real.

The community site loops.elorm.xyzalready has a library of copy-paste loop recipes: “Ship PR Until Green” (implement, test, push, open PR, wait for CI, loop until checks pass), “De-Sloppify Pass” (cleanup after implementation), “PR Babysitter” (every 15 minutes, inspect watched PRs), “Coverage Until Threshold.” Each includes triggers, feedback gates, and exit conditions so agents self-pace until the job is done.

What loop engineering does not solve

The loop changes the work. It does not delete you from it. Three problems actually get sharper as the loop improves, not easier.

Comprehension debt. The faster the loop ships code you did not write, the bigger the gap between what exists and what you understand. Osmani calls this comprehension debt. A smooth loop makes it grow faster unless you read what the loop made.

Cognitive surrender. When the loop runs itself, the temptation is to stop having an opinion and just take whatever it gives back. Two people can build the exact same loop and get opposite results. One uses it to move faster on work they understand deeply. The other uses it to avoid understanding the work at all. The loop does not know the difference.

Verification is still yours. A loop running unattended is also a loop making mistakes unattended. The sub-agent verifier helps. It does not replace your judgment on whether the whole direction is right.

Token economics. Loops burn tokens continuously. A loop that re-runs every 15 minutes against a large codebase can cost more per day than your CI system. Usage patterns vary wildly depending on whether you are token-rich or running on a budget. The cost model is different from prompting where you pay per turn.

Where individual loops break down

Everything above works for one developer running loops on their own repo. The problem surfaces the moment you have a team.

Developer A has a loop that triages and fixes test failures every morning. Developer B has a loop that refactors modules for performance every night. Neither loop knows what the other is doing. Monday morning, A's loop “fixes” a test that B's loop intentionally changed. B's loop reverts A's fix Tuesday night. Neither developer is awake for either event.

This is the coordination bottleneck applied to loops. Individual loops are tractable. Organizational loops are where things break. The specific failure modes:

No shared context

Each loop operates with its own view of the codebase. No loop knows what other loops are currently changing, have recently changed, or are about to change.

No governance

A loop can violate architectural boundaries, touch services it shouldn't own, or deploy to production at 3am without any human knowing. Individual /goal conditions check correctness, not policy.

No observability

When ten loops run across a team, nobody has a unified view of what they are collectively doing to the codebase. An incident caused by a loop looks identical to an incident caused by a human until you trace it.

No coordination

Worktrees isolate one developer's loops from colliding with each other. They do not isolate one developer's loops from colliding with another developer's loops.

The gap between “I have a loop that works” and “my team has loops that work together” is the same gap between individual agentic engineering and organizational agentic engineering. The primitives exist for one. The infrastructure for many is what's missing.

The orchestration layer for loops

Loop engineering as Cherny and Steinberger describe it is the individual developer's workflow. The org-level question is: who coordinates the loops?

Somebody needs to know that Developer A's triage loop and Developer B's refactoring loop are about to conflict. Somebody needs to enforce that no loop touches the payments service without a human checkpoint. Somebody needs to trace an incident back through three loops that each made a change in the hour before production broke.

That “somebody” is infrastructure, not a person. It is the orchestration layer that sits above individual loops the same way individual loops sit above individual prompts. It provides shared context (so loops know what other loops are doing), governance (so loops respect organizational policy), observability (so the team knows what loops are collectively doing), and coordination (so loops don't work against each other).

The same progression applies. First we prompted agents. Then we designed loops for agents. Next: we build the infrastructure that coordinates the loops. Each layer is an abstraction above the last. Each layer is what makes the layer below it safe to run at scale.

Build the loop, stay the engineer

Osmani closes his analysis with a line worth repeating: “Build the loop. But build it like someone who intends to stay the engineer, not just the person who presses go.”

Loop engineering is harder than prompt engineering, not easier. The leverage point moved. The skill is no longer talking to AI. It is designing systems where AI talks to itself, with enough structure that the output is trustworthy and enough observability that you know when it is not.

The teams that get this right in the next six months will have a structural advantage that compounds every week. Their loops learn from their skills. Their skills encode what the loops discovered. The system gets better whether the developer is at the keyboard or not.

The teams that get the individual loop right but skip the coordination layer will hit the same wall the industry hit with individual agents: it works for one, it breaks at organizational scale. That wall is where the harness and the orchestration layer become necessary.

Coordinate your loops at organizational scale

LoomStack is the orchestration layer above individual loops. Shared context, governance enforcement, observability across all agent activity, and coordination so your team's loops don't work against each other.

Talk to the team →Explore the platform