MODULE 0314 min read·June 2026

How do you coordinate multiple agents and teams without the overhead killing the gains?

Agentic engineering is not vibe coding at scale. It's a coordination problem requiring deliberate operating model design: team topology, decision authority, and agent fleet structure.

Andrej Karpathy's "agentic engineering" framing from Sequoia's AI Ascent captured something real: software engineering is shifting from engineers writing code to engineers orchestrating agents that write code. The implication most organizations have drawn is "deploy more agents."

That's the wrong implication. Deploying more agents without redesigning the operating model around them produces exactly what Module 00 describes: individual velocity up, organizational delivery flat. The problem isn't the agents. It's the absence of a coordination model designed for agent-speed execution.

Agentic engineering is the discipline of designing that operating model. It covers how you structure work so agents can execute it, how your team roles shift, what decisions stay human-owned, and how you prevent the quadratic coordination overhead from consuming the gains.

~80%

of software engineering tasks are soon within reach of AI agents

Anthropic, 2025

25%

Multi-agent collaborative task success rate (vs 50% solo)

CooperBench, Stanford/SAP 2026

1,300+

AI-written PRs merged per week at Stripe

Stripe Engineering Blog

3–5×

Review load increase on senior engineers in AI-native orgs

LinearB 2026

Definition

What agentic engineering actually means

The term has been colonized by product marketing. For this playbook: agentic engineering is the practice of structuring engineering work so that AI agents can handle implementation, verification, and iteration with minimal human coordination overhead for each individual change, while maintaining human authority over architectural decisions, risk classification, and organizational direction.

The critical distinction: this is not about maximizing agent autonomy. It's about right-sizing human involvement. Humans remain in the loop on the things that require human judgment. The operating model shifts which things those are.

Karpathy's framing: "The mental model shifts from 'coder' to 'orchestrator,' defining tasks, reviewing outputs, managing context, and maintaining quality." That's the leader-level summary. The implementation involves specific choices about team structure, decision tiers, and agent fleet design.

“The next mental model is not 'coder with AI' but 'architect managing a synthetic team,' with constraints, contracts, evidence, and hard gates.”
HackerNews, discussion on 'Code Is Cheap. Coherence Is the New Bottleneck'

Team topology

How engineer roles shift

The roles don't disappear. They evolve. Understanding this shift prevents two failure modes: engineers resisting agent work, and engineers delegating too much.

Traditional role

Agentic role

Senior engineers write the most complex code. Their time is the bottleneck for hard problems.

Senior engineers define the architectural boundaries and decision criteria that agents operate within. Their time is the bottleneck for context and governance quality.

Mid-level engineers implement features from tickets. Code review is their main quality gate.

Mid-level engineers orchestrate agent work: decompose tasks, manage context, verify outputs against specs, escalate when agents deviate.

Junior engineers work on well-scoped bugs and small features while building context.

Junior engineers still work on well-scoped work, but they learn by reviewing agent output, modifying specs, and understanding why agents made the choices they made.

Platform teams build developer tools and CI/CD pipelines.

Platform teams build the agent coordination layer: shared context infrastructure, eval harnesses, governance policy, orchestration workflows. This is now core infrastructure, not a side project.

WATCH OUT

The most common people failure: senior engineers feel deskilled. They used to write the complex code; now they're reviewing agent output and writing specs. If you don't reframe this explicitly, your best engineers will resent the shift. The reframe: defining the architectural boundaries that constrain agent work IS the complex engineering work. The agents aren't replacing judgment. They're handling execution so judgment can scale.

Decision authority

The decision tier framework

This is the most important artifact in agentic engineering. Without explicit decision tiers, every change requires human review, which is exactly the review debt problem from Module 00.

Decision tier spectrum← more autonomous · more human →

T1Autonomous

Test additionsDocs updatesDep bumpsLint fixes

T2Supervised

New featuresBug fixesNon-breaking APIs

T3Escalated

Arch changesSecurity codeBreaking APIs

T4Human only

Prod accessIncident resp.Org decisions

Tier 1

Autonomous

Agent acts without human review. Verified by eval harness only.

Examples

Test coverage improvements
Documentation updates
Dependency version bumps (passing all tests)
Linting and formatting fixes
Refactors with no behavior change (verified by test suite)

Required gates

All tests pass
No new dependencies
No changes to public interfaces
Diff size below threshold

Tier 2

Supervised

Agent acts; human reviews before merge. Risk-routed to appropriate reviewer.

Examples

New features within existing service boundaries
Bug fixes with behavior change
API additions (non-breaking)
Database migrations (non-destructive)

Required gates

All tests pass
Risk score below threshold
Assigned to appropriate reviewer based on domain

Tier 3

Escalated

Human decision required before agent starts. Architectural sign-off or security review needed.

Examples

New services or major architectural changes
Breaking API changes
Security-sensitive code (auth, payments, crypto)
Schema changes affecting multiple services

Required gates

Architecture review completed
Security sign-off if applicable
Rollback plan defined

Tier 4

Human-owned

Humans only. No agent involvement in decision or implementation.

Examples

Production access and deployment controls
Org-wide architectural decisions
Incident response and production debugging
Hiring and team structure decisions

Required gates

N/A, agent involvement is explicitly blocked

LEADER TAKEAWAY

Start by categorizing your last 50 merged PRs into these tiers. That gives you a real distribution for your codebase, not a theoretical one. Most orgs find that 40–60% of work is genuinely Tier 1 or 2, and they've been manually reviewing all of it.

Case study

What Stripe figured out

Stripe ships over 1,300 AI-written PRs per week. That number is cited constantly. Less discussed is how: it required years of platform investment to make the agent fleet functional at that scale.

The core insight from Stripe's engineering blog: "Whether it's documentation, developer environments, or CI, we've found time and time again that our investments in human developer productivity pay dividends in the world of agents." The platform they built for human engineers (fast devboxes, 3 million automated tests, 400+ internal tools, autofixing CI) became the coordination infrastructure for their agent fleet.

The organizational structure behind this: a dedicated developer productivity team with the mandate to build infrastructure that makes both human and agent engineering faster. Not an afterthought, not a side project. Core infrastructure investment, sustained over years.

Most organizations can't replicate Stripe's 10-year investment in 90 days. But the architecture is clear: fast feedback loops, comprehensive automated verification, rich organizational context, and governance enforced in infrastructure. That's the direction. The question is how fast you move toward it.

Leader artifact

Operating model design questions

These are the questions your operating model needs to answer before you scale agent deployment. Skipping them produces coordination failures.

Structure

Who owns agent coordination?

Is there a named team or person responsible for the coordination layer (shared context, eval harnesses, governance policy)? Or is each team building independently? Decentralized agent adoption without a platform function produces fragmentation.

Authority

What stays human-owned?

Have you written down your Tier 4 list? The decisions that are human-owned regardless of agent capability? If this isn't explicit, individual engineers and agents will fill in the gap inconsistently.

Verification

What are your eval gates?

Before any agent work is considered complete, what must pass automatically? If the answer is just 'existing test suite,' your Tier 1 autonomy is limited by test coverage quality. What verification do you need to expand autonomous execution?

Context

What do agents know about your org?

If you started a new agent session right now with a task from your backlog, what organizational context would it have? Is it the same across all teams? Is it current? This is your context architecture, and it's probably not what you think it is.

Go deeper

From playbook to production

We work directly with engineering leaders who are making this transition now. You bring the real constraints; we help you build the coordination layer around them.

Talk to the team Back to the playbook

How do you coordinate multiple agents and teams without the overhead killing the gains?

What agentic engineering actually means

How engineer roles shift

The decision tier framework

Autonomous

Supervised

Escalated

Human-owned

What Stripe figured out

Operating model design questions

Who owns agent coordination?

What stays human-owned?

What are your eval gates?

What do agents know about your org?

Related reading

From playbook to production