MODULE 0713 min read·June 2026

Build, buy, or hybrid?

The control plane is the governance layer above your agent runtimes. It's not a feature. It's infrastructure. Here's what it needs, what it costs to build, and how to make the decision.

Every module in this playbook describes a piece of infrastructure: shared context, eval harnesses, governance policy, loop orchestration, organizational memory. At some point you need to ask: where does this live? Who builds it? How does it connect?

The control plane is the answer. It's the coordination layer that sits above your agent runtimes (Cursor, Claude Code, Copilot, whatever your engineers use) and manages context distribution, policy enforcement, workflow orchestration, and observability across all agent activity in your org.

Stripe built this over years. Most orgs are at an earlier stage. The question is not whether you need it (you do, at any meaningful scale) but whether to build it internally, buy it from a vendor, or assemble it from open-source components.

Definition

What a control plane actually is

The control plane: 5 layers

Coordinationworkflow orchestration, fleet management

depends on all below

Governancepolicy enforcement, escalation, audit

depends on 1–3

Guardrailseval harness, deterministic + LLM scoring

depends on 1–2

Knowledgeorg context, architecture, conventions

depends on infra

Execution envfast, reproducible, isolated agent runtime

foundation

Layer 1

Execution environment

Where agents run. Isolated, reproducible environments. Fast feedback loops. This is Stripe's 10-second devboxes: infrastructure investment that pays dividends in agent iteration speed.

Layer 2

Knowledge authority

The organizational context layer. Architecture, conventions, service contracts, decision history. All agent sessions get this as ground truth. This is what prevents context fragmentation.

Layer 3

Behavior guardrails

The eval harness. Deterministic checks, behavioral scoring, regression baselines. Answers 'did the agent do what we asked?' before routing to human review.

Layer 4

Governance

Policy enforcement, escalation routing, audit logging, override tracking. Answers 'is this change allowed to proceed?' at the infrastructure level, not the individual-judgment level.

Layer 5

Coordination

Workflow orchestration: who does what, in what order, with what dependencies. Conflict detection before generation. Multi-agent coordination with shared state.

KEY INSIGHT

You already have parts of this. Your CI system is Layer 3. Your PR review process is a manual version of Layer 4. Your existing documentation is a static version of Layer 2. The question is not whether to build all five layers from scratch. It's which layers to systematize and which to buy.

Build-vs-buy calculus

What internal build actually costs

Stripe is the most-cited example of a fully internal control plane. It works. It also took years and a dedicated developer productivity team. That's not a reason not to build. It's a data point for the cost estimate.

The honest estimate for building a minimum viable control plane internally: 3–6 months for a small dedicated team (3–5 engineers) to get Layers 2–4 working reliably. Layer 5 (workflow orchestration) adds another 3–6 months. Layer 1 (execution environment) is either your existing CI infrastructure (fine for most orgs) or a significant investment if you need isolated devboxes at Stripe's speed.

That estimate assumes you're building for your specific org, which is the main advantage of internal build. You know your toolchain, your codebase, your governance requirements, and your team's working style. A custom build can integrate perfectly. The disadvantage: you're building and maintaining infrastructure that is not your core product, and you're doing it while the underlying agent ecosystem is evolving rapidly.

3–6 mo

To build Layers 2–4 with a dedicated team

Internal build estimate

3–5 eng

Dedicated platform engineers required

Conservative estimate

6–12 mo

To reach full Layer 5 orchestration

Including workflow engine

Ongoing

Maintenance as agent ecosystem evolves

Structural cost of internal build

Decision framework

Build, buy, or hybrid: how to decide

The right answer depends on your org size, your platform team capacity, your differentiation strategy, and your timeline.

Build

When it makes sense

Your org has >500 engineers and the control plane is genuinely a competitive advantage
Your AI governance requirements are unusual (heavily regulated industry, government contractor, etc.)
You have a dedicated platform team with 5+ engineers and the mandate to build this
You have 12+ months and the patience to maintain the infrastructure as the ecosystem evolves

Not when

Don't build if you're under 200 engineers or don't have a dedicated platform team. The opportunity cost is too high.

Buy

When it makes sense

You want to capture gains in the next 90 days, not the next 12 months
Your platform team is already stretched on product infrastructure
Your governance requirements are standard (most orgs' requirements are standard)
You want to iterate on operating model, not platform engineering

Not when

Don't buy if your requirements are genuinely unusual or if the vendor's architecture doesn't support your integration constraints.

Hybrid

When it makes sense

You have unusual requirements in one layer but standard requirements in others
You want to buy the coordination layer but build your organizational context layer (which knows your codebase)
You're a mid-size org (100–500 engineers) with a small platform team
You want to prove the model with a vendor before committing to an internal build

Not when

Don't go hybrid if you're not prepared to manage integration complexity. Two systems that need to stay in sync is harder than one system, even if the one system is slightly worse at each thing.

By org size

Minimum viable control plane by org size

5–30 engineers: Shared context file + CI gates

One shared CLAUDE.md covering org conventions, one CI workflow that runs lint/type/test on AI-tagged PRs, and a simple risk-routing rule (changes to core services require senior review). This takes 2–3 weeks to set up. It's not a control plane. It's a foundation. It's enough for this org size.

30–200 engineers: Context layer + governance policy + eval harness

Shared context infrastructure (not just a file, but a maintained repository with versioning), a governance policy codified in CI config (not a doc), and a two-gate eval harness (deterministic + behavioral). At this size, you need at least one engineer who owns the AI coordination layer as their primary responsibility.

200–1000 engineers: Full five-layer architecture

Layers 1–5 all need to be operational. At this size, you probably need a dedicated AI platform team (not just one person). The question of build vs. buy becomes significant because the investment in either direction is substantial. The hybrid model often makes sense here: buy coordination and governance, build context layer (it knows your codebase).

1000+ engineers: Organizational-scale architecture

At this scale, the control plane is core infrastructure, not an experiment. You likely need multiple context layers (org-wide, business-unit-wide, team-level). Governance requirements are probably more complex (multiple regulatory regimes, global teams). This is where the Stripe-scale investment becomes economically justified, and where the hybrid or vendor approach requires careful evaluation of vendor scale constraints.

Leader artifact

Infrastructure audit checklist

Before making the build-vs-buy decision, audit what you already have. Most orgs have more control plane infrastructure than they realize, just uncombined.

[Execution]Do agents have isolated environments that don't interfere with each other or production?

[Context]Is there one canonical source of org-wide coding conventions that all agent sessions can access?

[Context]Are your architectural decision records up-to-date and in a format agents can consume?

[Eval]Do you have a test suite that runs in under 5 minutes and covers your critical paths?

[Eval]Is there any automated behavioral verification beyond unit tests (integration tests, contract tests)?

[Governance]Is your AI governance policy written down? Is it enforced programmatically or by convention?

[Governance]Do you have risk-based PR routing? Do high-risk AI changes go to appropriate reviewers automatically?

[Governance]Do you have an audit trail for AI-generated changes: what the agent did, what gates it passed, who reviewed it?

[Coordination]Can two agents work on adjacent parts of your codebase simultaneously without discovering conflicts at merge time?

[Observability]Can you tell, right now, which PRs this week were AI-generated and what their acceptance rate was?

If you can check 7+ of these: you're in better shape than most orgs. Focus on the gaps. If you can check fewer than 4: the priority is the unchecked items, not the build-vs-buy question.

Where to go from here

One option

LoomStack is building the infrastructure described in this module: the coordination layer that gives every agent session organizational context, enforces policy at the infrastructure level, orchestrates multi-agent workflows, and provides end-to-end observability from signal to production.

If you're evaluating whether to build, buy, or assemble your own, we're happy to talk through the architecture and what makes sense for your specific context.

Blog article

The Agents Aren't the Problem, The Coordination Layer Is

Read

Harness Engineering

The eval and governance layers in detail.

Metrics That Don't Lie

Measuring whether the control plane is working.

MODULE 06

Compound Engineering

MODULE 08

Metrics That Don't Lie

CO-BUILD PROGRAM

From playbook to production

We work directly with engineering leaders who are making this transition now. You bring the real constraints; we help you build the coordination layer around them.

Talk to the team Back to the playbook