Build, buy, or hybrid?
The control plane is the governance layer above your agent runtimes. It's not a feature. It's infrastructure. Here's what it needs, what it costs to build, and how to make the decision.
Every module in this playbook describes a piece of infrastructure: shared context, eval harnesses, governance policy, loop orchestration, organizational memory. At some point you need to ask: where does this live? Who builds it? How does it connect?
The control plane is the answer. It's the coordination layer that sits above your agent runtimes (Cursor, Claude Code, Copilot, whatever your engineers use) and manages context distribution, policy enforcement, workflow orchestration, and observability across all agent activity in your org.
Stripe built this over years. Most orgs are at an earlier stage. The question is not whether you need it (you do, at any meaningful scale) but whether to build it internally, buy it from a vendor, or assemble it from open-source components.
Definition
What a control plane actually is
Layer 1
Execution environment
Where agents run. Isolated, reproducible environments. Fast feedback loops. This is Stripe's 10-second devboxes: infrastructure investment that pays dividends in agent iteration speed.
Layer 2
Knowledge authority
The organizational context layer. Architecture, conventions, service contracts, decision history. All agent sessions get this as ground truth. This is what prevents context fragmentation.
Layer 3
Behavior guardrails
The eval harness. Deterministic checks, behavioral scoring, regression baselines. Answers 'did the agent do what we asked?' before routing to human review.
Layer 4
Governance
Policy enforcement, escalation routing, audit logging, override tracking. Answers 'is this change allowed to proceed?' at the infrastructure level, not the individual-judgment level.
Layer 5
Coordination
Workflow orchestration: who does what, in what order, with what dependencies. Conflict detection before generation. Multi-agent coordination with shared state.
Build-vs-buy calculus
What internal build actually costs
Stripe is the most-cited example of a fully internal control plane. It works. It also took years and a dedicated developer productivity team. That's not a reason not to build. It's a data point for the cost estimate.
The honest estimate for building a minimum viable control plane internally: 3–6 months for a small dedicated team (3–5 engineers) to get Layers 2–4 working reliably. Layer 5 (workflow orchestration) adds another 3–6 months. Layer 1 (execution environment) is either your existing CI infrastructure (fine for most orgs) or a significant investment if you need isolated devboxes at Stripe's speed.
That estimate assumes you're building for your specific org, which is the main advantage of internal build. You know your toolchain, your codebase, your governance requirements, and your team's working style. A custom build can integrate perfectly. The disadvantage: you're building and maintaining infrastructure that is not your core product, and you're doing it while the underlying agent ecosystem is evolving rapidly.
To build Layers 2–4 with a dedicated team
Internal build estimate
Dedicated platform engineers required
Conservative estimate
To reach full Layer 5 orchestration
Including workflow engine
Maintenance as agent ecosystem evolves
Structural cost of internal build
Decision framework
Build, buy, or hybrid: how to decide
The right answer depends on your org size, your platform team capacity, your differentiation strategy, and your timeline.
When it makes sense
- Your org has >500 engineers and the control plane is genuinely a competitive advantage
- Your AI governance requirements are unusual (heavily regulated industry, government contractor, etc.)
- You have a dedicated platform team with 5+ engineers and the mandate to build this
- You have 12+ months and the patience to maintain the infrastructure as the ecosystem evolves
Not when
Don't build if you're under 200 engineers or don't have a dedicated platform team. The opportunity cost is too high.
When it makes sense
- You want to capture gains in the next 90 days, not the next 12 months
- Your platform team is already stretched on product infrastructure
- Your governance requirements are standard (most orgs' requirements are standard)
- You want to iterate on operating model, not platform engineering
Not when
Don't buy if your requirements are genuinely unusual or if the vendor's architecture doesn't support your integration constraints.
When it makes sense
- You have unusual requirements in one layer but standard requirements in others
- You want to buy the coordination layer but build your organizational context layer (which knows your codebase)
- You're a mid-size org (100–500 engineers) with a small platform team
- You want to prove the model with a vendor before committing to an internal build
Not when
Don't go hybrid if you're not prepared to manage integration complexity. Two systems that need to stay in sync is harder than one system, even if the one system is slightly worse at each thing.
By org size
Minimum viable control plane by org size
5–30 engineers: Shared context file + CI gates
One shared CLAUDE.md covering org conventions, one CI workflow that runs lint/type/test on AI-tagged PRs, and a simple risk-routing rule (changes to core services require senior review). This takes 2–3 weeks to set up. It's not a control plane. It's a foundation. It's enough for this org size.
30–200 engineers: Context layer + governance policy + eval harness
Shared context infrastructure (not just a file, but a maintained repository with versioning), a governance policy codified in CI config (not a doc), and a two-gate eval harness (deterministic + behavioral). At this size, you need at least one engineer who owns the AI coordination layer as their primary responsibility.
200–1000 engineers: Full five-layer architecture
Layers 1–5 all need to be operational. At this size, you probably need a dedicated AI platform team (not just one person). The question of build vs. buy becomes significant because the investment in either direction is substantial. The hybrid model often makes sense here: buy coordination and governance, build context layer (it knows your codebase).
1000+ engineers: Organizational-scale architecture
At this scale, the control plane is core infrastructure, not an experiment. You likely need multiple context layers (org-wide, business-unit-wide, team-level). Governance requirements are probably more complex (multiple regulatory regimes, global teams). This is where the Stripe-scale investment becomes economically justified, and where the hybrid or vendor approach requires careful evaluation of vendor scale constraints.
Leader artifact
Infrastructure audit checklist
Before making the build-vs-buy decision, audit what you already have. Most orgs have more control plane infrastructure than they realize, just uncombined.
[Execution]Do agents have isolated environments that don't interfere with each other or production?
[Context]Is there one canonical source of org-wide coding conventions that all agent sessions can access?
[Context]Are your architectural decision records up-to-date and in a format agents can consume?
[Eval]Do you have a test suite that runs in under 5 minutes and covers your critical paths?
[Eval]Is there any automated behavioral verification beyond unit tests (integration tests, contract tests)?
[Governance]Is your AI governance policy written down? Is it enforced programmatically or by convention?
[Governance]Do you have risk-based PR routing? Do high-risk AI changes go to appropriate reviewers automatically?
[Governance]Do you have an audit trail for AI-generated changes: what the agent did, what gates it passed, who reviewed it?
[Coordination]Can two agents work on adjacent parts of your codebase simultaneously without discovering conflicts at merge time?
[Observability]Can you tell, right now, which PRs this week were AI-generated and what their acceptance rate was?
If you can check 7+ of these: you're in better shape than most orgs. Focus on the gaps. If you can check fewer than 4: the priority is the unchecked items, not the build-vs-buy question.
Where to go from here
One option
LoomStack is building the infrastructure described in this module: the coordination layer that gives every agent session organizational context, enforces policy at the infrastructure level, orchestrates multi-agent workflows, and provides end-to-end observability from signal to production.
If you're evaluating whether to build, buy, or assemble your own, we're happy to talk through the architecture and what makes sense for your specific context.
Blog article
The Agents Aren't the Problem, The Coordination Layer Is
CO-BUILD PROGRAM
From playbook to production
We work directly with engineering leaders who are making this transition now. You bring the real constraints; we help you build the coordination layer around them.