What Stripe Actually Built

Stripe published the second installment of its Minions engineering deep-dive last week. And there was a number that stood out: over 1,300 pull requests merged every week, with zero human-written code.

Engineers do review everything before it ships, but from the moment a task is triggered to the moment a PR is merged, no human touches a keyboard.

For a company processing over $1 trillion in annual payment volume across hundreds of millions of lines of Ruby code, this is not a pilot.

This edition breaks down what Stripe actually built, what the pattern looks like in practice, and what it means for any organization thinking seriously about agent deployment right now.

Minions: Stripe’s Internal Coding Agent

Minions are Stripe's internal coding agents, built on a heavily modified fork of Block's open-source Goose framework. Engineers trigger them through Slack — the most common path — or through a CLI, a web interface, or automated systems like flaky-test detectors.

From there, each Minion spins up on an isolated AWS EC2 devbox that boots in about 10 seconds from a warm pool. Pre-loaded with Stripe's source code and services. The agent runs, writes the code, passes local linting, gets one CI feedback cycle with autofixes, and produces a pull request ready for human review.

It’s important to note that architecture here matters as much as the number. Stripe didn't get to 1,300 PRs a week by giving engineers Claude and telling them to move faster. They built deployment infrastructure. They defined what the agents could touch, created review pipelines, scoping rules, and feedback loops. They treated AI agents like any other production system: with the same rigor, the same access controls, and more importantly, the same accountability.

The gap currently exists in deployment. Every major model can do impressive things in a controlled environment. What Stripe proved is that closing the gap between "impressive in a demo" and "running in production at scale" requires infrastructure, not just better prompting.
The architecture is deliberately constrained. Each Minion runs in isolation. One agent, one task, one devbox. It can't access production systems directly. It has one CI feedback cycle. Tight scoping is what makes the system trustworthy enough to run unattended at this volume. The 1,300-PR number is a product of the constraints, not despite them.
The trigger mechanism is worth noting. Engineers activate Minions through Slack reactions and messages — interfaces they were already using. The agent fits into the existing workflow rather than requiring a new one. That's a meaningful design choice. Adoption shouldn’t require behavioral change from the engineering team.

What This Means For You:

Stripe's engineering team had resources that most organizations don't. They built custom infrastructure over months before hitting 1,300 PRs a week.

But the underlying pattern is replicable. Pick one workflow. Define what the agent can touch. Build the review step. Run it for 30 days and measure the output.

The companies I've seen move fastest didn't start with a company-wide AI transformation. They started with one painful, repetitive workflow (inbound triage, deal research, weekly reporting, test generation) and deployed an agent to own it end-to-end. Then they expanded from there.

The models are ready. The question is whether your organization has defined the rails clearly enough for them to run on safely. If the answer is no, that's the work, not finding a better model.

Clutch. Just launched.

OpenClaw made it easy to get an agent running. Clutch makes it safe to run that agent at work.

Secure multi-agent deployment, built for teams that need more than a single-machine setup. We just launched.

Request a demo.

Microsoft is testing Copilot Cowork: their answer to Anthropic's multi-agent workspace
The agent platform competition is now between Anthropic, Microsoft, OpenAI, and Google as each builds a version of the same thing. The differentiation is about to be in enterprise integration depth.
Anthropic is hitting server capacity limits
Starting March 26, users on Free, Pro, and Max plans burn through their five-hour session limits faster during peak hours. This is a direct response to a user surge triggered by OpenAI's Pentagon deal and the QuitGPT movement.
Oracle laid off up to 30,000 employees today via a 6 am termination email with no warning
Employees across the US, India, Canada, and Mexico lost access to systems the moment the email arrived. TD Cowen estimates the cuts affect 18% of Oracle's 162,000-person workforce and will free up $8–10 billion to fund a $156 billion AI infrastructure buildout.

Stripe processed over $1 trillion in payment volume last year. Their codebase is hundreds of millions of lines of Ruby. Their compliance bar is about as high as it gets in fintech.

And they figured out how to run 1,300 AI-generated PRs a week.

The organizations telling me they can't get AI agents past their security team are usually smaller, with simpler infrastructure, and lower compliance requirements than Stripe.

The blocker isn't the security team. It's that nobody has done the work of defining what the agent can touch, what it can't, and what a human reviews before anything goes live. Stripe did that work. Most companies have yet to start.

Haroon

P.S. If your team is stuck between "this is cool" and "this is running," that's the specific problem Clutch was built for.

What Stripe Actually Built

Minions: Stripe’s Internal Coding Agent

Clutch. Just launched.

Reply

Keep Reading

The AI Ready Newsletter