
Scaling
The core problem of scaling a team is that not everyone can know everything. Scaling is the work of systematizing that – so that things are understandable, knowable. It’s turning “tribal knowledge” into documentation, automation, and process. It used to be that only the automation was for the machine, but now documentation is machine critical and process is machine readable.
With humans, things bubble up and down. You don’t – in general – make systematic fixes for the first complaint. First you validate, check the scale of the issue. You try the softer interventions – trying to get people to actually talk to their team mates – before you start building process. Maybe you make a change a month to the way a team works. Carefully, thoughtfully. Because no effective person wants to be on a team where the job seems more process than impact.
Automation was always the most effective, but also usually the most expensive. The thing that gets put in when “please follow this checklist” fails frequently and severely enough to be worthwhile. An incident can be the gift that delivers the agreement on priority that always seemed necessary to those close enough to it, by making the cost of not doing it clear.
As we are deep in replatforming Twill, I’m back to being a developer, and as I operated my orchestrator and dispatched agents on the backlog, even though in some ways everything is wildly different, some parts of it that I reached for out of habit felt very familiar: scaling.
+AI
AI feels like speed-running some of the team scaling stuff that previously I orchestrated over months – years even. The economics have shifted to make automation much cheaper, so it’s no longer deferred, but top of the list.
With people, knowledge lives in two places – in the people and in the systems. With AI, only the systems part persists. Every session starts contextless. This is the problem that surfaces all the documentation that wasn’t written, and all the automation that was never built – a core part of why building with AI from scratch is so much easier than retrofitting onto an existing setup.
On DRI, when Jean started building the platform, we talked affectionately about “intern Claude”. Over time, we matured away from that framing. We put the guardrails in. We stopped treating it as a novelty to be supervised and started treating it as a system to be designed. With Twill, I started from a higher level of understanding, and the baseline was higher.
The snark, which I’ve also expressed: people will do for a machine what they were never willing to do for their team. Write the doc. Define the work clearly. Build the guardrail. Forgive the mistake.
I stand by it. But that doesn’t mean those of us who were always trying to do this for humans gain some moral high ground by not doing this for machines.
The rigour was always good practice. The humans operating the machine will have a much better time if those machines are set up to succeed. I think the call for EMs to “get closer to the work” has been poorly expressed and executed, but at the core of it – having got much closer to the work myself – I have come to see the validity of it. The scaling problem that managers used to deal with is now happening at higher speed, and the only way to have fidelity on it is to understand the work being done. Because so far, there is no playbook – we all have to write our own.
The best example of this is CLAUDE.md. For both DRI and Twill, it’s a living team artifact, and the churn on it is a good thing – the way of working should keep evolving, as we evolve our understanding, our work, our effectiveness.
Three terminals
“Dispatch a review agent” was encoded in our process for months. It’s very helpful. But I would still find the odd issue or untracked follow up. Then, tired of Claudes colliding on CI and wasting build minutes, I made a merge queue. It serialises merges across concurrent sessions, files the follow-ups, and sends PRs back when they’re not ready.
The next round of reviews I dispatched came back clean. And the next. And the next.
Turns out merge queue Claude, reviewing work it had no part in, was stricter than the Claudes that had done the work.
The reason is the contextlessness. Which gets talked about like a strength or a weakness as we look for metaphors about working with this thing that is new and we don’t yet fully understand. I’ve given up on metaphors, and think this aspect, like this shift in general, is best treated as a neutral.
The reviewer had no investment in the work. Nothing to rationalise. No “well, I did it that way because.” It knew only what the standard said and whether the code met it.
So many times, after reading a review or finding something that didn’t work as I expected it to, I’ve thought that something should be already prevented by CLAUDE.md and asked why it wasn’t. The answer was always clarity – fair. But turns out, like humans, the answer could also be process.
The common manager refrain on AI is that it separates those who think from those who don’t. Fair. But one of the core skills you have to develop in leadership is knowing what to pay attention to and what can be ignored. The broader your scope, the more you need to be brutal.
How many times have I told someone “that’s not making it onto my list”? Too many to count. Enough that people have been known to text me when they find themselves saying it. Now, having so many things in flight at once, I’m always trying to figure out what to pay attention to and what not.
The mechanical guardrail is encoded in CLAUDE.md: hit the same issue twice, stop. A lesson learned the hard way after burning a bunch of build minutes on something stupid, because it only got intermittent attention and seemed like it should be fine. It’s inclined to keep pushing through, but telling it to stop also tells me to stop, to give that problem some attention, to think.
Before, the discipline was: you found a bug, you wrote a test so it wouldn’t come back. Good practice. Now, I do that, and then I ask: how do we guard against this class of problem?
Competent humans rarely make the same mistake twice. AI will make the same mistake endlessly.
Making a class of issue impossible is not cheap. This is why we didn’t use to do it. A minor bug spawns five-plus follow-ups – graduated solutions of increasing complexity. A null-handling issue that threw 500s; fixed in one PR, but the follow up issues to eliminate the pattern and add a tripwire in CI so it can’t come back got added to the backlog, ready to be picked up, implemented by the orchestrator as a slot becomes free. Done within the week. Because now it’s possible. It’s tractable: it’s bounded, well-defined work, which makes it exactly the kind of thing you can hand to an orchestrator and let run in parallel.
I keep reading all this stuff about “working with AI” and it’s so depressing that most of it seems to be about building things that run AI agents. What’s the point of it? What are you actually shipping? Is this just some game, like The Sims: But Make It Programming (and also really expensive)?
Cranking through a bunch of stuff to launch, I’m running Claude in three tabs. The first is an orchestrator. The second is the merge queue.
The third is the terminal where I actually define, decide, and unblock things. The most important one. In this one sometimes I feel like a product manager, sometimes an engineering manager, sometimes an architect. Sometimes I’m just wiring up secrets and things, feeling like some modern day version of the ENIAC women who used cables and switches. At least I get a manual.
Scaling a team, you keep coming back to the same need: be clearer. Clearer about what’s expected, who owns what, what good looks like. Clarity. Clarity. Clarity.
The work of the third terminal. Clarity.
The tools have changed. The work is the same.

go deeper
Navigating the AI Shift
From anxious and reactive to genuinely capable. Build real fluency with AI — without the hype.

go deeper
The EM Survival Guide
The EM job has changed. Four modules to become the force multiplier your team actually needs.
Leave a Reply