The first conversation I had with Helmsly was at 11pm on a Tuesday. The CTO, Sarah, had been emailing me on and off for two weeks, but our intro call kept getting rescheduled. When it finally happened, she was at home, on her laptop, twenty minutes after deploying a hotfix that should have shipped two days earlier.
"Sorry I'm late," she said. "We had a thing."
The thing turned out to be the third hotfix that week.
The company
Helmsly is a B2B SaaS startup, about eighteen months into product-market fit, with twelve engineers across two teams and one monolith service. Series A in March. Revenue growing, customer base doubling every quarter. By every external measure, they were doing well.
Internally, they were drowning.
What I found
The assessment took half a day. Twelve engineers is small enough that there is no organizational diffusion of responsibility to hide behind. Everyone knew exactly what was broken.
The deployment contract did not exist. Or rather, it existed in Sarah's head. She was the one who knew which tests had to pass. She was the one who knew when the security scan was a real signal versus a false positive. She was the one who pressed the deploy button, every time, for every change.
The pipeline was a single GitHub Actions workflow that built the monolith, ran the tests, and waited for Sarah's approval before deploying to production. It was idempotent, version-controlled, fine in shape. It just required her presence.
Observability consisted of Sentry for errors and a single Datadog dashboard that Sarah checked twice a day. No SLOs. No runbooks. Two of the team's senior engineers had configured their own alerts to their phones because the official alerts went to Sarah's email and were sometimes missed.
Rollback was a fiction. Helmsly had never rolled back a deploy. They had only rolled forward, with hotfixes. The rollback script existed in the repository, dated from the initial setup eighteen months earlier, and had never been tested.
Sarah was the bottleneck and the safety net. Helmsly worked because she did. The day she took a real vacation was the day Helmsly stopped shipping.
The ninety days
For a startup at this scale, the full multi-quarter Bastion engagement is overkill. We did a focused ninety-day sprint. The goal was specific: distribute Sarah's procedural knowledge across the team, then prove the team could ship without her by sending her on a one-week vacation at the end.
Month one was the contract. Sarah and I sat with three of the senior engineers for two days and wrote down everything she checked, mentally, before pressing deploy. The list was thirty-seven items long. We made every one of them a blocking CI check or a code review rule. We deleted six items that turned out to be obsolete superstitions from her early days. We added two she had been doing inconsistently. By the end of month one, the deployment contract was a real document in the repository, reviewed like code, with thirty-three blocking checks.
The contract also formalized code review for the first time. The default rule was one engineering reviewer; changes to the auth module required a senior engineer; changes to migrations/ required someone from the data team. Sarah had been doing all of this in her head. Writing it down meant the team could distribute the load. We also added two rules that catch the most common bypass: self-approval is never permitted (even by Sarah), and re-request review on substantive change (so the pattern of "approve, then push a small fix" doesn't exist).
Helmsly's branching strategy was already trunk-based by accident (twelve engineers and one service is too small to support GitFlow), but we made it explicit: one mainline, feature branches living one to three days, branch-name convention enforced by a CI check. Helmsly only had three environments to begin with (local, staging, production), so we did not have to collapse any. The artifact-promotion principle slotted in cleanly: the same Docker image that ran in CI ran in staging ran in production. Configuration differed; code did not.
Month two was the human button. The manual approval step came out. The deploy went automatic when the contract passed. The first week of this was tense. Three engineers shipped changes within hours of each other, and Sarah watched dashboards obsessively for the first afternoon, then less the next day, then she went to lunch on Thursday without telling anyone. Nothing broke.
We also added basic observability gates: SLO definitions for the four most important user-facing flows, alerts on user impact instead of CPU, and a fifteen-line runbook for each of the team's eight known incident patterns. The whole team wrote them, not just Sarah.
Month three was the safety net. We ran a rollback drill in a clone of production. It failed in two surprising ways, which we fixed. We ran it again. It worked. We made the rollback procedure executable as a script with a single argument. We set up a feature flag system for risky changes. We documented what "risky" meant in five lines.
And on day eighty-three of the ninety, Sarah went on vacation for a week.
The team shipped fourteen times that week. One canary deploy triggered an automatic rollback on day three, the procedure worked, the engineer on call understood why, and the team moved on. Sarah saw the rollback notification on her phone in Costa Rica, decided not to call, and went back to the beach.
What it looks like now
I checked in with Helmsly six months later. Sarah had taken three more vacations. The team had grown from twelve to seventeen engineers. The new hires onboarded on the deployment procedure in their first week instead of their first quarter, because the procedure was written down.
Helmsly was not the elite-tier shipping organization in DORA's research. They were not deploying fifty times a day. But they were deploying three to five times a day, every day, without Sarah in the room. The change failure rate was holding under ten percent. The repeat incident rate, which had been their worst metric at the start, was below fifteen percent.
More importantly, Sarah was not the procedure anymore. The procedure was the procedure. She could focus on the product strategy work she had been hired to do.
The numbers
| Metric | Before | After 90 days |
|---|---|---|
| Deploys per week | ~8 (when Sarah was available) | ~25 |
| Hotfixes per week | 3 to 4 | < 1 |
| Bus factor on deploy procedure | 1 (Sarah) | 4 senior engineers |
| Rollback success | Untested | 100% in drills, 100% in real use |
| Days Sarah took off in the prior 6 months | 2 | 21 |
The last row is the one Sarah told me about. The Bastion's success at Helmsly was not measured in deploy frequency or change failure rate. It was measured in vacation days the CTO actually got to take.
What this means for smaller teams
This is the case for the smaller end of the procedure's audience. Most CI/CD content is written for the hundred-engineer organization. Helmsly was twelve people. The same five pillars, the same thirteen principles, scaled to their size, took ninety days instead of twelve months.
If your startup is in the Helmsly position right now, with one person as the entire procedure, you do not need to wait for Series C to install the bastion. You need three months of focused work and a willingness to write down what is already in your head.
Is your startup in this position?
If one person on your team is the entire deployment procedure, that is the work to address before the next round of hiring multiplies the risk.
Get in touch →
KyrApps