Craft / Logic

We use AI across our own delivery workflow. Not because it's a talking point — because on specific, measurable tasks it produces a clear lift. On other tasks it produces the illusion of lift while quietly costing us time. This is our honest map of both.

Where we measure real lift

Six categories where we can point to reduced cycle time, fewer defects, or better estimates, repeatedly and across engagements. In each case the measurement is against the same team doing the same work without the tool.

Discovery synthesis — clustering interview notes and mapping them against the intake brief. Hours saved per sprint, no quality regression.
Scaffolding known architectures — first pass of a well-understood service skeleton. The senior engineer still reads every line; the starting point is materially better.
Test generation for deterministic functions — coverage jumps without a commensurate jump in maintenance cost.
Migration drudgery — schema rewrites, config translation, boilerplate-heavy refactors. Reliable, checkable lift.
Documentation first drafts — we edit heavily, but we no longer start from a blank page.
Support triage — classification and routing of inbound issues on client ops tools we run.

Where we've pulled it back

Two categories where AI assistance looked like a win at first, then quietly slipped on the metrics we actually care about. We've either removed the tool or tightened its scope severely.

Novel architectural decisions — the tool is confidently wrong often enough to contaminate the design conversation. The cost of unwinding a wrong assumption dwarfs any time saved.
Customer-facing copy on first-party engagements — voice drifts, claims creep, and review cost exceeds the draft savings.

What this means for clients

If an agency is selling you AI as a line item, ask where they measure it and against what baseline. Without those, the claim is marketing. Our rule: AI shows up in our delivery where the metric moves and the rework stays flat. Nowhere else.

Where AI improves delivery — and where it doesn't.

Where we measure real lift

Where we've pulled it back

What this means for clients

More from the same team.