Every growth-stage product eventually hits the same question: do we rebuild, or shore up what we have? The wrong answer costs months of runway in either direction. Here is the framework we use to make the call, separate from the emotional pull of either path.
Start with the load line
Draw the curve of expected load over the next 18 months — concurrent users, write volume, integration count, whichever dominates your cost. Plot your current system's capacity against it. Three outcomes:
Current system clears the curve with room — shore up.
Current system clears the curve by less than 30% — shore up, but budget the rebuild for the quarter after you clear it.
Current system does not clear the curve — rebuild, but incrementally.
Then check the delivery cost curve
Capacity isn't the only axis. If every new feature takes 2× the time it did a year ago for reasons that compound (coupling, missing tests, forgotten decisions), the system is a tax on the team regardless of raw capacity. That tax also belongs in the decision.
Incremental, never big-bang
The strongest predictor of a failed rebuild is a year-plus parallel project with no shipping milestone. We don't take those engagements. A rebuild worth doing ships value every quarter.