« blog 2022-03-14

Tech Debt and Taxes

In Reframing Tech Debt, Leemay Nassery suggests framing technical housekeeping positively — as building “tech wealth” — to convince business and product stakeholders it’s worthwhile.

If I were one of those stakeholders, a self-professed customer-obsessive, I’d balk (rightfully!) at the idea of a secondary system of value independent from the user experience. “What does ‘tech wealth’ mean for a customer?” Fair question!

I think a negative framing is more useful here, but Nassery’s suggestion that, say, refactoring a program yields a resource to reinvest gets at the issue with calling it “debt.” Technical “debt” sounds like a fixed cost that can be deferred; more accurately, we should say debt exacts an ongoing tax from go-to-market teams. The costs of quick development may be hidden, but they aren’t externalized.

If you make implementation shortcuts in the name of product velocity, but you don’t have the discipline to clean them up, those shortcuts will eventually be counterproductive to the speed-of-delivery you hoped to prioritize. Changes to a codebase are more often integrative than strictly additive: they have to contend with the code that’s already there. Writing sloppy code today means writing code slowly tomorrow.

Imagine a minor update to to a webhook implementation. If the webhook’s baseline behavior is undocumented (tech debt!), or its input validation is rickety (tech debt!), verifying the change is a herculean task of either retroactively establishing that baseline with unit tests or, worse, manually testing sample inputs. Even though the update’s minor and implemented well, the QA process spends time on the multitude of poorly-implemented cases just because it modifies debt-addled code.
Imagine you want to release a new billing page to admin users. If the permissions module (which canonically establishes whether a given user is an admin) has an unclear interface, your engineer’s more likely to initially misuse it. Either they’ll catch the mistake, then burn time debugging it, or you’ll have a product permissions bug in production. Either way, a dark cloud over what should be a joyous feature release, all because it uses debt-addled code.
Imagine you’re testing a new feature, and you want to release it to a beta testing group behind a feature flag. Your frontend engineer has added n feature flags already; why’s this one taking them so long? Maybe they’re stuck testing 2^n + 1 possible configurations because your flagged features are mutually-dependent.¹ Yikes!

Even unused and unmodified code taxes productivity. If you have a deprecated service, changes to its dependencies still require changes in the disused service! At its most insidious, this dynamic prevents you from improving joint dependencies with the highest-value parts of your product, where inefficiencies are really costly.

In each of these cases, tech debt’s the common enemy of the product stakeholder and even the most navel-gazing, customer-indifferent engineer. While the underlying debt isn’t addressed, the stakeholder pays a tax — as a delay, or as systematic under-verification that’ll eventually yield bugs.

This dynamic is particular to API and product debt — unnecessary complexity in how parts of the system interact with each other (interface and data model design, documentation, and testing practices) or the space of customer states (feature flags and billing). These forms of debt tax new projects in proportion to those projects’ complexity; big projects and radical changes in product behavior are proportionally punished. Of course, not all “tech debt” should be prioritized this way; poor database performance might usually be considered debt, but it’s less likely than an unexpressive API to derail your next feature.

The usual refrain — including in Nassery’s article — is to preallocate blocks of time for building tech wealth. The “tax” translation from tech into customer impact should let a team prioritize specific tech debt initiatives, according to the same criteria as features.²

Unsurprisingly, it falls to engineers and their managers to use this framing device effectively.

If an element of tech debt really can’t be translated to customer impact, even when taking tech tax into consideration, deprioritize it.
Keep a shared list of known issues. This can be simple: a short name, a quick description, and a way to voice agreement.
Measure tech tax while you work. How much time did you burn grappling with tech debt? Dealing with unexpected integration or testing issues? Associate the tax with an underlying issue (“known-bad permission API: one day”) and aggregate costs across projects.
Keep engineering leaders abreast of implementation details and the stumbling blocks therein. If individually-contributing engineers aren’t involved in early planning, it’s a manager’s or tech lead’s responsibility to explain the relationship between tech debt and their team’s productivity.
If the best you can do is preallocate blocks, try to allocate time early in the planning period rather than at the very end! An allocation is easier to justify if it yields improvements in the same iteration. Leave enough time before the quarterly retrospective that you have something to show for your efforts.

Business and product stakeholders can take some proactive action.

Clean up after experiments. Suppose you test a new billing model with a handful of customers, but ultimately decide against it. If you roll those customers into one of the pre-existing tiers, engineers can remove the experimental code to leave billing as simple and inspectable as when the experiment started, and customers love free upgrades!
Negotiate firm sunset timelines for deprecated features. Ideally, keep them short. Ensure users who postdate a feature’s deprecation won’t start to depend on it; usage should dwindle before feature support ends.
Practice empathy: engage your engineers over why — not just whether — a project runs late or turns out buggy. If you spot common issues, speak up! Bell Labs researcher Richard Hamming credits scientists in other departments with solving his resourcing struggles.

Every time I had to tell some scientist in some other area, “No I can’t; I haven’t the machine capacity,” he complained. I said “Go tell your Vice President that Hamming needs more computing capacity.” After a while I could see what was happening up there at the top; many people said to my Vice President, “Your man needs more computing capacity.” I got it!

This should help you hit your targets, and it’ll definitely make you friends.

The issue with “tech debt” and “tech wealth” is that they underemphasize software’s impact on a company’s agility, its ongoing ability to adapt to serve its users’ needs. Without recourse to customer value, advocating against technical debt means advocating for a separate and competing system of value, something business and product stakeholders are right to treat with skepticism.

Emphasis on the productivity impact makes technical debt expressible in the primary system of value, where it’s comparable against — and can be prioritized over — any other company initiative.

Process Plants: A Handbook for Inherently Safer Design (Kletz 1998) reminded me of this essay. A process plant designer can intervene to prevent hazards (e.g. a “snakepit” plant layout), but investors don’t value hazard mitigation for its own sake. Quoting Kirkland,

This cannot all be the fault of the “money men” and their failure to understand us. It is often much more our inability to express ourselves in a language they can understand.³

I’m more optimistic than Kirkland. I think the “money” stakeholders do understand hazards, but that they’re better-equipped to predict and prioritize money. To that end, Kletz offers napkin math:

Remember that if it costs $1 to fix a problem at the conceptual stage, it will cost

about $10 at the flowsheet stage,

about $100 at the line diagram stage,

about $1000 after the plant is built,

about $10,000 to clean up after an accident.⁴

Software could use some equivalent with costs in various stages of planning, development, and deployment. “Hazard” may be a better analogy than “tax” for describing the running cost of technical debt.

Jamie Brandon discusses technical debt in terms of a team’s “complexity budget.” That post argues complexity compounds rather than exacting a tax:

Complexity limits how much of the system can fit into the heads of the developers, and in doing so breeds more complexity. Every time you are forced to do something ugly in one place because of existing ugliness in another place you are feeling this cost.

Moreover, complexity isn’t linearly costly: at certain thresholds, a small complication may dramatically impact your effectiveness. Conversely, when the team is small — maybe it’s just you! — and the program is simple, complexity seems free. That’s what makes technical debt so pernicious!

Worse, there are cliffs in the cost. As soon as a particular subsystem cannot fit into the head of a single developer there are huge additional overheads for communication. Opportunities for improvements or simplification are missed because no one person can see all the parts of the problem.

In hindsight, “interest” is a better extension of the debt analogy than “tax.” Avery Pennarun follows that terminology to extend the metaphor even further: differentiating between high- and low-interest technical debt, describing debt-income ratios (a better formulation than my sketchy footnote calculus), and so on.

Take the time to keep your feature flags mutually independent, even if that means running migrations to split or merge flags; make invalid states unrepresentable. If you don’t, each new flag effectively doubles the user interface surface you design and test.↩︎
Development acceleration should have a double-integral relationship to the value delivered by new development, but expressing this accurately would require some complex discounting.

Imagine you can divide your fixed engineering time in a quarter between t_debt and t_feat. Customer value delivered looks something like (v+t_debt) ⋅ t_feat, where v is your initial product velocity. As t_feat approaches zero, the team spends all of its time preparing to deliver value… but none of its time actually doing so.

This model is short several coefficients, but hopefully you get the idea.↩︎
Kirkland, C. J. The Channel Tunnel: Engineering under Private Finance — Innovation or Frustration (London: Fellowship of Engineering, 1989). Quoted in Kletz, page 157.↩︎
Kletz, Trevor. Process plants: A handbook for inherently safer design. CRC Press, 2010. Page 165. For specific examples of how “chang[ing] early” saves money, see page 202.↩︎