# Effort as a function of code-base size

In the Mythical Man Month1, Brooks relates a study by Nanus and Far2, validated by others including Brooks himself, observing an empirical relation between effort and code-base. The formula goes:

with $E$: the effort to add new work, $n_{lines}$ the number of lines of code, and $const$ a “productivity constant” of the team.

I realized recently, trying to determine the economic equation justifying (or not) to break down a monolith into decoupled services, the direct relation this formula has with the efficiency of decoupling, microservices3 and small code-bases.

For a given monolith with a code base of $n_{loc}$ lines of code, being split into $s$ services, assuming the resulting services code-base have $\frac{n_{loc}}{s}$ lines of code each. The effort to add new lines as a ratio of:

For example, a monolith 200k lines of code long, if split into 20 services 10k lines long, has an effort ratio of 0.22, which means adding code to a decoupled service base is ~4 times faster.

Decompose monolith in $s$ services Resulting ratio
$s =$ $\delta_{E} =$

It is times faster to add code to the services.

# Empirical significance

This calculation is taking a lot of assumptions, and grossly oversimplifies reality. Those numbers are not to be taken as face-value. What is the significance of this result, then?

On the overestimating side, a legacy monolith has high chances to have deprecated and useless code. The resulting code base might even be smaller, increase the ratio further. On the underestimating side, microservice architectures are complex3 to build, to maintain and to debug. There is an overhead to setting them up and training to team to work with the resulting infrastructure, plus some migration costs, which is not accounted for in this simple calculation.

Not all monoliths are created equal, one with proper separation of concerns and layer might fare better than a ball of mud - but these are unicorn-rare occurrences, which the same Brooks would argue are bound to decay over the course of their lives. But arguably, large monoliths with years (or decades) of existence might not be much simpler to operate.

The numbers do bear some reality that most people confronted to ball-of-muds have witnessed. They are illustrating the complexity and incurred confusion that most large ball-of-mud code-bases generate. Leaky abstractions over cross-contaminated layers, unpredictable side effects, impossibility of debugging, long build times, lengthy and risky deployments, poor code coverage with unmanageable combinatorial. The larger the code base, the larger the team to maintain it, the more complex organization gets, the more conflicts (at one step of the process or another) emerge, the longer the cycle time, the riskier the maintenance, the more manual checks and balances they require, the more friction it creates. A vicious loop where bail-out becomes the only viable option.

There is no judgement to be made, one should respect what came before, and the fact that despite its limits, the legacy software supports business and brought us where we are. Things became this way because of the very nature of maintenance - as Brooks says:

Systems program building is an entropy-decreasing process, hence inherently metastable. Program maintenance is an entropy-increasing process, and even its most skillful execution only delays the subsidence of the system into unfixable obsolescence.

~ The Mythical Man-Month: Essays on Software Engineering, Frederick P. Brooks Jr.

Experience, as much as empirical studies, shows that larger code-bases do get more complex. Reducing them, by trimming or distribution, do make evolutions faster and more fluid. It is interesting that empirical evidence dating from 45 years ago was already underlying that.

# Notes

1. The Mythical Man-Month: Essays on Software Engineering, Anniversary Edition, Frederick P. Brooks Jr.

2. Nanus B. and L. Far, Some cost contributors to large-scale programs, 1964

3. See also: when to use microservices and when not to 2