Design and the Mythical Man Month
09 February 2014
I have written about the Mythical Man Month in the past, and recently decided to have another look at it. The book covers a lot of ground, so my interest here is how it treats design & specification. In particular I am interested in how it contrasts with my more recent experience such as what I have some across regarding Agile. In short many of my views that are most in conflict with modern practice are backed up by what Fred Brookes wrote, once allowances are made for its age. I feel that Mythical Man Month is best read as a set of cautionary tales rather than a cook-book.
The core theme of Mythical Man Month is the need for conceptual integrity, which is all about following a consistent core of philosophies, with ideas that do not fit in well being discarded. If important things do not fit in, then the whole design needs to be reworked, because hammering them into place will sooner or later cause a whole host of problems. This conceptual integrity has to be adhered to top-to-bottom. Conceptual disunity occurs when design is split into tasks done by many people, and I have too often seen problems caused by design-consequential decisions being made on the fly by whatever developer happened to be implementing a programming task. I am somewhat sympathetic to Brookes' view that the design needs to flow from at most a very restricted number of minds. In reality conceptual integrity is often compromised, but every time this is done the door is opened up to a whole set of avoidable (or at least anticipatable) problems.
Specification is everything
In reading Mythical Man Month, one thing stands out: Documentation is king. This is because specification & design is how conceptual integrity is maintained in practice, and everything else flows from that. There is clear influence of the waterfall model, although there is also realisation (discussed below) that software development does not fit the model very well. Nevertheless the processes that are aimed at preserving the conceptual integrity are also applicable to other development models:
I am in full agreement to the reasons given in Mythical Man Month as to why formal specifications are essential, for which two reasons are given: Flushing out issues, and communication. The latter is obvious because all the implementers on a team need to at least move in the same general direction, but the former needs much more discussion. While I don't exactly subscribe to Joel's Big Design viewpoint, I do see the merit behind how it quickly nails down issues. Mythical Man Month takes the same tack that it is only when things are written down that the gaps and inconsistencies reveal themselves, and the process of rectifying these requires thinking that leads to a tight rather than fuzzy design. It is planning, rather than the plan, that matters here.
System architecture is defined as the complete and detailed specification of the user interface, which is presented in terms of user manuals. This can be a compiler manual, a computer manual, or even a set of system operating instructions. These manuals are the external specification, and although Mythical Man Month frames it in terms of what the user sees and doesn't see, it is fundamentally what these days would be called use-case requirements analysis. Use-case analysis is a theme I have written about repeatedly, because without it you have no foundations. It must have details nailed down tightly, but must also not prescribe how things are to be implemented. Of course in practice there will be issues flagged up by both users and implementers that leads to changes, so the design needs to be able to adapt.
The issue here is goal-posts because while it is possible for developers to get started on only vague assumptions, there is only so far this can be pushed. Poor requirement handling is the one thing that separates living dangerously from outright gambling, and all too often I have found that anything other than up-front requirements are rarely specified in a timely manner. A personal instance of mine was when I had to write the code that handled the removable hard drive bays on a video recording server, which was controlled by an LCD panel mounted on the front. All I had to work with was “ write files to the inserted disk”, which bought immediate questions as to which drives(s) as the rack-mount was able to take four drives.
Mythical Man Month introduces the idea of mini-decisions, which are defined as issues that are not of full-debate importance, the example being the setting of operation condition codes, but are nevertheless decisions that needs to me made. These needs to be made consistently throughout the project, which I fully agree with. Numerous times while writing software I have come to points where design-consequential decisions have to be made, and I have seen the consequences of both the can being kicked down the road, and an uninformed programmer taking the easy option.
Although these days they are more likely to go into a wiki page, Mythical Man Month introduces the idea of a telephone log where every technical question and answer are recorded. This is crucial because these correspond to cracks in the implementation if not the design itself, and if how they are filled does not become part of the specification, they will likely cause more problems down the road. It also forces consistency, because too many times I have seen the situation where a technical question is answered by whatever happens to be on to of the bosses mind, and have subsequently been stung by this changing.
Design bugs out
Brookes asserts that most bugs should be faults in implementation rather than design, which is idealistic, but fundamentally right. An important point is that having a design that at least illuminates bugs will result in big maintenance pay-offs, and unless the design is pure and the documentation fine, far-reaching system-wide consequences of even supposedly simple bug fixes will not be considered. It is such far-reaching effects what is why any bug-fix has a 20-50% chance of introducing another problem, with the most subtle bugs are caused by mismatched assumptions made by different module authors, which comes back to getting the specification right so that bugs are designed out.
Prototype as specification
An interesting diversion is the idea of having the system act as the specifications, while while it has its merits, also brings some potentially disastrous problems. In short side-effects of invalid operations get baked into the specification, with unexpected answers to sharp questions become de-facto standard, which often are suboptimal and cause more trouble down the line. While I can see some merit in using the system as a starting point for a specification, this only really works if this specification is refined and used as a blue-print for a re-implementation (ideally in a different programming language), as otherwise the specification is a waste of time and likely a work of fiction. Eric Raymond's article on ground-truth documents covers the hazards, which are common as the prototype usually ends up being the product.
Changes has to be designed for, because the only constant is that there will be change, and if this is not allowed for it will result in pain down the road. In practice this means that some level of generality has to be built into the design of a program, as otherwise implementing extra functionality requires the extra effort of breaking apart the existing program, and hoping that these pieces will still fit together afterwards. If this is not allowed for, the result is the mess of hammering a square peg into a round hole. Nevertheless change still has to be managed, and in cases curtailed.
How Mythical Man Month treats change is the one area where the timeless answer is the right one in contrast to the utter cop-out that appears in the Agile Manifesto Agile welcomes changing requirements, even late in development, which is a marketing-led disaster in the making. Late changes to requirements is the single-biggest cause of projects going tits-up, and no experienced developer would ever advocate it as good practice. Late changes are the surest way of blowing conceptual integrity out of the water.
Mythical Man Month correctly states that for the benefit of the implementers, any changes (and updates) have to be quantised, because at the end of the day they need goals that are not constantly moving. Yes goals will need to shift occasionally, but any such shift will typically trash developers implementation planning, and if this becomes persistent then developers will end up throwing foresight out the window. The first software house I worked in was one where I effectively gave up planning more than half a day in advance, because in short I couldn't.
More importantly is the willingness to say “No” to a customer request, and as the project progresses the threshold a feature has to pass before being accepted has to increase. In modern software development this means deferring the feature for the next version rather than cutting it completely, but the underlying concepts are still much the same. Sooner or later you have to have a feature freeze in order to consolidate what there is, or the project will either never be finished or end up being a complete crock.
The pilot system
A well-known core part of Mythical Man Month is that a pilot system will end up being thrown away, regardless of whether this is intended from the start. Therefore it is a choice between planning in advance to have a throwaway prototype, or delivering the prototype to a customer. The latter is correctly stated as something that merely buys time for a redesign, and comes with the risk to reputation. This problem is something my previous company landed right on top of, but with the added problem of not realising they were on borrowed time.
More interestingly Brookes states that the actual disposal of the pilot system may be either a clean sweep, or a bit-by-bit process, but does not make much distinction as to how it happens. These days the bit-by-bit approach is known as a refactor, which is substantially more common than a ground-up rewrite. The distinction I think is due to how software projects these days are usually ongoing efforts with multiple deliveries, together with a much stronger culture of code reuse. These days doing a true clean-sweep is pretty rare, and although I don't quite agree with Joel's assertion that starting from scratch is the single biggest mistake a project can make, I agree with the reasons why it should at least be a last resort. It is surprising how often supposed “rewrites” are really heavy-weight refactors, as there is always some code that gets cut'n'pasted from the old project.
Handling knock-on changes
The idea of self-documenting programs is simple, and that is to avoid problems associated with keeping two separate sets of files in synchronisation with each other. As requirements change, so does the implementation, and this means changes to documentation also need to be handled. More importantly, the burden of handling this change must be minimised, and one way of doing this is having self-documenting programs. An interesting aside is that programs that are not self-documenting, in particular through non-use use of descriptive variable and names, are quite likely going to be unmaintainable crocks. The methods given in Mythical Man Month are somewhat quaint but the underlying concepts are all there. These days much of it is handled via automation, using tools such as Doxygen that handle the presentational side of documentation.
In Mythical Man Month the baseball concept of hussle is discussed, which is people doing that bit more than necessary in order to build up a cushion against mishaps. The problem is that measuring effort is the fastest way of dampening hussle, because the assumption builds that this measured overextension is actually normal. Even worse is the assumption that it is on-tap, even if it is recognised as abnormal. Problem is that too often this assumption is made, and the way that the taps are “turned on” leads to a combination of alienation and burn-out. The more general problem is that people do not work at a uniform pace, and the point Brookes misses is that a performance spike (or even a plateau) might actually be due the cross-product of prior experience and luck rather than raw ability.
While Agile methods such as Scrum include a lot of good practice and take account of modern technologies & trends, I am now convinced that Agile is intellectually flawed, because the way I see Agile fail are the failure modes described in Mythical Man Month. Welcoming late requirement changes in itself is pretty bad, but Agile's interpretation of simplicity as maximizing work not done basically means kicking the can down the road, which I have seen happen far too often. When I hear about Agile projects failing because Agile was not done right, I can't help but think about how often the same thing is said about communism. Mythical Man Month has its fault and contradictions, but when read as a set of cautionary tales rather than a guide, its value really shows through.