So Pete took me to lunch today with some fellow software engineering buddies. While munching our rather excellent Indian buffet food, one of the engineers related an incident that happened to him on vacation. He was walking back to his hotel room, and he overhears one side of a phone conversation that is going like this:
We can either fix more bugs to make the system more stable or we can develop new features but not both.
Anyone who has done any level of commercial software development is familiar with this conversation. What I do is very different from what Pete’s buddies do, but the conversation was completely familiar to me. It doesn’t matter if the project is the next PS2 game or the control system for the next Airbus airliner. If you sit in a room with the developers and the managers, they are probably having this conversation. My theory about the origin of this conversation is overly simplistic not filled with deep insight. But, here it is anyway.
Like most engineering problems, software projects can be thought of as being managed across three different constraint axes: new features, general quality, and time to market. The goal is always to ship out as much as you can as fast as you can and have it work as well as you can. I think where people get into trouble with software is when they begin to believe that the three constraints above are more flexible than they really are. In particular, it’s easy to convince yourself that change is cheap. I think even experienced engineers who really understand software development can fall into this trap.
Consider the following thoughts that all of us have had:
I can’t believe they didn’t fix this simple bug
and
Just add this simple new feature. It will take you a day.
I’ve listed these two statements separately, but they are really asking for the same thing: a small, seemingly localized change that would make the product better. The problem is that in software, small localized changes almost always have non local effects that you didn’t expect. To use a cliched analogy, it’s like tossing a rock in a pond. The initial splash is small, but the ripples go on for a long time. While this seems obvious, software is still perceived as being flexible and easily changed. The reality is that every single change you make to a piece of code has a high probability of breaking something that someone doesn’t want broken. Therefore, no matter how trivial it seems, it is likely to be expensive to qualify. In other words, all change is hard. There are no shallow bugs.
I think these two relatively innocent thoughts are the core cause of the phone conversation that Pete’s friend overheard. When a team is put into the position of needing to deal with two streams of changes, life can just get intractable. Consider that on the one hand, the team has to deal with all the non-local effects of the bug fixes that they must make to keep the system running. On the other hand, they must also deal with the effects of the new feature development, and then on top of that, the effects of the bug fixes on the new features. So instead of just trying to handle the first order effects of fixing bugs, you have to deal with second and third order effects as well. Eventually, every new request becomes a fountain of pain and torture until finally the engineering team will threaten to storm out of the project in protest. This is when the conversation happens.
Being an engineer, my feeling is that at some level this conversation is inevitable because in software we just don’t know how to specify what we want early enough to avoid the thrash later. Often it’s the case that the only way to know if you’ve built the right thing is to build it and see. User interfaces (and computer games) fall into this category of project. As long as this is true, we’ll have late changes and late cycle thrash, and people will be on their cell phones in Florida pleading with their managers to start cutting down the scope.
I think that the shrink wrap world is in slightly better shape with regard to this than other areas of software development. This is because in shrink wrap, the time to market rules. You must turn the product around every year on the year to keep the revenue stream coming in to keep the product alive. In this environment, it can be easier to explain why features need to be cut or compromised to make the ship date. Of course, the flip side is that shrink wrap software tends to also compromise on overall quality to make the ship date. But for now, users seem to be willing to take that tradeoff.
I think the software services industry has it harder, because there the time constraint is slightly looser, but the pressure to implement everything the customer wants is much stronger. Therefore, you end up with long deathmarches trying to patch together huge custom systems to jump through all the right hoops and still not be too late. That’s a tough world to be in.
Finally, I think that the general principle here: that changing software is expensive, cannot be understated. People seem reluctant to accept this fact about software even though they are happy to accept it in other aspects of life. If you are having work done on your house, you don’t expect to be able to change the requirements on the contractor without paying extra money. And yet most people, even experienced software engineers, have a hard time not thinking that just one more tweak to the code will be easy and cheap. Learning to estimate and accept these costs will go a long way to improving software, the development process, and thus reduce the number of times you hear the conversation.
There are ways to manage this though. In my experience, the better an automated unit test suite I have and the more diligent I am about writing unit tests in advance of the code they test, the less fragile and the more amenable to quick changes the codebase is.
(And a unit test suite doesn’t count as one to me unless it runs every time I type command-B in Xcode, and it’s trivial to add new tests to…)
Test-driven development isn’t a panacea. But I’ve found that it works far better than any other technique I’ve tried.
What Chris said
If you’re worried about non-localized impact, you can chose to handle it manually – exponential cost driven by the number of paths your software can take. (With the implication that, sooner or later, modifications to software are commercially unfeasible!)
Or, you can do what every other industry did when a process became to costly to do manually: Automate – i.e., a test suite.
It still doesn’t solve the problem completely – test suites are still to some extent incurring exponential cost – but you’ve flattened the curve enough to make it almost flat for even large projects.
(There are other things that impact the curve, choice of language for example. But it still remains flattened )
And that means you will be alerted to any non-local impacts your change has the moment you make it. Changing software is *not* expensive unless we make it so.
Refactoring plays another significant part here. If you let your code grow into a tangled mess, more and more of the impact is non-local. So you need to clean it up regularly.
> And that means you will be alerted to any
> non-local impacts your change has the moment you
> make it. Changing software is *not* expensive
> unless we make it so.
That assumes (1) that if you’re alerted, the non-local impact is no longer expensive and (2) that the test suite really will alert you to all the important non-local impacts.
I think both of these are simply false in practice, most of the time.
Knowing that a non-local effect exists isn’t the battle–dealing with it is. You’re pushing back the assumption that changes are cheap and easy.
And sometimes even a “good” test suite simply won’t expose all the important non-local impacts.