The local utility decided to change the pole mounted transformer that feeds my house and two neighbors to the East. They didn’t give any notice or anything, they just cut the power. So the new edition of the IEEE controls systems magazine came in the mail today and I spent some time looking through it while I was waiting. A column called 25 years ago discusses the difficulty of testing systems for all contingencies and highlights some of the issues that were encountered on the Voyager One spacecraft.
Voyager One encountered problems immediately after entering orbit. The main issue had to do with navigation, but a secondary issue was caused by the software trying to correct for the navigation issue and prevented a fix from being uploaded immediately. They were eventually able to upload a software patch and correct the issue, but it sounds like the integrity of spacecraft and mission success were in question for a while. From the article:
This was a program that had been very carefully done by probably some of the very best people in the world. They had exercised it and worked on it, but that particular real circumstance was not quite right in the model, and the software had a loop in it that no one had ever thought of.
This immediately brought to mind Air France 447. We may never know exactly what happened, but I think it is safe to say that some unforseeable, untestable sequence of events came together at the wrong time.
Software is great, it is so flexible that you can make it do anything you need it to. The drawback to that flexibility is a complexity that is hard to comprehend. For example, a small piece of software may be around 100,000 lines of code. The number of potential execution paths through those 100,000 lines of code is on the order of exponential. This is the reason that you can only really test an application to be correct according to the requirements. It is just not possible to completely test every state and execution path of an application of any appreciable size.