What Price Safety? The 737 Max 8 Saga Continues

In March and April, I
blogged on the tragic and costly software problems plaguing Boeing’s 737 Max 8
jetliner.  Briefly, after two crashes in
Ethiopia and Malaysia in which a total of 346 people died, evidence pointed to
a software problem in the fly-by-wire plane, and the U. S. Federal Aviation
Administration (FAA) grounded the plane after numerous other nations did the
same in March.  In May, Boeing claimed
that they had fixed the software problem, and since then Boeing and the FAA
have been running extensive tests to verify that the problem has in fact been
solved.  On June 3, Boeing CEO Dennis
Muilenberg said that he expected the FAA to declare the plane flightworthy by
the end of the year, but declined to give a specific timeline. 

In the meantime, all 387
existing MAX 8s are sitting on the ground instead of flying and generating revenue
for the airlines that own them.  This has
caused big headaches for both American Airlines and Southwest, which recently
announced that it is terminating service to New Jersey’s Newark Airport simply
because it doesn’t have enough planes owing to the MAX 8 groundings.  And American’s losses are running in the
range of $400 million, largely due to the groundings.

Most of the time, when
software fails to do what it should, the consequences are fairly minor.  If it’s one feature on some software on your
laptop that acts up, maybe you lose some work, or even get so turned off by the
problem that you swear never to buy that software again. But you remain healthy and nobody dies.

Then there’s the whole issue
of software security, and making sure malevolent attacks don’t disable or
otherwise inconvenience users.  Software
companies are used to dealing with such things by now, and generally stay up to
date with patches that prevent hackers from doing major damage, as long as the
users install the patches.

These kinds of environments
are what most software developers are used to working in.  The bigger the organization and the more
critical the software, the more bureaucracy is involved, but that’s not
necessarily a bad thing.  I spoke with a
software engineer many years ago who worked for a regional telecommunications
company.  She told me that she’d been spending
most of the previous year on changing exactly one line of code.  The reason it took so long was that a bunch
of other engineers had to take that change and try it out in all sorts of other
situations and find out what its ramifications were, and whether it would cause
problems down the road. 

Telecomm companies are
rather shielded from competition, and so taking a year to change one line of
code may be fairly routine, I don’t know. 
So maybe we shouldn’t be that surprised if it now takes six more months
for the FAA to make sure that the changes Boeing has made in their 737 MAX 8s are
really going to make things better and not otherwise. 

Thing is, the phone company
didn’t have to shut down and wait for my software engineer friend to finish her
job.  But when software is intimately
tied in with a multimillion-dollar piece of hardware that you can’t use just a
little of, and the software makes the whole thing unusable, it creates a
spectacle that we haven’t seen since the week or so after 9/11/2001 when all
domestic U. S. flights were grounded. 
And that period, plus the general fear of flying it engendered, hit the
airlines with an economic punch that took them years to recover from.

Fortunately, the MAX 8
problem doesn’t appear to have frightened people away from flying in
general.  Because of the scarcity of seats,
the airlines have been able to charge more, and so revenues at American and
Southwest are actually up, despite the shortage of planes.  Nevertheless, Boeing has set aside nearly $5
billion in case it ends up having to pay its customers for loss of revenue, and
lots of airlines around the world are going to think very hard before they
place any more orders with Boeing.

Unlike mechanical failures,
software failures are not simply a function of physics.  Software is so dynamic and dependent on the
exact conditions and history of its environment that it is virtually impossible
to “prove” it won’t fail under any circumstances, except in rare and
rather academic cases.  Some day, I hope
the whole history of this fiasco will come out, as it will be a fascinating
study in how software engineering ethics failed in this instance, and it will
harbor lessons for how safety-critical software should not be written. 

The problem with such a
story may be that it could be too hard for anybody except specialized software
engineers to understand.  But then again,
it may boil down to management problems, as so many ethical issues do.  Already there has been speculation that the
FAA was allowing Boeing to conduct too many of its own safety tests, and
basically just taking Boeing’s word for it that everything was okay.  Only when we have enough details about how
the problems happened and how they were fixed, can we judge whether the FAA has
been lax or negligent in this area.

In the meantime, software
engineers everywhere except Boeing can be glad that their work is not going
under the microscope of the FAA’s inspection. 
But there are plenty of other types of software that are
life-critical:  for example, software for
medical devices, automotive software, even the software that lets first
responders communicate with each other. 
A failure with any of these products can have life-threatening

So maybe the lesson here for
software engineers is:  program as though
your life depended on it.  If more
programmers had that attitude, we’d all have much better software.  Maybe not so much of it, but that might not
be a bad thing either.

Source link