Software
Design (CSC-223 97F)
[News]
[Basics]
[Syllabus]
[Outlines]
[Assignments]
[Studies]
[Examples]
[Readings]
[Projects]
[API]
The
Failed Phobos Missions to Mars
Summary:
Two Soviet Missions to Mars failed and two craft were lost. The first
failure was blamed on human error. The second was blamed on computer failure.
Both were likely caused by design flaws.
Sources:
Scenario
Phobos I was launched on July 7, 1988 and Phobos II was launched two
weeks later. Phobos I operated normally until routine attempts at communication
failed on September 2, 1988. It appeared as thought the craft had lost
power. Phobos 2 continued to operate normally and successfully entered
orbit of Mars. Yet, while preparing to launch mobile landers to the surface
it lost contact on March 27, 1989.
What went wrong?
- The error checking computer was offline.
- Only one programmer, rather than the required two, was present.
- On the night of August 29/30 this technician got one bit wrong in a
large sequence sent to the craft. As a result the craft lost its lock on
the sun. This caused its solar panels to point away from the sun allowing
its batteries to go dead.
- The Phobos I failure was blamed on human error and the technician was
removed from the project.
- Phobos II was able to reach orbit.
- Here the computer failed and the probe lost its orientation to the
sun. Its batteries went dead soon after.
- This time the problem had to do with the redundancy and error checking
system the probe used. It had threee processors. Each operated independently
then each processor would vote on what the the probe should actually do.
One of the three failed early on but the craft continued to function close
to normally. However as the mission progressed the second started to fail.
What most suspect happened is that the second failed as well. As a result
the third which was still functioning normally could not get any of its
instructions carried out because it did not have quorum.
- Some blame the
failure of Phobos II on a UFO that was supposedly sighted shortly before
its failure.
Conclusions
- Human Error is rarely a legitmate reason for the failure of a package
or project.
- Instruction sets must be designed to minimize accidental catastrophe.
- Designers must assume humans will make mistakes and design their code
to meet these limitiations.
The behavior we call human error is just as predictable as system noise,
perhaps more so: therefore, instead of blaming the human who happens to
be involved, it would be better to try to identify the system characteristics
that led to the incident and then to modify the design, either to eliminate
the situation or at least to minimize the impact for future events. One
major step would be to remove the term "human error" from our
vocabulary and to re-evaluate the need to blame individuals. A second major
step would be to develop design specifications that consider the functionality
of the human with the same degree of care that has been given to the rest
of the system (Norman).
- Using independantly written code to create reduncies in crucial areas
does not solve problems because programs typically have flaws in the same
places.
- All persons involved in a project must have the freedom to anonymously
report bugs in the code. This ensures that various managerial, financial
and political issues do not perpeturate errors in coding.
[News]
[Basics]
[Syllabus]
[Outlines]
[Assignments]
[Studies]
[Examples]
[Readings]
[Projects]
[API]
Disclaimer Often, these pages were created "on
the fly" with little, if any, proofreading. Any or all of the information
on the pages may be incorrect. Please contact me if you notice errors.
Source text written by Evan Schnell and Robert Reasoner.
Source text last modified Fri Oct. 27. 12:31:49 1997.
This page generated on Fri Oct 17 09:04:48 1997 by SamR's
Site Suite.
Contact our webmaster at rebelsky@math.grin.edu