Traditional approaches to dealing with software malfunctions
have included such mechanisms as:
- Hardware/software watchdog
- This is a piece of hardware that's known to be
fault-free. It triggers code to check the sanity of the
system. This sanity check usually involves examining a set
of registers that are continuously updated by properly
functioning software components. But when one of the
components isn't working properly, the system is reset.
- Manual operator intervention
- Many systems aren't designed to include an automatic
fault detection, but rely instead on a manual approach
— an operator who monitors the health of the system.
If the system state is deemed invalid, then the operator
takes the appropriate action, which usually includes a
system reset.
- Memory constraint faulting
- Several operating systems (and hardware platforms)
include features that let you generate a fault when a
program accesses memory that isn't yours. Once this occurs,
the program becomes unreliable. With most realtime
executives, the result is that the system must be reset in
order to return to a sane operating state.
All of these approaches are relatively successful at
detecting a software fault. But the net result of this
detection, especially when faced with a multitude of faults
in several potentially separate software components, is the
rather drastic action of a system reset.