What is really needed here is a more modular approach. System architects often de-couple and modularize their systems from a design/implementation point of view. Ideally, these modules would be the focus not only of the design, but also of the fault-recovery process, so that if one module malfunctions, then only that module would require a reset — the integrity of the rest of the system would remain intact. In other words, that particular module wouldn't be a SPOF.
This modular approach would also help us address the fact that the mean time to repair (MTTR) for a system reboot is a magnitude larger than the MTTR for replacing a single running task.
This type of increased granularity on the recovery of individual tasks is precisely what the QNX Neutrino microkernel offers. The architecture of the QNX Neutrino RTOS itself provides so many intrinsic HA features that many QNX Neutrino users take them for granted and often design recoverability into their systems without giving it a second thought.
Let's look briefly at the key features of QNX Neutrino and see how system designers can easily make use of these builtin HA-ready features to build effective HA systems.