The QNX Neutrino High Availability Framework consists of the
following main components:
- QNX Neutrino RTOS
- We're not just trying to be thorough by listing the OS
itself here! And it's first in the list for good reason
— the QNX Neutrino microkernel architecture
inherently provides a
robust environment for building highly reliable
applications. Many of the particular features required in an
HA application — system stability, isolation of
software modules, dynamic upgrading of software components,
etc. — are already included in the OS.
The microkernel provides system-wide stability by
offering full memory protection to all processes. And
there's very little code running in kernel mode that could
cause the microkernel itself to fail. All individual processes,
whether applications or OS services — including device
drivers — can be started and stopped dynamically,
without jeopardizing system uptime.
For more on the suitability of the QNX Neutrino RTOS for HA, see the
next chapter in this guide.
- High Availability Manager (HAM)
- A HAM is a smart watchdog — a
highly resilient manager process that can monitor your
system and perform multistage recovery whenever system
services or processes fail or no longer respond.
As a self-monitoring manager, a HAM is resilient to
internal failures. If, for whatever reason, the HAM itself
is stopped abnormally, it can immediately and completely
reconstruct its own state by handing over to a
mirror process called the Guardian.
For details on the HAM, see the chapter Using the High Availability
Manager in this guide.
- HAM API
- The HAM API library of more than 35 ham_*()
functions gives you a simple mechanism to talk to a HAM.
This API is implemented as a thread-safe library you can
link against.
You use the API to interact with a HAM in order to begin
monitoring processes and to set up the various conditions
(e.g., the death of a server) that will trigger certain
recovery actions.
For descriptions of the functions in the HAM API, see the
HAM API Reference chapter in this guide.
- Client Recovery Library
- The client recovery library provides a drop-in
enhancement solution for many standard libc
I/O operations. The HA library's cover functions provide
automatic recovery mechanisms for failed connections that
can be recovered from in an HA scenario.
For descriptions of the client library functions, see the
Client Recovery Library Reference chapter in this guide.
- Examples
- You'll find several sample code listings (and source)
that illustrate such tasks as restarting, heartbeating, and
more. Since the examples deal with some typical
fault-recovery scenarios, you may be able to easily tailor
this source for your HA applications.
For details, see the Examples appendix in this guide.