Glossary
- action
- A specific task the HAM will perform under certain
associated conditions. Examples of actions
include executing an external process, restarting a process
that has died, sending a signal or pulse notification, etc.
- availability
- The ability of a system to provide its intended service
without interruption for extended periods of time.
- clustering
- A method of distributing processing among several
computers in order to reduce the number of SPOFs.
QNX Neutrino native networking offers transparent network-wide
processing, which facilitates building clustered HA
applications.
- condition
- An event that will trigger certain actions
for the HAM to perform. Examples of conditions include
the death of entity, a missed heartbeat, etc.
- entity
- A process that the HAM will monitor. Entities can
explicitly ask to be monitored (i.e., as
self-attached entities), or they may be monitored
without ever realizing it.
- five nines
- The celebrated availability metric that
refers to a system's ability to remain up and running
99.999% of the time per year.
- Guardian
- The HAM's clone, a stand-in process that
the HAM creates to ensure uninterrupted HA management within
the QNX Neutrino environment.
- HAM
- High Availability Manager.
- heartbeat
- A wellness or liveness
notification sent at specific intervals by a client to the
HAM.
- hot swap
- The ability to remove or insert a component in a live system.
- MMU
- Memory Management Unit. A device on many CPUs that
alerts the OS if a process tries to access memory that's
been allocated to another process.
- MTTF
- Mean Time To Failure. This is the average length of time
that the system will remain in service before failing. You
want this to be as long as possible.
- MTTR
- Mean Time To Repair. This is the amount of time it takes
for the system to resume operation after any component fails
or is upgraded. You want this to be as small as possible.
- SPOF
- Single point of failure. Any particular weak
link in a system would be considered a SPOF, because
its demise would put the entire system at risk.
- watchdog
- A trusted piece of hardware whose main purpose is to
trigger code that will check the sanity of the system. There
are software watchdogs as well; the HAM may be considered a
smart watchdog.