Quality of Service (QoS) is an issue that often arises in high-availability networks as well as in realtime control systems. In the Qnet context, QoS really boils down to transmission media selection: in a system with two or more network interfaces, Qnet chooses which one to use, according to the policy you specify.
Qnet provides the following policies that let you specify how it should select a network interface for transmission:
The default is loadbalance. Let's look at them in more detail:
If a link does fail, Qnet switches to the next available link. By default, this switch takes a few seconds the first time, because the network driver on the bad link will have timed out, retried, and finally died. But once Qnet knows that a link is down, it doesn't send user data over that link.
The time required to switch to another link can be set to whatever is appropriate for your application using Qnet's command-line options; see the entry for lsm-qnet.so in the Utilities Reference.
Using these options, you can create a redundant behavior by minimizing the latency that occurs when switching to another interface in case one of the interfaces fails.
While load-balancing among the live links, Qnet sends periodic maintenance packets on the failed link in order to detect recovery. When the link recovers, Qnet places it back into the pool of available links.
When your preferred link is available again, Qnet again uses only that link, ignoring all others (unless the preferred link fails).
Why would you want to use the exclusive policy? Suppose you have two networks, one much faster than the other, and you have an application that moves large amounts of data. You might want to restrict transmissions to only the fast network, in order to avoid swamping the slow network if the fast one fails.
You specify the QoS policy as part of the pathname. For example, to access /net/node1/dev/ser1 with a QoS of exclusive, you could use the following pathname:
/net/node1~exclusive:en0/dev/ser1
The QoS parameter always begins with a tilde (~) character. Here we're telling Qnet to lock onto the en0 interface exclusively, even if it fails.
You can set up symbolic links to the various QoS-qualified pathnames:
ln -sP /net/node1~preferred:en1 /remote/sql_server
This assigns an abstracted name of /remote/sql_server to the node node1 with a preferred QoS (i.e., over the en1 link).
Abstracting the pathnames by one level of indirection gives you multiple servers available in a network, all providing the same service. When one server fails, the abstract pathname can be remapped to point to the pathname of a different server. For example, if node1 fails, then a monitoring program could detect this and effectively issue:
rm /remote/sql_server ln -sP /net/node2 /remote/sql_server
This removes node1 and reassigns the service to node2. The real advantage here is that applications can be coded based on the abstract service name rather than be bound to a specific node name.
For a real-world example of choosing appropriate QoS policy in an application, see Designing a system using Qnet.