Reducing the frequency of guest-issued IPIs can improve the performance of the guest and of the overall system.
The cost of an inter-processor interrupt (IPI) between physical CPUs typically takes less than a microsecond when initiated by an operating system running directly on the hardware. While this overhead isn't extravagant, any excessive use of IPIs may affect system performance.
Just like any OS running running directly on hardware with multiple physical CPUs, a guest OS running in a VM with multiple vCPUs may need to issue IPIs. From the perspective of the OS issuing the IPI, the IPI behavior is exactly the same regardless of whether the OS is running directly on hardware or as a guest in a VM.
However, the time overhead cost of an IPI issued by a guest OS in a VM is an order of magnitude greater that the cost of an IPI issued by an OS running directly on hardware. This cost is similar to the cost of a guest exit–entrance cycle, which typically can take 10 microseconds, sometimes longer.
Since a guest OS runs in a VM and its CPUs are in fact vCPUs (threads in the hosting qvm process instance), when a guest issues an IPI, the IPI source is a vCPU thread, and each IPI target is another vCPU thread.
The relatively high cost of guest-issued IPIs is due to the work required by the hypervisor to prepare and deliver these IPIs; that is, to the work that must be done by software rather than by hardware to deliver the IPI from its source vCPU thread to its target vCPU thread(s).
The hypervisor tasks described below prepare a guest-issued IPI for delivery. They are the same regardless of the board architecture or the state of the target vCPU thread.
When the guest OS issues an IPI, the hypervisor must:
From this point forward, the work required to deliver a guest-issued IPI to a vCPU is the same as for delivering any interrupt to a vCPU, regardless of the interrupt's source. This work differs according to the state of the target vCPU thread, and some boards support posted interrupts, which can reduce overhead.
If the target vCPU isn't executing guest code (the guest is stopped on an HLT or WFI), then the host will see the target vCPU thread as stopped on a semaphore. In this case, assuming that the source vCPU thread prepared the IPI for delivery:
If the target vCPU thread is executing:
However, note the following:
The costliest tasks in the preparation and delivery of a guest-issued IPI to its target vCPU(s) are:
Given the high cost of guest-issued IPIs, even on boards with posted interrupt support, reducing the frequency of guest-issued IPIs can improve both guest and overall system performance. This reduction can often be achieved by managing which CPUs (in fact, vCPUs) a guest application runs on; in this case, by binding the relevant processes in the guest to a single CPU:
With these configurations, the applications will behave as though they are running on single-CPU system and won't issue IPIs that require hypervisor intervention to deliver.
You can bind guest processes to a vCPU just like you bind a process in a non-virtualized system to a physical CPU. From the perspective of the guest, the binding applies to a physical CPU. For a Linux guest, use the taskset command (see your Linux documentation). For a QNX Neutrino OS guest, use the on command with the -C option; for example:
on -C 1 foo
binds the program foo to CPU 1 (from the guest's perspective), which is in fact a qvm process vCPU thread (see the on utility in the QNX SDP Utilities Reference).
In some cases, if you are running a single-threaded application, it may even prove advantageous to run that application in its own guest OS running in a VM on its own dedicated physical CPU.