One of the driving changes behind the embeddability differences between QNX 4 and QNX Neutrino is the fact that QNX Neutrino supports ARM processors. Whereas QNX 4 was initially at home on an IBM PC with a BIOS and very standard hardware, QNX Neutrino is equally at home on multiple processor platforms with or without a BIOS (or ROM monitor), and with customized hardware chosen by the manufacturer (often, it would appear, without regard for the requirements of the OS). This means that the microkernel had to have provision for callouts, so you could, for example, decide what kind of interrupt controller hardware you had, and, without having to buy a source license for the operating system, run on that hardware.
A bunch of other changes you'll notice when you port QNX 4 applications to QNX Neutrino, especially on these different processor platforms, is that they're fussy about alignment issues. You can't access an N-byte object on anything other than an N-byte multiple of an address. Under the x86 (with the alignment flag turned off), you could access memory willy-nilly. By modifying your code to have properly aligned structures (for non-x86 processors), you'll also find that your code runs faster on x86, because the x86 processor can access aligned data faster.
Another thing that often comes to haunt people is the issue of big-endian versus little-endian. The x86 processor is a mono-endian processor (meaning it has only one endian-ness), and that's little-endian. ARM, for example, are bi-endian processors (meaning that the processor can operate in either big-endian or little-endian mode). Furthermore, these non-x86 processors are RISC (Reduced Instruction Set CPU) machines, meaning that certain operations, such as a simple C language |= (bitwise set operation) may or may not be performed in an atomic manner. This can have startling consequences! Look at the file <atomic.h> for a list of helper functions that ensure atomic operation.