So who decides which thread is going to run at any given instant in time? That's the kernel's job.
The kernel determines which thread should be using the CPU at a particular moment, and switches context to that thread. Let's examine what the kernel does with the CPU.
The CPU has a number of registers (the exact number depends on the processor family, e.g., x86 versus ARM, and the specific family member, e.g., Pentium). When the thread is running, information is stored in those registers (e.g., the current program location).
When the kernel decides that another thread should run, it needs to:
But how does the kernel decide that another thread should run? It looks at whether or not a particular thread is capable of using the CPU at this point. When we talked about mutexes, for example, we introduced a blocking state (this occurred when one thread owned the mutex, and another thread wanted to acquire it as well; the second thread would be blocked).
From the kernel's perspective, therefore, we have one thread that can consume CPU, and one that can't, because it's blocked, waiting for a mutex. In this case, the kernel lets the thread that can run consume CPU, and puts the other thread into an internal list (so that the kernel can track its request for the mutex).
Obviously, that's not a very interesting situation. Suppose that a number of threads can use the CPU. Remember that we delegated access to the mutex based on priority and length of wait? The kernel uses a similar scheme to determine which thread is going to run next. There are two factors: priority and scheduling policy, evaluated in that order.