Now consider the case where the client can be made to heartbeat so that a HAM will automatically detect when it's unresponsive and will terminate it.
Thread 1 Thread 2 ... ... while true while true do do obtain lock a obtain lock b (compute section1) (compute section1) obtain lock b obtain lock a send heartbeat send heartbeat (compute section2) (compute section2) release lock b release lock a release lock a release lock b done done ... ...
Here the process is expected to send heartbeats to a HAM. By placing the heartbeat call within the inside loop, the deadlock condition is trapped. The HAM notices that the heartbeats have stopped and can then perform recovery.
Let's look at what happens now:
The threads will execute as described earlier, but will eventually deadlock. We'll wait for a reasonable amount of time (a few seconds) until they do end in deadlock. The threads write a simple execution log in /dev/shmem/mutex-deadlock-heartbeat.log. The HAM detects that the threads have stopped heartbeating and terminates the process, after saving its state for postmortem analysis.
Here's the current state of the threads in process 462866 and the state of mutex-deadlock when it missed heartbeats:
pid tid name prio STATE Blocked 462866 1 oot/mutex-deadlock 10r MUTEX 462866-03 #-2147 462866 2 oot/mutex-deadlock 63r RECEIVE 1 462866 3 oot/mutex-deadlock 10r MUTEX 462866-01 #-2147 Entity state from HAM Path : mutex-deadlock Entity Pid : 462866 Num conditions : 1 Condition type : ATTACHEDSELF Stats: HeartBeat Period: 1000000000 HB Low Mark : 5 HB High Mark : 5 Last Heartbeat : 2001/09/03 14:40:41:406575120 HeartBeat State : MISSEDHIGH Created : 2001/09/03 14:40:40:391615720 Num Restarts : 0
And here's the tail from the threads' log file:
Thread 2: Obtained lock b Thread 2: Waiting for lock a Thread 2: Obtained lock a Thread 2: Performing computation Thread 2: Unlocking lock a Thread 2: Unlocking lock b Thread 2: Obtained lock b Thread 2: Waiting for lock a Thread 1: Obtained lock a Thread 1: Waiting for lock b
/tmp/mutex-deadlock.core: processor=ARM num_cpus=2 cpu 1 cpu=602370 name=604e speed=299 flags=0xc0000001 FPU MMU EAR cpu 2 cpu=602370 name=604e speed=299 flags=0xc0000001 FPU MMU EAR cyc/sec=16666666 tod_adj=999522656000000000 nsec=5390696363520 inc=999960 boot=999522656 epoch=1970 intr=-2147483648 rate=600000024 scale=-16 load=16666 MACHINE="mtx604-smp" HOSTNAME="localhost" hwflags=0x000004 pretend_cpu=0 init_msr=36866 pid=462866 parent=434193 child=0 pgrp=462866 sid=1 flags=0x000300 umask=0 base_addr=0x48040000 init_stack=0x4803f9f0 ruid=0 euid=0 suid=0 rgid=0 egid=0 sgid=0 ign=0000000006801000 queue=ff00000000000000 pending=0000000000000000 fds=5 threads=3 timers=1 chans=4 thread 1 REQUESTED ip=0xfe32f838 sp=0x4803f8f0 stkbase=0x47fbf000 stksize=528384 state=MUTEX flags=0 last_cpu=2 timeout=00000000 pri=10 realpri=10 policy=RR thread 2 ip=0xfe32f1a8 sp=0x47fbef50 stkbase=0x47f9e000 stksize=135168 state=RECEIVE flags=4000000 last_cpu=2 timeout=00000000 pri=63 realpri=63 policy=RR blocked_chid=1 thread 3 ip=0xfe32f838 sp=0x47f9df80 stkbase=0x47f7d000 stksize=135168 state=MUTEX flags=4020000 last_cpu=1 timeout=00000000 pri=10 realpri=10 policy=RR