| xj | b04a402 | 2021-11-25 15:01:52 +0800 | [diff] [blame] | 1 | Generic Mutex Subsystem | 
|  | 2 |  | 
|  | 3 | started by Ingo Molnar <mingo@redhat.com> | 
|  | 4 | updated by Davidlohr Bueso <davidlohr@hp.com> | 
|  | 5 |  | 
|  | 6 | What are mutexes? | 
|  | 7 | ----------------- | 
|  | 8 |  | 
|  | 9 | In the Linux kernel, mutexes refer to a particular locking primitive | 
|  | 10 | that enforces serialization on shared memory systems, and not only to | 
|  | 11 | the generic term referring to 'mutual exclusion' found in academia | 
|  | 12 | or similar theoretical text books. Mutexes are sleeping locks which | 
|  | 13 | behave similarly to binary semaphores, and were introduced in 2006[1] | 
|  | 14 | as an alternative to these. This new data structure provided a number | 
|  | 15 | of advantages, including simpler interfaces, and at that time smaller | 
|  | 16 | code (see Disadvantages). | 
|  | 17 |  | 
|  | 18 | [1] http://lwn.net/Articles/164802/ | 
|  | 19 |  | 
|  | 20 | Implementation | 
|  | 21 | -------------- | 
|  | 22 |  | 
|  | 23 | Mutexes are represented by 'struct mutex', defined in include/linux/mutex.h | 
|  | 24 | and implemented in kernel/locking/mutex.c. These locks use an atomic variable | 
|  | 25 | (->owner) to keep track of the lock state during its lifetime.  Field owner | 
|  | 26 | actually contains 'struct task_struct *' to the current lock owner and it is | 
|  | 27 | therefore NULL if not currently owned. Since task_struct pointers are aligned | 
|  | 28 | at at least L1_CACHE_BYTES, low bits (3) are used to store extra state (e.g., | 
|  | 29 | if waiter list is non-empty).  In its most basic form it also includes a | 
|  | 30 | wait-queue and a spinlock that serializes access to it. Furthermore, | 
|  | 31 | CONFIG_MUTEX_SPIN_ON_OWNER=y systems use a spinner MCS lock (->osq), described | 
|  | 32 | below in (ii). | 
|  | 33 |  | 
|  | 34 | When acquiring a mutex, there are three possible paths that can be | 
|  | 35 | taken, depending on the state of the lock: | 
|  | 36 |  | 
|  | 37 | (i) fastpath: tries to atomically acquire the lock by cmpxchg()ing the owner with | 
|  | 38 | the current task. This only works in the uncontended case (cmpxchg() checks | 
|  | 39 | against 0UL, so all 3 state bits above have to be 0). If the lock is | 
|  | 40 | contended it goes to the next possible path. | 
|  | 41 |  | 
|  | 42 | (ii) midpath: aka optimistic spinning, tries to spin for acquisition | 
|  | 43 | while the lock owner is running and there are no other tasks ready | 
|  | 44 | to run that have higher priority (need_resched). The rationale is | 
|  | 45 | that if the lock owner is running, it is likely to release the lock | 
|  | 46 | soon. The mutex spinners are queued up using MCS lock so that only | 
|  | 47 | one spinner can compete for the mutex. | 
|  | 48 |  | 
|  | 49 | The MCS lock (proposed by Mellor-Crummey and Scott) is a simple spinlock | 
|  | 50 | with the desirable properties of being fair and with each cpu trying | 
|  | 51 | to acquire the lock spinning on a local variable. It avoids expensive | 
|  | 52 | cacheline bouncing that common test-and-set spinlock implementations | 
|  | 53 | incur. An MCS-like lock is specially tailored for optimistic spinning | 
|  | 54 | for sleeping lock implementation. An important feature of the customized | 
|  | 55 | MCS lock is that it has the extra property that spinners are able to exit | 
|  | 56 | the MCS spinlock queue when they need to reschedule. This further helps | 
|  | 57 | avoid situations where MCS spinners that need to reschedule would continue | 
|  | 58 | waiting to spin on mutex owner, only to go directly to slowpath upon | 
|  | 59 | obtaining the MCS lock. | 
|  | 60 |  | 
|  | 61 |  | 
|  | 62 | (iii) slowpath: last resort, if the lock is still unable to be acquired, | 
|  | 63 | the task is added to the wait-queue and sleeps until woken up by the | 
|  | 64 | unlock path. Under normal circumstances it blocks as TASK_UNINTERRUPTIBLE. | 
|  | 65 |  | 
|  | 66 | While formally kernel mutexes are sleepable locks, it is path (ii) that | 
|  | 67 | makes them more practically a hybrid type. By simply not interrupting a | 
|  | 68 | task and busy-waiting for a few cycles instead of immediately sleeping, | 
|  | 69 | the performance of this lock has been seen to significantly improve a | 
|  | 70 | number of workloads. Note that this technique is also used for rw-semaphores. | 
|  | 71 |  | 
|  | 72 | Semantics | 
|  | 73 | --------- | 
|  | 74 |  | 
|  | 75 | The mutex subsystem checks and enforces the following rules: | 
|  | 76 |  | 
|  | 77 | - Only one task can hold the mutex at a time. | 
|  | 78 | - Only the owner can unlock the mutex. | 
|  | 79 | - Multiple unlocks are not permitted. | 
|  | 80 | - Recursive locking/unlocking is not permitted. | 
|  | 81 | - A mutex must only be initialized via the API (see below). | 
|  | 82 | - A task may not exit with a mutex held. | 
|  | 83 | - Memory areas where held locks reside must not be freed. | 
|  | 84 | - Held mutexes must not be reinitialized. | 
|  | 85 | - Mutexes may not be used in hardware or software interrupt | 
|  | 86 | contexts such as tasklets and timers. | 
|  | 87 |  | 
|  | 88 | These semantics are fully enforced when CONFIG DEBUG_MUTEXES is enabled. | 
|  | 89 | In addition, the mutex debugging code also implements a number of other | 
|  | 90 | features that make lock debugging easier and faster: | 
|  | 91 |  | 
|  | 92 | - Uses symbolic names of mutexes, whenever they are printed | 
|  | 93 | in debug output. | 
|  | 94 | - Point-of-acquire tracking, symbolic lookup of function names, | 
|  | 95 | list of all locks held in the system, printout of them. | 
|  | 96 | - Owner tracking. | 
|  | 97 | - Detects self-recursing locks and prints out all relevant info. | 
|  | 98 | - Detects multi-task circular deadlocks and prints out all affected | 
|  | 99 | locks and tasks (and only those tasks). | 
|  | 100 |  | 
|  | 101 |  | 
|  | 102 | Interfaces | 
|  | 103 | ---------- | 
|  | 104 | Statically define the mutex: | 
|  | 105 | DEFINE_MUTEX(name); | 
|  | 106 |  | 
|  | 107 | Dynamically initialize the mutex: | 
|  | 108 | mutex_init(mutex); | 
|  | 109 |  | 
|  | 110 | Acquire the mutex, uninterruptible: | 
|  | 111 | void mutex_lock(struct mutex *lock); | 
|  | 112 | void mutex_lock_nested(struct mutex *lock, unsigned int subclass); | 
|  | 113 | int  mutex_trylock(struct mutex *lock); | 
|  | 114 |  | 
|  | 115 | Acquire the mutex, interruptible: | 
|  | 116 | int mutex_lock_interruptible_nested(struct mutex *lock, | 
|  | 117 | unsigned int subclass); | 
|  | 118 | int mutex_lock_interruptible(struct mutex *lock); | 
|  | 119 |  | 
|  | 120 | Acquire the mutex, interruptible, if dec to 0: | 
|  | 121 | int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock); | 
|  | 122 |  | 
|  | 123 | Unlock the mutex: | 
|  | 124 | void mutex_unlock(struct mutex *lock); | 
|  | 125 |  | 
|  | 126 | Test if the mutex is taken: | 
|  | 127 | int mutex_is_locked(struct mutex *lock); | 
|  | 128 |  | 
|  | 129 | Disadvantages | 
|  | 130 | ------------- | 
|  | 131 |  | 
|  | 132 | Unlike its original design and purpose, 'struct mutex' is among the largest | 
|  | 133 | locks in the kernel. E.g: on x86-64 it is 32 bytes, where 'struct semaphore' | 
|  | 134 | is 24 bytes and rw_semaphore is 40 bytes. Larger structure sizes mean more CPU | 
|  | 135 | cache and memory footprint. | 
|  | 136 |  | 
|  | 137 | When to use mutexes | 
|  | 138 | ------------------- | 
|  | 139 |  | 
|  | 140 | Unless the strict semantics of mutexes are unsuitable and/or the critical | 
|  | 141 | region prevents the lock from being shared, always prefer them to any other | 
|  | 142 | locking primitive. |