lh | 9ed821d | 2023-04-07 01:36:19 -0700 | [diff] [blame^] | 1 | Using the Linux Kernel Latency Histograms |
| 2 | |
| 3 | |
| 4 | This document gives a short explanation how to enable, configure and use |
| 5 | latency histograms. Latency histograms are primarily relevant in the |
| 6 | context of real-time enabled kernels (CONFIG_PREEMPT/CONFIG_PREEMPT_RT) |
| 7 | and are used in the quality management of the Linux real-time |
| 8 | capabilities. |
| 9 | |
| 10 | |
| 11 | * Purpose of latency histograms |
| 12 | |
| 13 | A latency histogram continuously accumulates the frequencies of latency |
| 14 | data. There are two types of histograms |
| 15 | - potential sources of latencies |
| 16 | - effective latencies |
| 17 | |
| 18 | |
| 19 | * Potential sources of latencies |
| 20 | |
| 21 | Potential sources of latencies are code segments where interrupts, |
| 22 | preemption or both are disabled (aka critical sections). To create |
| 23 | histograms of potential sources of latency, the kernel stores the time |
| 24 | stamp at the start of a critical section, determines the time elapsed |
| 25 | when the end of the section is reached, and increments the frequency |
| 26 | counter of that latency value - irrespective of whether any concurrently |
| 27 | running process is affected by latency or not. |
| 28 | - Configuration items (in the Kernel hacking/Tracers submenu) |
| 29 | CONFIG_INTERRUPT_OFF_LATENCY |
| 30 | CONFIG_PREEMPT_OFF_LATENCY |
| 31 | |
| 32 | |
| 33 | * Effective latencies |
| 34 | |
| 35 | Effective latencies are actually occuring during wakeup of a process. To |
| 36 | determine effective latencies, the kernel stores the time stamp when a |
| 37 | process is scheduled to be woken up, and determines the duration of the |
| 38 | wakeup time shortly before control is passed over to this process. Note |
| 39 | that the apparent latency in user space may be somewhat longer, since the |
| 40 | process may be interrupted after control is passed over to it but before |
| 41 | the execution in user space takes place. Simply measuring the interval |
| 42 | between enqueuing and wakeup may also not appropriate in cases when a |
| 43 | process is scheduled as a result of a timer expiration. The timer may have |
| 44 | missed its deadline, e.g. due to disabled interrupts, but this latency |
| 45 | would not be registered. Therefore, the offsets of missed timers are |
| 46 | recorded in a separate histogram. If both wakeup latency and missed timer |
| 47 | offsets are configured and enabled, a third histogram may be enabled that |
| 48 | records the overall latency as a sum of the timer latency, if any, and the |
| 49 | wakeup latency. This histogram is called "timerandwakeup". |
| 50 | - Configuration items (in the Kernel hacking/Tracers submenu) |
| 51 | CONFIG_WAKEUP_LATENCY |
| 52 | CONFIG_MISSED_TIMER_OFSETS |
| 53 | |
| 54 | |
| 55 | * Usage |
| 56 | |
| 57 | The interface to the administration of the latency histograms is located |
| 58 | in the debugfs file system. To mount it, either enter |
| 59 | |
| 60 | mount -t sysfs nodev /sys |
| 61 | mount -t debugfs nodev /sys/kernel/debug |
| 62 | |
| 63 | from shell command line level, or add |
| 64 | |
| 65 | nodev /sys sysfs defaults 0 0 |
| 66 | nodev /sys/kernel/debug debugfs defaults 0 0 |
| 67 | |
| 68 | to the file /etc/fstab. All latency histogram related files are then |
| 69 | available in the directory /sys/kernel/debug/tracing/latency_hist. A |
| 70 | particular histogram type is enabled by writing non-zero to the related |
| 71 | variable in the /sys/kernel/debug/tracing/latency_hist/enable directory. |
| 72 | Select "preemptirqsoff" for the histograms of potential sources of |
| 73 | latencies and "wakeup" for histograms of effective latencies etc. The |
| 74 | histogram data - one per CPU - are available in the files |
| 75 | |
| 76 | /sys/kernel/debug/tracing/latency_hist/preemptoff/CPUx |
| 77 | /sys/kernel/debug/tracing/latency_hist/irqsoff/CPUx |
| 78 | /sys/kernel/debug/tracing/latency_hist/preemptirqsoff/CPUx |
| 79 | /sys/kernel/debug/tracing/latency_hist/wakeup/CPUx |
| 80 | /sys/kernel/debug/tracing/latency_hist/wakeup/sharedprio/CPUx |
| 81 | /sys/kernel/debug/tracing/latency_hist/missed_timer_offsets/CPUx |
| 82 | /sys/kernel/debug/tracing/latency_hist/timerandwakeup/CPUx |
| 83 | |
| 84 | The histograms are reset by writing non-zero to the file "reset" in a |
| 85 | particular latency directory. To reset all latency data, use |
| 86 | |
| 87 | #!/bin/sh |
| 88 | |
| 89 | TRACINGDIR=/sys/kernel/debug/tracing |
| 90 | HISTDIR=$TRACINGDIR/latency_hist |
| 91 | |
| 92 | if test -d $HISTDIR |
| 93 | then |
| 94 | cd $HISTDIR |
| 95 | for i in `find . | grep /reset$` |
| 96 | do |
| 97 | echo 1 >$i |
| 98 | done |
| 99 | fi |
| 100 | |
| 101 | |
| 102 | * Data format |
| 103 | |
| 104 | Latency data are stored with a resolution of one microsecond. The |
| 105 | maximum latency is 10,240 microseconds. The data are only valid, if the |
| 106 | overflow register is empty. Every output line contains the latency in |
| 107 | microseconds in the first row and the number of samples in the second |
| 108 | row. To display only lines with a positive latency count, use, for |
| 109 | example, |
| 110 | |
| 111 | grep -v " 0$" /sys/kernel/debug/tracing/latency_hist/preemptoff/CPU0 |
| 112 | |
| 113 | #Minimum latency: 0 microseconds. |
| 114 | #Average latency: 0 microseconds. |
| 115 | #Maximum latency: 25 microseconds. |
| 116 | #Total samples: 3104770694 |
| 117 | #There are 0 samples greater or equal than 10240 microseconds |
| 118 | #usecs samples |
| 119 | 0 2984486876 |
| 120 | 1 49843506 |
| 121 | 2 58219047 |
| 122 | 3 5348126 |
| 123 | 4 2187960 |
| 124 | 5 3388262 |
| 125 | 6 959289 |
| 126 | 7 208294 |
| 127 | 8 40420 |
| 128 | 9 4485 |
| 129 | 10 14918 |
| 130 | 11 18340 |
| 131 | 12 25052 |
| 132 | 13 19455 |
| 133 | 14 5602 |
| 134 | 15 969 |
| 135 | 16 47 |
| 136 | 17 18 |
| 137 | 18 14 |
| 138 | 19 1 |
| 139 | 20 3 |
| 140 | 21 2 |
| 141 | 22 5 |
| 142 | 23 2 |
| 143 | 25 1 |
| 144 | |
| 145 | |
| 146 | * Wakeup latency of a selected process |
| 147 | |
| 148 | To only collect wakeup latency data of a particular process, write the |
| 149 | PID of the requested process to |
| 150 | |
| 151 | /sys/kernel/debug/tracing/latency_hist/wakeup/pid |
| 152 | |
| 153 | PIDs are not considered, if this variable is set to 0. |
| 154 | |
| 155 | |
| 156 | * Details of the process with the highest wakeup latency so far |
| 157 | |
| 158 | Selected data of the process that suffered from the highest wakeup |
| 159 | latency that occurred in a particular CPU are available in the file |
| 160 | |
| 161 | /sys/kernel/debug/tracing/latency_hist/wakeup/max_latency-CPUx. |
| 162 | |
| 163 | In addition, other relevant system data at the time when the |
| 164 | latency occurred are given. |
| 165 | |
| 166 | The format of the data is (all in one line): |
| 167 | <PID> <Priority> <Latency> (<Timeroffset>) <Command> \ |
| 168 | <- <PID> <Priority> <Command> <Timestamp> |
| 169 | |
| 170 | The value of <Timeroffset> is only relevant in the combined timer |
| 171 | and wakeup latency recording. In the wakeup recording, it is |
| 172 | always 0, in the missed_timer_offsets recording, it is the same |
| 173 | as <Latency>. |
| 174 | |
| 175 | When retrospectively searching for the origin of a latency and |
| 176 | tracing was not enabled, it may be helpful to know the name and |
| 177 | some basic data of the task that (finally) was switching to the |
| 178 | late real-tlme task. In addition to the victim's data, also the |
| 179 | data of the possible culprit are therefore displayed after the |
| 180 | "<-" symbol. |
| 181 | |
| 182 | Finally, the timestamp of the time when the latency occurred |
| 183 | in <seconds>.<microseconds> after the most recent system boot |
| 184 | is provided. |
| 185 | |
| 186 | These data are also reset when the wakeup histogram is reset. |