yuezonghe | 824eb0c | 2024-06-27 02:32:26 -0700 | [diff] [blame^] | 1 | Introduction: |
| 2 | ------------- |
| 3 | |
| 4 | The module hwlat_detector is a special purpose kernel module that is used to |
| 5 | detect large system latencies induced by the behavior of certain underlying |
| 6 | hardware or firmware, independent of Linux itself. The code was developed |
| 7 | originally to detect SMIs (System Management Interrupts) on x86 systems, |
| 8 | however there is nothing x86 specific about this patchset. It was |
| 9 | originally written for use by the "RT" patch since the Real Time |
| 10 | kernel is highly latency sensitive. |
| 11 | |
| 12 | SMIs are usually not serviced by the Linux kernel, which typically does not |
| 13 | even know that they are occuring. SMIs are instead are set up by BIOS code |
| 14 | and are serviced by BIOS code, usually for "critical" events such as |
| 15 | management of thermal sensors and fans. Sometimes though, SMIs are used for |
| 16 | other tasks and those tasks can spend an inordinate amount of time in the |
| 17 | handler (sometimes measured in milliseconds). Obviously this is a problem if |
| 18 | you are trying to keep event service latencies down in the microsecond range. |
| 19 | |
| 20 | The hardware latency detector works by hogging all of the cpus for configurable |
| 21 | amounts of time (by calling stop_machine()), polling the CPU Time Stamp Counter |
| 22 | for some period, then looking for gaps in the TSC data. Any gap indicates a |
| 23 | time when the polling was interrupted and since the machine is stopped and |
| 24 | interrupts turned off the only thing that could do that would be an SMI. |
| 25 | |
| 26 | Note that the SMI detector should *NEVER* be used in a production environment. |
| 27 | It is intended to be run manually to determine if the hardware platform has a |
| 28 | problem with long system firmware service routines. |
| 29 | |
| 30 | Usage: |
| 31 | ------ |
| 32 | |
| 33 | Loading the module hwlat_detector passing the parameter "enabled=1" (or by |
| 34 | setting the "enable" entry in "hwlat_detector" debugfs toggled on) is the only |
| 35 | step required to start the hwlat_detector. It is possible to redefine the |
| 36 | threshold in microseconds (us) above which latency spikes will be taken |
| 37 | into account (parameter "threshold="). |
| 38 | |
| 39 | Example: |
| 40 | |
| 41 | # modprobe hwlat_detector enabled=1 threshold=100 |
| 42 | |
| 43 | After the module is loaded, it creates a directory named "hwlat_detector" under |
| 44 | the debugfs mountpoint, "/debug/hwlat_detector" for this text. It is necessary |
| 45 | to have debugfs mounted, which might be on /sys/debug on your system. |
| 46 | |
| 47 | The /debug/hwlat_detector interface contains the following files: |
| 48 | |
| 49 | count - number of latency spikes observed since last reset |
| 50 | enable - a global enable/disable toggle (0/1), resets count |
| 51 | max - maximum hardware latency actually observed (usecs) |
| 52 | sample - a pipe from which to read current raw sample data |
| 53 | in the format <timestamp> <latency observed usecs> |
| 54 | (can be opened O_NONBLOCK for a single sample) |
| 55 | threshold - minimum latency value to be considered (usecs) |
| 56 | width - time period to sample with CPUs held (usecs) |
| 57 | must be less than the total window size (enforced) |
| 58 | window - total period of sampling, width being inside (usecs) |
| 59 | |
| 60 | By default we will set width to 500,000 and window to 1,000,000, meaning that |
| 61 | we will sample every 1,000,000 usecs (1s) for 500,000 usecs (0.5s). If we |
| 62 | observe any latencies that exceed the threshold (initially 100 usecs), |
| 63 | then we write to a global sample ring buffer of 8K samples, which is |
| 64 | consumed by reading from the "sample" (pipe) debugfs file interface. |