rjw | 1f88458 | 2022-01-06 17:20:42 +0800 | [diff] [blame^] | 1 | perf-trace(1) |
| 2 | ============= |
| 3 | |
| 4 | NAME |
| 5 | ---- |
| 6 | perf-trace - strace inspired tool |
| 7 | |
| 8 | SYNOPSIS |
| 9 | -------- |
| 10 | [verse] |
| 11 | 'perf trace' |
| 12 | 'perf trace record' |
| 13 | |
| 14 | DESCRIPTION |
| 15 | ----------- |
| 16 | This command will show the events associated with the target, initially |
| 17 | syscalls, but other system events like pagefaults, task lifetime events, |
| 18 | scheduling events, etc. |
| 19 | |
| 20 | This is a live mode tool in addition to working with perf.data files like |
| 21 | the other perf tools. Files can be generated using the 'perf record' command |
| 22 | but the session needs to include the raw_syscalls events (-e 'raw_syscalls:*'). |
| 23 | Alternatively, 'perf trace record' can be used as a shortcut to |
| 24 | automatically include the raw_syscalls events when writing events to a file. |
| 25 | |
| 26 | The following options apply to perf trace; options to perf trace record are |
| 27 | found in the perf record man page. |
| 28 | |
| 29 | OPTIONS |
| 30 | ------- |
| 31 | |
| 32 | -a:: |
| 33 | --all-cpus:: |
| 34 | System-wide collection from all CPUs. |
| 35 | |
| 36 | -e:: |
| 37 | --expr:: |
| 38 | --event:: |
| 39 | List of syscalls and other perf events (tracepoints, HW cache events, |
| 40 | etc) to show. Globbing is supported, e.g.: "epoll_*", "*msg*", etc. |
| 41 | See 'perf list' for a complete list of events. |
| 42 | Prefixing with ! shows all syscalls but the ones specified. You may |
| 43 | need to escape it. |
| 44 | |
| 45 | -D msecs:: |
| 46 | --delay msecs:: |
| 47 | After starting the program, wait msecs before measuring. This is useful to |
| 48 | filter out the startup phase of the program, which is often very different. |
| 49 | |
| 50 | -o:: |
| 51 | --output=:: |
| 52 | Output file name. |
| 53 | |
| 54 | -p:: |
| 55 | --pid=:: |
| 56 | Record events on existing process ID (comma separated list). |
| 57 | |
| 58 | -t:: |
| 59 | --tid=:: |
| 60 | Record events on existing thread ID (comma separated list). |
| 61 | |
| 62 | -u:: |
| 63 | --uid=:: |
| 64 | Record events in threads owned by uid. Name or number. |
| 65 | |
| 66 | --filter-pids=:: |
| 67 | Filter out events for these pids and for 'trace' itself (comma separated list). |
| 68 | |
| 69 | -v:: |
| 70 | --verbose=:: |
| 71 | Verbosity level. |
| 72 | |
| 73 | --no-inherit:: |
| 74 | Child tasks do not inherit counters. |
| 75 | |
| 76 | -m:: |
| 77 | --mmap-pages=:: |
| 78 | Number of mmap data pages (must be a power of two) or size |
| 79 | specification with appended unit character - B/K/M/G. The |
| 80 | size is rounded up to have nearest pages power of two value. |
| 81 | |
| 82 | -C:: |
| 83 | --cpu:: |
| 84 | Collect samples only on the list of CPUs provided. Multiple CPUs can be provided as a |
| 85 | comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2. |
| 86 | In per-thread mode with inheritance mode on (default), Events are captured only when |
| 87 | the thread executes on the designated CPUs. Default is to monitor all CPUs. |
| 88 | |
| 89 | --duration: |
| 90 | Show only events that had a duration greater than N.M ms. |
| 91 | |
| 92 | --sched: |
| 93 | Accrue thread runtime and provide a summary at the end of the session. |
| 94 | |
| 95 | -i |
| 96 | --input |
| 97 | Process events from a given perf data file. |
| 98 | |
| 99 | -T |
| 100 | --time |
| 101 | Print full timestamp rather time relative to first sample. |
| 102 | |
| 103 | --comm:: |
| 104 | Show process COMM right beside its ID, on by default, disable with --no-comm. |
| 105 | |
| 106 | -s:: |
| 107 | --summary:: |
| 108 | Show only a summary of syscalls by thread with min, max, and average times |
| 109 | (in msec) and relative stddev. |
| 110 | |
| 111 | -S:: |
| 112 | --with-summary:: |
| 113 | Show all syscalls followed by a summary by thread with min, max, and |
| 114 | average times (in msec) and relative stddev. |
| 115 | |
| 116 | --tool_stats:: |
| 117 | Show tool stats such as number of times fd->pathname was discovered thru |
| 118 | hooking the open syscall return + vfs_getname or via reading /proc/pid/fd, etc. |
| 119 | |
| 120 | -F=[all|min|maj]:: |
| 121 | --pf=[all|min|maj]:: |
| 122 | Trace pagefaults. Optionally, you can specify whether you want minor, |
| 123 | major or all pagefaults. Default value is maj. |
| 124 | |
| 125 | --syscalls:: |
| 126 | Trace system calls. This options is enabled by default, disable with |
| 127 | --no-syscalls. |
| 128 | |
| 129 | --call-graph [mode,type,min[,limit],order[,key][,branch]]:: |
| 130 | Setup and enable call-graph (stack chain/backtrace) recording. |
| 131 | See `--call-graph` section in perf-record and perf-report |
| 132 | man pages for details. The ones that are most useful in 'perf trace' |
| 133 | are 'dwarf' and 'lbr', where available, try: 'perf trace --call-graph dwarf'. |
| 134 | |
| 135 | Using this will, for the root user, bump the value of --mmap-pages to 4 |
| 136 | times the maximum for non-root users, based on the kernel.perf_event_mlock_kb |
| 137 | sysctl. This is done only if the user doesn't specify a --mmap-pages value. |
| 138 | |
| 139 | --kernel-syscall-graph:: |
| 140 | Show the kernel callchains on the syscall exit path. |
| 141 | |
| 142 | --max-stack:: |
| 143 | Set the stack depth limit when parsing the callchain, anything |
| 144 | beyond the specified depth will be ignored. Note that at this point |
| 145 | this is just about the presentation part, i.e. the kernel is still |
| 146 | not limiting, the overhead of callchains needs to be set via the |
| 147 | knobs in --call-graph dwarf. |
| 148 | |
| 149 | Implies '--call-graph dwarf' when --call-graph not present on the |
| 150 | command line, on systems where DWARF unwinding was built in. |
| 151 | |
| 152 | Default: /proc/sys/kernel/perf_event_max_stack when present for |
| 153 | live sessions (without --input/-i), 127 otherwise. |
| 154 | |
| 155 | --min-stack:: |
| 156 | Set the stack depth limit when parsing the callchain, anything |
| 157 | below the specified depth will be ignored. Disabled by default. |
| 158 | |
| 159 | Implies '--call-graph dwarf' when --call-graph not present on the |
| 160 | command line, on systems where DWARF unwinding was built in. |
| 161 | |
| 162 | --proc-map-timeout:: |
| 163 | When processing pre-existing threads /proc/XXX/mmap, it may take a long time, |
| 164 | because the file may be huge. A time out is needed in such cases. |
| 165 | This option sets the time out limit. The default value is 500 ms. |
| 166 | |
| 167 | PAGEFAULTS |
| 168 | ---------- |
| 169 | |
| 170 | When tracing pagefaults, the format of the trace is as follows: |
| 171 | |
| 172 | <min|maj>fault [<ip.symbol>+<ip.offset>] => <addr.dso@addr.offset> (<map type><addr level>). |
| 173 | |
| 174 | - min/maj indicates whether fault event is minor or major; |
| 175 | - ip.symbol shows symbol for instruction pointer (the code that generated the |
| 176 | fault); if no debug symbols available, perf trace will print raw IP; |
| 177 | - addr.dso shows DSO for the faulted address; |
| 178 | - map type is either 'd' for non-executable maps or 'x' for executable maps; |
| 179 | - addr level is either 'k' for kernel dso or '.' for user dso. |
| 180 | |
| 181 | For symbols resolution you may need to install debugging symbols. |
| 182 | |
| 183 | Please be aware that duration is currently always 0 and doesn't reflect actual |
| 184 | time it took for fault to be handled! |
| 185 | |
| 186 | When --verbose specified, perf trace tries to print all available information |
| 187 | for both IP and fault address in the form of dso@symbol+offset. |
| 188 | |
| 189 | EXAMPLES |
| 190 | -------- |
| 191 | |
| 192 | Trace only major pagefaults: |
| 193 | |
| 194 | $ perf trace --no-syscalls -F |
| 195 | |
| 196 | Trace syscalls, major and minor pagefaults: |
| 197 | |
| 198 | $ perf trace -F all |
| 199 | |
| 200 | 1416.547 ( 0.000 ms): python/20235 majfault [CRYPTO_push_info_+0x0] => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0@0x61be0 (x.) |
| 201 | |
| 202 | As you can see, there was major pagefault in python process, from |
| 203 | CRYPTO_push_info_ routine which faulted somewhere in libcrypto.so. |
| 204 | |
| 205 | SEE ALSO |
| 206 | -------- |
| 207 | linkperf:perf-record[1], linkperf:perf-script[1] |