Blame - src/kernel/linux/v4.14/tools/perf/Documentation/perf-c2c.txt - T103

blob: 82241423517051b1668564710063cd1797bcf327 [file] [log] [blame]

rjw	1f88458	2022-01-06 17:20:42 +0800	[diff] [blame]	1	perf-c2c(1)
				2	===========
				3
				4	NAME
				5	----
				6	perf-c2c - Shared Data C2C/HITM Analyzer.
				7
				8	SYNOPSIS
				9	--------
				10	[verse]
				11	'perf c2c record' [<options>] <command>
				12	'perf c2c record' [<options>] -- [<record command options>] <command>
				13	'perf c2c report' [<options>]
				14
				15	DESCRIPTION
				16	-----------
				17	C2C stands for Cache To Cache.
				18
				19	The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows
				20	you to track down the cacheline contentions.
				21
				22	The tool is based on x86's load latency and precise store facility events
				23	provided by Intel CPUs. These events provide:
				24	- memory address of the access
				25	- type of the access (load and store details)
				26	- latency (in cycles) of the load access
				27
				28	The c2c tool provide means to record this data and report back access details
				29	for cachelines with highest contention - highest number of HITM accesses.
				30
				31	The basic workflow with this tool follows the standard record/report phase.
				32	User uses the record command to record events data and report command to
				33	display it.
				34
				35
				36	RECORD OPTIONS
				37	--------------
				38	-e::
				39	--event=::
				40	Select the PMU event. Use 'perf mem record -e list'
				41	to list available events.
				42
				43	-v::
				44	--verbose::
				45	Be more verbose (show counter open errors, etc).
				46
				47	-l::
				48	--ldlat::
				49	Configure mem-loads latency.
				50
				51	-k::
				52	--all-kernel::
				53	Configure all used events to run in kernel space.
				54
				55	-u::
				56	--all-user::
				57	Configure all used events to run in user space.
				58
				59	REPORT OPTIONS
				60	--------------
				61	-k::
				62	--vmlinux=<file>::
				63	vmlinux pathname
				64
				65	-v::
				66	--verbose::
				67	Be more verbose (show counter open errors, etc).
				68
				69	-i::
				70	--input::
				71	Specify the input file to process.
				72
				73	-N::
				74	--node-info::
				75	Show extra node info in report (see NODE INFO section)
				76
				77	-c::
				78	--coalesce::
				79	Specify sorting fields for single cacheline display.
				80	Following fields are available: tid,pid,iaddr,dso
				81	(see COALESCE)
				82
				83	-g::
				84	--call-graph::
				85	Setup callchains parameters.
				86	Please refer to perf-report man page for details.
				87
				88	--stdio::
				89	Force the stdio output (see STDIO OUTPUT)
				90
				91	--stats::
				92	Display only statistic tables and force stdio mode.
				93
				94	--full-symbols::
				95	Display full length of symbols.
				96
				97	--no-source::
				98	Do not display Source:Line column.
				99
				100	--show-all::
				101	Show all captured HITM lines, with no regard to HITM % 0.0005 limit.
				102
				103	-f::
				104	--force::
				105	Don't do ownership validation.
				106
				107	-d::
				108	--display::
				109	Switch to HITM type (rmt, lcl) to display and sort on. Total HITMs as default.
				110
				111	C2C RECORD
				112	----------
				113	The perf c2c record command setup options related to HITM cacheline analysis
				114	and calls standard perf record command.
				115
				116	Following perf record options are configured by default:
				117	(check perf record man page for details)
				118
				119	-W,-d,--sample-cpu
				120
				121	Unless specified otherwise with '-e' option, following events are monitored by
				122	default:
				123
				124	cpu/mem-loads,ldlat=30/P
				125	cpu/mem-stores/P
				126
				127	User can pass any 'perf record' option behind '--' mark, like (to enable
				128	callchains and system wide monitoring):
				129
				130	$ perf c2c record -- -g -a
				131
				132	Please check RECORD OPTIONS section for specific c2c record options.
				133
				134	C2C REPORT
				135	----------
				136	The perf c2c report command displays shared data analysis. It comes in two
				137	display modes: stdio and tui (default).
				138
				139	The report command workflow is following:
				140	- sort all the data based on the cacheline address
				141	- store access details for each cacheline
				142	- sort all cachelines based on user settings
				143	- display data
				144
				145	In general perf report output consist of 2 basic views:
				146	1) most expensive cachelines list
				147	2) offsets details for each cacheline
				148
				149	For each cacheline in the 1) list we display following data:
				150	(Both stdio and TUI modes follow the same fields output)
				151
				152	Index
				153	- zero based index to identify the cacheline
				154
				155	Cacheline
				156	- cacheline address (hex number)
				157
				158	Total records
				159	- sum of all cachelines accesses
				160
				161	Rmt/Lcl Hitm
				162	- cacheline percentage of all Remote/Local HITM accesses
				163
				164	LLC Load Hitm - Total, Lcl, Rmt
				165	- count of Total/Local/Remote load HITMs
				166
				167	Store Reference - Total, L1Hit, L1Miss
				168	Total - all store accesses
				169	L1Hit - store accesses that hit L1
				170	L1Hit - store accesses that missed L1
				171
				172	Load Dram
				173	- count of local and remote DRAM accesses
				174
				175	LLC Ld Miss
				176	- count of all accesses that missed LLC
				177
				178	Total Loads
				179	- sum of all load accesses
				180
				181	Core Load Hit - FB, L1, L2
				182	- count of load hits in FB (Fill Buffer), L1 and L2 cache
				183
				184	LLC Load Hit - Llc, Rmt
				185	- count of LLC and Remote load hits
				186
				187	For each offset in the 2) list we display following data:
				188
				189	HITM - Rmt, Lcl
				190	- % of Remote/Local HITM accesses for given offset within cacheline
				191
				192	Store Refs - L1 Hit, L1 Miss
				193	- % of store accesses that hit/missed L1 for given offset within cacheline
				194
				195	Data address - Offset
				196	- offset address
				197
				198	Pid
				199	- pid of the process responsible for the accesses
				200
				201	Tid
				202	- tid of the process responsible for the accesses
				203
				204	Code address
				205	- code address responsible for the accesses
				206
				207	cycles - rmt hitm, lcl hitm, load
				208	- sum of cycles for given accesses - Remote/Local HITM and generic load
				209
				210	cpu cnt
				211	- number of cpus that participated on the access
				212
				213	Symbol
				214	- code symbol related to the 'Code address' value
				215
				216	Shared Object
				217	- shared object name related to the 'Code address' value
				218
				219	Source:Line
				220	- source information related to the 'Code address' value
				221
				222	Node
				223	- nodes participating on the access (see NODE INFO section)
				224
				225	NODE INFO
				226	---------
				227	The 'Node' field displays nodes that accesses given cacheline
				228	offset. Its output comes in 3 flavors:
				229	- node IDs separated by ','
				230	- node IDs with stats for each ID, in following format:
				231	Node{cpus %hitms %stores}
				232	- node IDs with list of affected CPUs in following format:
				233	Node{cpu list}
				234
				235	User can switch between above flavors with -N option or
				236	use 'n' key to interactively switch in TUI mode.
				237
				238	COALESCE
				239	--------
				240	User can specify how to sort offsets for cacheline.
				241
				242	Following fields are available and governs the final
				243	output fields set for caheline offsets output:
				244
				245	tid - coalesced by process TIDs
				246	pid - coalesced by process PIDs
				247	iaddr - coalesced by code address, following fields are displayed:
				248	Code address, Code symbol, Shared Object, Source line
				249	dso - coalesced by shared object
				250
				251	By default the coalescing is setup with 'pid,iaddr'.
				252
				253	STDIO OUTPUT
				254	------------
				255	The stdio output displays data on standard output.
				256
				257	Following tables are displayed:
				258	Trace Event Information
				259	- overall statistics of memory accesses
				260
				261	Global Shared Cache Line Event Information
				262	- overall statistics on shared cachelines
				263
				264	Shared Data Cache Line Table
				265	- list of most expensive cachelines
				266
				267	Shared Cache Line Distribution Pareto
				268	- list of all accessed offsets for each cacheline
				269
				270	TUI OUTPUT
				271	----------
				272	The TUI output provides interactive interface to navigate
				273	through cachelines list and to display offset details.
				274
				275	For details please refer to the help window by pressing '?' key.
				276
				277	CREDITS
				278	-------
				279	Although Don Zickus, Dick Fowles and Joe Mario worked together
				280	to get this implemented, we got lots of early help from Arnaldo
				281	Carvalho de Melo, Stephane Eranian, Jiri Olsa and Andi Kleen.
				282
				283	C2C BLOG
				284	--------
				285	Check Joe's blog on c2c tool for detailed use case explanation:
				286	https://joemario.github.io/blog/2016/09/01/c2c-blog/
				287
				288	SEE ALSO
				289	--------
				290	linkperf:perf-record[1], linkperf:perf-mem[1]