Blame - src/kernel/linux/v4.19/Documentation/this_cpu_ops.txt - T800

blob: 5cb8b883ae83221ace183c463d0232ea9d556980 [file] [log] [blame]

xj	b04a402	2021-11-25 15:01:52 +0800	[diff] [blame]	1	===================
				2	this_cpu operations
				3	===================
				4
				5	:Author: Christoph Lameter, August 4th, 2014
				6	:Author: Pranith Kumar, Aug 2nd, 2014
				7
				8	this_cpu operations are a way of optimizing access to per cpu
				9	variables associated with the currently executing processor. This is
				10	done through the use of segment registers (or a dedicated register where
				11	the cpu permanently stored the beginning of the per cpu area for a
				12	specific processor).
				13
				14	this_cpu operations add a per cpu variable offset to the processor
				15	specific per cpu base and encode that operation in the instruction
				16	operating on the per cpu variable.
				17
				18	This means that there are no atomicity issues between the calculation of
				19	the offset and the operation on the data. Therefore it is not
				20	necessary to disable preemption or interrupts to ensure that the
				21	processor is not changed between the calculation of the address and
				22	the operation on the data.
				23
				24	Read-modify-write operations are of particular interest. Frequently
				25	processors have special lower latency instructions that can operate
				26	without the typical synchronization overhead, but still provide some
				27	sort of relaxed atomicity guarantees. The x86, for example, can execute
				28	RMW (Read Modify Write) instructions like inc/dec/cmpxchg without the
				29	lock prefix and the associated latency penalty.
				30
				31	Access to the variable without the lock prefix is not synchronized but
				32	synchronization is not necessary since we are dealing with per cpu
				33	data specific to the currently executing processor. Only the current
				34	processor should be accessing that variable and therefore there are no
				35	concurrency issues with other processors in the system.
				36
				37	Please note that accesses by remote processors to a per cpu area are
				38	exceptional situations and may impact performance and/or correctness
				39	(remote write operations) of local RMW operations via this_cpu_*.
				40
				41	The main use of the this_cpu operations has been to optimize counter
				42	operations.
				43
				44	The following this_cpu() operations with implied preemption protection
				45	are defined. These operations can be used without worrying about
				46	preemption and interrupts::
				47
				48	this_cpu_read(pcp)
				49	this_cpu_write(pcp, val)
				50	this_cpu_add(pcp, val)
				51	this_cpu_and(pcp, val)
				52	this_cpu_or(pcp, val)
				53	this_cpu_add_return(pcp, val)
				54	this_cpu_xchg(pcp, nval)
				55	this_cpu_cmpxchg(pcp, oval, nval)
				56	this_cpu_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2)
				57	this_cpu_sub(pcp, val)
				58	this_cpu_inc(pcp)
				59	this_cpu_dec(pcp)
				60	this_cpu_sub_return(pcp, val)
				61	this_cpu_inc_return(pcp)
				62	this_cpu_dec_return(pcp)
				63
				64
				65	Inner working of this_cpu operations
				66	------------------------------------
				67
				68	On x86 the fs: or the gs: segment registers contain the base of the
				69	per cpu area. It is then possible to simply use the segment override
				70	to relocate a per cpu relative address to the proper per cpu area for
				71	the processor. So the relocation to the per cpu base is encoded in the
				72	instruction via a segment register prefix.
				73
				74	For example::
				75
				76	DEFINE_PER_CPU(int, x);
				77	int z;
				78
				79	z = this_cpu_read(x);
				80
				81	results in a single instruction::
				82
				83	mov ax, gs:[x]
				84
				85	instead of a sequence of calculation of the address and then a fetch
				86	from that address which occurs with the per cpu operations. Before
				87	this_cpu_ops such sequence also required preempt disable/enable to
				88	prevent the kernel from moving the thread to a different processor
				89	while the calculation is performed.
				90
				91	Consider the following this_cpu operation::
				92
				93	this_cpu_inc(x)
				94
				95	The above results in the following single instruction (no lock prefix!)::
				96
				97	inc gs:[x]
				98
				99	instead of the following operations required if there is no segment
				100	register::
				101
				102	int *y;
				103	int cpu;
				104
				105	cpu = get_cpu();
				106	y = per_cpu_ptr(&x, cpu);
				107	(*y)++;
				108	put_cpu();
				109
				110	Note that these operations can only be used on per cpu data that is
				111	reserved for a specific processor. Without disabling preemption in the
				112	surrounding code this_cpu_inc() will only guarantee that one of the
				113	per cpu counters is correctly incremented. However, there is no
				114	guarantee that the OS will not move the process directly before or
				115	after the this_cpu instruction is executed. In general this means that
				116	the value of the individual counters for each processor are
				117	meaningless. The sum of all the per cpu counters is the only value
				118	that is of interest.
				119
				120	Per cpu variables are used for performance reasons. Bouncing cache
				121	lines can be avoided if multiple processors concurrently go through
				122	the same code paths. Since each processor has its own per cpu
				123	variables no concurrent cache line updates take place. The price that
				124	has to be paid for this optimization is the need to add up the per cpu
				125	counters when the value of a counter is needed.
				126
				127
				128	Special operations
				129	------------------
				130
				131	::
				132
				133	y = this_cpu_ptr(&x)
				134
				135	Takes the offset of a per cpu variable (&x !) and returns the address
				136	of the per cpu variable that belongs to the currently executing
				137	processor. this_cpu_ptr avoids multiple steps that the common
				138	get_cpu/put_cpu sequence requires. No processor number is
				139	available. Instead, the offset of the local per cpu area is simply
				140	added to the per cpu offset.
				141
				142	Note that this operation is usually used in a code segment when
				143	preemption has been disabled. The pointer is then used to
				144	access local per cpu data in a critical section. When preemption
				145	is re-enabled this pointer is usually no longer useful since it may
				146	no longer point to per cpu data of the current processor.
				147
				148
				149	Per cpu variables and offsets
				150	-----------------------------
				151
				152	Per cpu variables have offsets to the beginning of the per cpu
				153	area. They do not have addresses although they look like that in the
				154	code. Offsets cannot be directly dereferenced. The offset must be
				155	added to a base pointer of a per cpu area of a processor in order to
				156	form a valid address.
				157
				158	Therefore the use of x or &x outside of the context of per cpu
				159	operations is invalid and will generally be treated like a NULL
				160	pointer dereference.
				161
				162	::
				163
				164	DEFINE_PER_CPU(int, x);
				165
				166	In the context of per cpu operations the above implies that x is a per
				167	cpu variable. Most this_cpu operations take a cpu variable.
				168
				169	::
				170
				171	int __percpu *p = &x;
				172
				173	&x and hence p is the offset of a per cpu variable. this_cpu_ptr()
				174	takes the offset of a per cpu variable which makes this look a bit
				175	strange.
				176
				177
				178	Operations on a field of a per cpu structure
				179	--------------------------------------------
				180
				181	Let's say we have a percpu structure::
				182
				183	struct s {
				184	int n,m;
				185	};
				186
				187	DEFINE_PER_CPU(struct s, p);
				188
				189
				190	Operations on these fields are straightforward::
				191
				192	this_cpu_inc(p.m)
				193
				194	z = this_cpu_cmpxchg(p.m, 0, 1);
				195
				196
				197	If we have an offset to struct s::
				198
				199	struct s __percpu *ps = &p;
				200
				201	this_cpu_dec(ps->m);
				202
				203	z = this_cpu_inc_return(ps->n);
				204
				205
				206	The calculation of the pointer may require the use of this_cpu_ptr()
				207	if we do not make use of this_cpu ops later to manipulate fields::
				208
				209	struct s *pp;
				210
				211	pp = this_cpu_ptr(&p);
				212
				213	pp->m--;
				214
				215	z = pp->n++;
				216
				217
				218	Variants of this_cpu ops
				219	------------------------
				220
				221	this_cpu ops are interrupt safe. Some architectures do not support
				222	these per cpu local operations. In that case the operation must be
				223	replaced by code that disables interrupts, then does the operations
				224	that are guaranteed to be atomic and then re-enable interrupts. Doing
				225	so is expensive. If there are other reasons why the scheduler cannot
				226	change the processor we are executing on then there is no reason to
				227	disable interrupts. For that purpose the following __this_cpu operations
				228	are provided.
				229
				230	These operations have no guarantee against concurrent interrupts or
				231	preemption. If a per cpu variable is not used in an interrupt context
				232	and the scheduler cannot preempt, then they are safe. If any interrupts
				233	still occur while an operation is in progress and if the interrupt too
				234	modifies the variable, then RMW actions can not be guaranteed to be
				235	safe::
				236
				237	__this_cpu_read(pcp)
				238	__this_cpu_write(pcp, val)
				239	__this_cpu_add(pcp, val)
				240	__this_cpu_and(pcp, val)
				241	__this_cpu_or(pcp, val)
				242	__this_cpu_add_return(pcp, val)
				243	__this_cpu_xchg(pcp, nval)
				244	__this_cpu_cmpxchg(pcp, oval, nval)
				245	__this_cpu_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2)
				246	__this_cpu_sub(pcp, val)
				247	__this_cpu_inc(pcp)
				248	__this_cpu_dec(pcp)
				249	__this_cpu_sub_return(pcp, val)
				250	__this_cpu_inc_return(pcp)
				251	__this_cpu_dec_return(pcp)
				252
				253
				254	Will increment x and will not fall-back to code that disables
				255	interrupts on platforms that cannot accomplish atomicity through
				256	address relocation and a Read-Modify-Write operation in the same
				257	instruction.
				258
				259
				260	&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n)
				261	--------------------------------------------
				262
				263	The first operation takes the offset and forms an address and then
				264	adds the offset of the n field. This may result in two add
				265	instructions emitted by the compiler.
				266
				267	The second one first adds the two offsets and then does the
				268	relocation. IMHO the second form looks cleaner and has an easier time
				269	with (). The second form also is consistent with the way
				270	this_cpu_read() and friends are used.
				271
				272
				273	Remote access to per cpu data
				274	------------------------------
				275
				276	Per cpu data structures are designed to be used by one cpu exclusively.
				277	If you use the variables as intended, this_cpu_ops() are guaranteed to
				278	be "atomic" as no other CPU has access to these data structures.
				279
				280	There are special cases where you might need to access per cpu data
				281	structures remotely. It is usually safe to do a remote read access
				282	and that is frequently done to summarize counters. Remote write access
				283	something which could be problematic because this_cpu ops do not
				284	have lock semantics. A remote write may interfere with a this_cpu
				285	RMW operation.
				286
				287	Remote write accesses to percpu data structures are highly discouraged
				288	unless absolutely necessary. Please consider using an IPI to wake up
				289	the remote CPU and perform the update to its per cpu area.
				290
				291	To access per-cpu data structure remotely, typically the per_cpu_ptr()
				292	function is used::
				293
				294
				295	DEFINE_PER_CPU(struct data, datap);
				296
				297	struct data *p = per_cpu_ptr(&datap, cpu);
				298
				299	This makes it explicit that we are getting ready to access a percpu
				300	area remotely.
				301
				302	You can also do the following to convert the datap offset to an address::
				303
				304	struct data *p = this_cpu_ptr(&datap);
				305
				306	but, passing of pointers calculated via this_cpu_ptr to other cpus is
				307	unusual and should be avoided.
				308
				309	Remote access are typically only for reading the status of another cpus
				310	per cpu data. Write accesses can cause unique problems due to the
				311	relaxed synchronization requirements for this_cpu operations.
				312
				313	One example that illustrates some concerns with write operations is
				314	the following scenario that occurs because two per cpu variables
				315	share a cache-line but the relaxed synchronization is applied to
				316	only one process updating the cache-line.
				317
				318	Consider the following example::
				319
				320
				321	struct test {
				322	atomic_t a;
				323	int b;
				324	};
				325
				326	DEFINE_PER_CPU(struct test, onecacheline);
				327
				328	There is some concern about what would happen if the field 'a' is updated
				329	remotely from one processor and the local processor would use this_cpu ops
				330	to update field b. Care should be taken that such simultaneous accesses to
				331	data within the same cache line are avoided. Also costly synchronization
				332	may be necessary. IPIs are generally recommended in such scenarios instead
				333	of a remote write to the per cpu area of another processor.
				334
				335	Even in cases where the remote writes are rare, please bear in
				336	mind that a remote write will evict the cache line from the processor
				337	that most likely will access it. If the processor wakes up and finds a
				338	missing local cache line of a per cpu area, its performance and hence
				339	the wake up times will be affected.