| PROPER CARE AND FEEDING OF RETURN VALUES FROM rcu_dereference() | 
 |  | 
 | Most of the time, you can use values from rcu_dereference() or one of | 
 | the similar primitives without worries.  Dereferencing (prefix "*"), | 
 | field selection ("->"), assignment ("="), address-of ("&"), addition and | 
 | subtraction of constants, and casts all work quite naturally and safely. | 
 |  | 
 | It is nevertheless possible to get into trouble with other operations. | 
 | Follow these rules to keep your RCU code working properly: | 
 |  | 
 | o	You must use one of the rcu_dereference() family of primitives | 
 | 	to load an RCU-protected pointer, otherwise CONFIG_PROVE_RCU | 
 | 	will complain.  Worse yet, your code can see random memory-corruption | 
 | 	bugs due to games that compilers and DEC Alpha can play. | 
 | 	Without one of the rcu_dereference() primitives, compilers | 
 | 	can reload the value, and won't your code have fun with two | 
 | 	different values for a single pointer!  Without rcu_dereference(), | 
 | 	DEC Alpha can load a pointer, dereference that pointer, and | 
 | 	return data preceding initialization that preceded the store of | 
 | 	the pointer. | 
 |  | 
 | 	In addition, the volatile cast in rcu_dereference() prevents the | 
 | 	compiler from deducing the resulting pointer value.  Please see | 
 | 	the section entitled "EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH" | 
 | 	for an example where the compiler can in fact deduce the exact | 
 | 	value of the pointer, and thus cause misordering. | 
 |  | 
 | o	You are only permitted to use rcu_dereference on pointer values. | 
 | 	The compiler simply knows too much about integral values to | 
 | 	trust it to carry dependencies through integer operations. | 
 | 	There are a very few exceptions, namely that you can temporarily | 
 | 	cast the pointer to uintptr_t in order to: | 
 |  | 
 | 	o	Set bits and clear bits down in the must-be-zero low-order | 
 | 		bits of that pointer.  This clearly means that the pointer | 
 | 		must have alignment constraints, for example, this does | 
 | 		-not- work in general for char* pointers. | 
 |  | 
 | 	o	XOR bits to translate pointers, as is done in some | 
 | 		classic buddy-allocator algorithms. | 
 |  | 
 | 	It is important to cast the value back to pointer before | 
 | 	doing much of anything else with it. | 
 |  | 
 | o	Avoid cancellation when using the "+" and "-" infix arithmetic | 
 | 	operators.  For example, for a given variable "x", avoid | 
 | 	"(x-(uintptr_t)x)" for char* pointers.	The compiler is within its | 
 | 	rights to substitute zero for this sort of expression, so that | 
 | 	subsequent accesses no longer depend on the rcu_dereference(), | 
 | 	again possibly resulting in bugs due to misordering. | 
 |  | 
 | 	Of course, if "p" is a pointer from rcu_dereference(), and "a" | 
 | 	and "b" are integers that happen to be equal, the expression | 
 | 	"p+a-b" is safe because its value still necessarily depends on | 
 | 	the rcu_dereference(), thus maintaining proper ordering. | 
 |  | 
 | o	If you are using RCU to protect JITed functions, so that the | 
 | 	"()" function-invocation operator is applied to a value obtained | 
 | 	(directly or indirectly) from rcu_dereference(), you may need to | 
 | 	interact directly with the hardware to flush instruction caches. | 
 | 	This issue arises on some systems when a newly JITed function is | 
 | 	using the same memory that was used by an earlier JITed function. | 
 |  | 
 | o	Do not use the results from relational operators ("==", "!=", | 
 | 	">", ">=", "<", or "<=") when dereferencing.  For example, | 
 | 	the following (quite strange) code is buggy: | 
 |  | 
 | 		int *p; | 
 | 		int *q; | 
 |  | 
 | 		... | 
 |  | 
 | 		p = rcu_dereference(gp) | 
 | 		q = &global_q; | 
 | 		q += p > &oom_p; | 
 | 		r1 = *q;  /* BUGGY!!! */ | 
 |  | 
 | 	As before, the reason this is buggy is that relational operators | 
 | 	are often compiled using branches.  And as before, although | 
 | 	weak-memory machines such as ARM or PowerPC do order stores | 
 | 	after such branches, but can speculate loads, which can again | 
 | 	result in misordering bugs. | 
 |  | 
 | o	Be very careful about comparing pointers obtained from | 
 | 	rcu_dereference() against non-NULL values.  As Linus Torvalds | 
 | 	explained, if the two pointers are equal, the compiler could | 
 | 	substitute the pointer you are comparing against for the pointer | 
 | 	obtained from rcu_dereference().  For example: | 
 |  | 
 | 		p = rcu_dereference(gp); | 
 | 		if (p == &default_struct) | 
 | 			do_default(p->a); | 
 |  | 
 | 	Because the compiler now knows that the value of "p" is exactly | 
 | 	the address of the variable "default_struct", it is free to | 
 | 	transform this code into the following: | 
 |  | 
 | 		p = rcu_dereference(gp); | 
 | 		if (p == &default_struct) | 
 | 			do_default(default_struct.a); | 
 |  | 
 | 	On ARM and Power hardware, the load from "default_struct.a" | 
 | 	can now be speculated, such that it might happen before the | 
 | 	rcu_dereference().  This could result in bugs due to misordering. | 
 |  | 
 | 	However, comparisons are OK in the following cases: | 
 |  | 
 | 	o	The comparison was against the NULL pointer.  If the | 
 | 		compiler knows that the pointer is NULL, you had better | 
 | 		not be dereferencing it anyway.  If the comparison is | 
 | 		non-equal, the compiler is none the wiser.  Therefore, | 
 | 		it is safe to compare pointers from rcu_dereference() | 
 | 		against NULL pointers. | 
 |  | 
 | 	o	The pointer is never dereferenced after being compared. | 
 | 		Since there are no subsequent dereferences, the compiler | 
 | 		cannot use anything it learned from the comparison | 
 | 		to reorder the non-existent subsequent dereferences. | 
 | 		This sort of comparison occurs frequently when scanning | 
 | 		RCU-protected circular linked lists. | 
 |  | 
 | 		Note that if checks for being within an RCU read-side | 
 | 		critical section are not required and the pointer is never | 
 | 		dereferenced, rcu_access_pointer() should be used in place | 
 | 		of rcu_dereference(). | 
 |  | 
 | 	o	The comparison is against a pointer that references memory | 
 | 		that was initialized "a long time ago."  The reason | 
 | 		this is safe is that even if misordering occurs, the | 
 | 		misordering will not affect the accesses that follow | 
 | 		the comparison.  So exactly how long ago is "a long | 
 | 		time ago"?  Here are some possibilities: | 
 |  | 
 | 		o	Compile time. | 
 |  | 
 | 		o	Boot time. | 
 |  | 
 | 		o	Module-init time for module code. | 
 |  | 
 | 		o	Prior to kthread creation for kthread code. | 
 |  | 
 | 		o	During some prior acquisition of the lock that | 
 | 			we now hold. | 
 |  | 
 | 		o	Before mod_timer() time for a timer handler. | 
 |  | 
 | 		There are many other possibilities involving the Linux | 
 | 		kernel's wide array of primitives that cause code to | 
 | 		be invoked at a later time. | 
 |  | 
 | 	o	The pointer being compared against also came from | 
 | 		rcu_dereference().  In this case, both pointers depend | 
 | 		on one rcu_dereference() or another, so you get proper | 
 | 		ordering either way. | 
 |  | 
 | 		That said, this situation can make certain RCU usage | 
 | 		bugs more likely to happen.  Which can be a good thing, | 
 | 		at least if they happen during testing.  An example | 
 | 		of such an RCU usage bug is shown in the section titled | 
 | 		"EXAMPLE OF AMPLIFIED RCU-USAGE BUG". | 
 |  | 
 | 	o	All of the accesses following the comparison are stores, | 
 | 		so that a control dependency preserves the needed ordering. | 
 | 		That said, it is easy to get control dependencies wrong. | 
 | 		Please see the "CONTROL DEPENDENCIES" section of | 
 | 		Documentation/memory-barriers.txt for more details. | 
 |  | 
 | 	o	The pointers are not equal -and- the compiler does | 
 | 		not have enough information to deduce the value of the | 
 | 		pointer.  Note that the volatile cast in rcu_dereference() | 
 | 		will normally prevent the compiler from knowing too much. | 
 |  | 
 | 		However, please note that if the compiler knows that the | 
 | 		pointer takes on only one of two values, a not-equal | 
 | 		comparison will provide exactly the information that the | 
 | 		compiler needs to deduce the value of the pointer. | 
 |  | 
 | o	Disable any value-speculation optimizations that your compiler | 
 | 	might provide, especially if you are making use of feedback-based | 
 | 	optimizations that take data collected from prior runs.  Such | 
 | 	value-speculation optimizations reorder operations by design. | 
 |  | 
 | 	There is one exception to this rule:  Value-speculation | 
 | 	optimizations that leverage the branch-prediction hardware are | 
 | 	safe on strongly ordered systems (such as x86), but not on weakly | 
 | 	ordered systems (such as ARM or Power).  Choose your compiler | 
 | 	command-line options wisely! | 
 |  | 
 |  | 
 | EXAMPLE OF AMPLIFIED RCU-USAGE BUG | 
 |  | 
 | Because updaters can run concurrently with RCU readers, RCU readers can | 
 | see stale and/or inconsistent values.  If RCU readers need fresh or | 
 | consistent values, which they sometimes do, they need to take proper | 
 | precautions.  To see this, consider the following code fragment: | 
 |  | 
 | 	struct foo { | 
 | 		int a; | 
 | 		int b; | 
 | 		int c; | 
 | 	}; | 
 | 	struct foo *gp1; | 
 | 	struct foo *gp2; | 
 |  | 
 | 	void updater(void) | 
 | 	{ | 
 | 		struct foo *p; | 
 |  | 
 | 		p = kmalloc(...); | 
 | 		if (p == NULL) | 
 | 			deal_with_it(); | 
 | 		p->a = 42;  /* Each field in its own cache line. */ | 
 | 		p->b = 43; | 
 | 		p->c = 44; | 
 | 		rcu_assign_pointer(gp1, p); | 
 | 		p->b = 143; | 
 | 		p->c = 144; | 
 | 		rcu_assign_pointer(gp2, p); | 
 | 	} | 
 |  | 
 | 	void reader(void) | 
 | 	{ | 
 | 		struct foo *p; | 
 | 		struct foo *q; | 
 | 		int r1, r2; | 
 |  | 
 | 		p = rcu_dereference(gp2); | 
 | 		if (p == NULL) | 
 | 			return; | 
 | 		r1 = p->b;  /* Guaranteed to get 143. */ | 
 | 		q = rcu_dereference(gp1);  /* Guaranteed non-NULL. */ | 
 | 		if (p == q) { | 
 | 			/* The compiler decides that q->c is same as p->c. */ | 
 | 			r2 = p->c; /* Could get 44 on weakly order system. */ | 
 | 		} | 
 | 		do_something_with(r1, r2); | 
 | 	} | 
 |  | 
 | You might be surprised that the outcome (r1 == 143 && r2 == 44) is possible, | 
 | but you should not be.  After all, the updater might have been invoked | 
 | a second time between the time reader() loaded into "r1" and the time | 
 | that it loaded into "r2".  The fact that this same result can occur due | 
 | to some reordering from the compiler and CPUs is beside the point. | 
 |  | 
 | But suppose that the reader needs a consistent view? | 
 |  | 
 | Then one approach is to use locking, for example, as follows: | 
 |  | 
 | 	struct foo { | 
 | 		int a; | 
 | 		int b; | 
 | 		int c; | 
 | 		spinlock_t lock; | 
 | 	}; | 
 | 	struct foo *gp1; | 
 | 	struct foo *gp2; | 
 |  | 
 | 	void updater(void) | 
 | 	{ | 
 | 		struct foo *p; | 
 |  | 
 | 		p = kmalloc(...); | 
 | 		if (p == NULL) | 
 | 			deal_with_it(); | 
 | 		spin_lock(&p->lock); | 
 | 		p->a = 42;  /* Each field in its own cache line. */ | 
 | 		p->b = 43; | 
 | 		p->c = 44; | 
 | 		spin_unlock(&p->lock); | 
 | 		rcu_assign_pointer(gp1, p); | 
 | 		spin_lock(&p->lock); | 
 | 		p->b = 143; | 
 | 		p->c = 144; | 
 | 		spin_unlock(&p->lock); | 
 | 		rcu_assign_pointer(gp2, p); | 
 | 	} | 
 |  | 
 | 	void reader(void) | 
 | 	{ | 
 | 		struct foo *p; | 
 | 		struct foo *q; | 
 | 		int r1, r2; | 
 |  | 
 | 		p = rcu_dereference(gp2); | 
 | 		if (p == NULL) | 
 | 			return; | 
 | 		spin_lock(&p->lock); | 
 | 		r1 = p->b;  /* Guaranteed to get 143. */ | 
 | 		q = rcu_dereference(gp1);  /* Guaranteed non-NULL. */ | 
 | 		if (p == q) { | 
 | 			/* The compiler decides that q->c is same as p->c. */ | 
 | 			r2 = p->c; /* Locking guarantees r2 == 144. */ | 
 | 		} | 
 | 		spin_unlock(&p->lock); | 
 | 		do_something_with(r1, r2); | 
 | 	} | 
 |  | 
 | As always, use the right tool for the job! | 
 |  | 
 |  | 
 | EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH | 
 |  | 
 | If a pointer obtained from rcu_dereference() compares not-equal to some | 
 | other pointer, the compiler normally has no clue what the value of the | 
 | first pointer might be.  This lack of knowledge prevents the compiler | 
 | from carrying out optimizations that otherwise might destroy the ordering | 
 | guarantees that RCU depends on.  And the volatile cast in rcu_dereference() | 
 | should prevent the compiler from guessing the value. | 
 |  | 
 | But without rcu_dereference(), the compiler knows more than you might | 
 | expect.  Consider the following code fragment: | 
 |  | 
 | 	struct foo { | 
 | 		int a; | 
 | 		int b; | 
 | 	}; | 
 | 	static struct foo variable1; | 
 | 	static struct foo variable2; | 
 | 	static struct foo *gp = &variable1; | 
 |  | 
 | 	void updater(void) | 
 | 	{ | 
 | 		initialize_foo(&variable2); | 
 | 		rcu_assign_pointer(gp, &variable2); | 
 | 		/* | 
 | 		 * The above is the only store to gp in this translation unit, | 
 | 		 * and the address of gp is not exported in any way. | 
 | 		 */ | 
 | 	} | 
 |  | 
 | 	int reader(void) | 
 | 	{ | 
 | 		struct foo *p; | 
 |  | 
 | 		p = gp; | 
 | 		barrier(); | 
 | 		if (p == &variable1) | 
 | 			return p->a; /* Must be variable1.a. */ | 
 | 		else | 
 | 			return p->b; /* Must be variable2.b. */ | 
 | 	} | 
 |  | 
 | Because the compiler can see all stores to "gp", it knows that the only | 
 | possible values of "gp" are "variable1" on the one hand and "variable2" | 
 | on the other.  The comparison in reader() therefore tells the compiler | 
 | the exact value of "p" even in the not-equals case.  This allows the | 
 | compiler to make the return values independent of the load from "gp", | 
 | in turn destroying the ordering between this load and the loads of the | 
 | return values.  This can result in "p->b" returning pre-initialization | 
 | garbage values. | 
 |  | 
 | In short, rcu_dereference() is -not- optional when you are going to | 
 | dereference the resulting pointer. |