| xj | b04a402 | 2021-11-25 15:01:52 +0800 | [diff] [blame] | 1 | ====================== | 
 | 2 | Function Tracer Design | 
 | 3 | ====================== | 
 | 4 |  | 
 | 5 | :Author: Mike Frysinger | 
 | 6 |  | 
 | 7 | .. caution:: | 
 | 8 | 	This document is out of date. Some of the description below doesn't | 
 | 9 | 	match current implementation now. | 
 | 10 |  | 
 | 11 | Introduction | 
 | 12 | ------------ | 
 | 13 |  | 
 | 14 | Here we will cover the architecture pieces that the common function tracing | 
 | 15 | code relies on for proper functioning.  Things are broken down into increasing | 
 | 16 | complexity so that you can start simple and at least get basic functionality. | 
 | 17 |  | 
 | 18 | Note that this focuses on architecture implementation details only.  If you | 
 | 19 | want more explanation of a feature in terms of common code, review the common | 
 | 20 | ftrace.txt file. | 
 | 21 |  | 
 | 22 | Ideally, everyone who wishes to retain performance while supporting tracing in | 
 | 23 | their kernel should make it all the way to dynamic ftrace support. | 
 | 24 |  | 
 | 25 |  | 
 | 26 | Prerequisites | 
 | 27 | ------------- | 
 | 28 |  | 
 | 29 | Ftrace relies on these features being implemented: | 
 | 30 |   - STACKTRACE_SUPPORT - implement save_stack_trace() | 
 | 31 |   - TRACE_IRQFLAGS_SUPPORT - implement include/asm/irqflags.h | 
 | 32 |  | 
 | 33 |  | 
 | 34 | HAVE_FUNCTION_TRACER | 
 | 35 | -------------------- | 
 | 36 |  | 
 | 37 | You will need to implement the mcount and the ftrace_stub functions. | 
 | 38 |  | 
 | 39 | The exact mcount symbol name will depend on your toolchain.  Some call it | 
 | 40 | "mcount", "_mcount", or even "__mcount".  You can probably figure it out by | 
 | 41 | running something like:: | 
 | 42 |  | 
 | 43 | 	$ echo 'main(){}' | gcc -x c -S -o - - -pg | grep mcount | 
 | 44 | 	        call    mcount | 
 | 45 |  | 
 | 46 | We'll make the assumption below that the symbol is "mcount" just to keep things | 
 | 47 | nice and simple in the examples. | 
 | 48 |  | 
 | 49 | Keep in mind that the ABI that is in effect inside of the mcount function is | 
 | 50 | *highly* architecture/toolchain specific.  We cannot help you in this regard, | 
 | 51 | sorry.  Dig up some old documentation and/or find someone more familiar than | 
 | 52 | you to bang ideas off of.  Typically, register usage (argument/scratch/etc...) | 
 | 53 | is a major issue at this point, especially in relation to the location of the | 
 | 54 | mcount call (before/after function prologue).  You might also want to look at | 
 | 55 | how glibc has implemented the mcount function for your architecture.  It might | 
 | 56 | be (semi-)relevant. | 
 | 57 |  | 
 | 58 | The mcount function should check the function pointer ftrace_trace_function | 
 | 59 | to see if it is set to ftrace_stub.  If it is, there is nothing for you to do, | 
 | 60 | so return immediately.  If it isn't, then call that function in the same way | 
 | 61 | the mcount function normally calls __mcount_internal -- the first argument is | 
 | 62 | the "frompc" while the second argument is the "selfpc" (adjusted to remove the | 
 | 63 | size of the mcount call that is embedded in the function). | 
 | 64 |  | 
 | 65 | For example, if the function foo() calls bar(), when the bar() function calls | 
 | 66 | mcount(), the arguments mcount() will pass to the tracer are: | 
 | 67 |  | 
 | 68 |   - "frompc" - the address bar() will use to return to foo() | 
 | 69 |   - "selfpc" - the address bar() (with mcount() size adjustment) | 
 | 70 |  | 
 | 71 | Also keep in mind that this mcount function will be called *a lot*, so | 
 | 72 | optimizing for the default case of no tracer will help the smooth running of | 
 | 73 | your system when tracing is disabled.  So the start of the mcount function is | 
 | 74 | typically the bare minimum with checking things before returning.  That also | 
 | 75 | means the code flow should usually be kept linear (i.e. no branching in the nop | 
 | 76 | case).  This is of course an optimization and not a hard requirement. | 
 | 77 |  | 
 | 78 | Here is some pseudo code that should help (these functions should actually be | 
 | 79 | implemented in assembly):: | 
 | 80 |  | 
 | 81 | 	void ftrace_stub(void) | 
 | 82 | 	{ | 
 | 83 | 		return; | 
 | 84 | 	} | 
 | 85 |  | 
 | 86 | 	void mcount(void) | 
 | 87 | 	{ | 
 | 88 | 		/* save any bare state needed in order to do initial checking */ | 
 | 89 |  | 
 | 90 | 		extern void (*ftrace_trace_function)(unsigned long, unsigned long); | 
 | 91 | 		if (ftrace_trace_function != ftrace_stub) | 
 | 92 | 			goto do_trace; | 
 | 93 |  | 
 | 94 | 		/* restore any bare state */ | 
 | 95 |  | 
 | 96 | 		return; | 
 | 97 |  | 
 | 98 | 	do_trace: | 
 | 99 |  | 
 | 100 | 		/* save all state needed by the ABI (see paragraph above) */ | 
 | 101 |  | 
 | 102 | 		unsigned long frompc = ...; | 
 | 103 | 		unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE; | 
 | 104 | 		ftrace_trace_function(frompc, selfpc); | 
 | 105 |  | 
 | 106 | 		/* restore all state needed by the ABI */ | 
 | 107 | 	} | 
 | 108 |  | 
 | 109 | Don't forget to export mcount for modules ! | 
 | 110 | :: | 
 | 111 |  | 
 | 112 | 	extern void mcount(void); | 
 | 113 | 	EXPORT_SYMBOL(mcount); | 
 | 114 |  | 
 | 115 |  | 
 | 116 | HAVE_FUNCTION_GRAPH_TRACER | 
 | 117 | -------------------------- | 
 | 118 |  | 
 | 119 | Deep breath ... time to do some real work.  Here you will need to update the | 
 | 120 | mcount function to check ftrace graph function pointers, as well as implement | 
 | 121 | some functions to save (hijack) and restore the return address. | 
 | 122 |  | 
 | 123 | The mcount function should check the function pointers ftrace_graph_return | 
 | 124 | (compare to ftrace_stub) and ftrace_graph_entry (compare to | 
 | 125 | ftrace_graph_entry_stub).  If either of those is not set to the relevant stub | 
 | 126 | function, call the arch-specific function ftrace_graph_caller which in turn | 
 | 127 | calls the arch-specific function prepare_ftrace_return.  Neither of these | 
 | 128 | function names is strictly required, but you should use them anyway to stay | 
 | 129 | consistent across the architecture ports -- easier to compare & contrast | 
 | 130 | things. | 
 | 131 |  | 
 | 132 | The arguments to prepare_ftrace_return are slightly different than what are | 
 | 133 | passed to ftrace_trace_function.  The second argument "selfpc" is the same, | 
 | 134 | but the first argument should be a pointer to the "frompc".  Typically this is | 
 | 135 | located on the stack.  This allows the function to hijack the return address | 
 | 136 | temporarily to have it point to the arch-specific function return_to_handler. | 
 | 137 | That function will simply call the common ftrace_return_to_handler function and | 
 | 138 | that will return the original return address with which you can return to the | 
 | 139 | original call site. | 
 | 140 |  | 
 | 141 | Here is the updated mcount pseudo code:: | 
 | 142 |  | 
 | 143 | 	void mcount(void) | 
 | 144 | 	{ | 
 | 145 | 	... | 
 | 146 | 		if (ftrace_trace_function != ftrace_stub) | 
 | 147 | 			goto do_trace; | 
 | 148 |  | 
 | 149 | 	+#ifdef CONFIG_FUNCTION_GRAPH_TRACER | 
 | 150 | 	+	extern void (*ftrace_graph_return)(...); | 
 | 151 | 	+	extern void (*ftrace_graph_entry)(...); | 
 | 152 | 	+	if (ftrace_graph_return != ftrace_stub || | 
 | 153 | 	+	    ftrace_graph_entry != ftrace_graph_entry_stub) | 
 | 154 | 	+		ftrace_graph_caller(); | 
 | 155 | 	+#endif | 
 | 156 |  | 
 | 157 | 		/* restore any bare state */ | 
 | 158 | 	... | 
 | 159 |  | 
 | 160 | Here is the pseudo code for the new ftrace_graph_caller assembly function:: | 
 | 161 |  | 
 | 162 | 	#ifdef CONFIG_FUNCTION_GRAPH_TRACER | 
 | 163 | 	void ftrace_graph_caller(void) | 
 | 164 | 	{ | 
 | 165 | 		/* save all state needed by the ABI */ | 
 | 166 |  | 
 | 167 | 		unsigned long *frompc = &...; | 
 | 168 | 		unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE; | 
 | 169 | 		/* passing frame pointer up is optional -- see below */ | 
 | 170 | 		prepare_ftrace_return(frompc, selfpc, frame_pointer); | 
 | 171 |  | 
 | 172 | 		/* restore all state needed by the ABI */ | 
 | 173 | 	} | 
 | 174 | 	#endif | 
 | 175 |  | 
 | 176 | For information on how to implement prepare_ftrace_return(), simply look at the | 
 | 177 | x86 version (the frame pointer passing is optional; see the next section for | 
 | 178 | more information).  The only architecture-specific piece in it is the setup of | 
 | 179 | the fault recovery table (the asm(...) code).  The rest should be the same | 
 | 180 | across architectures. | 
 | 181 |  | 
 | 182 | Here is the pseudo code for the new return_to_handler assembly function.  Note | 
 | 183 | that the ABI that applies here is different from what applies to the mcount | 
 | 184 | code.  Since you are returning from a function (after the epilogue), you might | 
 | 185 | be able to skimp on things saved/restored (usually just registers used to pass | 
 | 186 | return values). | 
 | 187 | :: | 
 | 188 |  | 
 | 189 | 	#ifdef CONFIG_FUNCTION_GRAPH_TRACER | 
 | 190 | 	void return_to_handler(void) | 
 | 191 | 	{ | 
 | 192 | 		/* save all state needed by the ABI (see paragraph above) */ | 
 | 193 |  | 
 | 194 | 		void (*original_return_point)(void) = ftrace_return_to_handler(); | 
 | 195 |  | 
 | 196 | 		/* restore all state needed by the ABI */ | 
 | 197 |  | 
 | 198 | 		/* this is usually either a return or a jump */ | 
 | 199 | 		original_return_point(); | 
 | 200 | 	} | 
 | 201 | 	#endif | 
 | 202 |  | 
 | 203 |  | 
 | 204 | HAVE_FUNCTION_GRAPH_FP_TEST | 
 | 205 | --------------------------- | 
 | 206 |  | 
 | 207 | An arch may pass in a unique value (frame pointer) to both the entering and | 
 | 208 | exiting of a function.  On exit, the value is compared and if it does not | 
 | 209 | match, then it will panic the kernel.  This is largely a sanity check for bad | 
 | 210 | code generation with gcc.  If gcc for your port sanely updates the frame | 
 | 211 | pointer under different optimization levels, then ignore this option. | 
 | 212 |  | 
 | 213 | However, adding support for it isn't terribly difficult.  In your assembly code | 
 | 214 | that calls prepare_ftrace_return(), pass the frame pointer as the 3rd argument. | 
 | 215 | Then in the C version of that function, do what the x86 port does and pass it | 
 | 216 | along to ftrace_push_return_trace() instead of a stub value of 0. | 
 | 217 |  | 
 | 218 | Similarly, when you call ftrace_return_to_handler(), pass it the frame pointer. | 
 | 219 |  | 
 | 220 | HAVE_FUNCTION_GRAPH_RET_ADDR_PTR | 
 | 221 | -------------------------------- | 
 | 222 |  | 
 | 223 | An arch may pass in a pointer to the return address on the stack.  This | 
 | 224 | prevents potential stack unwinding issues where the unwinder gets out of | 
 | 225 | sync with ret_stack and the wrong addresses are reported by | 
 | 226 | ftrace_graph_ret_addr(). | 
 | 227 |  | 
 | 228 | Adding support for it is easy: just define the macro in asm/ftrace.h and | 
 | 229 | pass the return address pointer as the 'retp' argument to | 
 | 230 | ftrace_push_return_trace(). | 
 | 231 |  | 
 | 232 | HAVE_FTRACE_NMI_ENTER | 
 | 233 | --------------------- | 
 | 234 |  | 
 | 235 | If you can't trace NMI functions, then skip this option. | 
 | 236 |  | 
 | 237 | <details to be filled> | 
 | 238 |  | 
 | 239 |  | 
 | 240 | HAVE_SYSCALL_TRACEPOINTS | 
 | 241 | ------------------------ | 
 | 242 |  | 
 | 243 | You need very few things to get the syscalls tracing in an arch. | 
 | 244 |  | 
 | 245 |   - Support HAVE_ARCH_TRACEHOOK (see arch/Kconfig). | 
 | 246 |   - Have a NR_syscalls variable in <asm/unistd.h> that provides the number | 
 | 247 |     of syscalls supported by the arch. | 
 | 248 |   - Support the TIF_SYSCALL_TRACEPOINT thread flags. | 
 | 249 |   - Put the trace_sys_enter() and trace_sys_exit() tracepoints calls from ptrace | 
 | 250 |     in the ptrace syscalls tracing path. | 
 | 251 |   - If the system call table on this arch is more complicated than a simple array | 
 | 252 |     of addresses of the system calls, implement an arch_syscall_addr to return | 
 | 253 |     the address of a given system call. | 
 | 254 |   - If the symbol names of the system calls do not match the function names on | 
 | 255 |     this arch, define ARCH_HAS_SYSCALL_MATCH_SYM_NAME in asm/ftrace.h and | 
 | 256 |     implement arch_syscall_match_sym_name with the appropriate logic to return | 
 | 257 |     true if the function name corresponds with the symbol name. | 
 | 258 |   - Tag this arch as HAVE_SYSCALL_TRACEPOINTS. | 
 | 259 |  | 
 | 260 |  | 
 | 261 | HAVE_FTRACE_MCOUNT_RECORD | 
 | 262 | ------------------------- | 
 | 263 |  | 
 | 264 | See scripts/recordmcount.pl for more info.  Just fill in the arch-specific | 
 | 265 | details for how to locate the addresses of mcount call sites via objdump. | 
 | 266 | This option doesn't make much sense without also implementing dynamic ftrace. | 
 | 267 |  | 
 | 268 |  | 
 | 269 | HAVE_DYNAMIC_FTRACE | 
 | 270 | ------------------- | 
 | 271 |  | 
 | 272 | You will first need HAVE_FTRACE_MCOUNT_RECORD and HAVE_FUNCTION_TRACER, so | 
 | 273 | scroll your reader back up if you got over eager. | 
 | 274 |  | 
 | 275 | Once those are out of the way, you will need to implement: | 
 | 276 | 	- asm/ftrace.h: | 
 | 277 | 		- MCOUNT_ADDR | 
 | 278 | 		- ftrace_call_adjust() | 
 | 279 | 		- struct dyn_arch_ftrace{} | 
 | 280 | 	- asm code: | 
 | 281 | 		- mcount() (new stub) | 
 | 282 | 		- ftrace_caller() | 
 | 283 | 		- ftrace_call() | 
 | 284 | 		- ftrace_stub() | 
 | 285 | 	- C code: | 
 | 286 | 		- ftrace_dyn_arch_init() | 
 | 287 | 		- ftrace_make_nop() | 
 | 288 | 		- ftrace_make_call() | 
 | 289 | 		- ftrace_update_ftrace_func() | 
 | 290 |  | 
 | 291 | First you will need to fill out some arch details in your asm/ftrace.h. | 
 | 292 |  | 
 | 293 | Define MCOUNT_ADDR as the address of your mcount symbol similar to:: | 
 | 294 |  | 
 | 295 | 	#define MCOUNT_ADDR ((unsigned long)mcount) | 
 | 296 |  | 
 | 297 | Since no one else will have a decl for that function, you will need to:: | 
 | 298 |  | 
 | 299 | 	extern void mcount(void); | 
 | 300 |  | 
 | 301 | You will also need the helper function ftrace_call_adjust().  Most people | 
 | 302 | will be able to stub it out like so:: | 
 | 303 |  | 
 | 304 | 	static inline unsigned long ftrace_call_adjust(unsigned long addr) | 
 | 305 | 	{ | 
 | 306 | 		return addr; | 
 | 307 | 	} | 
 | 308 |  | 
 | 309 | <details to be filled> | 
 | 310 |  | 
 | 311 | Lastly you will need the custom dyn_arch_ftrace structure.  If you need | 
 | 312 | some extra state when runtime patching arbitrary call sites, this is the | 
 | 313 | place.  For now though, create an empty struct:: | 
 | 314 |  | 
 | 315 | 	struct dyn_arch_ftrace { | 
 | 316 | 		/* No extra data needed */ | 
 | 317 | 	}; | 
 | 318 |  | 
 | 319 | With the header out of the way, we can fill out the assembly code.  While we | 
 | 320 | did already create a mcount() function earlier, dynamic ftrace only wants a | 
 | 321 | stub function.  This is because the mcount() will only be used during boot | 
 | 322 | and then all references to it will be patched out never to return.  Instead, | 
 | 323 | the guts of the old mcount() will be used to create a new ftrace_caller() | 
 | 324 | function.  Because the two are hard to merge, it will most likely be a lot | 
 | 325 | easier to have two separate definitions split up by #ifdefs.  Same goes for | 
 | 326 | the ftrace_stub() as that will now be inlined in ftrace_caller(). | 
 | 327 |  | 
 | 328 | Before we get confused anymore, let's check out some pseudo code so you can | 
 | 329 | implement your own stuff in assembly:: | 
 | 330 |  | 
 | 331 | 	void mcount(void) | 
 | 332 | 	{ | 
 | 333 | 		return; | 
 | 334 | 	} | 
 | 335 |  | 
 | 336 | 	void ftrace_caller(void) | 
 | 337 | 	{ | 
 | 338 | 		/* save all state needed by the ABI (see paragraph above) */ | 
 | 339 |  | 
 | 340 | 		unsigned long frompc = ...; | 
 | 341 | 		unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE; | 
 | 342 |  | 
 | 343 | 	ftrace_call: | 
 | 344 | 		ftrace_stub(frompc, selfpc); | 
 | 345 |  | 
 | 346 | 		/* restore all state needed by the ABI */ | 
 | 347 |  | 
 | 348 | 	ftrace_stub: | 
 | 349 | 		return; | 
 | 350 | 	} | 
 | 351 |  | 
 | 352 | This might look a little odd at first, but keep in mind that we will be runtime | 
 | 353 | patching multiple things.  First, only functions that we actually want to trace | 
 | 354 | will be patched to call ftrace_caller().  Second, since we only have one tracer | 
 | 355 | active at a time, we will patch the ftrace_caller() function itself to call the | 
 | 356 | specific tracer in question.  That is the point of the ftrace_call label. | 
 | 357 |  | 
 | 358 | With that in mind, let's move on to the C code that will actually be doing the | 
 | 359 | runtime patching.  You'll need a little knowledge of your arch's opcodes in | 
 | 360 | order to make it through the next section. | 
 | 361 |  | 
 | 362 | Every arch has an init callback function.  If you need to do something early on | 
 | 363 | to initialize some state, this is the time to do that.  Otherwise, this simple | 
 | 364 | function below should be sufficient for most people:: | 
 | 365 |  | 
 | 366 | 	int __init ftrace_dyn_arch_init(void) | 
 | 367 | 	{ | 
 | 368 | 		return 0; | 
 | 369 | 	} | 
 | 370 |  | 
 | 371 | There are two functions that are used to do runtime patching of arbitrary | 
 | 372 | functions.  The first is used to turn the mcount call site into a nop (which | 
 | 373 | is what helps us retain runtime performance when not tracing).  The second is | 
 | 374 | used to turn the mcount call site into a call to an arbitrary location (but | 
 | 375 | typically that is ftracer_caller()).  See the general function definition in | 
 | 376 | linux/ftrace.h for the functions:: | 
 | 377 |  | 
 | 378 | 	ftrace_make_nop() | 
 | 379 | 	ftrace_make_call() | 
 | 380 |  | 
 | 381 | The rec->ip value is the address of the mcount call site that was collected | 
 | 382 | by the scripts/recordmcount.pl during build time. | 
 | 383 |  | 
 | 384 | The last function is used to do runtime patching of the active tracer.  This | 
 | 385 | will be modifying the assembly code at the location of the ftrace_call symbol | 
 | 386 | inside of the ftrace_caller() function.  So you should have sufficient padding | 
 | 387 | at that location to support the new function calls you'll be inserting.  Some | 
 | 388 | people will be using a "call" type instruction while others will be using a | 
 | 389 | "branch" type instruction.  Specifically, the function is:: | 
 | 390 |  | 
 | 391 | 	ftrace_update_ftrace_func() | 
 | 392 |  | 
 | 393 |  | 
 | 394 | HAVE_DYNAMIC_FTRACE + HAVE_FUNCTION_GRAPH_TRACER | 
 | 395 | ------------------------------------------------ | 
 | 396 |  | 
 | 397 | The function grapher needs a few tweaks in order to work with dynamic ftrace. | 
 | 398 | Basically, you will need to: | 
 | 399 |  | 
 | 400 | 	- update: | 
 | 401 | 		- ftrace_caller() | 
 | 402 | 		- ftrace_graph_call() | 
 | 403 | 		- ftrace_graph_caller() | 
 | 404 | 	- implement: | 
 | 405 | 		- ftrace_enable_ftrace_graph_caller() | 
 | 406 | 		- ftrace_disable_ftrace_graph_caller() | 
 | 407 |  | 
 | 408 | <details to be filled> | 
 | 409 |  | 
 | 410 | Quick notes: | 
 | 411 |  | 
 | 412 | 	- add a nop stub after the ftrace_call location named ftrace_graph_call; | 
 | 413 | 	  stub needs to be large enough to support a call to ftrace_graph_caller() | 
 | 414 | 	- update ftrace_graph_caller() to work with being called by the new | 
 | 415 | 	  ftrace_caller() since some semantics may have changed | 
 | 416 | 	- ftrace_enable_ftrace_graph_caller() will runtime patch the | 
 | 417 | 	  ftrace_graph_call location with a call to ftrace_graph_caller() | 
 | 418 | 	- ftrace_disable_ftrace_graph_caller() will runtime patch the | 
 | 419 | 	  ftrace_graph_call location with nops |