| lh | 9ed821d | 2023-04-07 01:36:19 -0700 | [diff] [blame] | 1 | Using the glibc microbenchmark suite | 
 | 2 | ==================================== | 
 | 3 |  | 
 | 4 | The glibc microbenchmark suite automatically generates code for specified | 
 | 5 | functions, builds and calls them repeatedly for given inputs to give some | 
 | 6 | basic performance properties of the function. | 
 | 7 |  | 
 | 8 | Running the benchmark: | 
 | 9 | ===================== | 
 | 10 |  | 
 | 11 | The benchmark needs python 2.7 or later in addition to the | 
 | 12 | dependencies required to build the GNU C Library.  One may run the | 
 | 13 | benchmark by invoking make as follows: | 
 | 14 |  | 
 | 15 |   $ make bench | 
 | 16 |  | 
 | 17 | This runs each function for 10 seconds and appends its output to | 
 | 18 | benchtests/bench.out.  To ensure that the tests are rebuilt, one could run: | 
 | 19 |  | 
 | 20 |   $ make bench-clean | 
 | 21 |  | 
 | 22 | The duration of each test can be configured setting the BENCH_DURATION variable | 
 | 23 | in the call to make.  One should run `make bench-clean' before changing | 
 | 24 | BENCH_DURATION. | 
 | 25 |  | 
 | 26 |   $ make BENCH_DURATION=1 bench | 
 | 27 |  | 
 | 28 | The benchmark suite does function call measurements using architecture-specific | 
 | 29 | high precision timing instructions whenever available.  When such support is | 
 | 30 | not available, it uses clock_gettime (CLOCK_PROCESS_CPUTIME_ID).  One can force | 
 | 31 | the benchmark to use clock_gettime by invoking make as follows: | 
 | 32 |  | 
 | 33 |   $ make USE_CLOCK_GETTIME=1 bench | 
 | 34 |  | 
 | 35 | Again, one must run `make bench-clean' before changing the measurement method. | 
 | 36 |  | 
 | 37 | Adding a function to benchtests: | 
 | 38 | =============================== | 
 | 39 |  | 
 | 40 | If the name of the function is `foo', then the following procedure should allow | 
 | 41 | one to add `foo' to the bench tests: | 
 | 42 |  | 
 | 43 | - Append the function name to the bench variable in the Makefile. | 
 | 44 |  | 
 | 45 | - Make a file called `foo-inputs` to provide the definition and input for the | 
 | 46 |   function.  The file should have some directives telling the parser script | 
 | 47 |   about the function and then one input per line.  Directives are lines that | 
 | 48 |   have a special meaning for the parser and they begin with two hashes '##'. | 
 | 49 |   The following directives are recognized: | 
 | 50 |  | 
 | 51 |   - args: This should be assigned a colon separated list of types of the input | 
 | 52 |     arguments.  This directive may be skipped if the function does not take any | 
 | 53 |     inputs.  One may identify output arguments by nesting them in <>.  The | 
 | 54 |     generator will create variables to get outputs from the calling function. | 
 | 55 |   - ret: This should be assigned the type that the function returns.  This | 
 | 56 |     directive may be skipped if the function does not return a value. | 
 | 57 |   - includes: This should be assigned a comma-separated list of headers that | 
 | 58 |     need to be included to provide declarations for the function and types it | 
 | 59 |     may need (specifically, this includes using "#include <header>"). | 
 | 60 |   - include-sources: This should be assigned a comma-separated list of source | 
 | 61 |     files that need to be included to provide definitions of global variables | 
 | 62 |     and functions (specifically, this includes using "#include "source"). | 
 | 63 |     See pthread_once-inputs and pthreads_once-source.c for an example of how | 
 | 64 |     to use this to benchmark a function that needs state across several calls. | 
 | 65 |   - init: Name of an initializer function to call to initialize the benchtest. | 
 | 66 |   - name: See following section for instructions on how to use this directive. | 
 | 67 |  | 
 | 68 |   Lines beginning with a single hash '#' are treated as comments.  See | 
 | 69 |   pow-inputs for an example of an input file. | 
 | 70 |  | 
 | 71 | Multiple execution units per function: | 
 | 72 | ===================================== | 
 | 73 |  | 
 | 74 | Some functions have distinct performance characteristics for different input | 
 | 75 | domains and it may be necessary to measure those separately.  For example, some | 
 | 76 | math functions perform computations at different levels of precision (64-bit vs | 
 | 77 | 240-bit vs 768-bit) and mixing them does not give a very useful picture of the | 
 | 78 | performance of these functions.  One could separate inputs for these domains in | 
 | 79 | the same file by using the `name' directive that looks something like this: | 
 | 80 |  | 
 | 81 |   ##name: 240bit | 
 | 82 |  | 
 | 83 | See the pow-inputs file for an example of what such a partitioned input file | 
 | 84 | would look like. | 
 | 85 |  | 
 | 86 | Benchmark Sets: | 
 | 87 | ============== | 
 | 88 |  | 
 | 89 | In addition to standard benchmarking of functions, one may also generate | 
 | 90 | custom outputs for a set of functions.  This is currently used by string | 
 | 91 | function benchmarks where the aim is to compare performance between | 
 | 92 | implementations at various alignments and for various sizes. | 
 | 93 |  | 
 | 94 | To add a benchset for `foo': | 
 | 95 |  | 
 | 96 | - Add `foo' to the benchset variable. | 
 | 97 | - Write your bench-foo.c that prints out the measurements to stdout. | 
 | 98 | - On execution, a bench-foo.out is created in $(objpfx) with the contents of | 
 | 99 |   stdout. |