lh | 9ed821d | 2023-04-07 01:36:19 -0700 | [diff] [blame] | 1 | Using the glibc microbenchmark suite |
| 2 | ==================================== |
| 3 | |
| 4 | The glibc microbenchmark suite automatically generates code for specified |
| 5 | functions, builds and calls them repeatedly for given inputs to give some |
| 6 | basic performance properties of the function. |
| 7 | |
| 8 | Running the benchmark: |
| 9 | ===================== |
| 10 | |
| 11 | The benchmark needs python 2.7 or later in addition to the |
| 12 | dependencies required to build the GNU C Library. One may run the |
| 13 | benchmark by invoking make as follows: |
| 14 | |
| 15 | $ make bench |
| 16 | |
| 17 | This runs each function for 10 seconds and appends its output to |
| 18 | benchtests/bench.out. To ensure that the tests are rebuilt, one could run: |
| 19 | |
| 20 | $ make bench-clean |
| 21 | |
| 22 | The duration of each test can be configured setting the BENCH_DURATION variable |
| 23 | in the call to make. One should run `make bench-clean' before changing |
| 24 | BENCH_DURATION. |
| 25 | |
| 26 | $ make BENCH_DURATION=1 bench |
| 27 | |
| 28 | The benchmark suite does function call measurements using architecture-specific |
| 29 | high precision timing instructions whenever available. When such support is |
| 30 | not available, it uses clock_gettime (CLOCK_PROCESS_CPUTIME_ID). One can force |
| 31 | the benchmark to use clock_gettime by invoking make as follows: |
| 32 | |
| 33 | $ make USE_CLOCK_GETTIME=1 bench |
| 34 | |
| 35 | Again, one must run `make bench-clean' before changing the measurement method. |
| 36 | |
| 37 | Adding a function to benchtests: |
| 38 | =============================== |
| 39 | |
| 40 | If the name of the function is `foo', then the following procedure should allow |
| 41 | one to add `foo' to the bench tests: |
| 42 | |
| 43 | - Append the function name to the bench variable in the Makefile. |
| 44 | |
| 45 | - Make a file called `foo-inputs` to provide the definition and input for the |
| 46 | function. The file should have some directives telling the parser script |
| 47 | about the function and then one input per line. Directives are lines that |
| 48 | have a special meaning for the parser and they begin with two hashes '##'. |
| 49 | The following directives are recognized: |
| 50 | |
| 51 | - args: This should be assigned a colon separated list of types of the input |
| 52 | arguments. This directive may be skipped if the function does not take any |
| 53 | inputs. One may identify output arguments by nesting them in <>. The |
| 54 | generator will create variables to get outputs from the calling function. |
| 55 | - ret: This should be assigned the type that the function returns. This |
| 56 | directive may be skipped if the function does not return a value. |
| 57 | - includes: This should be assigned a comma-separated list of headers that |
| 58 | need to be included to provide declarations for the function and types it |
| 59 | may need (specifically, this includes using "#include <header>"). |
| 60 | - include-sources: This should be assigned a comma-separated list of source |
| 61 | files that need to be included to provide definitions of global variables |
| 62 | and functions (specifically, this includes using "#include "source"). |
| 63 | See pthread_once-inputs and pthreads_once-source.c for an example of how |
| 64 | to use this to benchmark a function that needs state across several calls. |
| 65 | - init: Name of an initializer function to call to initialize the benchtest. |
| 66 | - name: See following section for instructions on how to use this directive. |
| 67 | |
| 68 | Lines beginning with a single hash '#' are treated as comments. See |
| 69 | pow-inputs for an example of an input file. |
| 70 | |
| 71 | Multiple execution units per function: |
| 72 | ===================================== |
| 73 | |
| 74 | Some functions have distinct performance characteristics for different input |
| 75 | domains and it may be necessary to measure those separately. For example, some |
| 76 | math functions perform computations at different levels of precision (64-bit vs |
| 77 | 240-bit vs 768-bit) and mixing them does not give a very useful picture of the |
| 78 | performance of these functions. One could separate inputs for these domains in |
| 79 | the same file by using the `name' directive that looks something like this: |
| 80 | |
| 81 | ##name: 240bit |
| 82 | |
| 83 | See the pow-inputs file for an example of what such a partitioned input file |
| 84 | would look like. |
| 85 | |
| 86 | Benchmark Sets: |
| 87 | ============== |
| 88 | |
| 89 | In addition to standard benchmarking of functions, one may also generate |
| 90 | custom outputs for a set of functions. This is currently used by string |
| 91 | function benchmarks where the aim is to compare performance between |
| 92 | implementations at various alignments and for various sizes. |
| 93 | |
| 94 | To add a benchset for `foo': |
| 95 | |
| 96 | - Add `foo' to the benchset variable. |
| 97 | - Write your bench-foo.c that prints out the measurements to stdout. |
| 98 | - On execution, a bench-foo.out is created in $(objpfx) with the contents of |
| 99 | stdout. |