lh | 9ed821d | 2023-04-07 01:36:19 -0700 | [diff] [blame] | 1 | @node Searching and Sorting, Pattern Matching, Message Translation, Top |
| 2 | @c %MENU% General searching and sorting functions |
| 3 | @chapter Searching and Sorting |
| 4 | |
| 5 | This chapter describes functions for searching and sorting arrays of |
| 6 | arbitrary objects. You pass the appropriate comparison function to be |
| 7 | applied as an argument, along with the size of the objects in the array |
| 8 | and the total number of elements. |
| 9 | |
| 10 | @menu |
| 11 | * Comparison Functions:: Defining how to compare two objects. |
| 12 | Since the sort and search facilities |
| 13 | are general, you have to specify the |
| 14 | ordering. |
| 15 | * Array Search Function:: The @code{bsearch} function. |
| 16 | * Array Sort Function:: The @code{qsort} function. |
| 17 | * Search/Sort Example:: An example program. |
| 18 | * Hash Search Function:: The @code{hsearch} function. |
| 19 | * Tree Search Function:: The @code{tsearch} function. |
| 20 | @end menu |
| 21 | |
| 22 | @node Comparison Functions |
| 23 | @section Defining the Comparison Function |
| 24 | @cindex Comparison Function |
| 25 | |
| 26 | In order to use the sorted array library functions, you have to describe |
| 27 | how to compare the elements of the array. |
| 28 | |
| 29 | To do this, you supply a comparison function to compare two elements of |
| 30 | the array. The library will call this function, passing as arguments |
| 31 | pointers to two array elements to be compared. Your comparison function |
| 32 | should return a value the way @code{strcmp} (@pxref{String/Array |
| 33 | Comparison}) does: negative if the first argument is ``less'' than the |
| 34 | second, zero if they are ``equal'', and positive if the first argument |
| 35 | is ``greater''. |
| 36 | |
| 37 | Here is an example of a comparison function which works with an array of |
| 38 | numbers of type @code{double}: |
| 39 | |
| 40 | @smallexample |
| 41 | int |
| 42 | compare_doubles (const void *a, const void *b) |
| 43 | @{ |
| 44 | const double *da = (const double *) a; |
| 45 | const double *db = (const double *) b; |
| 46 | |
| 47 | return (*da > *db) - (*da < *db); |
| 48 | @} |
| 49 | @end smallexample |
| 50 | |
| 51 | The header file @file{stdlib.h} defines a name for the data type of |
| 52 | comparison functions. This type is a GNU extension. |
| 53 | |
| 54 | @comment stdlib.h |
| 55 | @comment GNU |
| 56 | @tindex comparison_fn_t |
| 57 | @smallexample |
| 58 | int comparison_fn_t (const void *, const void *); |
| 59 | @end smallexample |
| 60 | |
| 61 | @node Array Search Function |
| 62 | @section Array Search Function |
| 63 | @cindex search function (for arrays) |
| 64 | @cindex binary search function (for arrays) |
| 65 | @cindex array search function |
| 66 | |
| 67 | Generally searching for a specific element in an array means that |
| 68 | potentially all elements must be checked. @Theglibc{} contains |
| 69 | functions to perform linear search. The prototypes for the following |
| 70 | two functions can be found in @file{search.h}. |
| 71 | |
| 72 | @comment search.h |
| 73 | @comment SVID |
| 74 | @deftypefun {void *} lfind (const void *@var{key}, const void *@var{base}, size_t *@var{nmemb}, size_t @var{size}, comparison_fn_t @var{compar}) |
| 75 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 76 | The @code{lfind} function searches in the array with @code{*@var{nmemb}} |
| 77 | elements of @var{size} bytes pointed to by @var{base} for an element |
| 78 | which matches the one pointed to by @var{key}. The function pointed to |
| 79 | by @var{compar} is used decide whether two elements match. |
| 80 | |
| 81 | The return value is a pointer to the matching element in the array |
| 82 | starting at @var{base} if it is found. If no matching element is |
| 83 | available @code{NULL} is returned. |
| 84 | |
| 85 | The mean runtime of this function is @code{*@var{nmemb}}/2. This |
| 86 | function should only be used if elements often get added to or deleted from |
| 87 | the array in which case it might not be useful to sort the array before |
| 88 | searching. |
| 89 | @end deftypefun |
| 90 | |
| 91 | @comment search.h |
| 92 | @comment SVID |
| 93 | @deftypefun {void *} lsearch (const void *@var{key}, void *@var{base}, size_t *@var{nmemb}, size_t @var{size}, comparison_fn_t @var{compar}) |
| 94 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 95 | @c A signal handler that interrupted an insertion and performed an |
| 96 | @c insertion itself would leave the array in a corrupt state (e.g. one |
| 97 | @c new element initialized twice, with parts of both initializations |
| 98 | @c prevailing, and another uninitialized element), but this is just a |
| 99 | @c special case of races on user-controlled objects, that have to be |
| 100 | @c avoided by users. |
| 101 | |
| 102 | @c In case of cancellation, we know the array won't be left in a corrupt |
| 103 | @c state; the new element is initialized before the element count is |
| 104 | @c incremented, and the compiler can't reorder these operations because |
| 105 | @c it can't know that they don't alias. So, we'll either cancel after |
| 106 | @c the increment and the initialization are both complete, or the |
| 107 | @c increment won't have taken place, and so how far the initialization |
| 108 | @c got doesn't matter. |
| 109 | The @code{lsearch} function is similar to the @code{lfind} function. It |
| 110 | searches the given array for an element and returns it if found. The |
| 111 | difference is that if no matching element is found the @code{lsearch} |
| 112 | function adds the object pointed to by @var{key} (with a size of |
| 113 | @var{size} bytes) at the end of the array and it increments the value of |
| 114 | @code{*@var{nmemb}} to reflect this addition. |
| 115 | |
| 116 | This means for the caller that if it is not sure that the array contains |
| 117 | the element one is searching for the memory allocated for the array |
| 118 | starting at @var{base} must have room for at least @var{size} more |
| 119 | bytes. If one is sure the element is in the array it is better to use |
| 120 | @code{lfind} so having more room in the array is always necessary when |
| 121 | calling @code{lsearch}. |
| 122 | @end deftypefun |
| 123 | |
| 124 | To search a sorted array for an element matching the key, use the |
| 125 | @code{bsearch} function. The prototype for this function is in |
| 126 | the header file @file{stdlib.h}. |
| 127 | @pindex stdlib.h |
| 128 | |
| 129 | @comment stdlib.h |
| 130 | @comment ISO |
| 131 | @deftypefun {void *} bsearch (const void *@var{key}, const void *@var{array}, size_t @var{count}, size_t @var{size}, comparison_fn_t @var{compare}) |
| 132 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 133 | The @code{bsearch} function searches the sorted array @var{array} for an object |
| 134 | that is equivalent to @var{key}. The array contains @var{count} elements, |
| 135 | each of which is of size @var{size} bytes. |
| 136 | |
| 137 | The @var{compare} function is used to perform the comparison. This |
| 138 | function is called with two pointer arguments and should return an |
| 139 | integer less than, equal to, or greater than zero corresponding to |
| 140 | whether its first argument is considered less than, equal to, or greater |
| 141 | than its second argument. The elements of the @var{array} must already |
| 142 | be sorted in ascending order according to this comparison function. |
| 143 | |
| 144 | The return value is a pointer to the matching array element, or a null |
| 145 | pointer if no match is found. If the array contains more than one element |
| 146 | that matches, the one that is returned is unspecified. |
| 147 | |
| 148 | This function derives its name from the fact that it is implemented |
| 149 | using the binary search algorithm. |
| 150 | @end deftypefun |
| 151 | |
| 152 | @node Array Sort Function |
| 153 | @section Array Sort Function |
| 154 | @cindex sort function (for arrays) |
| 155 | @cindex quick sort function (for arrays) |
| 156 | @cindex array sort function |
| 157 | |
| 158 | To sort an array using an arbitrary comparison function, use the |
| 159 | @code{qsort} function. The prototype for this function is in |
| 160 | @file{stdlib.h}. |
| 161 | @pindex stdlib.h |
| 162 | |
| 163 | @comment stdlib.h |
| 164 | @comment ISO |
| 165 | @deftypefun void qsort (void *@var{array}, size_t @var{count}, size_t @var{size}, comparison_fn_t @var{compare}) |
| 166 | @safety{@prelim{}@mtsafe{}@assafe{}@acunsafe{@acucorrupt{}}} |
| 167 | The @code{qsort} function sorts the array @var{array}. The array |
| 168 | contains @var{count} elements, each of which is of size @var{size}. |
| 169 | |
| 170 | The @var{compare} function is used to perform the comparison on the |
| 171 | array elements. This function is called with two pointer arguments and |
| 172 | should return an integer less than, equal to, or greater than zero |
| 173 | corresponding to whether its first argument is considered less than, |
| 174 | equal to, or greater than its second argument. |
| 175 | |
| 176 | @cindex stable sorting |
| 177 | @strong{Warning:} If two objects compare as equal, their order after |
| 178 | sorting is unpredictable. That is to say, the sorting is not stable. |
| 179 | This can make a difference when the comparison considers only part of |
| 180 | the elements. Two elements with the same sort key may differ in other |
| 181 | respects. |
| 182 | |
| 183 | Although the object addresses passed to the comparison function lie |
| 184 | within the array, they need not correspond with the original locations |
| 185 | of those objects because the sorting algorithm may swap around objects |
| 186 | in the array before making some comparisons. The only way to perform |
| 187 | a stable sort with @code{qsort} is to first augment the objects with a |
| 188 | monotonic counter of some kind. |
| 189 | |
| 190 | Here is a simple example of sorting an array of doubles in numerical |
| 191 | order, using the comparison function defined above (@pxref{Comparison |
| 192 | Functions}): |
| 193 | |
| 194 | @smallexample |
| 195 | @{ |
| 196 | double *array; |
| 197 | int size; |
| 198 | @dots{} |
| 199 | qsort (array, size, sizeof (double), compare_doubles); |
| 200 | @} |
| 201 | @end smallexample |
| 202 | |
| 203 | The @code{qsort} function derives its name from the fact that it was |
| 204 | originally implemented using the ``quick sort'' algorithm. |
| 205 | |
| 206 | The implementation of @code{qsort} in this library might not be an |
| 207 | in-place sort and might thereby use an extra amount of memory to store |
| 208 | the array. |
| 209 | @end deftypefun |
| 210 | |
| 211 | @node Search/Sort Example |
| 212 | @section Searching and Sorting Example |
| 213 | |
| 214 | Here is an example showing the use of @code{qsort} and @code{bsearch} |
| 215 | with an array of structures. The objects in the array are sorted |
| 216 | by comparing their @code{name} fields with the @code{strcmp} function. |
| 217 | Then, we can look up individual objects based on their names. |
| 218 | |
| 219 | @comment This example is dedicated to the memory of Jim Henson. RIP. |
| 220 | @smallexample |
| 221 | @include search.c.texi |
| 222 | @end smallexample |
| 223 | |
| 224 | @cindex Kermit the frog |
| 225 | The output from this program looks like: |
| 226 | |
| 227 | @smallexample |
| 228 | Kermit, the frog |
| 229 | Piggy, the pig |
| 230 | Gonzo, the whatever |
| 231 | Fozzie, the bear |
| 232 | Sam, the eagle |
| 233 | Robin, the frog |
| 234 | Animal, the animal |
| 235 | Camilla, the chicken |
| 236 | Sweetums, the monster |
| 237 | Dr. Strangepork, the pig |
| 238 | Link Hogthrob, the pig |
| 239 | Zoot, the human |
| 240 | Dr. Bunsen Honeydew, the human |
| 241 | Beaker, the human |
| 242 | Swedish Chef, the human |
| 243 | |
| 244 | Animal, the animal |
| 245 | Beaker, the human |
| 246 | Camilla, the chicken |
| 247 | Dr. Bunsen Honeydew, the human |
| 248 | Dr. Strangepork, the pig |
| 249 | Fozzie, the bear |
| 250 | Gonzo, the whatever |
| 251 | Kermit, the frog |
| 252 | Link Hogthrob, the pig |
| 253 | Piggy, the pig |
| 254 | Robin, the frog |
| 255 | Sam, the eagle |
| 256 | Swedish Chef, the human |
| 257 | Sweetums, the monster |
| 258 | Zoot, the human |
| 259 | |
| 260 | Kermit, the frog |
| 261 | Gonzo, the whatever |
| 262 | Couldn't find Janice. |
| 263 | @end smallexample |
| 264 | |
| 265 | |
| 266 | @node Hash Search Function |
| 267 | @section The @code{hsearch} function. |
| 268 | |
| 269 | The functions mentioned so far in this chapter are for searching in a sorted |
| 270 | or unsorted array. There are other methods to organize information |
| 271 | which later should be searched. The costs of insert, delete and search |
| 272 | differ. One possible implementation is using hashing tables. |
| 273 | The following functions are declared in the header file @file{search.h}. |
| 274 | |
| 275 | @comment search.h |
| 276 | @comment SVID |
| 277 | @deftypefun int hcreate (size_t @var{nel}) |
| 278 | @safety{@prelim{}@mtunsafe{@mtasurace{:hsearch}}@asunsafe{@ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}} |
| 279 | @c hcreate @mtasurace:hsearch @ascuheap @acucorrupt @acsmem |
| 280 | @c hcreate_r dup @mtsrace:htab @ascuheap @acucorrupt @acsmem |
| 281 | The @code{hcreate} function creates a hashing table which can contain at |
| 282 | least @var{nel} elements. There is no possibility to grow this table so |
| 283 | it is necessary to choose the value for @var{nel} wisely. The method |
| 284 | used to implement this function might make it necessary to make the |
| 285 | number of elements in the hashing table larger than the expected maximal |
| 286 | number of elements. Hashing tables usually work inefficiently if they are |
| 287 | filled 80% or more. The constant access time guaranteed by hashing can |
| 288 | only be achieved if few collisions exist. See Knuth's ``The Art of |
| 289 | Computer Programming, Part 3: Searching and Sorting'' for more |
| 290 | information. |
| 291 | |
| 292 | The weakest aspect of this function is that there can be at most one |
| 293 | hashing table used through the whole program. The table is allocated |
| 294 | in local memory out of control of the programmer. As an extension @theglibc{} |
| 295 | provides an additional set of functions with a reentrant |
| 296 | interface which provide a similar interface but which allow to keep |
| 297 | arbitrarily many hashing tables. |
| 298 | |
| 299 | It is possible to use more than one hashing table in the program run if |
| 300 | the former table is first destroyed by a call to @code{hdestroy}. |
| 301 | |
| 302 | The function returns a non-zero value if successful. If it return zero |
| 303 | something went wrong. This could either mean there is already a hashing |
| 304 | table in use or the program runs out of memory. |
| 305 | @end deftypefun |
| 306 | |
| 307 | @comment search.h |
| 308 | @comment SVID |
| 309 | @deftypefun void hdestroy (void) |
| 310 | @safety{@prelim{}@mtunsafe{@mtasurace{:hsearch}}@asunsafe{@ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}} |
| 311 | @c hdestroy @mtasurace:hsearch @ascuheap @acucorrupt @acsmem |
| 312 | @c hdestroy_r dup @mtsrace:htab @ascuheap @acucorrupt @acsmem |
| 313 | The @code{hdestroy} function can be used to free all the resources |
| 314 | allocated in a previous call of @code{hcreate}. After a call to this |
| 315 | function it is again possible to call @code{hcreate} and allocate a new |
| 316 | table with possibly different size. |
| 317 | |
| 318 | It is important to remember that the elements contained in the hashing |
| 319 | table at the time @code{hdestroy} is called are @emph{not} freed by this |
| 320 | function. It is the responsibility of the program code to free those |
| 321 | strings (if necessary at all). Freeing all the element memory is not |
| 322 | possible without extra, separately kept information since there is no |
| 323 | function to iterate through all available elements in the hashing table. |
| 324 | If it is really necessary to free a table and all elements the |
| 325 | programmer has to keep a list of all table elements and before calling |
| 326 | @code{hdestroy} s/he has to free all element's data using this list. |
| 327 | This is a very unpleasant mechanism and it also shows that this kind of |
| 328 | hashing tables is mainly meant for tables which are created once and |
| 329 | used until the end of the program run. |
| 330 | @end deftypefun |
| 331 | |
| 332 | Entries of the hashing table and keys for the search are defined using |
| 333 | this type: |
| 334 | |
| 335 | @deftp {Data type} {struct ENTRY} |
| 336 | Both elements of this structure are pointers to zero-terminated strings. |
| 337 | This is a limiting restriction of the functionality of the |
| 338 | @code{hsearch} functions. They can only be used for data sets which use |
| 339 | the NUL character always and solely to terminate the records. It is not |
| 340 | possible to handle general binary data. |
| 341 | |
| 342 | @table @code |
| 343 | @item char *key |
| 344 | Pointer to a zero-terminated string of characters describing the key for |
| 345 | the search or the element in the hashing table. |
| 346 | @item char *data |
| 347 | Pointer to a zero-terminated string of characters describing the data. |
| 348 | If the functions will be called only for searching an existing entry |
| 349 | this element might stay undefined since it is not used. |
| 350 | @end table |
| 351 | @end deftp |
| 352 | |
| 353 | @comment search.h |
| 354 | @comment SVID |
| 355 | @deftypefun {ENTRY *} hsearch (ENTRY @var{item}, ACTION @var{action}) |
| 356 | @safety{@prelim{}@mtunsafe{@mtasurace{:hsearch}}@asunsafe{}@acunsafe{@acucorrupt{/action==ENTER}}} |
| 357 | @c hsearch @mtasurace:hsearch @acucorrupt/action==ENTER |
| 358 | @c hsearch_r dup @mtsrace:htab @acucorrupt/action==ENTER |
| 359 | To search in a hashing table created using @code{hcreate} the |
| 360 | @code{hsearch} function must be used. This function can perform simple |
| 361 | search for an element (if @var{action} has the @code{FIND}) or it can |
| 362 | alternatively insert the key element into the hashing table. Entries |
| 363 | are never replaced. |
| 364 | |
| 365 | The key is denoted by a pointer to an object of type @code{ENTRY}. For |
| 366 | locating the corresponding position in the hashing table only the |
| 367 | @code{key} element of the structure is used. |
| 368 | |
| 369 | If an entry with matching key is found the @var{action} parameter is |
| 370 | irrelevant. The found entry is returned. If no matching entry is found |
| 371 | and the @var{action} parameter has the value @code{FIND} the function |
| 372 | returns a @code{NULL} pointer. If no entry is found and the |
| 373 | @var{action} parameter has the value @code{ENTER} a new entry is added |
| 374 | to the hashing table which is initialized with the parameter @var{item}. |
| 375 | A pointer to the newly added entry is returned. |
| 376 | @end deftypefun |
| 377 | |
| 378 | As mentioned before the hashing table used by the functions described so |
| 379 | far is global and there can be at any time at most one hashing table in |
| 380 | the program. A solution is to use the following functions which are a |
| 381 | GNU extension. All have in common that they operate on a hashing table |
| 382 | which is described by the content of an object of the type @code{struct |
| 383 | hsearch_data}. This type should be treated as opaque, none of its |
| 384 | members should be changed directly. |
| 385 | |
| 386 | @comment search.h |
| 387 | @comment GNU |
| 388 | @deftypefun int hcreate_r (size_t @var{nel}, struct hsearch_data *@var{htab}) |
| 389 | @safety{@prelim{}@mtsafe{@mtsrace{:htab}}@asunsafe{@ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}} |
| 390 | @c Unlike the lsearch array, the htab is (at least in part) opaque, so |
| 391 | @c let's make it absolutely clear that ensuring exclusive access is a |
| 392 | @c caller responsibility. |
| 393 | |
| 394 | @c Cancellation is unlikely to leave the htab in a corrupt state: the |
| 395 | @c last field to be initialized is the one that tells whether the entire |
| 396 | @c data structure was initialized, and there's a function call (calloc) |
| 397 | @c in between that will often ensure all other fields are written before |
| 398 | @c the table. However, should this call be inlined (say with LTO), this |
| 399 | @c assumption may not hold. The calloc call doesn't cross our library |
| 400 | @c interface barrier, so let's consider this could happen and mark this |
| 401 | @c with @acucorrupt. It's no safety loss, since we already have |
| 402 | @c @ascuheap anyway... |
| 403 | |
| 404 | @c hcreate_r @mtsrace:htab @ascuheap @acucorrupt @acsmem |
| 405 | @c isprime ok |
| 406 | @c calloc dup @ascuheap @acsmem |
| 407 | The @code{hcreate_r} function initializes the object pointed to by |
| 408 | @var{htab} to contain a hashing table with at least @var{nel} elements. |
| 409 | So this function is equivalent to the @code{hcreate} function except |
| 410 | that the initialized data structure is controlled by the user. |
| 411 | |
| 412 | This allows having more than one hashing table at one time. The memory |
| 413 | necessary for the @code{struct hsearch_data} object can be allocated |
| 414 | dynamically. It must be initialized with zero before calling this |
| 415 | function. |
| 416 | |
| 417 | The return value is non-zero if the operation was successful. If the |
| 418 | return value is zero, something went wrong, which probably means the |
| 419 | programs ran out of memory. |
| 420 | @end deftypefun |
| 421 | |
| 422 | @comment search.h |
| 423 | @comment GNU |
| 424 | @deftypefun void hdestroy_r (struct hsearch_data *@var{htab}) |
| 425 | @safety{@prelim{}@mtsafe{@mtsrace{:htab}}@asunsafe{@ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}} |
| 426 | @c The table is released while the table pointer still points to it. |
| 427 | @c Async cancellation is thus unsafe, but it already was because we call |
| 428 | @c free(). Using the table in a handler while it's being released would |
| 429 | @c also be dangerous, but calling free() already makes it unsafe, and |
| 430 | @c the requirement on the caller to ensure exclusive access already |
| 431 | @c guarantees this doesn't happen, so we don't get @asucorrupt. |
| 432 | |
| 433 | @c hdestroy_r @mtsrace:htab @ascuheap @acucorrupt @acsmem |
| 434 | @c free dup @ascuheap @acsmem |
| 435 | The @code{hdestroy_r} function frees all resources allocated by the |
| 436 | @code{hcreate_r} function for this very same object @var{htab}. As for |
| 437 | @code{hdestroy} it is the programs responsibility to free the strings |
| 438 | for the elements of the table. |
| 439 | @end deftypefun |
| 440 | |
| 441 | @comment search.h |
| 442 | @comment GNU |
| 443 | @deftypefun int hsearch_r (ENTRY @var{item}, ACTION @var{action}, ENTRY **@var{retval}, struct hsearch_data *@var{htab}) |
| 444 | @safety{@prelim{}@mtsafe{@mtsrace{:htab}}@assafe{}@acunsafe{@acucorrupt{/action==ENTER}}} |
| 445 | @c Callers have to ensure mutual exclusion; insertion, if cancelled, |
| 446 | @c leaves the table in a corrupt state. |
| 447 | |
| 448 | @c hsearch_r @mtsrace:htab @acucorrupt/action==ENTER |
| 449 | @c strlen dup ok |
| 450 | @c strcmp dup ok |
| 451 | The @code{hsearch_r} function is equivalent to @code{hsearch}. The |
| 452 | meaning of the first two arguments is identical. But instead of |
| 453 | operating on a single global hashing table the function works on the |
| 454 | table described by the object pointed to by @var{htab} (which is |
| 455 | initialized by a call to @code{hcreate_r}). |
| 456 | |
| 457 | Another difference to @code{hcreate} is that the pointer to the found |
| 458 | entry in the table is not the return value of the functions. It is |
| 459 | returned by storing it in a pointer variables pointed to by the |
| 460 | @var{retval} parameter. The return value of the function is an integer |
| 461 | value indicating success if it is non-zero and failure if it is zero. |
| 462 | In the latter case the global variable @var{errno} signals the reason for |
| 463 | the failure. |
| 464 | |
| 465 | @table @code |
| 466 | @item ENOMEM |
| 467 | The table is filled and @code{hsearch_r} was called with a so far |
| 468 | unknown key and @var{action} set to @code{ENTER}. |
| 469 | @item ESRCH |
| 470 | The @var{action} parameter is @code{FIND} and no corresponding element |
| 471 | is found in the table. |
| 472 | @end table |
| 473 | @end deftypefun |
| 474 | |
| 475 | |
| 476 | @node Tree Search Function |
| 477 | @section The @code{tsearch} function. |
| 478 | |
| 479 | Another common form to organize data for efficient search is to use |
| 480 | trees. The @code{tsearch} function family provides a nice interface to |
| 481 | functions to organize possibly large amounts of data by providing a mean |
| 482 | access time proportional to the logarithm of the number of elements. |
| 483 | @Theglibc{} implementation even guarantees that this bound is |
| 484 | never exceeded even for input data which cause problems for simple |
| 485 | binary tree implementations. |
| 486 | |
| 487 | The functions described in the chapter are all described in the @w{System |
| 488 | V} and X/Open specifications and are therefore quite portable. |
| 489 | |
| 490 | In contrast to the @code{hsearch} functions the @code{tsearch} functions |
| 491 | can be used with arbitrary data and not only zero-terminated strings. |
| 492 | |
| 493 | The @code{tsearch} functions have the advantage that no function to |
| 494 | initialize data structures is necessary. A simple pointer of type |
| 495 | @code{void *} initialized to @code{NULL} is a valid tree and can be |
| 496 | extended or searched. The prototypes for these functions can be found |
| 497 | in the header file @file{search.h}. |
| 498 | |
| 499 | @comment search.h |
| 500 | @comment SVID |
| 501 | @deftypefun {void *} tsearch (const void *@var{key}, void **@var{rootp}, comparison_fn_t @var{compar}) |
| 502 | @safety{@prelim{}@mtsafe{@mtsrace{:rootp}}@asunsafe{@ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}} |
| 503 | @c The tree is not modified in a thread-safe manner, and rotations may |
| 504 | @c leave the tree in an inconsistent state that could be observed in an |
| 505 | @c asynchronous signal handler (except for the caller-synchronization |
| 506 | @c requirement) or after asynchronous cancellation of the thread |
| 507 | @c performing the rotation or the insertion. |
| 508 | The @code{tsearch} function searches in the tree pointed to by |
| 509 | @code{*@var{rootp}} for an element matching @var{key}. The function |
| 510 | pointed to by @var{compar} is used to determine whether two elements |
| 511 | match. @xref{Comparison Functions}, for a specification of the functions |
| 512 | which can be used for the @var{compar} parameter. |
| 513 | |
| 514 | If the tree does not contain a matching entry the @var{key} value will |
| 515 | be added to the tree. @code{tsearch} does not make a copy of the object |
| 516 | pointed to by @var{key} (how could it since the size is unknown). |
| 517 | Instead it adds a reference to this object which means the object must |
| 518 | be available as long as the tree data structure is used. |
| 519 | |
| 520 | The tree is represented by a pointer to a pointer since it is sometimes |
| 521 | necessary to change the root node of the tree. So it must not be |
| 522 | assumed that the variable pointed to by @var{rootp} has the same value |
| 523 | after the call. This also shows that it is not safe to call the |
| 524 | @code{tsearch} function more than once at the same time using the same |
| 525 | tree. It is no problem to run it more than once at a time on different |
| 526 | trees. |
| 527 | |
| 528 | The return value is a pointer to the matching element in the tree. If a |
| 529 | new element was created the pointer points to the new data (which is in |
| 530 | fact @var{key}). If an entry had to be created and the program ran out |
| 531 | of space @code{NULL} is returned. |
| 532 | @end deftypefun |
| 533 | |
| 534 | @comment search.h |
| 535 | @comment SVID |
| 536 | @deftypefun {void *} tfind (const void *@var{key}, void *const *@var{rootp}, comparison_fn_t @var{compar}) |
| 537 | @safety{@prelim{}@mtsafe{@mtsrace{:rootp}}@assafe{}@acsafe{}} |
| 538 | The @code{tfind} function is similar to the @code{tsearch} function. It |
| 539 | locates an element matching the one pointed to by @var{key} and returns |
| 540 | a pointer to this element. But if no matching element is available no |
| 541 | new element is entered (note that the @var{rootp} parameter points to a |
| 542 | constant pointer). Instead the function returns @code{NULL}. |
| 543 | @end deftypefun |
| 544 | |
| 545 | Another advantage of the @code{tsearch} function in contrast to the |
| 546 | @code{hsearch} functions is that there is an easy way to remove |
| 547 | elements. |
| 548 | |
| 549 | @comment search.h |
| 550 | @comment SVID |
| 551 | @deftypefun {void *} tdelete (const void *@var{key}, void **@var{rootp}, comparison_fn_t @var{compar}) |
| 552 | @safety{@prelim{}@mtsafe{@mtsrace{:rootp}}@asunsafe{@ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}} |
| 553 | To remove a specific element matching @var{key} from the tree |
| 554 | @code{tdelete} can be used. It locates the matching element using the |
| 555 | same method as @code{tfind}. The corresponding element is then removed |
| 556 | and a pointer to the parent of the deleted node is returned by the |
| 557 | function. If there is no matching entry in the tree nothing can be |
| 558 | deleted and the function returns @code{NULL}. If the root of the tree |
| 559 | is deleted @code{tdelete} returns some unspecified value not equal to |
| 560 | @code{NULL}. |
| 561 | @end deftypefun |
| 562 | |
| 563 | @comment search.h |
| 564 | @comment GNU |
| 565 | @deftypefun void tdestroy (void *@var{vroot}, __free_fn_t @var{freefct}) |
| 566 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| 567 | If the complete search tree has to be removed one can use |
| 568 | @code{tdestroy}. It frees all resources allocated by the @code{tsearch} |
| 569 | function to generate the tree pointed to by @var{vroot}. |
| 570 | |
| 571 | For the data in each tree node the function @var{freefct} is called. |
| 572 | The pointer to the data is passed as the argument to the function. If |
| 573 | no such work is necessary @var{freefct} must point to a function doing |
| 574 | nothing. It is called in any case. |
| 575 | |
| 576 | This function is a GNU extension and not covered by the @w{System V} or |
| 577 | X/Open specifications. |
| 578 | @end deftypefun |
| 579 | |
| 580 | In addition to the function to create and destroy the tree data |
| 581 | structure, there is another function which allows you to apply a |
| 582 | function to all elements of the tree. The function must have this type: |
| 583 | |
| 584 | @smallexample |
| 585 | void __action_fn_t (const void *nodep, VISIT value, int level); |
| 586 | @end smallexample |
| 587 | |
| 588 | The @var{nodep} is the data value of the current node (once given as the |
| 589 | @var{key} argument to @code{tsearch}). @var{level} is a numeric value |
| 590 | which corresponds to the depth of the current node in the tree. The |
| 591 | root node has the depth @math{0} and its children have a depth of |
| 592 | @math{1} and so on. The @code{VISIT} type is an enumeration type. |
| 593 | |
| 594 | @deftp {Data Type} VISIT |
| 595 | The @code{VISIT} value indicates the status of the current node in the |
| 596 | tree and how the function is called. The status of a node is either |
| 597 | `leaf' or `internal node'. For each leaf node the function is called |
| 598 | exactly once, for each internal node it is called three times: before |
| 599 | the first child is processed, after the first child is processed and |
| 600 | after both children are processed. This makes it possible to handle all |
| 601 | three methods of tree traversal (or even a combination of them). |
| 602 | |
| 603 | @table @code |
| 604 | @item preorder |
| 605 | The current node is an internal node and the function is called before |
| 606 | the first child was processed. |
| 607 | @item postorder |
| 608 | The current node is an internal node and the function is called after |
| 609 | the first child was processed. |
| 610 | @item endorder |
| 611 | The current node is an internal node and the function is called after |
| 612 | the second child was processed. |
| 613 | @item leaf |
| 614 | The current node is a leaf. |
| 615 | @end table |
| 616 | @end deftp |
| 617 | |
| 618 | @comment search.h |
| 619 | @comment SVID |
| 620 | @deftypefun void twalk (const void *@var{root}, __action_fn_t @var{action}) |
| 621 | @safety{@prelim{}@mtsafe{@mtsrace{:root}}@assafe{}@acsafe{}} |
| 622 | For each node in the tree with a node pointed to by @var{root}, the |
| 623 | @code{twalk} function calls the function provided by the parameter |
| 624 | @var{action}. For leaf nodes the function is called exactly once with |
| 625 | @var{value} set to @code{leaf}. For internal nodes the function is |
| 626 | called three times, setting the @var{value} parameter or @var{action} to |
| 627 | the appropriate value. The @var{level} argument for the @var{action} |
| 628 | function is computed while descending the tree with increasing the value |
| 629 | by one for the descend to a child, starting with the value @math{0} for |
| 630 | the root node. |
| 631 | |
| 632 | Since the functions used for the @var{action} parameter to @code{twalk} |
| 633 | must not modify the tree data, it is safe to run @code{twalk} in more |
| 634 | than one thread at the same time, working on the same tree. It is also |
| 635 | safe to call @code{tfind} in parallel. Functions which modify the tree |
| 636 | must not be used, otherwise the behavior is undefined. |
| 637 | @end deftypefun |