xf.li | bdd93d5 | 2023-05-12 07:10:14 -0700 | [diff] [blame^] | 1 | @node String and Array Utilities, Character Set Handling, Character Handling, Top |
| 2 | @c %MENU% Utilities for copying and comparing strings and arrays |
| 3 | @chapter String and Array Utilities |
| 4 | |
| 5 | Operations on strings (null-terminated byte sequences) are an important part of |
| 6 | many programs. @Theglibc{} provides an extensive set of string |
| 7 | utility functions, including functions for copying, concatenating, |
| 8 | comparing, and searching strings. Many of these functions can also |
| 9 | operate on arbitrary regions of storage; for example, the @code{memcpy} |
| 10 | function can be used to copy the contents of any kind of array. |
| 11 | |
| 12 | It's fairly common for beginning C programmers to ``reinvent the wheel'' |
| 13 | by duplicating this functionality in their own code, but it pays to |
| 14 | become familiar with the library functions and to make use of them, |
| 15 | since this offers benefits in maintenance, efficiency, and portability. |
| 16 | |
| 17 | For instance, you could easily compare one string to another in two |
| 18 | lines of C code, but if you use the built-in @code{strcmp} function, |
| 19 | you're less likely to make a mistake. And, since these library |
| 20 | functions are typically highly optimized, your program may run faster |
| 21 | too. |
| 22 | |
| 23 | @menu |
| 24 | * Representation of Strings:: Introduction to basic concepts. |
| 25 | * String/Array Conventions:: Whether to use a string function or an |
| 26 | arbitrary array function. |
| 27 | * String Length:: Determining the length of a string. |
| 28 | * Copying Strings and Arrays:: Functions to copy strings and arrays. |
| 29 | * Concatenating Strings:: Functions to concatenate strings while copying. |
| 30 | * Truncating Strings:: Functions to truncate strings while copying. |
| 31 | * String/Array Comparison:: Functions for byte-wise and character-wise |
| 32 | comparison. |
| 33 | * Collation Functions:: Functions for collating strings. |
| 34 | * Search Functions:: Searching for a specific element or substring. |
| 35 | * Finding Tokens in a String:: Splitting a string into tokens by looking |
| 36 | for delimiters. |
| 37 | * strfry:: Function for flash-cooking a string. |
| 38 | * Trivial Encryption:: Obscuring data. |
| 39 | * Encode Binary Data:: Encoding and Decoding of Binary Data. |
| 40 | * Argz and Envz Vectors:: Null-separated string vectors. |
| 41 | @end menu |
| 42 | |
| 43 | @node Representation of Strings |
| 44 | @section Representation of Strings |
| 45 | @cindex string, representation of |
| 46 | |
| 47 | This section is a quick summary of string concepts for beginning C |
| 48 | programmers. It describes how strings are represented in C |
| 49 | and some common pitfalls. If you are already familiar with this |
| 50 | material, you can skip this section. |
| 51 | |
| 52 | @cindex string |
| 53 | A @dfn{string} is a null-terminated array of bytes of type @code{char}, |
| 54 | including the terminating null byte. String-valued |
| 55 | variables are usually declared to be pointers of type @code{char *}. |
| 56 | Such variables do not include space for the text of a string; that has |
| 57 | to be stored somewhere else---in an array variable, a string constant, |
| 58 | or dynamically allocated memory (@pxref{Memory Allocation}). It's up to |
| 59 | you to store the address of the chosen memory space into the pointer |
| 60 | variable. Alternatively you can store a @dfn{null pointer} in the |
| 61 | pointer variable. The null pointer does not point anywhere, so |
| 62 | attempting to reference the string it points to gets an error. |
| 63 | |
| 64 | @cindex multibyte character |
| 65 | @cindex multibyte string |
| 66 | @cindex wide string |
| 67 | A @dfn{multibyte character} is a sequence of one or more bytes that |
| 68 | represents a single character using the locale's encoding scheme; a |
| 69 | null byte always represents the null character. A @dfn{multibyte |
| 70 | string} is a string that consists entirely of multibyte |
| 71 | characters. In contrast, a @dfn{wide string} is a null-terminated |
| 72 | sequence of @code{wchar_t} objects. A wide-string variable is usually |
| 73 | declared to be a pointer of type @code{wchar_t *}, by analogy with |
| 74 | string variables and @code{char *}. @xref{Extended Char Intro}. |
| 75 | |
| 76 | @cindex null byte |
| 77 | @cindex null wide character |
| 78 | By convention, the @dfn{null byte}, @code{'\0'}, |
| 79 | marks the end of a string and the @dfn{null wide character}, |
| 80 | @code{L'\0'}, marks the end of a wide string. For example, in |
| 81 | testing to see whether the @code{char *} variable @var{p} points to a |
| 82 | null byte marking the end of a string, you can write |
| 83 | @code{!*@var{p}} or @code{*@var{p} == '\0'}. |
| 84 | |
| 85 | A null byte is quite different conceptually from a null pointer, |
| 86 | although both are represented by the integer constant @code{0}. |
| 87 | |
| 88 | @cindex string literal |
| 89 | A @dfn{string literal} appears in C program source as a multibyte |
| 90 | string between double-quote characters (@samp{"}). If the |
| 91 | initial double-quote character is immediately preceded by a capital |
| 92 | @samp{L} (ell) character (as in @code{L"foo"}), it is a wide string |
| 93 | literal. String literals can also contribute to @dfn{string |
| 94 | concatenation}: @code{"a" "b"} is the same as @code{"ab"}. |
| 95 | For wide strings one can use either |
| 96 | @code{L"a" L"b"} or @code{L"a" "b"}. Modification of string literals is |
| 97 | not allowed by the GNU C compiler, because literals are placed in |
| 98 | read-only storage. |
| 99 | |
| 100 | Arrays that are declared @code{const} cannot be modified |
| 101 | either. It's generally good style to declare non-modifiable string |
| 102 | pointers to be of type @code{const char *}, since this often allows the |
| 103 | C compiler to detect accidental modifications as well as providing some |
| 104 | amount of documentation about what your program intends to do with the |
| 105 | string. |
| 106 | |
| 107 | The amount of memory allocated for a byte array may extend past the null byte |
| 108 | that marks the end of the string that the array contains. In this |
| 109 | document, the term @dfn{allocated size} is always used to refer to the |
| 110 | total amount of memory allocated for an array, while the term |
| 111 | @dfn{length} refers to the number of bytes up to (but not including) |
| 112 | the terminating null byte. Wide strings are similar, except their |
| 113 | sizes and lengths count wide characters, not bytes. |
| 114 | @cindex length of string |
| 115 | @cindex allocation size of string |
| 116 | @cindex size of string |
| 117 | @cindex string length |
| 118 | @cindex string allocation |
| 119 | |
| 120 | A notorious source of program bugs is trying to put more bytes into a |
| 121 | string than fit in its allocated size. When writing code that extends |
| 122 | strings or moves bytes into a pre-allocated array, you should be |
| 123 | very careful to keep track of the length of the text and make explicit |
| 124 | checks for overflowing the array. Many of the library functions |
| 125 | @emph{do not} do this for you! Remember also that you need to allocate |
| 126 | an extra byte to hold the null byte that marks the end of the |
| 127 | string. |
| 128 | |
| 129 | @cindex single-byte string |
| 130 | @cindex multibyte string |
| 131 | Originally strings were sequences of bytes where each byte represented a |
| 132 | single character. This is still true today if the strings are encoded |
| 133 | using a single-byte character encoding. Things are different if the |
| 134 | strings are encoded using a multibyte encoding (for more information on |
| 135 | encodings see @ref{Extended Char Intro}). There is no difference in |
| 136 | the programming interface for these two kind of strings; the programmer |
| 137 | has to be aware of this and interpret the byte sequences accordingly. |
| 138 | |
| 139 | But since there is no separate interface taking care of these |
| 140 | differences the byte-based string functions are sometimes hard to use. |
| 141 | Since the count parameters of these functions specify bytes a call to |
| 142 | @code{memcpy} could cut a multibyte character in the middle and put an |
| 143 | incomplete (and therefore unusable) byte sequence in the target buffer. |
| 144 | |
| 145 | @cindex wide string |
| 146 | To avoid these problems later versions of the @w{ISO C} standard |
| 147 | introduce a second set of functions which are operating on @dfn{wide |
| 148 | characters} (@pxref{Extended Char Intro}). These functions don't have |
| 149 | the problems the single-byte versions have since every wide character is |
| 150 | a legal, interpretable value. This does not mean that cutting wide |
| 151 | strings at arbitrary points is without problems. It normally |
| 152 | is for alphabet-based languages (except for non-normalized text) but |
| 153 | languages based on syllables still have the problem that more than one |
| 154 | wide character is necessary to complete a logical unit. This is a |
| 155 | higher level problem which the @w{C library} functions are not designed |
| 156 | to solve. But it is at least good that no invalid byte sequences can be |
| 157 | created. Also, the higher level functions can also much more easily operate |
| 158 | on wide characters than on multibyte characters so that a common strategy |
| 159 | is to use wide characters internally whenever text is more than simply |
| 160 | copied. |
| 161 | |
| 162 | The remaining of this chapter will discuss the functions for handling |
| 163 | wide strings in parallel with the discussion of |
| 164 | strings since there is almost always an exact equivalent |
| 165 | available. |
| 166 | |
| 167 | @node String/Array Conventions |
| 168 | @section String and Array Conventions |
| 169 | |
| 170 | This chapter describes both functions that work on arbitrary arrays or |
| 171 | blocks of memory, and functions that are specific to strings and wide |
| 172 | strings. |
| 173 | |
| 174 | Functions that operate on arbitrary blocks of memory have names |
| 175 | beginning with @samp{mem} and @samp{wmem} (such as @code{memcpy} and |
| 176 | @code{wmemcpy}) and invariably take an argument which specifies the size |
| 177 | (in bytes and wide characters respectively) of the block of memory to |
| 178 | operate on. The array arguments and return values for these functions |
| 179 | have type @code{void *} or @code{wchar_t}. As a matter of style, the |
| 180 | elements of the arrays used with the @samp{mem} functions are referred |
| 181 | to as ``bytes''. You can pass any kind of pointer to these functions, |
| 182 | and the @code{sizeof} operator is useful in computing the value for the |
| 183 | size argument. Parameters to the @samp{wmem} functions must be of type |
| 184 | @code{wchar_t *}. These functions are not really usable with anything |
| 185 | but arrays of this type. |
| 186 | |
| 187 | In contrast, functions that operate specifically on strings and wide |
| 188 | strings have names beginning with @samp{str} and @samp{wcs} |
| 189 | respectively (such as @code{strcpy} and @code{wcscpy}) and look for a |
| 190 | terminating null byte or null wide character instead of requiring an explicit |
| 191 | size argument to be passed. (Some of these functions accept a specified |
| 192 | maximum length, but they also check for premature termination.) |
| 193 | The array arguments and return values for these |
| 194 | functions have type @code{char *} and @code{wchar_t *} respectively, and |
| 195 | the array elements are referred to as ``bytes'' and ``wide |
| 196 | characters''. |
| 197 | |
| 198 | In many cases, there are both @samp{mem} and @samp{str}/@samp{wcs} |
| 199 | versions of a function. The one that is more appropriate to use depends |
| 200 | on the exact situation. When your program is manipulating arbitrary |
| 201 | arrays or blocks of storage, then you should always use the @samp{mem} |
| 202 | functions. On the other hand, when you are manipulating |
| 203 | strings it is usually more convenient to use the @samp{str}/@samp{wcs} |
| 204 | functions, unless you already know the length of the string in advance. |
| 205 | The @samp{wmem} functions should be used for wide character arrays with |
| 206 | known size. |
| 207 | |
| 208 | @cindex wint_t |
| 209 | @cindex parameter promotion |
| 210 | Some of the memory and string functions take single characters as |
| 211 | arguments. Since a value of type @code{char} is automatically promoted |
| 212 | into a value of type @code{int} when used as a parameter, the functions |
| 213 | are declared with @code{int} as the type of the parameter in question. |
| 214 | In case of the wide character functions the situation is similar: the |
| 215 | parameter type for a single wide character is @code{wint_t} and not |
| 216 | @code{wchar_t}. This would for many implementations not be necessary |
| 217 | since @code{wchar_t} is large enough to not be automatically |
| 218 | promoted, but since the @w{ISO C} standard does not require such a |
| 219 | choice of types the @code{wint_t} type is used. |
| 220 | |
| 221 | @node String Length |
| 222 | @section String Length |
| 223 | |
| 224 | You can get the length of a string using the @code{strlen} function. |
| 225 | This function is declared in the header file @file{string.h}. |
| 226 | @pindex string.h |
| 227 | |
| 228 | @comment string.h |
| 229 | @comment ISO |
| 230 | @deftypefun size_t strlen (const char *@var{s}) |
| 231 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 232 | The @code{strlen} function returns the length of the |
| 233 | string @var{s} in bytes. (In other words, it returns the offset of the |
| 234 | terminating null byte within the array.) |
| 235 | |
| 236 | For example, |
| 237 | @smallexample |
| 238 | strlen ("hello, world") |
| 239 | @result{} 12 |
| 240 | @end smallexample |
| 241 | |
| 242 | When applied to an array, the @code{strlen} function returns |
| 243 | the length of the string stored there, not its allocated size. You can |
| 244 | get the allocated size of the array that holds a string using |
| 245 | the @code{sizeof} operator: |
| 246 | |
| 247 | @smallexample |
| 248 | char string[32] = "hello, world"; |
| 249 | sizeof (string) |
| 250 | @result{} 32 |
| 251 | strlen (string) |
| 252 | @result{} 12 |
| 253 | @end smallexample |
| 254 | |
| 255 | But beware, this will not work unless @var{string} is the |
| 256 | array itself, not a pointer to it. For example: |
| 257 | |
| 258 | @smallexample |
| 259 | char string[32] = "hello, world"; |
| 260 | char *ptr = string; |
| 261 | sizeof (string) |
| 262 | @result{} 32 |
| 263 | sizeof (ptr) |
| 264 | @result{} 4 /* @r{(on a machine with 4 byte pointers)} */ |
| 265 | @end smallexample |
| 266 | |
| 267 | This is an easy mistake to make when you are working with functions that |
| 268 | take string arguments; those arguments are always pointers, not arrays. |
| 269 | |
| 270 | It must also be noted that for multibyte encoded strings the return |
| 271 | value does not have to correspond to the number of characters in the |
| 272 | string. To get this value the string can be converted to wide |
| 273 | characters and @code{wcslen} can be used or something like the following |
| 274 | code can be used: |
| 275 | |
| 276 | @smallexample |
| 277 | /* @r{The input is in @code{string}.} |
| 278 | @r{The length is expected in @code{n}.} */ |
| 279 | @{ |
| 280 | mbstate_t t; |
| 281 | char *scopy = string; |
| 282 | /* In initial state. */ |
| 283 | memset (&t, '\0', sizeof (t)); |
| 284 | /* Determine number of characters. */ |
| 285 | n = mbsrtowcs (NULL, &scopy, strlen (scopy), &t); |
| 286 | @} |
| 287 | @end smallexample |
| 288 | |
| 289 | This is cumbersome to do so if the number of characters (as opposed to |
| 290 | bytes) is needed often it is better to work with wide characters. |
| 291 | @end deftypefun |
| 292 | |
| 293 | The wide character equivalent is declared in @file{wchar.h}. |
| 294 | |
| 295 | @comment wchar.h |
| 296 | @comment ISO |
| 297 | @deftypefun size_t wcslen (const wchar_t *@var{ws}) |
| 298 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 299 | The @code{wcslen} function is the wide character equivalent to |
| 300 | @code{strlen}. The return value is the number of wide characters in the |
| 301 | wide string pointed to by @var{ws} (this is also the offset of |
| 302 | the terminating null wide character of @var{ws}). |
| 303 | |
| 304 | Since there are no multi wide character sequences making up one wide |
| 305 | character the return value is not only the offset in the array, it is |
| 306 | also the number of wide characters. |
| 307 | |
| 308 | This function was introduced in @w{Amendment 1} to @w{ISO C90}. |
| 309 | @end deftypefun |
| 310 | |
| 311 | @comment string.h |
| 312 | @comment GNU |
| 313 | @deftypefun size_t strnlen (const char *@var{s}, size_t @var{maxlen}) |
| 314 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 315 | If the array @var{s} of size @var{maxlen} contains a null byte, |
| 316 | the @code{strnlen} function returns the length of the string @var{s} in |
| 317 | bytes. Otherwise it |
| 318 | returns @var{maxlen}. Therefore this function is equivalent to |
| 319 | @code{(strlen (@var{s}) < @var{maxlen} ? strlen (@var{s}) : @var{maxlen})} |
| 320 | but it |
| 321 | is more efficient and works even if @var{s} is not null-terminated so |
| 322 | long as @var{maxlen} does not exceed the size of @var{s}'s array. |
| 323 | |
| 324 | @smallexample |
| 325 | char string[32] = "hello, world"; |
| 326 | strnlen (string, 32) |
| 327 | @result{} 12 |
| 328 | strnlen (string, 5) |
| 329 | @result{} 5 |
| 330 | @end smallexample |
| 331 | |
| 332 | This function is a GNU extension and is declared in @file{string.h}. |
| 333 | @end deftypefun |
| 334 | |
| 335 | @comment wchar.h |
| 336 | @comment GNU |
| 337 | @deftypefun size_t wcsnlen (const wchar_t *@var{ws}, size_t @var{maxlen}) |
| 338 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 339 | @code{wcsnlen} is the wide character equivalent to @code{strnlen}. The |
| 340 | @var{maxlen} parameter specifies the maximum number of wide characters. |
| 341 | |
| 342 | This function is a GNU extension and is declared in @file{wchar.h}. |
| 343 | @end deftypefun |
| 344 | |
| 345 | @node Copying Strings and Arrays |
| 346 | @section Copying Strings and Arrays |
| 347 | |
| 348 | You can use the functions described in this section to copy the contents |
| 349 | of strings, wide strings, and arrays. The @samp{str} and @samp{mem} |
| 350 | functions are declared in @file{string.h} while the @samp{w} functions |
| 351 | are declared in @file{wchar.h}. |
| 352 | @pindex string.h |
| 353 | @pindex wchar.h |
| 354 | @cindex copying strings and arrays |
| 355 | @cindex string copy functions |
| 356 | @cindex array copy functions |
| 357 | @cindex concatenating strings |
| 358 | @cindex string concatenation functions |
| 359 | |
| 360 | A helpful way to remember the ordering of the arguments to the functions |
| 361 | in this section is that it corresponds to an assignment expression, with |
| 362 | the destination array specified to the left of the source array. Most |
| 363 | of these functions return the address of the destination array; a few |
| 364 | return the address of the destination's terminating null, or of just |
| 365 | past the destination. |
| 366 | |
| 367 | Most of these functions do not work properly if the source and |
| 368 | destination arrays overlap. For example, if the beginning of the |
| 369 | destination array overlaps the end of the source array, the original |
| 370 | contents of that part of the source array may get overwritten before it |
| 371 | is copied. Even worse, in the case of the string functions, the null |
| 372 | byte marking the end of the string may be lost, and the copy |
| 373 | function might get stuck in a loop trashing all the memory allocated to |
| 374 | your program. |
| 375 | |
| 376 | All functions that have problems copying between overlapping arrays are |
| 377 | explicitly identified in this manual. In addition to functions in this |
| 378 | section, there are a few others like @code{sprintf} (@pxref{Formatted |
| 379 | Output Functions}) and @code{scanf} (@pxref{Formatted Input |
| 380 | Functions}). |
| 381 | |
| 382 | @comment string.h |
| 383 | @comment ISO |
| 384 | @deftypefun {void *} memcpy (void *restrict @var{to}, const void *restrict @var{from}, size_t @var{size}) |
| 385 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 386 | The @code{memcpy} function copies @var{size} bytes from the object |
| 387 | beginning at @var{from} into the object beginning at @var{to}. The |
| 388 | behavior of this function is undefined if the two arrays @var{to} and |
| 389 | @var{from} overlap; use @code{memmove} instead if overlapping is possible. |
| 390 | |
| 391 | The value returned by @code{memcpy} is the value of @var{to}. |
| 392 | |
| 393 | Here is an example of how you might use @code{memcpy} to copy the |
| 394 | contents of an array: |
| 395 | |
| 396 | @smallexample |
| 397 | struct foo *oldarray, *newarray; |
| 398 | int arraysize; |
| 399 | @dots{} |
| 400 | memcpy (new, old, arraysize * sizeof (struct foo)); |
| 401 | @end smallexample |
| 402 | @end deftypefun |
| 403 | |
| 404 | @comment wchar.h |
| 405 | @comment ISO |
| 406 | @deftypefun {wchar_t *} wmemcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size}) |
| 407 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 408 | The @code{wmemcpy} function copies @var{size} wide characters from the object |
| 409 | beginning at @var{wfrom} into the object beginning at @var{wto}. The |
| 410 | behavior of this function is undefined if the two arrays @var{wto} and |
| 411 | @var{wfrom} overlap; use @code{wmemmove} instead if overlapping is possible. |
| 412 | |
| 413 | The following is a possible implementation of @code{wmemcpy} but there |
| 414 | are more optimizations possible. |
| 415 | |
| 416 | @smallexample |
| 417 | wchar_t * |
| 418 | wmemcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom, |
| 419 | size_t size) |
| 420 | @{ |
| 421 | return (wchar_t *) memcpy (wto, wfrom, size * sizeof (wchar_t)); |
| 422 | @} |
| 423 | @end smallexample |
| 424 | |
| 425 | The value returned by @code{wmemcpy} is the value of @var{wto}. |
| 426 | |
| 427 | This function was introduced in @w{Amendment 1} to @w{ISO C90}. |
| 428 | @end deftypefun |
| 429 | |
| 430 | @comment string.h |
| 431 | @comment GNU |
| 432 | @deftypefun {void *} mempcpy (void *restrict @var{to}, const void *restrict @var{from}, size_t @var{size}) |
| 433 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 434 | The @code{mempcpy} function is nearly identical to the @code{memcpy} |
| 435 | function. It copies @var{size} bytes from the object beginning at |
| 436 | @code{from} into the object pointed to by @var{to}. But instead of |
| 437 | returning the value of @var{to} it returns a pointer to the byte |
| 438 | following the last written byte in the object beginning at @var{to}. |
| 439 | I.e., the value is @code{((void *) ((char *) @var{to} + @var{size}))}. |
| 440 | |
| 441 | This function is useful in situations where a number of objects shall be |
| 442 | copied to consecutive memory positions. |
| 443 | |
| 444 | @smallexample |
| 445 | void * |
| 446 | combine (void *o1, size_t s1, void *o2, size_t s2) |
| 447 | @{ |
| 448 | void *result = malloc (s1 + s2); |
| 449 | if (result != NULL) |
| 450 | mempcpy (mempcpy (result, o1, s1), o2, s2); |
| 451 | return result; |
| 452 | @} |
| 453 | @end smallexample |
| 454 | |
| 455 | This function is a GNU extension. |
| 456 | @end deftypefun |
| 457 | |
| 458 | @comment wchar.h |
| 459 | @comment GNU |
| 460 | @deftypefun {wchar_t *} wmempcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size}) |
| 461 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 462 | The @code{wmempcpy} function is nearly identical to the @code{wmemcpy} |
| 463 | function. It copies @var{size} wide characters from the object |
| 464 | beginning at @code{wfrom} into the object pointed to by @var{wto}. But |
| 465 | instead of returning the value of @var{wto} it returns a pointer to the |
| 466 | wide character following the last written wide character in the object |
| 467 | beginning at @var{wto}. I.e., the value is @code{@var{wto} + @var{size}}. |
| 468 | |
| 469 | This function is useful in situations where a number of objects shall be |
| 470 | copied to consecutive memory positions. |
| 471 | |
| 472 | The following is a possible implementation of @code{wmemcpy} but there |
| 473 | are more optimizations possible. |
| 474 | |
| 475 | @smallexample |
| 476 | wchar_t * |
| 477 | wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom, |
| 478 | size_t size) |
| 479 | @{ |
| 480 | return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t)); |
| 481 | @} |
| 482 | @end smallexample |
| 483 | |
| 484 | This function is a GNU extension. |
| 485 | @end deftypefun |
| 486 | |
| 487 | @comment string.h |
| 488 | @comment ISO |
| 489 | @deftypefun {void *} memmove (void *@var{to}, const void *@var{from}, size_t @var{size}) |
| 490 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 491 | @code{memmove} copies the @var{size} bytes at @var{from} into the |
| 492 | @var{size} bytes at @var{to}, even if those two blocks of space |
| 493 | overlap. In the case of overlap, @code{memmove} is careful to copy the |
| 494 | original values of the bytes in the block at @var{from}, including those |
| 495 | bytes which also belong to the block at @var{to}. |
| 496 | |
| 497 | The value returned by @code{memmove} is the value of @var{to}. |
| 498 | @end deftypefun |
| 499 | |
| 500 | @comment wchar.h |
| 501 | @comment ISO |
| 502 | @deftypefun {wchar_t *} wmemmove (wchar_t *@var{wto}, const wchar_t *@var{wfrom}, size_t @var{size}) |
| 503 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 504 | @code{wmemmove} copies the @var{size} wide characters at @var{wfrom} |
| 505 | into the @var{size} wide characters at @var{wto}, even if those two |
| 506 | blocks of space overlap. In the case of overlap, @code{memmove} is |
| 507 | careful to copy the original values of the wide characters in the block |
| 508 | at @var{wfrom}, including those wide characters which also belong to the |
| 509 | block at @var{wto}. |
| 510 | |
| 511 | The following is a possible implementation of @code{wmemcpy} but there |
| 512 | are more optimizations possible. |
| 513 | |
| 514 | @smallexample |
| 515 | wchar_t * |
| 516 | wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom, |
| 517 | size_t size) |
| 518 | @{ |
| 519 | return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t)); |
| 520 | @} |
| 521 | @end smallexample |
| 522 | |
| 523 | The value returned by @code{wmemmove} is the value of @var{wto}. |
| 524 | |
| 525 | This function is a GNU extension. |
| 526 | @end deftypefun |
| 527 | |
| 528 | @comment string.h |
| 529 | @comment SVID |
| 530 | @deftypefun {void *} memccpy (void *restrict @var{to}, const void *restrict @var{from}, int @var{c}, size_t @var{size}) |
| 531 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 532 | This function copies no more than @var{size} bytes from @var{from} to |
| 533 | @var{to}, stopping if a byte matching @var{c} is found. The return |
| 534 | value is a pointer into @var{to} one byte past where @var{c} was copied, |
| 535 | or a null pointer if no byte matching @var{c} appeared in the first |
| 536 | @var{size} bytes of @var{from}. |
| 537 | @end deftypefun |
| 538 | |
| 539 | @comment string.h |
| 540 | @comment ISO |
| 541 | @deftypefun {void *} memset (void *@var{block}, int @var{c}, size_t @var{size}) |
| 542 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 543 | This function copies the value of @var{c} (converted to an |
| 544 | @code{unsigned char}) into each of the first @var{size} bytes of the |
| 545 | object beginning at @var{block}. It returns the value of @var{block}. |
| 546 | @end deftypefun |
| 547 | |
| 548 | @comment wchar.h |
| 549 | @comment ISO |
| 550 | @deftypefun {wchar_t *} wmemset (wchar_t *@var{block}, wchar_t @var{wc}, size_t @var{size}) |
| 551 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 552 | This function copies the value of @var{wc} into each of the first |
| 553 | @var{size} wide characters of the object beginning at @var{block}. It |
| 554 | returns the value of @var{block}. |
| 555 | @end deftypefun |
| 556 | |
| 557 | @comment string.h |
| 558 | @comment ISO |
| 559 | @deftypefun {char *} strcpy (char *restrict @var{to}, const char *restrict @var{from}) |
| 560 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 561 | This copies bytes from the string @var{from} (up to and including |
| 562 | the terminating null byte) into the string @var{to}. Like |
| 563 | @code{memcpy}, this function has undefined results if the strings |
| 564 | overlap. The return value is the value of @var{to}. |
| 565 | @end deftypefun |
| 566 | |
| 567 | @comment wchar.h |
| 568 | @comment ISO |
| 569 | @deftypefun {wchar_t *} wcscpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}) |
| 570 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 571 | This copies wide characters from the wide string @var{wfrom} (up to and |
| 572 | including the terminating null wide character) into the string |
| 573 | @var{wto}. Like @code{wmemcpy}, this function has undefined results if |
| 574 | the strings overlap. The return value is the value of @var{wto}. |
| 575 | @end deftypefun |
| 576 | |
| 577 | @comment SVID |
| 578 | @deftypefun {char *} strdup (const char *@var{s}) |
| 579 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| 580 | This function copies the string @var{s} into a newly |
| 581 | allocated string. The string is allocated using @code{malloc}; see |
| 582 | @ref{Unconstrained Allocation}. If @code{malloc} cannot allocate space |
| 583 | for the new string, @code{strdup} returns a null pointer. Otherwise it |
| 584 | returns a pointer to the new string. |
| 585 | @end deftypefun |
| 586 | |
| 587 | @comment wchar.h |
| 588 | @comment GNU |
| 589 | @deftypefun {wchar_t *} wcsdup (const wchar_t *@var{ws}) |
| 590 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| 591 | This function copies the wide string @var{ws} |
| 592 | into a newly allocated string. The string is allocated using |
| 593 | @code{malloc}; see @ref{Unconstrained Allocation}. If @code{malloc} |
| 594 | cannot allocate space for the new string, @code{wcsdup} returns a null |
| 595 | pointer. Otherwise it returns a pointer to the new wide string. |
| 596 | |
| 597 | This function is a GNU extension. |
| 598 | @end deftypefun |
| 599 | |
| 600 | @comment string.h |
| 601 | @comment Unknown origin |
| 602 | @deftypefun {char *} stpcpy (char *restrict @var{to}, const char *restrict @var{from}) |
| 603 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 604 | This function is like @code{strcpy}, except that it returns a pointer to |
| 605 | the end of the string @var{to} (that is, the address of the terminating |
| 606 | null byte @code{to + strlen (from)}) rather than the beginning. |
| 607 | |
| 608 | For example, this program uses @code{stpcpy} to concatenate @samp{foo} |
| 609 | and @samp{bar} to produce @samp{foobar}, which it then prints. |
| 610 | |
| 611 | @smallexample |
| 612 | @include stpcpy.c.texi |
| 613 | @end smallexample |
| 614 | |
| 615 | This function is not part of the ISO or POSIX standards, and is not |
| 616 | customary on Unix systems, but we did not invent it either. Perhaps it |
| 617 | comes from MS-DOG. |
| 618 | |
| 619 | Its behavior is undefined if the strings overlap. The function is |
| 620 | declared in @file{string.h}. |
| 621 | @end deftypefun |
| 622 | |
| 623 | @comment wchar.h |
| 624 | @comment GNU |
| 625 | @deftypefun {wchar_t *} wcpcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}) |
| 626 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 627 | This function is like @code{wcscpy}, except that it returns a pointer to |
| 628 | the end of the string @var{wto} (that is, the address of the terminating |
| 629 | null wide character @code{wto + wcslen (wfrom)}) rather than the beginning. |
| 630 | |
| 631 | This function is not part of ISO or POSIX but was found useful while |
| 632 | developing @theglibc{} itself. |
| 633 | |
| 634 | The behavior of @code{wcpcpy} is undefined if the strings overlap. |
| 635 | |
| 636 | @code{wcpcpy} is a GNU extension and is declared in @file{wchar.h}. |
| 637 | @end deftypefun |
| 638 | |
| 639 | @comment string.h |
| 640 | @comment GNU |
| 641 | @deftypefn {Macro} {char *} strdupa (const char *@var{s}) |
| 642 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 643 | This macro is similar to @code{strdup} but allocates the new string |
| 644 | using @code{alloca} instead of @code{malloc} (@pxref{Variable Size |
| 645 | Automatic}). This means of course the returned string has the same |
| 646 | limitations as any block of memory allocated using @code{alloca}. |
| 647 | |
| 648 | For obvious reasons @code{strdupa} is implemented only as a macro; |
| 649 | you cannot get the address of this function. Despite this limitation |
| 650 | it is a useful function. The following code shows a situation where |
| 651 | using @code{malloc} would be a lot more expensive. |
| 652 | |
| 653 | @smallexample |
| 654 | @include strdupa.c.texi |
| 655 | @end smallexample |
| 656 | |
| 657 | Please note that calling @code{strtok} using @var{path} directly is |
| 658 | invalid. It is also not allowed to call @code{strdupa} in the argument |
| 659 | list of @code{strtok} since @code{strdupa} uses @code{alloca} |
| 660 | (@pxref{Variable Size Automatic}) can interfere with the parameter |
| 661 | passing. |
| 662 | |
| 663 | This function is only available if GNU CC is used. |
| 664 | @end deftypefn |
| 665 | |
| 666 | @comment string.h |
| 667 | @comment BSD |
| 668 | @deftypefun void bcopy (const void *@var{from}, void *@var{to}, size_t @var{size}) |
| 669 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 670 | This is a partially obsolete alternative for @code{memmove}, derived from |
| 671 | BSD. Note that it is not quite equivalent to @code{memmove}, because the |
| 672 | arguments are not in the same order and there is no return value. |
| 673 | @end deftypefun |
| 674 | |
| 675 | @comment string.h |
| 676 | @comment BSD |
| 677 | @deftypefun void bzero (void *@var{block}, size_t @var{size}) |
| 678 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 679 | This is a partially obsolete alternative for @code{memset}, derived from |
| 680 | BSD. Note that it is not as general as @code{memset}, because the only |
| 681 | value it can store is zero. |
| 682 | @end deftypefun |
| 683 | |
| 684 | @node Concatenating Strings |
| 685 | @section Concatenating Strings |
| 686 | @pindex string.h |
| 687 | @pindex wchar.h |
| 688 | @cindex concatenating strings |
| 689 | @cindex string concatenation functions |
| 690 | |
| 691 | The functions described in this section concatenate the contents of a |
| 692 | string or wide string to another. They follow the string-copying |
| 693 | functions in their conventions. @xref{Copying Strings and Arrays}. |
| 694 | @samp{strcat} is declared in the header file @file{string.h} while |
| 695 | @samp{wcscat} is declared in @file{wchar.h}. |
| 696 | |
| 697 | @comment string.h |
| 698 | @comment ISO |
| 699 | @deftypefun {char *} strcat (char *restrict @var{to}, const char *restrict @var{from}) |
| 700 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 701 | The @code{strcat} function is similar to @code{strcpy}, except that the |
| 702 | bytes from @var{from} are concatenated or appended to the end of |
| 703 | @var{to}, instead of overwriting it. That is, the first byte from |
| 704 | @var{from} overwrites the null byte marking the end of @var{to}. |
| 705 | |
| 706 | An equivalent definition for @code{strcat} would be: |
| 707 | |
| 708 | @smallexample |
| 709 | char * |
| 710 | strcat (char *restrict to, const char *restrict from) |
| 711 | @{ |
| 712 | strcpy (to + strlen (to), from); |
| 713 | return to; |
| 714 | @} |
| 715 | @end smallexample |
| 716 | |
| 717 | This function has undefined results if the strings overlap. |
| 718 | |
| 719 | As noted below, this function has significant performance issues. |
| 720 | @end deftypefun |
| 721 | |
| 722 | @comment wchar.h |
| 723 | @comment ISO |
| 724 | @deftypefun {wchar_t *} wcscat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}) |
| 725 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 726 | The @code{wcscat} function is similar to @code{wcscpy}, except that the |
| 727 | wide characters from @var{wfrom} are concatenated or appended to the end of |
| 728 | @var{wto}, instead of overwriting it. That is, the first wide character from |
| 729 | @var{wfrom} overwrites the null wide character marking the end of @var{wto}. |
| 730 | |
| 731 | An equivalent definition for @code{wcscat} would be: |
| 732 | |
| 733 | @smallexample |
| 734 | wchar_t * |
| 735 | wcscat (wchar_t *wto, const wchar_t *wfrom) |
| 736 | @{ |
| 737 | wcscpy (wto + wcslen (wto), wfrom); |
| 738 | return wto; |
| 739 | @} |
| 740 | @end smallexample |
| 741 | |
| 742 | This function has undefined results if the strings overlap. |
| 743 | |
| 744 | As noted below, this function has significant performance issues. |
| 745 | @end deftypefun |
| 746 | |
| 747 | Programmers using the @code{strcat} or @code{wcscat} function (or the |
| 748 | @code{strncat} or @code{wcsncat} functions defined in |
| 749 | a later section, for that matter) |
| 750 | can easily be recognized as lazy and reckless. In almost all situations |
| 751 | the lengths of the participating strings are known (it better should be |
| 752 | since how can one otherwise ensure the allocated size of the buffer is |
| 753 | sufficient?) Or at least, one could know them if one keeps track of the |
| 754 | results of the various function calls. But then it is very inefficient |
| 755 | to use @code{strcat}/@code{wcscat}. A lot of time is wasted finding the |
| 756 | end of the destination string so that the actual copying can start. |
| 757 | This is a common example: |
| 758 | |
| 759 | @cindex va_copy |
| 760 | @smallexample |
| 761 | /* @r{This function concatenates arbitrarily many strings. The last} |
| 762 | @r{parameter must be @code{NULL}.} */ |
| 763 | char * |
| 764 | concat (const char *str, @dots{}) |
| 765 | @{ |
| 766 | va_list ap, ap2; |
| 767 | size_t total = 1; |
| 768 | const char *s; |
| 769 | char *result; |
| 770 | |
| 771 | va_start (ap, str); |
| 772 | va_copy (ap2, ap); |
| 773 | |
| 774 | /* @r{Determine how much space we need.} */ |
| 775 | for (s = str; s != NULL; s = va_arg (ap, const char *)) |
| 776 | total += strlen (s); |
| 777 | |
| 778 | va_end (ap); |
| 779 | |
| 780 | result = (char *) malloc (total); |
| 781 | if (result != NULL) |
| 782 | @{ |
| 783 | result[0] = '\0'; |
| 784 | |
| 785 | /* @r{Copy the strings.} */ |
| 786 | for (s = str; s != NULL; s = va_arg (ap2, const char *)) |
| 787 | strcat (result, s); |
| 788 | @} |
| 789 | |
| 790 | va_end (ap2); |
| 791 | |
| 792 | return result; |
| 793 | @} |
| 794 | @end smallexample |
| 795 | |
| 796 | This looks quite simple, especially the second loop where the strings |
| 797 | are actually copied. But these innocent lines hide a major performance |
| 798 | penalty. Just imagine that ten strings of 100 bytes each have to be |
| 799 | concatenated. For the second string we search the already stored 100 |
| 800 | bytes for the end of the string so that we can append the next string. |
| 801 | For all strings in total the comparisons necessary to find the end of |
| 802 | the intermediate results sums up to 5500! If we combine the copying |
| 803 | with the search for the allocation we can write this function more |
| 804 | efficient: |
| 805 | |
| 806 | @smallexample |
| 807 | char * |
| 808 | concat (const char *str, @dots{}) |
| 809 | @{ |
| 810 | va_list ap; |
| 811 | size_t allocated = 100; |
| 812 | char *result = (char *) malloc (allocated); |
| 813 | |
| 814 | if (result != NULL) |
| 815 | @{ |
| 816 | char *newp; |
| 817 | char *wp; |
| 818 | const char *s; |
| 819 | |
| 820 | va_start (ap, str); |
| 821 | |
| 822 | wp = result; |
| 823 | for (s = str; s != NULL; s = va_arg (ap, const char *)) |
| 824 | @{ |
| 825 | size_t len = strlen (s); |
| 826 | |
| 827 | /* @r{Resize the allocated memory if necessary.} */ |
| 828 | if (wp + len + 1 > result + allocated) |
| 829 | @{ |
| 830 | allocated = (allocated + len) * 2; |
| 831 | newp = (char *) realloc (result, allocated); |
| 832 | if (newp == NULL) |
| 833 | @{ |
| 834 | free (result); |
| 835 | return NULL; |
| 836 | @} |
| 837 | wp = newp + (wp - result); |
| 838 | result = newp; |
| 839 | @} |
| 840 | |
| 841 | wp = mempcpy (wp, s, len); |
| 842 | @} |
| 843 | |
| 844 | /* @r{Terminate the result string.} */ |
| 845 | *wp++ = '\0'; |
| 846 | |
| 847 | /* @r{Resize memory to the optimal size.} */ |
| 848 | newp = realloc (result, wp - result); |
| 849 | if (newp != NULL) |
| 850 | result = newp; |
| 851 | |
| 852 | va_end (ap); |
| 853 | @} |
| 854 | |
| 855 | return result; |
| 856 | @} |
| 857 | @end smallexample |
| 858 | |
| 859 | With a bit more knowledge about the input strings one could fine-tune |
| 860 | the memory allocation. The difference we are pointing to here is that |
| 861 | we don't use @code{strcat} anymore. We always keep track of the length |
| 862 | of the current intermediate result so we can safe us the search for the |
| 863 | end of the string and use @code{mempcpy}. Please note that we also |
| 864 | don't use @code{stpcpy} which might seem more natural since we handle |
| 865 | with strings. But this is not necessary since we already know the |
| 866 | length of the string and therefore can use the faster memory copying |
| 867 | function. The example would work for wide characters the same way. |
| 868 | |
| 869 | Whenever a programmer feels the need to use @code{strcat} she or he |
| 870 | should think twice and look through the program whether the code cannot |
| 871 | be rewritten to take advantage of already calculated results. Again: it |
| 872 | is almost always unnecessary to use @code{strcat}. |
| 873 | |
| 874 | @node Truncating Strings |
| 875 | @section Truncating Strings while Copying |
| 876 | @cindex truncating strings |
| 877 | @cindex string truncation |
| 878 | |
| 879 | The functions described in this section copy or concatenate the |
| 880 | possibly-truncated contents of a string or array to another, and |
| 881 | similarly for wide strings. They follow the string-copying functions |
| 882 | in their header conventions. @xref{Copying Strings and Arrays}. The |
| 883 | @samp{str} functions are declared in the header file @file{string.h} |
| 884 | and the @samp{wc} functions are declared in the file @file{wchar.h}. |
| 885 | |
| 886 | @comment string.h |
| 887 | @deftypefun {char *} strncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size}) |
| 888 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 889 | This function is similar to @code{strcpy} but always copies exactly |
| 890 | @var{size} bytes into @var{to}. |
| 891 | |
| 892 | If @var{from} does not contain a null byte in its first @var{size} |
| 893 | bytes, @code{strncpy} copies just the first @var{size} bytes. In this |
| 894 | case no null terminator is written into @var{to}. |
| 895 | |
| 896 | Otherwise @var{from} must be a string with length less than |
| 897 | @var{size}. In this case @code{strncpy} copies all of @var{from}, |
| 898 | followed by enough null bytes to add up to @var{size} bytes in all. |
| 899 | |
| 900 | The behavior of @code{strncpy} is undefined if the strings overlap. |
| 901 | |
| 902 | This function was designed for now-rarely-used arrays consisting of |
| 903 | non-null bytes followed by zero or more null bytes. It needs to set |
| 904 | all @var{size} bytes of the destination, even when @var{size} is much |
| 905 | greater than the length of @var{from}. As noted below, this function |
| 906 | is generally a poor choice for processing text. |
| 907 | @end deftypefun |
| 908 | |
| 909 | @comment wchar.h |
| 910 | @comment ISO |
| 911 | @deftypefun {wchar_t *} wcsncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size}) |
| 912 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 913 | This function is similar to @code{wcscpy} but always copies exactly |
| 914 | @var{size} wide characters into @var{wto}. |
| 915 | |
| 916 | If @var{wfrom} does not contain a null wide character in its first |
| 917 | @var{size} wide characters, then @code{wcsncpy} copies just the first |
| 918 | @var{size} wide characters. In this case no null terminator is |
| 919 | written into @var{wto}. |
| 920 | |
| 921 | Otherwise @var{wfrom} must be a wide string with length less than |
| 922 | @var{size}. In this case @code{wcsncpy} copies all of @var{wfrom}, |
| 923 | followed by enough null wide characters to add up to @var{size} wide |
| 924 | characters in all. |
| 925 | |
| 926 | The behavior of @code{wcsncpy} is undefined if the strings overlap. |
| 927 | |
| 928 | This function is the wide-character counterpart of @code{strncpy} and |
| 929 | suffers from most of the problems that @code{strncpy} does. For |
| 930 | example, as noted below, this function is generally a poor choice for |
| 931 | processing text. |
| 932 | @end deftypefun |
| 933 | |
| 934 | @comment string.h |
| 935 | @comment GNU |
| 936 | @deftypefun {char *} strndup (const char *@var{s}, size_t @var{size}) |
| 937 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| 938 | This function is similar to @code{strdup} but always copies at most |
| 939 | @var{size} bytes into the newly allocated string. |
| 940 | |
| 941 | If the length of @var{s} is more than @var{size}, then @code{strndup} |
| 942 | copies just the first @var{size} bytes and adds a closing null byte. |
| 943 | Otherwise all bytes are copied and the string is terminated. |
| 944 | |
| 945 | This function differs from @code{strncpy} in that it always terminates |
| 946 | the destination string. |
| 947 | |
| 948 | As noted below, this function is generally a poor choice for |
| 949 | processing text. |
| 950 | |
| 951 | @code{strndup} is a GNU extension. |
| 952 | @end deftypefun |
| 953 | |
| 954 | @comment string.h |
| 955 | @comment GNU |
| 956 | @deftypefn {Macro} {char *} strndupa (const char *@var{s}, size_t @var{size}) |
| 957 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 958 | This function is similar to @code{strndup} but like @code{strdupa} it |
| 959 | allocates the new string using @code{alloca} @pxref{Variable Size |
| 960 | Automatic}. The same advantages and limitations of @code{strdupa} are |
| 961 | valid for @code{strndupa}, too. |
| 962 | |
| 963 | This function is implemented only as a macro, just like @code{strdupa}. |
| 964 | Just as @code{strdupa} this macro also must not be used inside the |
| 965 | parameter list in a function call. |
| 966 | |
| 967 | As noted below, this function is generally a poor choice for |
| 968 | processing text. |
| 969 | |
| 970 | @code{strndupa} is only available if GNU CC is used. |
| 971 | @end deftypefn |
| 972 | |
| 973 | @comment string.h |
| 974 | @comment GNU |
| 975 | @deftypefun {char *} stpncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size}) |
| 976 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 977 | This function is similar to @code{stpcpy} but copies always exactly |
| 978 | @var{size} bytes into @var{to}. |
| 979 | |
| 980 | If the length of @var{from} is more than @var{size}, then @code{stpncpy} |
| 981 | copies just the first @var{size} bytes and returns a pointer to the |
| 982 | byte directly following the one which was copied last. Note that in |
| 983 | this case there is no null terminator written into @var{to}. |
| 984 | |
| 985 | If the length of @var{from} is less than @var{size}, then @code{stpncpy} |
| 986 | copies all of @var{from}, followed by enough null bytes to add up |
| 987 | to @var{size} bytes in all. This behavior is rarely useful, but it |
| 988 | is implemented to be useful in contexts where this behavior of the |
| 989 | @code{strncpy} is used. @code{stpncpy} returns a pointer to the |
| 990 | @emph{first} written null byte. |
| 991 | |
| 992 | This function is not part of ISO or POSIX but was found useful while |
| 993 | developing @theglibc{} itself. |
| 994 | |
| 995 | Its behavior is undefined if the strings overlap. The function is |
| 996 | declared in @file{string.h}. |
| 997 | |
| 998 | As noted below, this function is generally a poor choice for |
| 999 | processing text. |
| 1000 | @end deftypefun |
| 1001 | |
| 1002 | @comment wchar.h |
| 1003 | @comment GNU |
| 1004 | @deftypefun {wchar_t *} wcpncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size}) |
| 1005 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1006 | This function is similar to @code{wcpcpy} but copies always exactly |
| 1007 | @var{wsize} wide characters into @var{wto}. |
| 1008 | |
| 1009 | If the length of @var{wfrom} is more than @var{size}, then |
| 1010 | @code{wcpncpy} copies just the first @var{size} wide characters and |
| 1011 | returns a pointer to the wide character directly following the last |
| 1012 | non-null wide character which was copied last. Note that in this case |
| 1013 | there is no null terminator written into @var{wto}. |
| 1014 | |
| 1015 | If the length of @var{wfrom} is less than @var{size}, then @code{wcpncpy} |
| 1016 | copies all of @var{wfrom}, followed by enough null wide characters to add up |
| 1017 | to @var{size} wide characters in all. This behavior is rarely useful, but it |
| 1018 | is implemented to be useful in contexts where this behavior of the |
| 1019 | @code{wcsncpy} is used. @code{wcpncpy} returns a pointer to the |
| 1020 | @emph{first} written null wide character. |
| 1021 | |
| 1022 | This function is not part of ISO or POSIX but was found useful while |
| 1023 | developing @theglibc{} itself. |
| 1024 | |
| 1025 | Its behavior is undefined if the strings overlap. |
| 1026 | |
| 1027 | As noted below, this function is generally a poor choice for |
| 1028 | processing text. |
| 1029 | |
| 1030 | @code{wcpncpy} is a GNU extension. |
| 1031 | @end deftypefun |
| 1032 | |
| 1033 | @comment string.h |
| 1034 | @comment ISO |
| 1035 | @deftypefun {char *} strncat (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size}) |
| 1036 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1037 | This function is like @code{strcat} except that not more than @var{size} |
| 1038 | bytes from @var{from} are appended to the end of @var{to}, and |
| 1039 | @var{from} need not be null-terminated. A single null byte is also |
| 1040 | always appended to @var{to}, so the total |
| 1041 | allocated size of @var{to} must be at least @code{@var{size} + 1} bytes |
| 1042 | longer than its initial length. |
| 1043 | |
| 1044 | The @code{strncat} function could be implemented like this: |
| 1045 | |
| 1046 | @smallexample |
| 1047 | @group |
| 1048 | char * |
| 1049 | strncat (char *to, const char *from, size_t size) |
| 1050 | @{ |
| 1051 | size_t len = strlen (to); |
| 1052 | memcpy (to + len, from, strnlen (from, size)); |
| 1053 | to[len + strnlen (from, size)] = '\0'; |
| 1054 | return to; |
| 1055 | @} |
| 1056 | @end group |
| 1057 | @end smallexample |
| 1058 | |
| 1059 | The behavior of @code{strncat} is undefined if the strings overlap. |
| 1060 | |
| 1061 | As a companion to @code{strncpy}, @code{strncat} was designed for |
| 1062 | now-rarely-used arrays consisting of non-null bytes followed by zero |
| 1063 | or more null bytes. As noted below, this function is generally a poor |
| 1064 | choice for processing text. Also, this function has significant |
| 1065 | performance issues. @xref{Concatenating Strings}. |
| 1066 | @end deftypefun |
| 1067 | |
| 1068 | @comment wchar.h |
| 1069 | @comment ISO |
| 1070 | @deftypefun {wchar_t *} wcsncat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size}) |
| 1071 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1072 | This function is like @code{wcscat} except that not more than @var{size} |
| 1073 | wide characters from @var{from} are appended to the end of @var{to}, |
| 1074 | and @var{from} need not be null-terminated. A single null wide |
| 1075 | character is also always appended to @var{to}, so the total allocated |
| 1076 | size of @var{to} must be at least @code{wcsnlen (@var{wfrom}, |
| 1077 | @var{size}) + 1} wide characters longer than its initial length. |
| 1078 | |
| 1079 | The @code{wcsncat} function could be implemented like this: |
| 1080 | |
| 1081 | @smallexample |
| 1082 | @group |
| 1083 | wchar_t * |
| 1084 | wcsncat (wchar_t *restrict wto, const wchar_t *restrict wfrom, |
| 1085 | size_t size) |
| 1086 | @{ |
| 1087 | size_t len = wcslen (wto); |
| 1088 | memcpy (wto + len, wfrom, wcsnlen (wfrom, size) * sizeof (wchar_t)); |
| 1089 | wto[len + wcsnlen (wfrom, size)] = L'\0'; |
| 1090 | return wto; |
| 1091 | @} |
| 1092 | @end group |
| 1093 | @end smallexample |
| 1094 | |
| 1095 | The behavior of @code{wcsncat} is undefined if the strings overlap. |
| 1096 | |
| 1097 | As noted below, this function is generally a poor choice for |
| 1098 | processing text. Also, this function has significant performance |
| 1099 | issues. @xref{Concatenating Strings}. |
| 1100 | @end deftypefun |
| 1101 | |
| 1102 | Because these functions can abruptly truncate strings or wide strings, |
| 1103 | they are generally poor choices for processing text. When coping or |
| 1104 | concatening multibyte strings, they can truncate within a multibyte |
| 1105 | character so that the result is not a valid multibyte string. When |
| 1106 | combining or concatenating multibyte or wide strings, they may |
| 1107 | truncate the output after a combining character, resulting in a |
| 1108 | corrupted grapheme. They can cause bugs even when processing |
| 1109 | single-byte strings: for example, when calculating an ASCII-only user |
| 1110 | name, a truncated name can identify the wrong user. |
| 1111 | |
| 1112 | Although some buffer overruns can be prevented by manually replacing |
| 1113 | calls to copying functions with calls to truncation functions, there |
| 1114 | are often easier and safer automatic techniques that cause buffer |
| 1115 | overruns to reliably terminate a program, such as GCC's |
| 1116 | @option{-fcheck-pointer-bounds} and @option{-fsanitize=address} |
| 1117 | options. @xref{Debugging Options,, Options for Debugging Your Program |
| 1118 | or GCC, gcc.info, Using GCC}. Because truncation functions can mask |
| 1119 | application bugs that would otherwise be caught by the automatic |
| 1120 | techniques, these functions should be used only when the application's |
| 1121 | underlying logic requires truncation. |
| 1122 | |
| 1123 | @strong{Note:} GNU programs should not truncate strings or wide |
| 1124 | strings to fit arbitrary size limits. @xref{Semantics, , Writing |
| 1125 | Robust Programs, standards, The GNU Coding Standards}. Instead of |
| 1126 | string-truncation functions, it is usually better to use dynamic |
| 1127 | memory allocation (@pxref{Unconstrained Allocation}) and functions |
| 1128 | such as @code{strdup} or @code{asprintf} to construct strings. |
| 1129 | |
| 1130 | @node String/Array Comparison |
| 1131 | @section String/Array Comparison |
| 1132 | @cindex comparing strings and arrays |
| 1133 | @cindex string comparison functions |
| 1134 | @cindex array comparison functions |
| 1135 | @cindex predicates on strings |
| 1136 | @cindex predicates on arrays |
| 1137 | |
| 1138 | You can use the functions in this section to perform comparisons on the |
| 1139 | contents of strings and arrays. As well as checking for equality, these |
| 1140 | functions can also be used as the ordering functions for sorting |
| 1141 | operations. @xref{Searching and Sorting}, for an example of this. |
| 1142 | |
| 1143 | Unlike most comparison operations in C, the string comparison functions |
| 1144 | return a nonzero value if the strings are @emph{not} equivalent rather |
| 1145 | than if they are. The sign of the value indicates the relative ordering |
| 1146 | of the first part of the strings that are not equivalent: a |
| 1147 | negative value indicates that the first string is ``less'' than the |
| 1148 | second, while a positive value indicates that the first string is |
| 1149 | ``greater''. |
| 1150 | |
| 1151 | The most common use of these functions is to check only for equality. |
| 1152 | This is canonically done with an expression like @w{@samp{! strcmp (s1, s2)}}. |
| 1153 | |
| 1154 | All of these functions are declared in the header file @file{string.h}. |
| 1155 | @pindex string.h |
| 1156 | |
| 1157 | @comment string.h |
| 1158 | @comment ISO |
| 1159 | @deftypefun int memcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size}) |
| 1160 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1161 | The function @code{memcmp} compares the @var{size} bytes of memory |
| 1162 | beginning at @var{a1} against the @var{size} bytes of memory beginning |
| 1163 | at @var{a2}. The value returned has the same sign as the difference |
| 1164 | between the first differing pair of bytes (interpreted as @code{unsigned |
| 1165 | char} objects, then promoted to @code{int}). |
| 1166 | |
| 1167 | If the contents of the two blocks are equal, @code{memcmp} returns |
| 1168 | @code{0}. |
| 1169 | @end deftypefun |
| 1170 | |
| 1171 | @comment wchar.h |
| 1172 | @comment ISO |
| 1173 | @deftypefun int wmemcmp (const wchar_t *@var{a1}, const wchar_t *@var{a2}, size_t @var{size}) |
| 1174 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1175 | The function @code{wmemcmp} compares the @var{size} wide characters |
| 1176 | beginning at @var{a1} against the @var{size} wide characters beginning |
| 1177 | at @var{a2}. The value returned is smaller than or larger than zero |
| 1178 | depending on whether the first differing wide character is @var{a1} is |
| 1179 | smaller or larger than the corresponding wide character in @var{a2}. |
| 1180 | |
| 1181 | If the contents of the two blocks are equal, @code{wmemcmp} returns |
| 1182 | @code{0}. |
| 1183 | @end deftypefun |
| 1184 | |
| 1185 | On arbitrary arrays, the @code{memcmp} function is mostly useful for |
| 1186 | testing equality. It usually isn't meaningful to do byte-wise ordering |
| 1187 | comparisons on arrays of things other than bytes. For example, a |
| 1188 | byte-wise comparison on the bytes that make up floating-point numbers |
| 1189 | isn't likely to tell you anything about the relationship between the |
| 1190 | values of the floating-point numbers. |
| 1191 | |
| 1192 | @code{wmemcmp} is really only useful to compare arrays of type |
| 1193 | @code{wchar_t} since the function looks at @code{sizeof (wchar_t)} bytes |
| 1194 | at a time and this number of bytes is system dependent. |
| 1195 | |
| 1196 | You should also be careful about using @code{memcmp} to compare objects |
| 1197 | that can contain ``holes'', such as the padding inserted into structure |
| 1198 | objects to enforce alignment requirements, extra space at the end of |
| 1199 | unions, and extra bytes at the ends of strings whose length is less |
| 1200 | than their allocated size. The contents of these ``holes'' are |
| 1201 | indeterminate and may cause strange behavior when performing byte-wise |
| 1202 | comparisons. For more predictable results, perform an explicit |
| 1203 | component-wise comparison. |
| 1204 | |
| 1205 | For example, given a structure type definition like: |
| 1206 | |
| 1207 | @smallexample |
| 1208 | struct foo |
| 1209 | @{ |
| 1210 | unsigned char tag; |
| 1211 | union |
| 1212 | @{ |
| 1213 | double f; |
| 1214 | long i; |
| 1215 | char *p; |
| 1216 | @} value; |
| 1217 | @}; |
| 1218 | @end smallexample |
| 1219 | |
| 1220 | @noindent |
| 1221 | you are better off writing a specialized comparison function to compare |
| 1222 | @code{struct foo} objects instead of comparing them with @code{memcmp}. |
| 1223 | |
| 1224 | @comment string.h |
| 1225 | @comment ISO |
| 1226 | @deftypefun int strcmp (const char *@var{s1}, const char *@var{s2}) |
| 1227 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1228 | The @code{strcmp} function compares the string @var{s1} against |
| 1229 | @var{s2}, returning a value that has the same sign as the difference |
| 1230 | between the first differing pair of bytes (interpreted as |
| 1231 | @code{unsigned char} objects, then promoted to @code{int}). |
| 1232 | |
| 1233 | If the two strings are equal, @code{strcmp} returns @code{0}. |
| 1234 | |
| 1235 | A consequence of the ordering used by @code{strcmp} is that if @var{s1} |
| 1236 | is an initial substring of @var{s2}, then @var{s1} is considered to be |
| 1237 | ``less than'' @var{s2}. |
| 1238 | |
| 1239 | @code{strcmp} does not take sorting conventions of the language the |
| 1240 | strings are written in into account. To get that one has to use |
| 1241 | @code{strcoll}. |
| 1242 | @end deftypefun |
| 1243 | |
| 1244 | @comment wchar.h |
| 1245 | @comment ISO |
| 1246 | @deftypefun int wcscmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}) |
| 1247 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1248 | |
| 1249 | The @code{wcscmp} function compares the wide string @var{ws1} |
| 1250 | against @var{ws2}. The value returned is smaller than or larger than zero |
| 1251 | depending on whether the first differing wide character is @var{ws1} is |
| 1252 | smaller or larger than the corresponding wide character in @var{ws2}. |
| 1253 | |
| 1254 | If the two strings are equal, @code{wcscmp} returns @code{0}. |
| 1255 | |
| 1256 | A consequence of the ordering used by @code{wcscmp} is that if @var{ws1} |
| 1257 | is an initial substring of @var{ws2}, then @var{ws1} is considered to be |
| 1258 | ``less than'' @var{ws2}. |
| 1259 | |
| 1260 | @code{wcscmp} does not take sorting conventions of the language the |
| 1261 | strings are written in into account. To get that one has to use |
| 1262 | @code{wcscoll}. |
| 1263 | @end deftypefun |
| 1264 | |
| 1265 | @comment string.h |
| 1266 | @comment BSD |
| 1267 | @deftypefun int strcasecmp (const char *@var{s1}, const char *@var{s2}) |
| 1268 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
| 1269 | @c Although this calls tolower multiple times, it's a macro, and |
| 1270 | @c strcasecmp is optimized so that the locale pointer is read only once. |
| 1271 | @c There are some asm implementations too, for which the single-read |
| 1272 | @c from locale TLS pointers also applies. |
| 1273 | This function is like @code{strcmp}, except that differences in case are |
| 1274 | ignored, and its arguments must be multibyte strings. |
| 1275 | How uppercase and lowercase characters are related is |
| 1276 | determined by the currently selected locale. In the standard @code{"C"} |
| 1277 | locale the characters @"A and @"a do not match but in a locale which |
| 1278 | regards these characters as parts of the alphabet they do match. |
| 1279 | |
| 1280 | @noindent |
| 1281 | @code{strcasecmp} is derived from BSD. |
| 1282 | @end deftypefun |
| 1283 | |
| 1284 | @comment wchar.h |
| 1285 | @comment GNU |
| 1286 | @deftypefun int wcscasecmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}) |
| 1287 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
| 1288 | @c Since towlower is not a macro, the locale object may be read multiple |
| 1289 | @c times. |
| 1290 | This function is like @code{wcscmp}, except that differences in case are |
| 1291 | ignored. How uppercase and lowercase characters are related is |
| 1292 | determined by the currently selected locale. In the standard @code{"C"} |
| 1293 | locale the characters @"A and @"a do not match but in a locale which |
| 1294 | regards these characters as parts of the alphabet they do match. |
| 1295 | |
| 1296 | @noindent |
| 1297 | @code{wcscasecmp} is a GNU extension. |
| 1298 | @end deftypefun |
| 1299 | |
| 1300 | @comment string.h |
| 1301 | @comment ISO |
| 1302 | @deftypefun int strncmp (const char *@var{s1}, const char *@var{s2}, size_t @var{size}) |
| 1303 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1304 | This function is the similar to @code{strcmp}, except that no more than |
| 1305 | @var{size} bytes are compared. In other words, if the two |
| 1306 | strings are the same in their first @var{size} bytes, the |
| 1307 | return value is zero. |
| 1308 | @end deftypefun |
| 1309 | |
| 1310 | @comment wchar.h |
| 1311 | @comment ISO |
| 1312 | @deftypefun int wcsncmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}, size_t @var{size}) |
| 1313 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1314 | This function is the similar to @code{wcscmp}, except that no more than |
| 1315 | @var{size} wide characters are compared. In other words, if the two |
| 1316 | strings are the same in their first @var{size} wide characters, the |
| 1317 | return value is zero. |
| 1318 | @end deftypefun |
| 1319 | |
| 1320 | @comment string.h |
| 1321 | @comment BSD |
| 1322 | @deftypefun int strncasecmp (const char *@var{s1}, const char *@var{s2}, size_t @var{n}) |
| 1323 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
| 1324 | This function is like @code{strncmp}, except that differences in case |
| 1325 | are ignored, and the compared parts of the arguments should consist of |
| 1326 | valid multibyte characters. |
| 1327 | Like @code{strcasecmp}, it is locale dependent how |
| 1328 | uppercase and lowercase characters are related. |
| 1329 | |
| 1330 | @noindent |
| 1331 | @code{strncasecmp} is a GNU extension. |
| 1332 | @end deftypefun |
| 1333 | |
| 1334 | @comment wchar.h |
| 1335 | @comment GNU |
| 1336 | @deftypefun int wcsncasecmp (const wchar_t *@var{ws1}, const wchar_t *@var{s2}, size_t @var{n}) |
| 1337 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
| 1338 | This function is like @code{wcsncmp}, except that differences in case |
| 1339 | are ignored. Like @code{wcscasecmp}, it is locale dependent how |
| 1340 | uppercase and lowercase characters are related. |
| 1341 | |
| 1342 | @noindent |
| 1343 | @code{wcsncasecmp} is a GNU extension. |
| 1344 | @end deftypefun |
| 1345 | |
| 1346 | Here are some examples showing the use of @code{strcmp} and |
| 1347 | @code{strncmp} (equivalent examples can be constructed for the wide |
| 1348 | character functions). These examples assume the use of the ASCII |
| 1349 | character set. (If some other character set---say, EBCDIC---is used |
| 1350 | instead, then the glyphs are associated with different numeric codes, |
| 1351 | and the return values and ordering may differ.) |
| 1352 | |
| 1353 | @smallexample |
| 1354 | strcmp ("hello", "hello") |
| 1355 | @result{} 0 /* @r{These two strings are the same.} */ |
| 1356 | strcmp ("hello", "Hello") |
| 1357 | @result{} 32 /* @r{Comparisons are case-sensitive.} */ |
| 1358 | strcmp ("hello", "world") |
| 1359 | @result{} -15 /* @r{The byte @code{'h'} comes before @code{'w'}.} */ |
| 1360 | strcmp ("hello", "hello, world") |
| 1361 | @result{} -44 /* @r{Comparing a null byte against a comma.} */ |
| 1362 | strncmp ("hello", "hello, world", 5) |
| 1363 | @result{} 0 /* @r{The initial 5 bytes are the same.} */ |
| 1364 | strncmp ("hello, world", "hello, stupid world!!!", 5) |
| 1365 | @result{} 0 /* @r{The initial 5 bytes are the same.} */ |
| 1366 | @end smallexample |
| 1367 | |
| 1368 | @comment string.h |
| 1369 | @comment GNU |
| 1370 | @deftypefun int strverscmp (const char *@var{s1}, const char *@var{s2}) |
| 1371 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
| 1372 | @c Calls isdigit multiple times, locale may change in between. |
| 1373 | The @code{strverscmp} function compares the string @var{s1} against |
| 1374 | @var{s2}, considering them as holding indices/version numbers. The |
| 1375 | return value follows the same conventions as found in the |
| 1376 | @code{strcmp} function. In fact, if @var{s1} and @var{s2} contain no |
| 1377 | digits, @code{strverscmp} behaves like @code{strcmp}. |
| 1378 | |
| 1379 | Basically, we compare strings normally (byte by byte), until |
| 1380 | we find a digit in each string - then we enter a special comparison |
| 1381 | mode, where each sequence of digits is taken as a whole. If we reach the |
| 1382 | end of these two parts without noticing a difference, we return to the |
| 1383 | standard comparison mode. There are two types of numeric parts: |
| 1384 | "integral" and "fractional" (those begin with a '0'). The types |
| 1385 | of the numeric parts affect the way we sort them: |
| 1386 | |
| 1387 | @itemize @bullet |
| 1388 | @item |
| 1389 | integral/integral: we compare values as you would expect. |
| 1390 | |
| 1391 | @item |
| 1392 | fractional/integral: the fractional part is less than the integral one. |
| 1393 | Again, no surprise. |
| 1394 | |
| 1395 | @item |
| 1396 | fractional/fractional: the things become a bit more complex. |
| 1397 | If the common prefix contains only leading zeroes, the longest part is less |
| 1398 | than the other one; else the comparison behaves normally. |
| 1399 | @end itemize |
| 1400 | |
| 1401 | @smallexample |
| 1402 | strverscmp ("no digit", "no digit") |
| 1403 | @result{} 0 /* @r{same behavior as strcmp.} */ |
| 1404 | strverscmp ("item#99", "item#100") |
| 1405 | @result{} <0 /* @r{same prefix, but 99 < 100.} */ |
| 1406 | strverscmp ("alpha1", "alpha001") |
| 1407 | @result{} >0 /* @r{fractional part inferior to integral one.} */ |
| 1408 | strverscmp ("part1_f012", "part1_f01") |
| 1409 | @result{} >0 /* @r{two fractional parts.} */ |
| 1410 | strverscmp ("foo.009", "foo.0") |
| 1411 | @result{} <0 /* @r{idem, but with leading zeroes only.} */ |
| 1412 | @end smallexample |
| 1413 | |
| 1414 | This function is especially useful when dealing with filename sorting, |
| 1415 | because filenames frequently hold indices/version numbers. |
| 1416 | |
| 1417 | @code{strverscmp} is a GNU extension. |
| 1418 | @end deftypefun |
| 1419 | |
| 1420 | @comment string.h |
| 1421 | @comment BSD |
| 1422 | @deftypefun int bcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size}) |
| 1423 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1424 | This is an obsolete alias for @code{memcmp}, derived from BSD. |
| 1425 | @end deftypefun |
| 1426 | |
| 1427 | @node Collation Functions |
| 1428 | @section Collation Functions |
| 1429 | |
| 1430 | @cindex collating strings |
| 1431 | @cindex string collation functions |
| 1432 | |
| 1433 | In some locales, the conventions for lexicographic ordering differ from |
| 1434 | the strict numeric ordering of character codes. For example, in Spanish |
| 1435 | most glyphs with diacritical marks such as accents are not considered |
| 1436 | distinct letters for the purposes of collation. On the other hand, the |
| 1437 | two-character sequence @samp{ll} is treated as a single letter that is |
| 1438 | collated immediately after @samp{l}. |
| 1439 | |
| 1440 | You can use the functions @code{strcoll} and @code{strxfrm} (declared in |
| 1441 | the headers file @file{string.h}) and @code{wcscoll} and @code{wcsxfrm} |
| 1442 | (declared in the headers file @file{wchar}) to compare strings using a |
| 1443 | collation ordering appropriate for the current locale. The locale used |
| 1444 | by these functions in particular can be specified by setting the locale |
| 1445 | for the @code{LC_COLLATE} category; see @ref{Locales}. |
| 1446 | @pindex string.h |
| 1447 | @pindex wchar.h |
| 1448 | |
| 1449 | In the standard C locale, the collation sequence for @code{strcoll} is |
| 1450 | the same as that for @code{strcmp}. Similarly, @code{wcscoll} and |
| 1451 | @code{wcscmp} are the same in this situation. |
| 1452 | |
| 1453 | Effectively, the way these functions work is by applying a mapping to |
| 1454 | transform the characters in a multibyte string to a byte |
| 1455 | sequence that represents |
| 1456 | the string's position in the collating sequence of the current locale. |
| 1457 | Comparing two such byte sequences in a simple fashion is equivalent to |
| 1458 | comparing the strings with the locale's collating sequence. |
| 1459 | |
| 1460 | The functions @code{strcoll} and @code{wcscoll} perform this translation |
| 1461 | implicitly, in order to do one comparison. By contrast, @code{strxfrm} |
| 1462 | and @code{wcsxfrm} perform the mapping explicitly. If you are making |
| 1463 | multiple comparisons using the same string or set of strings, it is |
| 1464 | likely to be more efficient to use @code{strxfrm} or @code{wcsxfrm} to |
| 1465 | transform all the strings just once, and subsequently compare the |
| 1466 | transformed strings with @code{strcmp} or @code{wcscmp}. |
| 1467 | |
| 1468 | @comment string.h |
| 1469 | @comment ISO |
| 1470 | @deftypefun int strcoll (const char *@var{s1}, const char *@var{s2}) |
| 1471 | @safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| 1472 | @c Calls strcoll_l with the current locale, which dereferences only the |
| 1473 | @c LC_COLLATE data pointer. |
| 1474 | The @code{strcoll} function is similar to @code{strcmp} but uses the |
| 1475 | collating sequence of the current locale for collation (the |
| 1476 | @code{LC_COLLATE} locale). The arguments are multibyte strings. |
| 1477 | @end deftypefun |
| 1478 | |
| 1479 | @comment wchar.h |
| 1480 | @comment ISO |
| 1481 | @deftypefun int wcscoll (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}) |
| 1482 | @safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| 1483 | @c Same as strcoll, but calling wcscoll_l. |
| 1484 | The @code{wcscoll} function is similar to @code{wcscmp} but uses the |
| 1485 | collating sequence of the current locale for collation (the |
| 1486 | @code{LC_COLLATE} locale). |
| 1487 | @end deftypefun |
| 1488 | |
| 1489 | Here is an example of sorting an array of strings, using @code{strcoll} |
| 1490 | to compare them. The actual sort algorithm is not written here; it |
| 1491 | comes from @code{qsort} (@pxref{Array Sort Function}). The job of the |
| 1492 | code shown here is to say how to compare the strings while sorting them. |
| 1493 | (Later on in this section, we will show a way to do this more |
| 1494 | efficiently using @code{strxfrm}.) |
| 1495 | |
| 1496 | @smallexample |
| 1497 | /* @r{This is the comparison function used with @code{qsort}.} */ |
| 1498 | |
| 1499 | int |
| 1500 | compare_elements (const void *v1, const void *v2) |
| 1501 | @{ |
| 1502 | char * const *p1 = v1; |
| 1503 | char * const *p2 = v2; |
| 1504 | |
| 1505 | return strcoll (*p1, *p2); |
| 1506 | @} |
| 1507 | |
| 1508 | /* @r{This is the entry point---the function to sort} |
| 1509 | @r{strings using the locale's collating sequence.} */ |
| 1510 | |
| 1511 | void |
| 1512 | sort_strings (char **array, int nstrings) |
| 1513 | @{ |
| 1514 | /* @r{Sort @code{temp_array} by comparing the strings.} */ |
| 1515 | qsort (array, nstrings, |
| 1516 | sizeof (char *), compare_elements); |
| 1517 | @} |
| 1518 | @end smallexample |
| 1519 | |
| 1520 | @cindex converting string to collation order |
| 1521 | @comment string.h |
| 1522 | @comment ISO |
| 1523 | @deftypefun size_t strxfrm (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size}) |
| 1524 | @safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| 1525 | The function @code{strxfrm} transforms the multibyte string |
| 1526 | @var{from} using the |
| 1527 | collation transformation determined by the locale currently selected for |
| 1528 | collation, and stores the transformed string in the array @var{to}. Up |
| 1529 | to @var{size} bytes (including a terminating null byte) are |
| 1530 | stored. |
| 1531 | |
| 1532 | The behavior is undefined if the strings @var{to} and @var{from} |
| 1533 | overlap; see @ref{Copying Strings and Arrays}. |
| 1534 | |
| 1535 | The return value is the length of the entire transformed string. This |
| 1536 | value is not affected by the value of @var{size}, but if it is greater |
| 1537 | or equal than @var{size}, it means that the transformed string did not |
| 1538 | entirely fit in the array @var{to}. In this case, only as much of the |
| 1539 | string as actually fits was stored. To get the whole transformed |
| 1540 | string, call @code{strxfrm} again with a bigger output array. |
| 1541 | |
| 1542 | The transformed string may be longer than the original string, and it |
| 1543 | may also be shorter. |
| 1544 | |
| 1545 | If @var{size} is zero, no bytes are stored in @var{to}. In this |
| 1546 | case, @code{strxfrm} simply returns the number of bytes that would |
| 1547 | be the length of the transformed string. This is useful for determining |
| 1548 | what size the allocated array should be. It does not matter what |
| 1549 | @var{to} is if @var{size} is zero; @var{to} may even be a null pointer. |
| 1550 | @end deftypefun |
| 1551 | |
| 1552 | @comment wchar.h |
| 1553 | @comment ISO |
| 1554 | @deftypefun size_t wcsxfrm (wchar_t *restrict @var{wto}, const wchar_t *@var{wfrom}, size_t @var{size}) |
| 1555 | @safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| 1556 | The function @code{wcsxfrm} transforms wide string @var{wfrom} |
| 1557 | using the collation transformation determined by the locale currently |
| 1558 | selected for collation, and stores the transformed string in the array |
| 1559 | @var{wto}. Up to @var{size} wide characters (including a terminating null |
| 1560 | wide character) are stored. |
| 1561 | |
| 1562 | The behavior is undefined if the strings @var{wto} and @var{wfrom} |
| 1563 | overlap; see @ref{Copying Strings and Arrays}. |
| 1564 | |
| 1565 | The return value is the length of the entire transformed wide |
| 1566 | string. This value is not affected by the value of @var{size}, but if |
| 1567 | it is greater or equal than @var{size}, it means that the transformed |
| 1568 | wide string did not entirely fit in the array @var{wto}. In |
| 1569 | this case, only as much of the wide string as actually fits |
| 1570 | was stored. To get the whole transformed wide string, call |
| 1571 | @code{wcsxfrm} again with a bigger output array. |
| 1572 | |
| 1573 | The transformed wide string may be longer than the original |
| 1574 | wide string, and it may also be shorter. |
| 1575 | |
| 1576 | If @var{size} is zero, no wide characters are stored in @var{to}. In this |
| 1577 | case, @code{wcsxfrm} simply returns the number of wide characters that |
| 1578 | would be the length of the transformed wide string. This is |
| 1579 | useful for determining what size the allocated array should be (remember |
| 1580 | to multiply with @code{sizeof (wchar_t)}). It does not matter what |
| 1581 | @var{wto} is if @var{size} is zero; @var{wto} may even be a null pointer. |
| 1582 | @end deftypefun |
| 1583 | |
| 1584 | Here is an example of how you can use @code{strxfrm} when |
| 1585 | you plan to do many comparisons. It does the same thing as the previous |
| 1586 | example, but much faster, because it has to transform each string only |
| 1587 | once, no matter how many times it is compared with other strings. Even |
| 1588 | the time needed to allocate and free storage is much less than the time |
| 1589 | we save, when there are many strings. |
| 1590 | |
| 1591 | @smallexample |
| 1592 | struct sorter @{ char *input; char *transformed; @}; |
| 1593 | |
| 1594 | /* @r{This is the comparison function used with @code{qsort}} |
| 1595 | @r{to sort an array of @code{struct sorter}.} */ |
| 1596 | |
| 1597 | int |
| 1598 | compare_elements (const void *v1, const void *v2) |
| 1599 | @{ |
| 1600 | const struct sorter *p1 = v1; |
| 1601 | const struct sorter *p2 = v2; |
| 1602 | |
| 1603 | return strcmp (p1->transformed, p2->transformed); |
| 1604 | @} |
| 1605 | |
| 1606 | /* @r{This is the entry point---the function to sort} |
| 1607 | @r{strings using the locale's collating sequence.} */ |
| 1608 | |
| 1609 | void |
| 1610 | sort_strings_fast (char **array, int nstrings) |
| 1611 | @{ |
| 1612 | struct sorter temp_array[nstrings]; |
| 1613 | int i; |
| 1614 | |
| 1615 | /* @r{Set up @code{temp_array}. Each element contains} |
| 1616 | @r{one input string and its transformed string.} */ |
| 1617 | for (i = 0; i < nstrings; i++) |
| 1618 | @{ |
| 1619 | size_t length = strlen (array[i]) * 2; |
| 1620 | char *transformed; |
| 1621 | size_t transformed_length; |
| 1622 | |
| 1623 | temp_array[i].input = array[i]; |
| 1624 | |
| 1625 | /* @r{First try a buffer perhaps big enough.} */ |
| 1626 | transformed = (char *) xmalloc (length); |
| 1627 | |
| 1628 | /* @r{Transform @code{array[i]}.} */ |
| 1629 | transformed_length = strxfrm (transformed, array[i], length); |
| 1630 | |
| 1631 | /* @r{If the buffer was not large enough, resize it} |
| 1632 | @r{and try again.} */ |
| 1633 | if (transformed_length >= length) |
| 1634 | @{ |
| 1635 | /* @r{Allocate the needed space. +1 for terminating} |
| 1636 | @r{@code{'\0'} byte.} */ |
| 1637 | transformed = (char *) xrealloc (transformed, |
| 1638 | transformed_length + 1); |
| 1639 | |
| 1640 | /* @r{The return value is not interesting because we know} |
| 1641 | @r{how long the transformed string is.} */ |
| 1642 | (void) strxfrm (transformed, array[i], |
| 1643 | transformed_length + 1); |
| 1644 | @} |
| 1645 | |
| 1646 | temp_array[i].transformed = transformed; |
| 1647 | @} |
| 1648 | |
| 1649 | /* @r{Sort @code{temp_array} by comparing transformed strings.} */ |
| 1650 | qsort (temp_array, nstrings, |
| 1651 | sizeof (struct sorter), compare_elements); |
| 1652 | |
| 1653 | /* @r{Put the elements back in the permanent array} |
| 1654 | @r{in their sorted order.} */ |
| 1655 | for (i = 0; i < nstrings; i++) |
| 1656 | array[i] = temp_array[i].input; |
| 1657 | |
| 1658 | /* @r{Free the strings we allocated.} */ |
| 1659 | for (i = 0; i < nstrings; i++) |
| 1660 | free (temp_array[i].transformed); |
| 1661 | @} |
| 1662 | @end smallexample |
| 1663 | |
| 1664 | The interesting part of this code for the wide character version would |
| 1665 | look like this: |
| 1666 | |
| 1667 | @smallexample |
| 1668 | void |
| 1669 | sort_strings_fast (wchar_t **array, int nstrings) |
| 1670 | @{ |
| 1671 | @dots{} |
| 1672 | /* @r{Transform @code{array[i]}.} */ |
| 1673 | transformed_length = wcsxfrm (transformed, array[i], length); |
| 1674 | |
| 1675 | /* @r{If the buffer was not large enough, resize it} |
| 1676 | @r{and try again.} */ |
| 1677 | if (transformed_length >= length) |
| 1678 | @{ |
| 1679 | /* @r{Allocate the needed space. +1 for terminating} |
| 1680 | @r{@code{L'\0'} wide character.} */ |
| 1681 | transformed = (wchar_t *) xrealloc (transformed, |
| 1682 | (transformed_length + 1) |
| 1683 | * sizeof (wchar_t)); |
| 1684 | |
| 1685 | /* @r{The return value is not interesting because we know} |
| 1686 | @r{how long the transformed string is.} */ |
| 1687 | (void) wcsxfrm (transformed, array[i], |
| 1688 | transformed_length + 1); |
| 1689 | @} |
| 1690 | @dots{} |
| 1691 | @end smallexample |
| 1692 | |
| 1693 | @noindent |
| 1694 | Note the additional multiplication with @code{sizeof (wchar_t)} in the |
| 1695 | @code{realloc} call. |
| 1696 | |
| 1697 | @strong{Compatibility Note:} The string collation functions are a new |
| 1698 | feature of @w{ISO C90}. Older C dialects have no equivalent feature. |
| 1699 | The wide character versions were introduced in @w{Amendment 1} to @w{ISO |
| 1700 | C90}. |
| 1701 | |
| 1702 | @node Search Functions |
| 1703 | @section Search Functions |
| 1704 | |
| 1705 | This section describes library functions which perform various kinds |
| 1706 | of searching operations on strings and arrays. These functions are |
| 1707 | declared in the header file @file{string.h}. |
| 1708 | @pindex string.h |
| 1709 | @cindex search functions (for strings) |
| 1710 | @cindex string search functions |
| 1711 | |
| 1712 | @comment string.h |
| 1713 | @comment ISO |
| 1714 | @deftypefun {void *} memchr (const void *@var{block}, int @var{c}, size_t @var{size}) |
| 1715 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1716 | This function finds the first occurrence of the byte @var{c} (converted |
| 1717 | to an @code{unsigned char}) in the initial @var{size} bytes of the |
| 1718 | object beginning at @var{block}. The return value is a pointer to the |
| 1719 | located byte, or a null pointer if no match was found. |
| 1720 | @end deftypefun |
| 1721 | |
| 1722 | @comment wchar.h |
| 1723 | @comment ISO |
| 1724 | @deftypefun {wchar_t *} wmemchr (const wchar_t *@var{block}, wchar_t @var{wc}, size_t @var{size}) |
| 1725 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1726 | This function finds the first occurrence of the wide character @var{wc} |
| 1727 | in the initial @var{size} wide characters of the object beginning at |
| 1728 | @var{block}. The return value is a pointer to the located wide |
| 1729 | character, or a null pointer if no match was found. |
| 1730 | @end deftypefun |
| 1731 | |
| 1732 | @comment string.h |
| 1733 | @comment GNU |
| 1734 | @deftypefun {void *} rawmemchr (const void *@var{block}, int @var{c}) |
| 1735 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1736 | Often the @code{memchr} function is used with the knowledge that the |
| 1737 | byte @var{c} is available in the memory block specified by the |
| 1738 | parameters. But this means that the @var{size} parameter is not really |
| 1739 | needed and that the tests performed with it at runtime (to check whether |
| 1740 | the end of the block is reached) are not needed. |
| 1741 | |
| 1742 | The @code{rawmemchr} function exists for just this situation which is |
| 1743 | surprisingly frequent. The interface is similar to @code{memchr} except |
| 1744 | that the @var{size} parameter is missing. The function will look beyond |
| 1745 | the end of the block pointed to by @var{block} in case the programmer |
| 1746 | made an error in assuming that the byte @var{c} is present in the block. |
| 1747 | In this case the result is unspecified. Otherwise the return value is a |
| 1748 | pointer to the located byte. |
| 1749 | |
| 1750 | This function is of special interest when looking for the end of a |
| 1751 | string. Since all strings are terminated by a null byte a call like |
| 1752 | |
| 1753 | @smallexample |
| 1754 | rawmemchr (str, '\0') |
| 1755 | @end smallexample |
| 1756 | |
| 1757 | @noindent |
| 1758 | will never go beyond the end of the string. |
| 1759 | |
| 1760 | This function is a GNU extension. |
| 1761 | @end deftypefun |
| 1762 | |
| 1763 | @comment string.h |
| 1764 | @comment GNU |
| 1765 | @deftypefun {void *} memrchr (const void *@var{block}, int @var{c}, size_t @var{size}) |
| 1766 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1767 | The function @code{memrchr} is like @code{memchr}, except that it searches |
| 1768 | backwards from the end of the block defined by @var{block} and @var{size} |
| 1769 | (instead of forwards from the front). |
| 1770 | |
| 1771 | This function is a GNU extension. |
| 1772 | @end deftypefun |
| 1773 | |
| 1774 | @comment string.h |
| 1775 | @comment ISO |
| 1776 | @deftypefun {char *} strchr (const char *@var{string}, int @var{c}) |
| 1777 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1778 | The @code{strchr} function finds the first occurrence of the byte |
| 1779 | @var{c} (converted to a @code{char}) in the string |
| 1780 | beginning at @var{string}. The return value is a pointer to the located |
| 1781 | byte, or a null pointer if no match was found. |
| 1782 | |
| 1783 | For example, |
| 1784 | @smallexample |
| 1785 | strchr ("hello, world", 'l') |
| 1786 | @result{} "llo, world" |
| 1787 | strchr ("hello, world", '?') |
| 1788 | @result{} NULL |
| 1789 | @end smallexample |
| 1790 | |
| 1791 | The terminating null byte is considered to be part of the string, |
| 1792 | so you can use this function get a pointer to the end of a string by |
| 1793 | specifying zero as the value of the @var{c} argument. |
| 1794 | |
| 1795 | When @code{strchr} returns a null pointer, it does not let you know |
| 1796 | the position of the terminating null byte it has found. If you |
| 1797 | need that information, it is better (but less portable) to use |
| 1798 | @code{strchrnul} than to search for it a second time. |
| 1799 | @end deftypefun |
| 1800 | |
| 1801 | @comment wchar.h |
| 1802 | @comment ISO |
| 1803 | @deftypefun {wchar_t *} wcschr (const wchar_t *@var{wstring}, int @var{wc}) |
| 1804 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1805 | The @code{wcschr} function finds the first occurrence of the wide |
| 1806 | character @var{wc} in the wide string |
| 1807 | beginning at @var{wstring}. The return value is a pointer to the |
| 1808 | located wide character, or a null pointer if no match was found. |
| 1809 | |
| 1810 | The terminating null wide character is considered to be part of the wide |
| 1811 | string, so you can use this function get a pointer to the end |
| 1812 | of a wide string by specifying a null wide character as the |
| 1813 | value of the @var{wc} argument. It would be better (but less portable) |
| 1814 | to use @code{wcschrnul} in this case, though. |
| 1815 | @end deftypefun |
| 1816 | |
| 1817 | @comment string.h |
| 1818 | @comment GNU |
| 1819 | @deftypefun {char *} strchrnul (const char *@var{string}, int @var{c}) |
| 1820 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1821 | @code{strchrnul} is the same as @code{strchr} except that if it does |
| 1822 | not find the byte, it returns a pointer to string's terminating |
| 1823 | null byte rather than a null pointer. |
| 1824 | |
| 1825 | This function is a GNU extension. |
| 1826 | @end deftypefun |
| 1827 | |
| 1828 | @comment wchar.h |
| 1829 | @comment GNU |
| 1830 | @deftypefun {wchar_t *} wcschrnul (const wchar_t *@var{wstring}, wchar_t @var{wc}) |
| 1831 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1832 | @code{wcschrnul} is the same as @code{wcschr} except that if it does not |
| 1833 | find the wide character, it returns a pointer to the wide string's |
| 1834 | terminating null wide character rather than a null pointer. |
| 1835 | |
| 1836 | This function is a GNU extension. |
| 1837 | @end deftypefun |
| 1838 | |
| 1839 | One useful, but unusual, use of the @code{strchr} |
| 1840 | function is when one wants to have a pointer pointing to the null byte |
| 1841 | terminating a string. This is often written in this way: |
| 1842 | |
| 1843 | @smallexample |
| 1844 | s += strlen (s); |
| 1845 | @end smallexample |
| 1846 | |
| 1847 | @noindent |
| 1848 | This is almost optimal but the addition operation duplicated a bit of |
| 1849 | the work already done in the @code{strlen} function. A better solution |
| 1850 | is this: |
| 1851 | |
| 1852 | @smallexample |
| 1853 | s = strchr (s, '\0'); |
| 1854 | @end smallexample |
| 1855 | |
| 1856 | There is no restriction on the second parameter of @code{strchr} so it |
| 1857 | could very well also be zero. Those readers thinking very |
| 1858 | hard about this might now point out that the @code{strchr} function is |
| 1859 | more expensive than the @code{strlen} function since we have two abort |
| 1860 | criteria. This is right. But in @theglibc{} the implementation of |
| 1861 | @code{strchr} is optimized in a special way so that @code{strchr} |
| 1862 | actually is faster. |
| 1863 | |
| 1864 | @comment string.h |
| 1865 | @comment ISO |
| 1866 | @deftypefun {char *} strrchr (const char *@var{string}, int @var{c}) |
| 1867 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1868 | The function @code{strrchr} is like @code{strchr}, except that it searches |
| 1869 | backwards from the end of the string @var{string} (instead of forwards |
| 1870 | from the front). |
| 1871 | |
| 1872 | For example, |
| 1873 | @smallexample |
| 1874 | strrchr ("hello, world", 'l') |
| 1875 | @result{} "ld" |
| 1876 | @end smallexample |
| 1877 | @end deftypefun |
| 1878 | |
| 1879 | @comment wchar.h |
| 1880 | @comment ISO |
| 1881 | @deftypefun {wchar_t *} wcsrchr (const wchar_t *@var{wstring}, wchar_t @var{c}) |
| 1882 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1883 | The function @code{wcsrchr} is like @code{wcschr}, except that it searches |
| 1884 | backwards from the end of the string @var{wstring} (instead of forwards |
| 1885 | from the front). |
| 1886 | @end deftypefun |
| 1887 | |
| 1888 | @comment string.h |
| 1889 | @comment ISO |
| 1890 | @deftypefun {char *} strstr (const char *@var{haystack}, const char *@var{needle}) |
| 1891 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1892 | This is like @code{strchr}, except that it searches @var{haystack} for a |
| 1893 | substring @var{needle} rather than just a single byte. It |
| 1894 | returns a pointer into the string @var{haystack} that is the first |
| 1895 | byte of the substring, or a null pointer if no match was found. If |
| 1896 | @var{needle} is an empty string, the function returns @var{haystack}. |
| 1897 | |
| 1898 | For example, |
| 1899 | @smallexample |
| 1900 | strstr ("hello, world", "l") |
| 1901 | @result{} "llo, world" |
| 1902 | strstr ("hello, world", "wo") |
| 1903 | @result{} "world" |
| 1904 | @end smallexample |
| 1905 | @end deftypefun |
| 1906 | |
| 1907 | @comment wchar.h |
| 1908 | @comment ISO |
| 1909 | @deftypefun {wchar_t *} wcsstr (const wchar_t *@var{haystack}, const wchar_t *@var{needle}) |
| 1910 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1911 | This is like @code{wcschr}, except that it searches @var{haystack} for a |
| 1912 | substring @var{needle} rather than just a single wide character. It |
| 1913 | returns a pointer into the string @var{haystack} that is the first wide |
| 1914 | character of the substring, or a null pointer if no match was found. If |
| 1915 | @var{needle} is an empty string, the function returns @var{haystack}. |
| 1916 | @end deftypefun |
| 1917 | |
| 1918 | @comment wchar.h |
| 1919 | @comment XPG |
| 1920 | @deftypefun {wchar_t *} wcswcs (const wchar_t *@var{haystack}, const wchar_t *@var{needle}) |
| 1921 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1922 | @code{wcswcs} is a deprecated alias for @code{wcsstr}. This is the |
| 1923 | name originally used in the X/Open Portability Guide before the |
| 1924 | @w{Amendment 1} to @w{ISO C90} was published. |
| 1925 | @end deftypefun |
| 1926 | |
| 1927 | |
| 1928 | @comment string.h |
| 1929 | @comment GNU |
| 1930 | @deftypefun {char *} strcasestr (const char *@var{haystack}, const char *@var{needle}) |
| 1931 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
| 1932 | @c There may be multiple calls of strncasecmp, each accessing the locale |
| 1933 | @c object independently. |
| 1934 | This is like @code{strstr}, except that it ignores case in searching for |
| 1935 | the substring. Like @code{strcasecmp}, it is locale dependent how |
| 1936 | uppercase and lowercase characters are related, and arguments are |
| 1937 | multibyte strings. |
| 1938 | |
| 1939 | |
| 1940 | For example, |
| 1941 | @smallexample |
| 1942 | strcasestr ("hello, world", "L") |
| 1943 | @result{} "llo, world" |
| 1944 | strcasestr ("hello, World", "wo") |
| 1945 | @result{} "World" |
| 1946 | @end smallexample |
| 1947 | @end deftypefun |
| 1948 | |
| 1949 | |
| 1950 | @comment string.h |
| 1951 | @comment GNU |
| 1952 | @deftypefun {void *} memmem (const void *@var{haystack}, size_t @var{haystack-len},@*const void *@var{needle}, size_t @var{needle-len}) |
| 1953 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1954 | This is like @code{strstr}, but @var{needle} and @var{haystack} are byte |
| 1955 | arrays rather than strings. @var{needle-len} is the |
| 1956 | length of @var{needle} and @var{haystack-len} is the length of |
| 1957 | @var{haystack}.@refill |
| 1958 | |
| 1959 | This function is a GNU extension. |
| 1960 | @end deftypefun |
| 1961 | |
| 1962 | @comment string.h |
| 1963 | @comment ISO |
| 1964 | @deftypefun size_t strspn (const char *@var{string}, const char *@var{skipset}) |
| 1965 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1966 | The @code{strspn} (``string span'') function returns the length of the |
| 1967 | initial substring of @var{string} that consists entirely of bytes that |
| 1968 | are members of the set specified by the string @var{skipset}. The order |
| 1969 | of the bytes in @var{skipset} is not important. |
| 1970 | |
| 1971 | For example, |
| 1972 | @smallexample |
| 1973 | strspn ("hello, world", "abcdefghijklmnopqrstuvwxyz") |
| 1974 | @result{} 5 |
| 1975 | @end smallexample |
| 1976 | |
| 1977 | In a multibyte string, characters consisting of |
| 1978 | more than one byte are not treated as single entities. Each byte is treated |
| 1979 | separately. The function is not locale-dependent. |
| 1980 | @end deftypefun |
| 1981 | |
| 1982 | @comment wchar.h |
| 1983 | @comment ISO |
| 1984 | @deftypefun size_t wcsspn (const wchar_t *@var{wstring}, const wchar_t *@var{skipset}) |
| 1985 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1986 | The @code{wcsspn} (``wide character string span'') function returns the |
| 1987 | length of the initial substring of @var{wstring} that consists entirely |
| 1988 | of wide characters that are members of the set specified by the string |
| 1989 | @var{skipset}. The order of the wide characters in @var{skipset} is not |
| 1990 | important. |
| 1991 | @end deftypefun |
| 1992 | |
| 1993 | @comment string.h |
| 1994 | @comment ISO |
| 1995 | @deftypefun size_t strcspn (const char *@var{string}, const char *@var{stopset}) |
| 1996 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 1997 | The @code{strcspn} (``string complement span'') function returns the length |
| 1998 | of the initial substring of @var{string} that consists entirely of bytes |
| 1999 | that are @emph{not} members of the set specified by the string @var{stopset}. |
| 2000 | (In other words, it returns the offset of the first byte in @var{string} |
| 2001 | that is a member of the set @var{stopset}.) |
| 2002 | |
| 2003 | For example, |
| 2004 | @smallexample |
| 2005 | strcspn ("hello, world", " \t\n,.;!?") |
| 2006 | @result{} 5 |
| 2007 | @end smallexample |
| 2008 | |
| 2009 | In a multibyte string, characters consisting of |
| 2010 | more than one byte are not treated as a single entities. Each byte is treated |
| 2011 | separately. The function is not locale-dependent. |
| 2012 | @end deftypefun |
| 2013 | |
| 2014 | @comment wchar.h |
| 2015 | @comment ISO |
| 2016 | @deftypefun size_t wcscspn (const wchar_t *@var{wstring}, const wchar_t *@var{stopset}) |
| 2017 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 2018 | The @code{wcscspn} (``wide character string complement span'') function |
| 2019 | returns the length of the initial substring of @var{wstring} that |
| 2020 | consists entirely of wide characters that are @emph{not} members of the |
| 2021 | set specified by the string @var{stopset}. (In other words, it returns |
| 2022 | the offset of the first wide character in @var{string} that is a member of |
| 2023 | the set @var{stopset}.) |
| 2024 | @end deftypefun |
| 2025 | |
| 2026 | @comment string.h |
| 2027 | @comment ISO |
| 2028 | @deftypefun {char *} strpbrk (const char *@var{string}, const char *@var{stopset}) |
| 2029 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 2030 | The @code{strpbrk} (``string pointer break'') function is related to |
| 2031 | @code{strcspn}, except that it returns a pointer to the first byte |
| 2032 | in @var{string} that is a member of the set @var{stopset} instead of the |
| 2033 | length of the initial substring. It returns a null pointer if no such |
| 2034 | byte from @var{stopset} is found. |
| 2035 | |
| 2036 | @c @group Invalid outside the example. |
| 2037 | For example, |
| 2038 | |
| 2039 | @smallexample |
| 2040 | strpbrk ("hello, world", " \t\n,.;!?") |
| 2041 | @result{} ", world" |
| 2042 | @end smallexample |
| 2043 | @c @end group |
| 2044 | |
| 2045 | In a multibyte string, characters consisting of |
| 2046 | more than one byte are not treated as single entities. Each byte is treated |
| 2047 | separately. The function is not locale-dependent. |
| 2048 | @end deftypefun |
| 2049 | |
| 2050 | @comment wchar.h |
| 2051 | @comment ISO |
| 2052 | @deftypefun {wchar_t *} wcspbrk (const wchar_t *@var{wstring}, const wchar_t *@var{stopset}) |
| 2053 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 2054 | The @code{wcspbrk} (``wide character string pointer break'') function is |
| 2055 | related to @code{wcscspn}, except that it returns a pointer to the first |
| 2056 | wide character in @var{wstring} that is a member of the set |
| 2057 | @var{stopset} instead of the length of the initial substring. It |
| 2058 | returns a null pointer if no such wide character from @var{stopset} is found. |
| 2059 | @end deftypefun |
| 2060 | |
| 2061 | |
| 2062 | @subsection Compatibility String Search Functions |
| 2063 | |
| 2064 | @comment string.h |
| 2065 | @comment BSD |
| 2066 | @deftypefun {char *} index (const char *@var{string}, int @var{c}) |
| 2067 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 2068 | @code{index} is another name for @code{strchr}; they are exactly the same. |
| 2069 | New code should always use @code{strchr} since this name is defined in |
| 2070 | @w{ISO C} while @code{index} is a BSD invention which never was available |
| 2071 | on @w{System V} derived systems. |
| 2072 | @end deftypefun |
| 2073 | |
| 2074 | @comment string.h |
| 2075 | @comment BSD |
| 2076 | @deftypefun {char *} rindex (const char *@var{string}, int @var{c}) |
| 2077 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 2078 | @code{rindex} is another name for @code{strrchr}; they are exactly the same. |
| 2079 | New code should always use @code{strrchr} since this name is defined in |
| 2080 | @w{ISO C} while @code{rindex} is a BSD invention which never was available |
| 2081 | on @w{System V} derived systems. |
| 2082 | @end deftypefun |
| 2083 | |
| 2084 | @node Finding Tokens in a String |
| 2085 | @section Finding Tokens in a String |
| 2086 | |
| 2087 | @cindex tokenizing strings |
| 2088 | @cindex breaking a string into tokens |
| 2089 | @cindex parsing tokens from a string |
| 2090 | It's fairly common for programs to have a need to do some simple kinds |
| 2091 | of lexical analysis and parsing, such as splitting a command string up |
| 2092 | into tokens. You can do this with the @code{strtok} function, declared |
| 2093 | in the header file @file{string.h}. |
| 2094 | @pindex string.h |
| 2095 | |
| 2096 | @comment string.h |
| 2097 | @comment ISO |
| 2098 | @deftypefun {char *} strtok (char *restrict @var{newstring}, const char *restrict @var{delimiters}) |
| 2099 | @safety{@prelim{}@mtunsafe{@mtasurace{:strtok}}@asunsafe{}@acsafe{}} |
| 2100 | A string can be split into tokens by making a series of calls to the |
| 2101 | function @code{strtok}. |
| 2102 | |
| 2103 | The string to be split up is passed as the @var{newstring} argument on |
| 2104 | the first call only. The @code{strtok} function uses this to set up |
| 2105 | some internal state information. Subsequent calls to get additional |
| 2106 | tokens from the same string are indicated by passing a null pointer as |
| 2107 | the @var{newstring} argument. Calling @code{strtok} with another |
| 2108 | non-null @var{newstring} argument reinitializes the state information. |
| 2109 | It is guaranteed that no other library function ever calls @code{strtok} |
| 2110 | behind your back (which would mess up this internal state information). |
| 2111 | |
| 2112 | The @var{delimiters} argument is a string that specifies a set of delimiters |
| 2113 | that may surround the token being extracted. All the initial bytes |
| 2114 | that are members of this set are discarded. The first byte that is |
| 2115 | @emph{not} a member of this set of delimiters marks the beginning of the |
| 2116 | next token. The end of the token is found by looking for the next |
| 2117 | byte that is a member of the delimiter set. This byte in the |
| 2118 | original string @var{newstring} is overwritten by a null byte, and the |
| 2119 | pointer to the beginning of the token in @var{newstring} is returned. |
| 2120 | |
| 2121 | On the next call to @code{strtok}, the searching begins at the next |
| 2122 | byte beyond the one that marked the end of the previous token. |
| 2123 | Note that the set of delimiters @var{delimiters} do not have to be the |
| 2124 | same on every call in a series of calls to @code{strtok}. |
| 2125 | |
| 2126 | If the end of the string @var{newstring} is reached, or if the remainder of |
| 2127 | string consists only of delimiter bytes, @code{strtok} returns |
| 2128 | a null pointer. |
| 2129 | |
| 2130 | In a multibyte string, characters consisting of |
| 2131 | more than one byte are not treated as single entities. Each byte is treated |
| 2132 | separately. The function is not locale-dependent. |
| 2133 | @end deftypefun |
| 2134 | |
| 2135 | @comment wchar.h |
| 2136 | @comment ISO |
| 2137 | @deftypefun {wchar_t *} wcstok (wchar_t *@var{newstring}, const wchar_t *@var{delimiters}, wchar_t **@var{save_ptr}) |
| 2138 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 2139 | A string can be split into tokens by making a series of calls to the |
| 2140 | function @code{wcstok}. |
| 2141 | |
| 2142 | The string to be split up is passed as the @var{newstring} argument on |
| 2143 | the first call only. The @code{wcstok} function uses this to set up |
| 2144 | some internal state information. Subsequent calls to get additional |
| 2145 | tokens from the same wide string are indicated by passing a |
| 2146 | null pointer as the @var{newstring} argument, which causes the pointer |
| 2147 | previously stored in @var{save_ptr} to be used instead. |
| 2148 | |
| 2149 | The @var{delimiters} argument is a wide string that specifies |
| 2150 | a set of delimiters that may surround the token being extracted. All |
| 2151 | the initial wide characters that are members of this set are discarded. |
| 2152 | The first wide character that is @emph{not} a member of this set of |
| 2153 | delimiters marks the beginning of the next token. The end of the token |
| 2154 | is found by looking for the next wide character that is a member of the |
| 2155 | delimiter set. This wide character in the original wide |
| 2156 | string @var{newstring} is overwritten by a null wide character, the |
| 2157 | pointer past the overwritten wide character is saved in @var{save_ptr}, |
| 2158 | and the pointer to the beginning of the token in @var{newstring} is |
| 2159 | returned. |
| 2160 | |
| 2161 | On the next call to @code{wcstok}, the searching begins at the next |
| 2162 | wide character beyond the one that marked the end of the previous token. |
| 2163 | Note that the set of delimiters @var{delimiters} do not have to be the |
| 2164 | same on every call in a series of calls to @code{wcstok}. |
| 2165 | |
| 2166 | If the end of the wide string @var{newstring} is reached, or |
| 2167 | if the remainder of string consists only of delimiter wide characters, |
| 2168 | @code{wcstok} returns a null pointer. |
| 2169 | @end deftypefun |
| 2170 | |
| 2171 | @strong{Warning:} Since @code{strtok} and @code{wcstok} alter the string |
| 2172 | they is parsing, you should always copy the string to a temporary buffer |
| 2173 | before parsing it with @code{strtok}/@code{wcstok} (@pxref{Copying Strings |
| 2174 | and Arrays}). If you allow @code{strtok} or @code{wcstok} to modify |
| 2175 | a string that came from another part of your program, you are asking for |
| 2176 | trouble; that string might be used for other purposes after |
| 2177 | @code{strtok} or @code{wcstok} has modified it, and it would not have |
| 2178 | the expected value. |
| 2179 | |
| 2180 | The string that you are operating on might even be a constant. Then |
| 2181 | when @code{strtok} or @code{wcstok} tries to modify it, your program |
| 2182 | will get a fatal signal for writing in read-only memory. @xref{Program |
| 2183 | Error Signals}. Even if the operation of @code{strtok} or @code{wcstok} |
| 2184 | would not require a modification of the string (e.g., if there is |
| 2185 | exactly one token) the string can (and in the @glibcadj{} case will) be |
| 2186 | modified. |
| 2187 | |
| 2188 | This is a special case of a general principle: if a part of a program |
| 2189 | does not have as its purpose the modification of a certain data |
| 2190 | structure, then it is error-prone to modify the data structure |
| 2191 | temporarily. |
| 2192 | |
| 2193 | The function @code{strtok} is not reentrant, whereas @code{wcstok} is. |
| 2194 | @xref{Nonreentrancy}, for a discussion of where and why reentrancy is |
| 2195 | important. |
| 2196 | |
| 2197 | Here is a simple example showing the use of @code{strtok}. |
| 2198 | |
| 2199 | @comment Yes, this example has been tested. |
| 2200 | @smallexample |
| 2201 | #include <string.h> |
| 2202 | #include <stddef.h> |
| 2203 | |
| 2204 | @dots{} |
| 2205 | |
| 2206 | const char string[] = "words separated by spaces -- and, punctuation!"; |
| 2207 | const char delimiters[] = " .,;:!-"; |
| 2208 | char *token, *cp; |
| 2209 | |
| 2210 | @dots{} |
| 2211 | |
| 2212 | cp = strdupa (string); /* Make writable copy. */ |
| 2213 | token = strtok (cp, delimiters); /* token => "words" */ |
| 2214 | token = strtok (NULL, delimiters); /* token => "separated" */ |
| 2215 | token = strtok (NULL, delimiters); /* token => "by" */ |
| 2216 | token = strtok (NULL, delimiters); /* token => "spaces" */ |
| 2217 | token = strtok (NULL, delimiters); /* token => "and" */ |
| 2218 | token = strtok (NULL, delimiters); /* token => "punctuation" */ |
| 2219 | token = strtok (NULL, delimiters); /* token => NULL */ |
| 2220 | @end smallexample |
| 2221 | |
| 2222 | @Theglibc{} contains two more functions for tokenizing a string |
| 2223 | which overcome the limitation of non-reentrancy. They are not |
| 2224 | available available for wide strings. |
| 2225 | |
| 2226 | @comment string.h |
| 2227 | @comment POSIX |
| 2228 | @deftypefun {char *} strtok_r (char *@var{newstring}, const char *@var{delimiters}, char **@var{save_ptr}) |
| 2229 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 2230 | Just like @code{strtok}, this function splits the string into several |
| 2231 | tokens which can be accessed by successive calls to @code{strtok_r}. |
| 2232 | The difference is that, as in @code{wcstok}, the information about the |
| 2233 | next token is stored in the space pointed to by the third argument, |
| 2234 | @var{save_ptr}, which is a pointer to a string pointer. Calling |
| 2235 | @code{strtok_r} with a null pointer for @var{newstring} and leaving |
| 2236 | @var{save_ptr} between the calls unchanged does the job without |
| 2237 | hindering reentrancy. |
| 2238 | |
| 2239 | This function is defined in POSIX.1 and can be found on many systems |
| 2240 | which support multi-threading. |
| 2241 | @end deftypefun |
| 2242 | |
| 2243 | @comment string.h |
| 2244 | @comment BSD |
| 2245 | @deftypefun {char *} strsep (char **@var{string_ptr}, const char *@var{delimiter}) |
| 2246 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 2247 | This function has a similar functionality as @code{strtok_r} with the |
| 2248 | @var{newstring} argument replaced by the @var{save_ptr} argument. The |
| 2249 | initialization of the moving pointer has to be done by the user. |
| 2250 | Successive calls to @code{strsep} move the pointer along the tokens |
| 2251 | separated by @var{delimiter}, returning the address of the next token |
| 2252 | and updating @var{string_ptr} to point to the beginning of the next |
| 2253 | token. |
| 2254 | |
| 2255 | One difference between @code{strsep} and @code{strtok_r} is that if the |
| 2256 | input string contains more than one byte from @var{delimiter} in a |
| 2257 | row @code{strsep} returns an empty string for each pair of bytes |
| 2258 | from @var{delimiter}. This means that a program normally should test |
| 2259 | for @code{strsep} returning an empty string before processing it. |
| 2260 | |
| 2261 | This function was introduced in 4.3BSD and therefore is widely available. |
| 2262 | @end deftypefun |
| 2263 | |
| 2264 | Here is how the above example looks like when @code{strsep} is used. |
| 2265 | |
| 2266 | @comment Yes, this example has been tested. |
| 2267 | @smallexample |
| 2268 | #include <string.h> |
| 2269 | #include <stddef.h> |
| 2270 | |
| 2271 | @dots{} |
| 2272 | |
| 2273 | const char string[] = "words separated by spaces -- and, punctuation!"; |
| 2274 | const char delimiters[] = " .,;:!-"; |
| 2275 | char *running; |
| 2276 | char *token; |
| 2277 | |
| 2278 | @dots{} |
| 2279 | |
| 2280 | running = strdupa (string); |
| 2281 | token = strsep (&running, delimiters); /* token => "words" */ |
| 2282 | token = strsep (&running, delimiters); /* token => "separated" */ |
| 2283 | token = strsep (&running, delimiters); /* token => "by" */ |
| 2284 | token = strsep (&running, delimiters); /* token => "spaces" */ |
| 2285 | token = strsep (&running, delimiters); /* token => "" */ |
| 2286 | token = strsep (&running, delimiters); /* token => "" */ |
| 2287 | token = strsep (&running, delimiters); /* token => "" */ |
| 2288 | token = strsep (&running, delimiters); /* token => "and" */ |
| 2289 | token = strsep (&running, delimiters); /* token => "" */ |
| 2290 | token = strsep (&running, delimiters); /* token => "punctuation" */ |
| 2291 | token = strsep (&running, delimiters); /* token => "" */ |
| 2292 | token = strsep (&running, delimiters); /* token => NULL */ |
| 2293 | @end smallexample |
| 2294 | |
| 2295 | @comment string.h |
| 2296 | @comment GNU |
| 2297 | @deftypefun {char *} basename (const char *@var{filename}) |
| 2298 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 2299 | The GNU version of the @code{basename} function returns the last |
| 2300 | component of the path in @var{filename}. This function is the preferred |
| 2301 | usage, since it does not modify the argument, @var{filename}, and |
| 2302 | respects trailing slashes. The prototype for @code{basename} can be |
| 2303 | found in @file{string.h}. Note, this function is overriden by the XPG |
| 2304 | version, if @file{libgen.h} is included. |
| 2305 | |
| 2306 | Example of using GNU @code{basename}: |
| 2307 | |
| 2308 | @smallexample |
| 2309 | #include <string.h> |
| 2310 | |
| 2311 | int |
| 2312 | main (int argc, char *argv[]) |
| 2313 | @{ |
| 2314 | char *prog = basename (argv[0]); |
| 2315 | |
| 2316 | if (argc < 2) |
| 2317 | @{ |
| 2318 | fprintf (stderr, "Usage %s <arg>\n", prog); |
| 2319 | exit (1); |
| 2320 | @} |
| 2321 | |
| 2322 | @dots{} |
| 2323 | @} |
| 2324 | @end smallexample |
| 2325 | |
| 2326 | @strong{Portability Note:} This function may produce different results |
| 2327 | on different systems. |
| 2328 | |
| 2329 | @end deftypefun |
| 2330 | |
| 2331 | @comment libgen.h |
| 2332 | @comment XPG |
| 2333 | @deftypefun {char *} basename (char *@var{path}) |
| 2334 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 2335 | This is the standard XPG defined @code{basename}. It is similar in |
| 2336 | spirit to the GNU version, but may modify the @var{path} by removing |
| 2337 | trailing '/' bytes. If the @var{path} is made up entirely of '/' |
| 2338 | bytes, then "/" will be returned. Also, if @var{path} is |
| 2339 | @code{NULL} or an empty string, then "." is returned. The prototype for |
| 2340 | the XPG version can be found in @file{libgen.h}. |
| 2341 | |
| 2342 | Example of using XPG @code{basename}: |
| 2343 | |
| 2344 | @smallexample |
| 2345 | #include <libgen.h> |
| 2346 | |
| 2347 | int |
| 2348 | main (int argc, char *argv[]) |
| 2349 | @{ |
| 2350 | char *prog; |
| 2351 | char *path = strdupa (argv[0]); |
| 2352 | |
| 2353 | prog = basename (path); |
| 2354 | |
| 2355 | if (argc < 2) |
| 2356 | @{ |
| 2357 | fprintf (stderr, "Usage %s <arg>\n", prog); |
| 2358 | exit (1); |
| 2359 | @} |
| 2360 | |
| 2361 | @dots{} |
| 2362 | |
| 2363 | @} |
| 2364 | @end smallexample |
| 2365 | @end deftypefun |
| 2366 | |
| 2367 | @comment libgen.h |
| 2368 | @comment XPG |
| 2369 | @deftypefun {char *} dirname (char *@var{path}) |
| 2370 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 2371 | The @code{dirname} function is the compliment to the XPG version of |
| 2372 | @code{basename}. It returns the parent directory of the file specified |
| 2373 | by @var{path}. If @var{path} is @code{NULL}, an empty string, or |
| 2374 | contains no '/' bytes, then "." is returned. The prototype for this |
| 2375 | function can be found in @file{libgen.h}. |
| 2376 | @end deftypefun |
| 2377 | |
| 2378 | @node strfry |
| 2379 | @section strfry |
| 2380 | |
| 2381 | The function below addresses the perennial programming quandary: ``How do |
| 2382 | I take good data in string form and painlessly turn it into garbage?'' |
| 2383 | This is actually a fairly simple task for C programmers who do not use |
| 2384 | @theglibc{} string functions, but for programs based on @theglibc{}, |
| 2385 | the @code{strfry} function is the preferred method for |
| 2386 | destroying string data. |
| 2387 | |
| 2388 | The prototype for this function is in @file{string.h}. |
| 2389 | |
| 2390 | @comment string.h |
| 2391 | @comment GNU |
| 2392 | @deftypefun {char *} strfry (char *@var{string}) |
| 2393 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 2394 | @c Calls initstate_r, time, getpid, strlen, and random_r. |
| 2395 | |
| 2396 | @code{strfry} creates a pseudorandom anagram of a string, replacing the |
| 2397 | input with the anagram in place. For each position in the string, |
| 2398 | @code{strfry} swaps it with a position in the string selected at random |
| 2399 | (from a uniform distribution). The two positions may be the same. |
| 2400 | |
| 2401 | The return value of @code{strfry} is always @var{string}. |
| 2402 | |
| 2403 | @strong{Portability Note:} This function is unique to @theglibc{}. |
| 2404 | |
| 2405 | @end deftypefun |
| 2406 | |
| 2407 | |
| 2408 | @node Trivial Encryption |
| 2409 | @section Trivial Encryption |
| 2410 | @cindex encryption |
| 2411 | |
| 2412 | |
| 2413 | The @code{memfrob} function converts an array of data to something |
| 2414 | unrecognizable and back again. It is not encryption in its usual sense |
| 2415 | since it is easy for someone to convert the encrypted data back to clear |
| 2416 | text. The transformation is analogous to Usenet's ``Rot13'' encryption |
| 2417 | method for obscuring offensive jokes from sensitive eyes and such. |
| 2418 | Unlike Rot13, @code{memfrob} works on arbitrary binary data, not just |
| 2419 | text. |
| 2420 | @cindex Rot13 |
| 2421 | |
| 2422 | For true encryption, @xref{Cryptographic Functions}. |
| 2423 | |
| 2424 | This function is declared in @file{string.h}. |
| 2425 | @pindex string.h |
| 2426 | |
| 2427 | @comment string.h |
| 2428 | @comment GNU |
| 2429 | @deftypefun {void *} memfrob (void *@var{mem}, size_t @var{length}) |
| 2430 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 2431 | |
| 2432 | @code{memfrob} transforms (frobnicates) each byte of the data structure |
| 2433 | at @var{mem}, which is @var{length} bytes long, by bitwise exclusive |
| 2434 | oring it with binary 00101010. It does the transformation in place and |
| 2435 | its return value is always @var{mem}. |
| 2436 | |
| 2437 | Note that @code{memfrob} a second time on the same data structure |
| 2438 | returns it to its original state. |
| 2439 | |
| 2440 | This is a good function for hiding information from someone who doesn't |
| 2441 | want to see it or doesn't want to see it very much. To really prevent |
| 2442 | people from retrieving the information, use stronger encryption such as |
| 2443 | that described in @xref{Cryptographic Functions}. |
| 2444 | |
| 2445 | @strong{Portability Note:} This function is unique to @theglibc{}. |
| 2446 | |
| 2447 | @end deftypefun |
| 2448 | |
| 2449 | @node Encode Binary Data |
| 2450 | @section Encode Binary Data |
| 2451 | |
| 2452 | To store or transfer binary data in environments which only support text |
| 2453 | one has to encode the binary data by mapping the input bytes to |
| 2454 | bytes in the range allowed for storing or transferring. SVID |
| 2455 | systems (and nowadays XPG compliant systems) provide minimal support for |
| 2456 | this task. |
| 2457 | |
| 2458 | @comment stdlib.h |
| 2459 | @comment XPG |
| 2460 | @deftypefun {char *} l64a (long int @var{n}) |
| 2461 | @safety{@prelim{}@mtunsafe{@mtasurace{:l64a}}@asunsafe{}@acsafe{}} |
| 2462 | This function encodes a 32-bit input value using bytes from the |
| 2463 | basic character set. It returns a pointer to a 7 byte buffer which |
| 2464 | contains an encoded version of @var{n}. To encode a series of bytes the |
| 2465 | user must copy the returned string to a destination buffer. It returns |
| 2466 | the empty string if @var{n} is zero, which is somewhat bizarre but |
| 2467 | mandated by the standard.@* |
| 2468 | @strong{Warning:} Since a static buffer is used this function should not |
| 2469 | be used in multi-threaded programs. There is no thread-safe alternative |
| 2470 | to this function in the C library.@* |
| 2471 | @strong{Compatibility Note:} The XPG standard states that the return |
| 2472 | value of @code{l64a} is undefined if @var{n} is negative. In the GNU |
| 2473 | implementation, @code{l64a} treats its argument as unsigned, so it will |
| 2474 | return a sensible encoding for any nonzero @var{n}; however, portable |
| 2475 | programs should not rely on this. |
| 2476 | |
| 2477 | To encode a large buffer @code{l64a} must be called in a loop, once for |
| 2478 | each 32-bit word of the buffer. For example, one could do something |
| 2479 | like this: |
| 2480 | |
| 2481 | @smallexample |
| 2482 | char * |
| 2483 | encode (const void *buf, size_t len) |
| 2484 | @{ |
| 2485 | /* @r{We know in advance how long the buffer has to be.} */ |
| 2486 | unsigned char *in = (unsigned char *) buf; |
| 2487 | char *out = malloc (6 + ((len + 3) / 4) * 6 + 1); |
| 2488 | char *cp = out, *p; |
| 2489 | |
| 2490 | /* @r{Encode the length.} */ |
| 2491 | /* @r{Using `htonl' is necessary so that the data can be} |
| 2492 | @r{decoded even on machines with different byte order.} |
| 2493 | @r{`l64a' can return a string shorter than 6 bytes, so } |
| 2494 | @r{we pad it with encoding of 0 (}'.'@r{) at the end by } |
| 2495 | @r{hand.} */ |
| 2496 | |
| 2497 | p = stpcpy (cp, l64a (htonl (len))); |
| 2498 | cp = mempcpy (p, "......", 6 - (p - cp)); |
| 2499 | |
| 2500 | while (len > 3) |
| 2501 | @{ |
| 2502 | unsigned long int n = *in++; |
| 2503 | n = (n << 8) | *in++; |
| 2504 | n = (n << 8) | *in++; |
| 2505 | n = (n << 8) | *in++; |
| 2506 | len -= 4; |
| 2507 | p = stpcpy (cp, l64a (htonl (n))); |
| 2508 | cp = mempcpy (p, "......", 6 - (p - cp)); |
| 2509 | @} |
| 2510 | if (len > 0) |
| 2511 | @{ |
| 2512 | unsigned long int n = *in++; |
| 2513 | if (--len > 0) |
| 2514 | @{ |
| 2515 | n = (n << 8) | *in++; |
| 2516 | if (--len > 0) |
| 2517 | n = (n << 8) | *in; |
| 2518 | @} |
| 2519 | cp = stpcpy (cp, l64a (htonl (n))); |
| 2520 | @} |
| 2521 | *cp = '\0'; |
| 2522 | return out; |
| 2523 | @} |
| 2524 | @end smallexample |
| 2525 | |
| 2526 | It is strange that the library does not provide the complete |
| 2527 | functionality needed but so be it. |
| 2528 | |
| 2529 | @end deftypefun |
| 2530 | |
| 2531 | To decode data produced with @code{l64a} the following function should be |
| 2532 | used. |
| 2533 | |
| 2534 | @comment stdlib.h |
| 2535 | @comment XPG |
| 2536 | @deftypefun {long int} a64l (const char *@var{string}) |
| 2537 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 2538 | The parameter @var{string} should contain a string which was produced by |
| 2539 | a call to @code{l64a}. The function processes at least 6 bytes of |
| 2540 | this string, and decodes the bytes it finds according to the table |
| 2541 | below. It stops decoding when it finds a byte not in the table, |
| 2542 | rather like @code{atoi}; if you have a buffer which has been broken into |
| 2543 | lines, you must be careful to skip over the end-of-line bytes. |
| 2544 | |
| 2545 | The decoded number is returned as a @code{long int} value. |
| 2546 | @end deftypefun |
| 2547 | |
| 2548 | The @code{l64a} and @code{a64l} functions use a base 64 encoding, in |
| 2549 | which each byte of an encoded string represents six bits of an |
| 2550 | input word. These symbols are used for the base 64 digits: |
| 2551 | |
| 2552 | @multitable {xxxxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} |
| 2553 | @item @tab 0 @tab 1 @tab 2 @tab 3 @tab 4 @tab 5 @tab 6 @tab 7 |
| 2554 | @item 0 @tab @code{.} @tab @code{/} @tab @code{0} @tab @code{1} |
| 2555 | @tab @code{2} @tab @code{3} @tab @code{4} @tab @code{5} |
| 2556 | @item 8 @tab @code{6} @tab @code{7} @tab @code{8} @tab @code{9} |
| 2557 | @tab @code{A} @tab @code{B} @tab @code{C} @tab @code{D} |
| 2558 | @item 16 @tab @code{E} @tab @code{F} @tab @code{G} @tab @code{H} |
| 2559 | @tab @code{I} @tab @code{J} @tab @code{K} @tab @code{L} |
| 2560 | @item 24 @tab @code{M} @tab @code{N} @tab @code{O} @tab @code{P} |
| 2561 | @tab @code{Q} @tab @code{R} @tab @code{S} @tab @code{T} |
| 2562 | @item 32 @tab @code{U} @tab @code{V} @tab @code{W} @tab @code{X} |
| 2563 | @tab @code{Y} @tab @code{Z} @tab @code{a} @tab @code{b} |
| 2564 | @item 40 @tab @code{c} @tab @code{d} @tab @code{e} @tab @code{f} |
| 2565 | @tab @code{g} @tab @code{h} @tab @code{i} @tab @code{j} |
| 2566 | @item 48 @tab @code{k} @tab @code{l} @tab @code{m} @tab @code{n} |
| 2567 | @tab @code{o} @tab @code{p} @tab @code{q} @tab @code{r} |
| 2568 | @item 56 @tab @code{s} @tab @code{t} @tab @code{u} @tab @code{v} |
| 2569 | @tab @code{w} @tab @code{x} @tab @code{y} @tab @code{z} |
| 2570 | @end multitable |
| 2571 | |
| 2572 | This encoding scheme is not standard. There are some other encoding |
| 2573 | methods which are much more widely used (UU encoding, MIME encoding). |
| 2574 | Generally, it is better to use one of these encodings. |
| 2575 | |
| 2576 | @node Argz and Envz Vectors |
| 2577 | @section Argz and Envz Vectors |
| 2578 | |
| 2579 | @cindex argz vectors (string vectors) |
| 2580 | @cindex string vectors, null-byte separated |
| 2581 | @cindex argument vectors, null-byte separated |
| 2582 | @dfn{argz vectors} are vectors of strings in a contiguous block of |
| 2583 | memory, each element separated from its neighbors by null bytes |
| 2584 | (@code{'\0'}). |
| 2585 | |
| 2586 | @cindex envz vectors (environment vectors) |
| 2587 | @cindex environment vectors, null-byte separated |
| 2588 | @dfn{Envz vectors} are an extension of argz vectors where each element is a |
| 2589 | name-value pair, separated by a @code{'='} byte (as in a Unix |
| 2590 | environment). |
| 2591 | |
| 2592 | @menu |
| 2593 | * Argz Functions:: Operations on argz vectors. |
| 2594 | * Envz Functions:: Additional operations on environment vectors. |
| 2595 | @end menu |
| 2596 | |
| 2597 | @node Argz Functions, Envz Functions, , Argz and Envz Vectors |
| 2598 | @subsection Argz Functions |
| 2599 | |
| 2600 | Each argz vector is represented by a pointer to the first element, of |
| 2601 | type @code{char *}, and a size, of type @code{size_t}, both of which can |
| 2602 | be initialized to @code{0} to represent an empty argz vector. All argz |
| 2603 | functions accept either a pointer and a size argument, or pointers to |
| 2604 | them, if they will be modified. |
| 2605 | |
| 2606 | The argz functions use @code{malloc}/@code{realloc} to allocate/grow |
| 2607 | argz vectors, and so any argz vector creating using these functions may |
| 2608 | be freed by using @code{free}; conversely, any argz function that may |
| 2609 | grow a string expects that string to have been allocated using |
| 2610 | @code{malloc} (those argz functions that only examine their arguments or |
| 2611 | modify them in place will work on any sort of memory). |
| 2612 | @xref{Unconstrained Allocation}. |
| 2613 | |
| 2614 | All argz functions that do memory allocation have a return type of |
| 2615 | @code{error_t}, and return @code{0} for success, and @code{ENOMEM} if an |
| 2616 | allocation error occurs. |
| 2617 | |
| 2618 | @pindex argz.h |
| 2619 | These functions are declared in the standard include file @file{argz.h}. |
| 2620 | |
| 2621 | @comment argz.h |
| 2622 | @comment GNU |
| 2623 | @deftypefun {error_t} argz_create (char *const @var{argv}[], char **@var{argz}, size_t *@var{argz_len}) |
| 2624 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| 2625 | The @code{argz_create} function converts the Unix-style argument vector |
| 2626 | @var{argv} (a vector of pointers to normal C strings, terminated by |
| 2627 | @code{(char *)0}; @pxref{Program Arguments}) into an argz vector with |
| 2628 | the same elements, which is returned in @var{argz} and @var{argz_len}. |
| 2629 | @end deftypefun |
| 2630 | |
| 2631 | @comment argz.h |
| 2632 | @comment GNU |
| 2633 | @deftypefun {error_t} argz_create_sep (const char *@var{string}, int @var{sep}, char **@var{argz}, size_t *@var{argz_len}) |
| 2634 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| 2635 | The @code{argz_create_sep} function converts the string |
| 2636 | @var{string} into an argz vector (returned in @var{argz} and |
| 2637 | @var{argz_len}) by splitting it into elements at every occurrence of the |
| 2638 | byte @var{sep}. |
| 2639 | @end deftypefun |
| 2640 | |
| 2641 | @comment argz.h |
| 2642 | @comment GNU |
| 2643 | @deftypefun {size_t} argz_count (const char *@var{argz}, size_t @var{arg_len}) |
| 2644 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 2645 | Returns the number of elements in the argz vector @var{argz} and |
| 2646 | @var{argz_len}. |
| 2647 | @end deftypefun |
| 2648 | |
| 2649 | @comment argz.h |
| 2650 | @comment GNU |
| 2651 | @deftypefun {void} argz_extract (const char *@var{argz}, size_t @var{argz_len}, char **@var{argv}) |
| 2652 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 2653 | The @code{argz_extract} function converts the argz vector @var{argz} and |
| 2654 | @var{argz_len} into a Unix-style argument vector stored in @var{argv}, |
| 2655 | by putting pointers to every element in @var{argz} into successive |
| 2656 | positions in @var{argv}, followed by a terminator of @code{0}. |
| 2657 | @var{Argv} must be pre-allocated with enough space to hold all the |
| 2658 | elements in @var{argz} plus the terminating @code{(char *)0} |
| 2659 | (@code{(argz_count (@var{argz}, @var{argz_len}) + 1) * sizeof (char *)} |
| 2660 | bytes should be enough). Note that the string pointers stored into |
| 2661 | @var{argv} point into @var{argz}---they are not copies---and so |
| 2662 | @var{argz} must be copied if it will be changed while @var{argv} is |
| 2663 | still active. This function is useful for passing the elements in |
| 2664 | @var{argz} to an exec function (@pxref{Executing a File}). |
| 2665 | @end deftypefun |
| 2666 | |
| 2667 | @comment argz.h |
| 2668 | @comment GNU |
| 2669 | @deftypefun {void} argz_stringify (char *@var{argz}, size_t @var{len}, int @var{sep}) |
| 2670 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 2671 | The @code{argz_stringify} converts @var{argz} into a normal string with |
| 2672 | the elements separated by the byte @var{sep}, by replacing each |
| 2673 | @code{'\0'} inside @var{argz} (except the last one, which terminates the |
| 2674 | string) with @var{sep}. This is handy for printing @var{argz} in a |
| 2675 | readable manner. |
| 2676 | @end deftypefun |
| 2677 | |
| 2678 | @comment argz.h |
| 2679 | @comment GNU |
| 2680 | @deftypefun {error_t} argz_add (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str}) |
| 2681 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| 2682 | @c Calls strlen and argz_append. |
| 2683 | The @code{argz_add} function adds the string @var{str} to the end of the |
| 2684 | argz vector @code{*@var{argz}}, and updates @code{*@var{argz}} and |
| 2685 | @code{*@var{argz_len}} accordingly. |
| 2686 | @end deftypefun |
| 2687 | |
| 2688 | @comment argz.h |
| 2689 | @comment GNU |
| 2690 | @deftypefun {error_t} argz_add_sep (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str}, int @var{delim}) |
| 2691 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| 2692 | The @code{argz_add_sep} function is similar to @code{argz_add}, but |
| 2693 | @var{str} is split into separate elements in the result at occurrences of |
| 2694 | the byte @var{delim}. This is useful, for instance, for |
| 2695 | adding the components of a Unix search path to an argz vector, by using |
| 2696 | a value of @code{':'} for @var{delim}. |
| 2697 | @end deftypefun |
| 2698 | |
| 2699 | @comment argz.h |
| 2700 | @comment GNU |
| 2701 | @deftypefun {error_t} argz_append (char **@var{argz}, size_t *@var{argz_len}, const char *@var{buf}, size_t @var{buf_len}) |
| 2702 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| 2703 | The @code{argz_append} function appends @var{buf_len} bytes starting at |
| 2704 | @var{buf} to the argz vector @code{*@var{argz}}, reallocating |
| 2705 | @code{*@var{argz}} to accommodate it, and adding @var{buf_len} to |
| 2706 | @code{*@var{argz_len}}. |
| 2707 | @end deftypefun |
| 2708 | |
| 2709 | @comment argz.h |
| 2710 | @comment GNU |
| 2711 | @deftypefun {void} argz_delete (char **@var{argz}, size_t *@var{argz_len}, char *@var{entry}) |
| 2712 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| 2713 | @c Calls free if no argument is left. |
| 2714 | If @var{entry} points to the beginning of one of the elements in the |
| 2715 | argz vector @code{*@var{argz}}, the @code{argz_delete} function will |
| 2716 | remove this entry and reallocate @code{*@var{argz}}, modifying |
| 2717 | @code{*@var{argz}} and @code{*@var{argz_len}} accordingly. Note that as |
| 2718 | destructive argz functions usually reallocate their argz argument, |
| 2719 | pointers into argz vectors such as @var{entry} will then become invalid. |
| 2720 | @end deftypefun |
| 2721 | |
| 2722 | @comment argz.h |
| 2723 | @comment GNU |
| 2724 | @deftypefun {error_t} argz_insert (char **@var{argz}, size_t *@var{argz_len}, char *@var{before}, const char *@var{entry}) |
| 2725 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| 2726 | @c Calls argz_add or realloc and memmove. |
| 2727 | The @code{argz_insert} function inserts the string @var{entry} into the |
| 2728 | argz vector @code{*@var{argz}} at a point just before the existing |
| 2729 | element pointed to by @var{before}, reallocating @code{*@var{argz}} and |
| 2730 | updating @code{*@var{argz}} and @code{*@var{argz_len}}. If @var{before} |
| 2731 | is @code{0}, @var{entry} is added to the end instead (as if by |
| 2732 | @code{argz_add}). Since the first element is in fact the same as |
| 2733 | @code{*@var{argz}}, passing in @code{*@var{argz}} as the value of |
| 2734 | @var{before} will result in @var{entry} being inserted at the beginning. |
| 2735 | @end deftypefun |
| 2736 | |
| 2737 | @comment argz.h |
| 2738 | @comment GNU |
| 2739 | @deftypefun {char *} argz_next (const char *@var{argz}, size_t @var{argz_len}, const char *@var{entry}) |
| 2740 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 2741 | The @code{argz_next} function provides a convenient way of iterating |
| 2742 | over the elements in the argz vector @var{argz}. It returns a pointer |
| 2743 | to the next element in @var{argz} after the element @var{entry}, or |
| 2744 | @code{0} if there are no elements following @var{entry}. If @var{entry} |
| 2745 | is @code{0}, the first element of @var{argz} is returned. |
| 2746 | |
| 2747 | This behavior suggests two styles of iteration: |
| 2748 | |
| 2749 | @smallexample |
| 2750 | char *entry = 0; |
| 2751 | while ((entry = argz_next (@var{argz}, @var{argz_len}, entry))) |
| 2752 | @var{action}; |
| 2753 | @end smallexample |
| 2754 | |
| 2755 | (the double parentheses are necessary to make some C compilers shut up |
| 2756 | about what they consider a questionable @code{while}-test) and: |
| 2757 | |
| 2758 | @smallexample |
| 2759 | char *entry; |
| 2760 | for (entry = @var{argz}; |
| 2761 | entry; |
| 2762 | entry = argz_next (@var{argz}, @var{argz_len}, entry)) |
| 2763 | @var{action}; |
| 2764 | @end smallexample |
| 2765 | |
| 2766 | Note that the latter depends on @var{argz} having a value of @code{0} if |
| 2767 | it is empty (rather than a pointer to an empty block of memory); this |
| 2768 | invariant is maintained for argz vectors created by the functions here. |
| 2769 | @end deftypefun |
| 2770 | |
| 2771 | @comment argz.h |
| 2772 | @comment GNU |
| 2773 | @deftypefun error_t argz_replace (@w{char **@var{argz}, size_t *@var{argz_len}}, @w{const char *@var{str}, const char *@var{with}}, @w{unsigned *@var{replace_count}}) |
| 2774 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| 2775 | Replace any occurrences of the string @var{str} in @var{argz} with |
| 2776 | @var{with}, reallocating @var{argz} as necessary. If |
| 2777 | @var{replace_count} is non-zero, @code{*@var{replace_count}} will be |
| 2778 | incremented by number of replacements performed. |
| 2779 | @end deftypefun |
| 2780 | |
| 2781 | @node Envz Functions, , Argz Functions, Argz and Envz Vectors |
| 2782 | @subsection Envz Functions |
| 2783 | |
| 2784 | Envz vectors are just argz vectors with additional constraints on the form |
| 2785 | of each element; as such, argz functions can also be used on them, where it |
| 2786 | makes sense. |
| 2787 | |
| 2788 | Each element in an envz vector is a name-value pair, separated by a @code{'='} |
| 2789 | byte; if multiple @code{'='} bytes are present in an element, those |
| 2790 | after the first are considered part of the value, and treated like all other |
| 2791 | non-@code{'\0'} bytes. |
| 2792 | |
| 2793 | If @emph{no} @code{'='} bytes are present in an element, that element is |
| 2794 | considered the name of a ``null'' entry, as distinct from an entry with an |
| 2795 | empty value: @code{envz_get} will return @code{0} if given the name of null |
| 2796 | entry, whereas an entry with an empty value would result in a value of |
| 2797 | @code{""}; @code{envz_entry} will still find such entries, however. Null |
| 2798 | entries can be removed with @code{envz_strip} function. |
| 2799 | |
| 2800 | As with argz functions, envz functions that may allocate memory (and thus |
| 2801 | fail) have a return type of @code{error_t}, and return either @code{0} or |
| 2802 | @code{ENOMEM}. |
| 2803 | |
| 2804 | @pindex envz.h |
| 2805 | These functions are declared in the standard include file @file{envz.h}. |
| 2806 | |
| 2807 | @comment envz.h |
| 2808 | @comment GNU |
| 2809 | @deftypefun {char *} envz_entry (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name}) |
| 2810 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 2811 | The @code{envz_entry} function finds the entry in @var{envz} with the name |
| 2812 | @var{name}, and returns a pointer to the whole entry---that is, the argz |
| 2813 | element which begins with @var{name} followed by a @code{'='} byte. If |
| 2814 | there is no entry with that name, @code{0} is returned. |
| 2815 | @end deftypefun |
| 2816 | |
| 2817 | @comment envz.h |
| 2818 | @comment GNU |
| 2819 | @deftypefun {char *} envz_get (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name}) |
| 2820 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 2821 | The @code{envz_get} function finds the entry in @var{envz} with the name |
| 2822 | @var{name} (like @code{envz_entry}), and returns a pointer to the value |
| 2823 | portion of that entry (following the @code{'='}). If there is no entry with |
| 2824 | that name (or only a null entry), @code{0} is returned. |
| 2825 | @end deftypefun |
| 2826 | |
| 2827 | @comment envz.h |
| 2828 | @comment GNU |
| 2829 | @deftypefun {error_t} envz_add (char **@var{envz}, size_t *@var{envz_len}, const char *@var{name}, const char *@var{value}) |
| 2830 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| 2831 | @c Calls envz_remove, which calls enz_entry and argz_delete, and then |
| 2832 | @c argz_add or equivalent code that reallocs and appends name=value. |
| 2833 | The @code{envz_add} function adds an entry to @code{*@var{envz}} |
| 2834 | (updating @code{*@var{envz}} and @code{*@var{envz_len}}) with the name |
| 2835 | @var{name}, and value @var{value}. If an entry with the same name |
| 2836 | already exists in @var{envz}, it is removed first. If @var{value} is |
| 2837 | @code{0}, then the new entry will the special null type of entry |
| 2838 | (mentioned above). |
| 2839 | @end deftypefun |
| 2840 | |
| 2841 | @comment envz.h |
| 2842 | @comment GNU |
| 2843 | @deftypefun {error_t} envz_merge (char **@var{envz}, size_t *@var{envz_len}, const char *@var{envz2}, size_t @var{envz2_len}, int @var{override}) |
| 2844 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| 2845 | The @code{envz_merge} function adds each entry in @var{envz2} to @var{envz}, |
| 2846 | as if with @code{envz_add}, updating @code{*@var{envz}} and |
| 2847 | @code{*@var{envz_len}}. If @var{override} is true, then values in @var{envz2} |
| 2848 | will supersede those with the same name in @var{envz}, otherwise not. |
| 2849 | |
| 2850 | Null entries are treated just like other entries in this respect, so a null |
| 2851 | entry in @var{envz} can prevent an entry of the same name in @var{envz2} from |
| 2852 | being added to @var{envz}, if @var{override} is false. |
| 2853 | @end deftypefun |
| 2854 | |
| 2855 | @comment envz.h |
| 2856 | @comment GNU |
| 2857 | @deftypefun {void} envz_strip (char **@var{envz}, size_t *@var{envz_len}) |
| 2858 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| 2859 | The @code{envz_strip} function removes any null entries from @var{envz}, |
| 2860 | updating @code{*@var{envz}} and @code{*@var{envz_len}}. |
| 2861 | @end deftypefun |
| 2862 | |
| 2863 | @comment envz.h |
| 2864 | @comment GNU |
| 2865 | @deftypefun {void} envz_remove (char **@var{envz}, size_t *@var{envz_len}, const char *@var{name}) |
| 2866 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| 2867 | The @code{envz_remove} function removes an entry named @var{name} from |
| 2868 | @var{envz}, updating @code{*@var{envz}} and @code{*@var{envz_len}}. |
| 2869 | @end deftypefun |
| 2870 | |
| 2871 | @c FIXME this are undocumented: |
| 2872 | @c strcasecmp_l @safety{@mtsafe{}@assafe{}@acsafe{}} see strcasecmp |