lh | 9ed821d | 2023-04-07 01:36:19 -0700 | [diff] [blame] | 1 | @node Pattern Matching, I/O Overview, Searching and Sorting, Top |
| 2 | @c %MENU% Matching shell ``globs'' and regular expressions |
| 3 | @chapter Pattern Matching |
| 4 | |
| 5 | @Theglibc{} provides pattern matching facilities for two kinds of |
| 6 | patterns: regular expressions and file-name wildcards. The library also |
| 7 | provides a facility for expanding variable and command references and |
| 8 | parsing text into words in the way the shell does. |
| 9 | |
| 10 | @menu |
| 11 | * Wildcard Matching:: Matching a wildcard pattern against a single string. |
| 12 | * Globbing:: Finding the files that match a wildcard pattern. |
| 13 | * Regular Expressions:: Matching regular expressions against strings. |
| 14 | * Word Expansion:: Expanding shell variables, nested commands, |
| 15 | arithmetic, and wildcards. |
| 16 | This is what the shell does with shell commands. |
| 17 | @end menu |
| 18 | |
| 19 | @node Wildcard Matching |
| 20 | @section Wildcard Matching |
| 21 | |
| 22 | @pindex fnmatch.h |
| 23 | This section describes how to match a wildcard pattern against a |
| 24 | particular string. The result is a yes or no answer: does the |
| 25 | string fit the pattern or not. The symbols described here are all |
| 26 | declared in @file{fnmatch.h}. |
| 27 | |
| 28 | @comment fnmatch.h |
| 29 | @comment POSIX.2 |
| 30 | @deftypefun int fnmatch (const char *@var{pattern}, const char *@var{string}, int @var{flags}) |
| 31 | @safety{@prelim{}@mtsafe{@mtsenv{} @mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| 32 | @c fnmatch @mtsenv @mtslocale @ascuheap @acsmem |
| 33 | @c strnlen dup ok |
| 34 | @c mbsrtowcs |
| 35 | @c memset dup ok |
| 36 | @c malloc dup @ascuheap @acsmem |
| 37 | @c mbsinit dup ok |
| 38 | @c free dup @ascuheap @acsmem |
| 39 | @c FCT = internal_fnwmatch @mtsenv @mtslocale @ascuheap @acsmem |
| 40 | @c FOLD @mtslocale |
| 41 | @c towlower @mtslocale |
| 42 | @c EXT @mtsenv @mtslocale @ascuheap @acsmem |
| 43 | @c STRLEN = wcslen dup ok |
| 44 | @c getenv @mtsenv |
| 45 | @c malloc dup @ascuheap @acsmem |
| 46 | @c MEMPCPY = wmempcpy dup ok |
| 47 | @c FCT dup @mtsenv @mtslocale @ascuheap @acsmem |
| 48 | @c STRCAT = wcscat dup ok |
| 49 | @c free dup @ascuheap @acsmem |
| 50 | @c END @mtsenv |
| 51 | @c getenv @mtsenv |
| 52 | @c MEMCHR = wmemchr dup ok |
| 53 | @c getenv @mtsenv |
| 54 | @c IS_CHAR_CLASS = is_char_class @mtslocale |
| 55 | @c wctype @mtslocale |
| 56 | @c BTOWC ok |
| 57 | @c ISWCTYPE ok |
| 58 | @c auto findidx dup ok |
| 59 | @c elem_hash dup ok |
| 60 | @c memcmp dup ok |
| 61 | @c collseq_table_lookup dup ok |
| 62 | @c NO_LEADING_PERIOD ok |
| 63 | This function tests whether the string @var{string} matches the pattern |
| 64 | @var{pattern}. It returns @code{0} if they do match; otherwise, it |
| 65 | returns the nonzero value @code{FNM_NOMATCH}. The arguments |
| 66 | @var{pattern} and @var{string} are both strings. |
| 67 | |
| 68 | The argument @var{flags} is a combination of flag bits that alter the |
| 69 | details of matching. See below for a list of the defined flags. |
| 70 | |
| 71 | In @theglibc{}, @code{fnmatch} might sometimes report ``errors'' by |
| 72 | returning nonzero values that are not equal to @code{FNM_NOMATCH}. |
| 73 | @end deftypefun |
| 74 | |
| 75 | These are the available flags for the @var{flags} argument: |
| 76 | |
| 77 | @table @code |
| 78 | @comment fnmatch.h |
| 79 | @comment GNU |
| 80 | @item FNM_FILE_NAME |
| 81 | Treat the @samp{/} character specially, for matching file names. If |
| 82 | this flag is set, wildcard constructs in @var{pattern} cannot match |
| 83 | @samp{/} in @var{string}. Thus, the only way to match @samp{/} is with |
| 84 | an explicit @samp{/} in @var{pattern}. |
| 85 | |
| 86 | @comment fnmatch.h |
| 87 | @comment POSIX.2 |
| 88 | @item FNM_PATHNAME |
| 89 | This is an alias for @code{FNM_FILE_NAME}; it comes from POSIX.2. We |
| 90 | don't recommend this name because we don't use the term ``pathname'' for |
| 91 | file names. |
| 92 | |
| 93 | @comment fnmatch.h |
| 94 | @comment POSIX.2 |
| 95 | @item FNM_PERIOD |
| 96 | Treat the @samp{.} character specially if it appears at the beginning of |
| 97 | @var{string}. If this flag is set, wildcard constructs in @var{pattern} |
| 98 | cannot match @samp{.} as the first character of @var{string}. |
| 99 | |
| 100 | If you set both @code{FNM_PERIOD} and @code{FNM_FILE_NAME}, then the |
| 101 | special treatment applies to @samp{.} following @samp{/} as well as to |
| 102 | @samp{.} at the beginning of @var{string}. (The shell uses the |
| 103 | @code{FNM_PERIOD} and @code{FNM_FILE_NAME} flags together for matching |
| 104 | file names.) |
| 105 | |
| 106 | @comment fnmatch.h |
| 107 | @comment POSIX.2 |
| 108 | @item FNM_NOESCAPE |
| 109 | Don't treat the @samp{\} character specially in patterns. Normally, |
| 110 | @samp{\} quotes the following character, turning off its special meaning |
| 111 | (if any) so that it matches only itself. When quoting is enabled, the |
| 112 | pattern @samp{\?} matches only the string @samp{?}, because the question |
| 113 | mark in the pattern acts like an ordinary character. |
| 114 | |
| 115 | If you use @code{FNM_NOESCAPE}, then @samp{\} is an ordinary character. |
| 116 | |
| 117 | @comment fnmatch.h |
| 118 | @comment GNU |
| 119 | @item FNM_LEADING_DIR |
| 120 | Ignore a trailing sequence of characters starting with a @samp{/} in |
| 121 | @var{string}; that is to say, test whether @var{string} starts with a |
| 122 | directory name that @var{pattern} matches. |
| 123 | |
| 124 | If this flag is set, either @samp{foo*} or @samp{foobar} as a pattern |
| 125 | would match the string @samp{foobar/frobozz}. |
| 126 | |
| 127 | @comment fnmatch.h |
| 128 | @comment GNU |
| 129 | @item FNM_CASEFOLD |
| 130 | Ignore case in comparing @var{string} to @var{pattern}. |
| 131 | |
| 132 | @comment fnmatch.h |
| 133 | @comment GNU |
| 134 | @item FNM_EXTMATCH |
| 135 | @cindex Korn Shell |
| 136 | @pindex ksh |
| 137 | Recognize beside the normal patterns also the extended patterns |
| 138 | introduced in @file{ksh}. The patterns are written in the form |
| 139 | explained in the following table where @var{pattern-list} is a @code{|} |
| 140 | separated list of patterns. |
| 141 | |
| 142 | @table @code |
| 143 | @item ?(@var{pattern-list}) |
| 144 | The pattern matches if zero or one occurrences of any of the patterns |
| 145 | in the @var{pattern-list} allow matching the input string. |
| 146 | |
| 147 | @item *(@var{pattern-list}) |
| 148 | The pattern matches if zero or more occurrences of any of the patterns |
| 149 | in the @var{pattern-list} allow matching the input string. |
| 150 | |
| 151 | @item +(@var{pattern-list}) |
| 152 | The pattern matches if one or more occurrences of any of the patterns |
| 153 | in the @var{pattern-list} allow matching the input string. |
| 154 | |
| 155 | @item @@(@var{pattern-list}) |
| 156 | The pattern matches if exactly one occurrence of any of the patterns in |
| 157 | the @var{pattern-list} allows matching the input string. |
| 158 | |
| 159 | @item !(@var{pattern-list}) |
| 160 | The pattern matches if the input string cannot be matched with any of |
| 161 | the patterns in the @var{pattern-list}. |
| 162 | @end table |
| 163 | @end table |
| 164 | |
| 165 | @node Globbing |
| 166 | @section Globbing |
| 167 | |
| 168 | @cindex globbing |
| 169 | The archetypal use of wildcards is for matching against the files in a |
| 170 | directory, and making a list of all the matches. This is called |
| 171 | @dfn{globbing}. |
| 172 | |
| 173 | You could do this using @code{fnmatch}, by reading the directory entries |
| 174 | one by one and testing each one with @code{fnmatch}. But that would be |
| 175 | slow (and complex, since you would have to handle subdirectories by |
| 176 | hand). |
| 177 | |
| 178 | The library provides a function @code{glob} to make this particular use |
| 179 | of wildcards convenient. @code{glob} and the other symbols in this |
| 180 | section are declared in @file{glob.h}. |
| 181 | |
| 182 | @menu |
| 183 | * Calling Glob:: Basic use of @code{glob}. |
| 184 | * Flags for Globbing:: Flags that enable various options in @code{glob}. |
| 185 | * More Flags for Globbing:: GNU specific extensions to @code{glob}. |
| 186 | @end menu |
| 187 | |
| 188 | @node Calling Glob |
| 189 | @subsection Calling @code{glob} |
| 190 | |
| 191 | The result of globbing is a vector of file names (strings). To return |
| 192 | this vector, @code{glob} uses a special data type, @code{glob_t}, which |
| 193 | is a structure. You pass @code{glob} the address of the structure, and |
| 194 | it fills in the structure's fields to tell you about the results. |
| 195 | |
| 196 | @comment glob.h |
| 197 | @comment POSIX.2 |
| 198 | @deftp {Data Type} glob_t |
| 199 | This data type holds a pointer to a word vector. More precisely, it |
| 200 | records both the address of the word vector and its size. The GNU |
| 201 | implementation contains some more fields which are non-standard |
| 202 | extensions. |
| 203 | |
| 204 | @table @code |
| 205 | @item gl_pathc |
| 206 | The number of elements in the vector, excluding the initial null entries |
| 207 | if the GLOB_DOOFFS flag is used (see gl_offs below). |
| 208 | |
| 209 | @item gl_pathv |
| 210 | The address of the vector. This field has type @w{@code{char **}}. |
| 211 | |
| 212 | @item gl_offs |
| 213 | The offset of the first real element of the vector, from its nominal |
| 214 | address in the @code{gl_pathv} field. Unlike the other fields, this |
| 215 | is always an input to @code{glob}, rather than an output from it. |
| 216 | |
| 217 | If you use a nonzero offset, then that many elements at the beginning of |
| 218 | the vector are left empty. (The @code{glob} function fills them with |
| 219 | null pointers.) |
| 220 | |
| 221 | The @code{gl_offs} field is meaningful only if you use the |
| 222 | @code{GLOB_DOOFFS} flag. Otherwise, the offset is always zero |
| 223 | regardless of what is in this field, and the first real element comes at |
| 224 | the beginning of the vector. |
| 225 | |
| 226 | @item gl_closedir |
| 227 | The address of an alternative implementation of the @code{closedir} |
| 228 | function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in |
| 229 | the flag parameter. The type of this field is |
| 230 | @w{@code{void (*) (void *)}}. |
| 231 | |
| 232 | This is a GNU extension. |
| 233 | |
| 234 | @item gl_readdir |
| 235 | The address of an alternative implementation of the @code{readdir} |
| 236 | function used to read the contents of a directory. It is used if the |
| 237 | @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of |
| 238 | this field is @w{@code{struct dirent *(*) (void *)}}. |
| 239 | |
| 240 | This is a GNU extension. |
| 241 | |
| 242 | @item gl_opendir |
| 243 | The address of an alternative implementation of the @code{opendir} |
| 244 | function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in |
| 245 | the flag parameter. The type of this field is |
| 246 | @w{@code{void *(*) (const char *)}}. |
| 247 | |
| 248 | This is a GNU extension. |
| 249 | |
| 250 | @item gl_stat |
| 251 | The address of an alternative implementation of the @code{stat} function |
| 252 | to get information about an object in the filesystem. It is used if the |
| 253 | @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of |
| 254 | this field is @w{@code{int (*) (const char *, struct stat *)}}. |
| 255 | |
| 256 | This is a GNU extension. |
| 257 | |
| 258 | @item gl_lstat |
| 259 | The address of an alternative implementation of the @code{lstat} |
| 260 | function to get information about an object in the filesystems, not |
| 261 | following symbolic links. It is used if the @code{GLOB_ALTDIRFUNC} bit |
| 262 | is set in the flag parameter. The type of this field is @code{@w{int |
| 263 | (*) (const char *,} @w{struct stat *)}}. |
| 264 | |
| 265 | This is a GNU extension. |
| 266 | |
| 267 | @item gl_flags |
| 268 | The flags used when @code{glob} was called. In addition, @code{GLOB_MAGCHAR} |
| 269 | might be set. See @ref{Flags for Globbing} for more details. |
| 270 | |
| 271 | This is a GNU extension. |
| 272 | @end table |
| 273 | @end deftp |
| 274 | |
| 275 | For use in the @code{glob64} function @file{glob.h} contains another |
| 276 | definition for a very similar type. @code{glob64_t} differs from |
| 277 | @code{glob_t} only in the types of the members @code{gl_readdir}, |
| 278 | @code{gl_stat}, and @code{gl_lstat}. |
| 279 | |
| 280 | @comment glob.h |
| 281 | @comment GNU |
| 282 | @deftp {Data Type} glob64_t |
| 283 | This data type holds a pointer to a word vector. More precisely, it |
| 284 | records both the address of the word vector and its size. The GNU |
| 285 | implementation contains some more fields which are non-standard |
| 286 | extensions. |
| 287 | |
| 288 | @table @code |
| 289 | @item gl_pathc |
| 290 | The number of elements in the vector, excluding the initial null entries |
| 291 | if the GLOB_DOOFFS flag is used (see gl_offs below). |
| 292 | |
| 293 | @item gl_pathv |
| 294 | The address of the vector. This field has type @w{@code{char **}}. |
| 295 | |
| 296 | @item gl_offs |
| 297 | The offset of the first real element of the vector, from its nominal |
| 298 | address in the @code{gl_pathv} field. Unlike the other fields, this |
| 299 | is always an input to @code{glob}, rather than an output from it. |
| 300 | |
| 301 | If you use a nonzero offset, then that many elements at the beginning of |
| 302 | the vector are left empty. (The @code{glob} function fills them with |
| 303 | null pointers.) |
| 304 | |
| 305 | The @code{gl_offs} field is meaningful only if you use the |
| 306 | @code{GLOB_DOOFFS} flag. Otherwise, the offset is always zero |
| 307 | regardless of what is in this field, and the first real element comes at |
| 308 | the beginning of the vector. |
| 309 | |
| 310 | @item gl_closedir |
| 311 | The address of an alternative implementation of the @code{closedir} |
| 312 | function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in |
| 313 | the flag parameter. The type of this field is |
| 314 | @w{@code{void (*) (void *)}}. |
| 315 | |
| 316 | This is a GNU extension. |
| 317 | |
| 318 | @item gl_readdir |
| 319 | The address of an alternative implementation of the @code{readdir64} |
| 320 | function used to read the contents of a directory. It is used if the |
| 321 | @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of |
| 322 | this field is @w{@code{struct dirent64 *(*) (void *)}}. |
| 323 | |
| 324 | This is a GNU extension. |
| 325 | |
| 326 | @item gl_opendir |
| 327 | The address of an alternative implementation of the @code{opendir} |
| 328 | function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in |
| 329 | the flag parameter. The type of this field is |
| 330 | @w{@code{void *(*) (const char *)}}. |
| 331 | |
| 332 | This is a GNU extension. |
| 333 | |
| 334 | @item gl_stat |
| 335 | The address of an alternative implementation of the @code{stat64} function |
| 336 | to get information about an object in the filesystem. It is used if the |
| 337 | @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of |
| 338 | this field is @w{@code{int (*) (const char *, struct stat64 *)}}. |
| 339 | |
| 340 | This is a GNU extension. |
| 341 | |
| 342 | @item gl_lstat |
| 343 | The address of an alternative implementation of the @code{lstat64} |
| 344 | function to get information about an object in the filesystems, not |
| 345 | following symbolic links. It is used if the @code{GLOB_ALTDIRFUNC} bit |
| 346 | is set in the flag parameter. The type of this field is @code{@w{int |
| 347 | (*) (const char *,} @w{struct stat64 *)}}. |
| 348 | |
| 349 | This is a GNU extension. |
| 350 | |
| 351 | @item gl_flags |
| 352 | The flags used when @code{glob} was called. In addition, @code{GLOB_MAGCHAR} |
| 353 | might be set. See @ref{Flags for Globbing} for more details. |
| 354 | |
| 355 | This is a GNU extension. |
| 356 | @end table |
| 357 | @end deftp |
| 358 | |
| 359 | @comment glob.h |
| 360 | @comment POSIX.2 |
| 361 | @deftypefun int glob (const char *@var{pattern}, int @var{flags}, int (*@var{errfunc}) (const char *@var{filename}, int @var{error-code}), glob_t *@var{vector-ptr}) |
| 362 | @safety{@prelim{}@mtunsafe{@mtasurace{:utent} @mtsenv{} @mtascusig{:ALRM} @mtascutimer{} @mtslocale{}}@asunsafe{@ascudlopen{} @ascuplugin{} @asucorrupt{} @ascuheap{} @asulock{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} |
| 363 | @c glob @mtasurace:utent @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @asucorrupt @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem |
| 364 | @c strlen dup ok |
| 365 | @c strchr dup ok |
| 366 | @c malloc dup @ascuheap @acsmem |
| 367 | @c mempcpy dup ok |
| 368 | @c next_brace_sub ok |
| 369 | @c free dup @ascuheap @acsmem |
| 370 | @c globfree dup @asucorrupt @ascuheap @acucorrupt @acsmem |
| 371 | @c glob_pattern_p ok |
| 372 | @c glob_pattern_type dup ok |
| 373 | @c getenv dup @mtsenv |
| 374 | @c GET_LOGIN_NAME_MAX ok |
| 375 | @c getlogin_r dup @mtasurace:utent @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem |
| 376 | @c GETPW_R_SIZE_MAX ok |
| 377 | @c getpwnam_r dup @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem |
| 378 | @c realloc dup @ascuheap @acsmem |
| 379 | @c memcpy dup ok |
| 380 | @c memchr dup ok |
| 381 | @c *pglob->gl_stat user-supplied |
| 382 | @c stat64 dup ok |
| 383 | @c S_ISDIR dup ok |
| 384 | @c strdup dup @ascuheap @acsmem |
| 385 | @c glob_pattern_type ok |
| 386 | @c glob_in_dir @mtsenv @mtslocale @asucorrupt @ascuheap @acucorrupt @acsfd @acsmem |
| 387 | @c strlen dup ok |
| 388 | @c glob_pattern_type dup ok |
| 389 | @c malloc dup @ascuheap @acsmem |
| 390 | @c mempcpy dup ok |
| 391 | @c *pglob->gl_stat user-supplied |
| 392 | @c stat64 dup ok |
| 393 | @c free dup @ascuheap @acsmem |
| 394 | @c *pglob->gl_opendir user-supplied |
| 395 | @c opendir dup @ascuheap @acsmem @acsfd |
| 396 | @c dirfd dup ok |
| 397 | @c *pglob->gl_readdir user-supplied |
| 398 | @c CONVERT_DIRENT_DIRENT64 ok |
| 399 | @c readdir64 ok [protected by exclusive use of the stream] |
| 400 | @c REAL_DIR_ENTRY ok |
| 401 | @c DIRENT_MIGHT_BE_DIR ok |
| 402 | @c fnmatch dup @mtsenv @mtslocale @ascuheap @acsmem |
| 403 | @c DIRENT_MIGHT_BE_SYMLINK ok |
| 404 | @c link_exists_p ok |
| 405 | @c link_exists2_p ok |
| 406 | @c strlen dup ok |
| 407 | @c mempcpy dup ok |
| 408 | @c *pglob->gl_stat user-supplied |
| 409 | @c fxstatat64 dup ok |
| 410 | @c realloc dup @ascuheap @acsmem |
| 411 | @c pglob->gl_closedir user-supplied |
| 412 | @c closedir @ascuheap @acsmem @acsfd |
| 413 | @c prefix_array dup @asucorrupt @ascuheap @acucorrupt @acsmem |
| 414 | @c strlen dup ok |
| 415 | @c malloc dup @ascuheap @acsmem |
| 416 | @c free dup @ascuheap @acsmem |
| 417 | @c mempcpy dup ok |
| 418 | @c strcpy dup ok |
| 419 | The function @code{glob} does globbing using the pattern @var{pattern} |
| 420 | in the current directory. It puts the result in a newly allocated |
| 421 | vector, and stores the size and address of this vector into |
| 422 | @code{*@var{vector-ptr}}. The argument @var{flags} is a combination of |
| 423 | bit flags; see @ref{Flags for Globbing}, for details of the flags. |
| 424 | |
| 425 | The result of globbing is a sequence of file names. The function |
| 426 | @code{glob} allocates a string for each resulting word, then |
| 427 | allocates a vector of type @code{char **} to store the addresses of |
| 428 | these strings. The last element of the vector is a null pointer. |
| 429 | This vector is called the @dfn{word vector}. |
| 430 | |
| 431 | To return this vector, @code{glob} stores both its address and its |
| 432 | length (number of elements, not counting the terminating null pointer) |
| 433 | into @code{*@var{vector-ptr}}. |
| 434 | |
| 435 | Normally, @code{glob} sorts the file names alphabetically before |
| 436 | returning them. You can turn this off with the flag @code{GLOB_NOSORT} |
| 437 | if you want to get the information as fast as possible. Usually it's |
| 438 | a good idea to let @code{glob} sort them---if you process the files in |
| 439 | alphabetical order, the users will have a feel for the rate of progress |
| 440 | that your application is making. |
| 441 | |
| 442 | If @code{glob} succeeds, it returns 0. Otherwise, it returns one |
| 443 | of these error codes: |
| 444 | |
| 445 | @vtable @code |
| 446 | @comment glob.h |
| 447 | @comment POSIX.2 |
| 448 | @item GLOB_ABORTED |
| 449 | There was an error opening a directory, and you used the flag |
| 450 | @code{GLOB_ERR} or your specified @var{errfunc} returned a nonzero |
| 451 | value. |
| 452 | @iftex |
| 453 | See below |
| 454 | @end iftex |
| 455 | @ifinfo |
| 456 | @xref{Flags for Globbing}, |
| 457 | @end ifinfo |
| 458 | for an explanation of the @code{GLOB_ERR} flag and @var{errfunc}. |
| 459 | |
| 460 | @comment glob.h |
| 461 | @comment POSIX.2 |
| 462 | @item GLOB_NOMATCH |
| 463 | The pattern didn't match any existing files. If you use the |
| 464 | @code{GLOB_NOCHECK} flag, then you never get this error code, because |
| 465 | that flag tells @code{glob} to @emph{pretend} that the pattern matched |
| 466 | at least one file. |
| 467 | |
| 468 | @comment glob.h |
| 469 | @comment POSIX.2 |
| 470 | @item GLOB_NOSPACE |
| 471 | It was impossible to allocate memory to hold the result. |
| 472 | @end vtable |
| 473 | |
| 474 | In the event of an error, @code{glob} stores information in |
| 475 | @code{*@var{vector-ptr}} about all the matches it has found so far. |
| 476 | |
| 477 | It is important to notice that the @code{glob} function will not fail if |
| 478 | it encounters directories or files which cannot be handled without the |
| 479 | LFS interfaces. The implementation of @code{glob} is supposed to use |
| 480 | these functions internally. This at least is the assumptions made by |
| 481 | the Unix standard. The GNU extension of allowing the user to provide |
| 482 | own directory handling and @code{stat} functions complicates things a |
| 483 | bit. If these callback functions are used and a large file or directory |
| 484 | is encountered @code{glob} @emph{can} fail. |
| 485 | @end deftypefun |
| 486 | |
| 487 | @comment glob.h |
| 488 | @comment GNU |
| 489 | @deftypefun int glob64 (const char *@var{pattern}, int @var{flags}, int (*@var{errfunc}) (const char *@var{filename}, int @var{error-code}), glob64_t *@var{vector-ptr}) |
| 490 | @safety{@prelim{}@mtunsafe{@mtasurace{:utent} @mtsenv{} @mtascusig{:ALRM} @mtascutimer{} @mtslocale{}}@asunsafe{@ascudlopen{} @asucorrupt{} @ascuheap{} @asulock{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} |
| 491 | @c Same code as glob, but with glob64_t #defined as glob_t. |
| 492 | The @code{glob64} function was added as part of the Large File Summit |
| 493 | extensions but is not part of the original LFS proposal. The reason for |
| 494 | this is simple: it is not necessary. The necessity for a @code{glob64} |
| 495 | function is added by the extensions of the GNU @code{glob} |
| 496 | implementation which allows the user to provide own directory handling |
| 497 | and @code{stat} functions. The @code{readdir} and @code{stat} functions |
| 498 | do depend on the choice of @code{_FILE_OFFSET_BITS} since the definition |
| 499 | of the types @code{struct dirent} and @code{struct stat} will change |
| 500 | depending on the choice. |
| 501 | |
| 502 | Beside this difference the @code{glob64} works just like @code{glob} in |
| 503 | all aspects. |
| 504 | |
| 505 | This function is a GNU extension. |
| 506 | @end deftypefun |
| 507 | |
| 508 | @node Flags for Globbing |
| 509 | @subsection Flags for Globbing |
| 510 | |
| 511 | This section describes the standard flags that you can specify in the |
| 512 | @var{flags} argument to @code{glob}. Choose the flags you want, |
| 513 | and combine them with the C bitwise OR operator @code{|}. |
| 514 | |
| 515 | Note that there are @ref{More Flags for Globbing} available as GNU extensions. |
| 516 | |
| 517 | @vtable @code |
| 518 | @comment glob.h |
| 519 | @comment POSIX.2 |
| 520 | @item GLOB_APPEND |
| 521 | Append the words from this expansion to the vector of words produced by |
| 522 | previous calls to @code{glob}. This way you can effectively expand |
| 523 | several words as if they were concatenated with spaces between them. |
| 524 | |
| 525 | In order for appending to work, you must not modify the contents of the |
| 526 | word vector structure between calls to @code{glob}. And, if you set |
| 527 | @code{GLOB_DOOFFS} in the first call to @code{glob}, you must also |
| 528 | set it when you append to the results. |
| 529 | |
| 530 | Note that the pointer stored in @code{gl_pathv} may no longer be valid |
| 531 | after you call @code{glob} the second time, because @code{glob} might |
| 532 | have relocated the vector. So always fetch @code{gl_pathv} from the |
| 533 | @code{glob_t} structure after each @code{glob} call; @strong{never} save |
| 534 | the pointer across calls. |
| 535 | |
| 536 | @comment glob.h |
| 537 | @comment POSIX.2 |
| 538 | @item GLOB_DOOFFS |
| 539 | Leave blank slots at the beginning of the vector of words. |
| 540 | The @code{gl_offs} field says how many slots to leave. |
| 541 | The blank slots contain null pointers. |
| 542 | |
| 543 | @comment glob.h |
| 544 | @comment POSIX.2 |
| 545 | @item GLOB_ERR |
| 546 | Give up right away and report an error if there is any difficulty |
| 547 | reading the directories that must be read in order to expand @var{pattern} |
| 548 | fully. Such difficulties might include a directory in which you don't |
| 549 | have the requisite access. Normally, @code{glob} tries its best to keep |
| 550 | on going despite any errors, reading whatever directories it can. |
| 551 | |
| 552 | You can exercise even more control than this by specifying an |
| 553 | error-handler function @var{errfunc} when you call @code{glob}. If |
| 554 | @var{errfunc} is not a null pointer, then @code{glob} doesn't give up |
| 555 | right away when it can't read a directory; instead, it calls |
| 556 | @var{errfunc} with two arguments, like this: |
| 557 | |
| 558 | @smallexample |
| 559 | (*@var{errfunc}) (@var{filename}, @var{error-code}) |
| 560 | @end smallexample |
| 561 | |
| 562 | @noindent |
| 563 | The argument @var{filename} is the name of the directory that |
| 564 | @code{glob} couldn't open or couldn't read, and @var{error-code} is the |
| 565 | @code{errno} value that was reported to @code{glob}. |
| 566 | |
| 567 | If the error handler function returns nonzero, then @code{glob} gives up |
| 568 | right away. Otherwise, it continues. |
| 569 | |
| 570 | @comment glob.h |
| 571 | @comment POSIX.2 |
| 572 | @item GLOB_MARK |
| 573 | If the pattern matches the name of a directory, append @samp{/} to the |
| 574 | directory's name when returning it. |
| 575 | |
| 576 | @comment glob.h |
| 577 | @comment POSIX.2 |
| 578 | @item GLOB_NOCHECK |
| 579 | If the pattern doesn't match any file names, return the pattern itself |
| 580 | as if it were a file name that had been matched. (Normally, when the |
| 581 | pattern doesn't match anything, @code{glob} returns that there were no |
| 582 | matches.) |
| 583 | |
| 584 | @comment glob.h |
| 585 | @comment POSIX.2 |
| 586 | @item GLOB_NOESCAPE |
| 587 | Don't treat the @samp{\} character specially in patterns. Normally, |
| 588 | @samp{\} quotes the following character, turning off its special meaning |
| 589 | (if any) so that it matches only itself. When quoting is enabled, the |
| 590 | pattern @samp{\?} matches only the string @samp{?}, because the question |
| 591 | mark in the pattern acts like an ordinary character. |
| 592 | |
| 593 | If you use @code{GLOB_NOESCAPE}, then @samp{\} is an ordinary character. |
| 594 | |
| 595 | @code{glob} does its work by calling the function @code{fnmatch} |
| 596 | repeatedly. It handles the flag @code{GLOB_NOESCAPE} by turning on the |
| 597 | @code{FNM_NOESCAPE} flag in calls to @code{fnmatch}. |
| 598 | |
| 599 | @comment glob.h |
| 600 | @comment POSIX.2 |
| 601 | @item GLOB_NOSORT |
| 602 | Don't sort the file names; return them in no particular order. |
| 603 | (In practice, the order will depend on the order of the entries in |
| 604 | the directory.) The only reason @emph{not} to sort is to save time. |
| 605 | @end vtable |
| 606 | |
| 607 | @node More Flags for Globbing |
| 608 | @subsection More Flags for Globbing |
| 609 | |
| 610 | Beside the flags described in the last section, the GNU implementation of |
| 611 | @code{glob} allows a few more flags which are also defined in the |
| 612 | @file{glob.h} file. Some of the extensions implement functionality |
| 613 | which is available in modern shell implementations. |
| 614 | |
| 615 | @vtable @code |
| 616 | @comment glob.h |
| 617 | @comment GNU |
| 618 | @item GLOB_PERIOD |
| 619 | The @code{.} character (period) is treated special. It cannot be |
| 620 | matched by wildcards. @xref{Wildcard Matching}, @code{FNM_PERIOD}. |
| 621 | |
| 622 | @comment glob.h |
| 623 | @comment GNU |
| 624 | @item GLOB_MAGCHAR |
| 625 | The @code{GLOB_MAGCHAR} value is not to be given to @code{glob} in the |
| 626 | @var{flags} parameter. Instead, @code{glob} sets this bit in the |
| 627 | @var{gl_flags} element of the @var{glob_t} structure provided as the |
| 628 | result if the pattern used for matching contains any wildcard character. |
| 629 | |
| 630 | @comment glob.h |
| 631 | @comment GNU |
| 632 | @item GLOB_ALTDIRFUNC |
| 633 | Instead of the using the using the normal functions for accessing the |
| 634 | filesystem the @code{glob} implementation uses the user-supplied |
| 635 | functions specified in the structure pointed to by @var{pglob} |
| 636 | parameter. For more information about the functions refer to the |
| 637 | sections about directory handling see @ref{Accessing Directories}, and |
| 638 | @ref{Reading Attributes}. |
| 639 | |
| 640 | @comment glob.h |
| 641 | @comment GNU |
| 642 | @item GLOB_BRACE |
| 643 | If this flag is given the handling of braces in the pattern is changed. |
| 644 | It is now required that braces appear correctly grouped. I.e., for each |
| 645 | opening brace there must be a closing one. Braces can be used |
| 646 | recursively. So it is possible to define one brace expression in |
| 647 | another one. It is important to note that the range of each brace |
| 648 | expression is completely contained in the outer brace expression (if |
| 649 | there is one). |
| 650 | |
| 651 | The string between the matching braces is separated into single |
| 652 | expressions by splitting at @code{,} (comma) characters. The commas |
| 653 | themselves are discarded. Please note what we said above about recursive |
| 654 | brace expressions. The commas used to separate the subexpressions must |
| 655 | be at the same level. Commas in brace subexpressions are not matched. |
| 656 | They are used during expansion of the brace expression of the deeper |
| 657 | level. The example below shows this |
| 658 | |
| 659 | @smallexample |
| 660 | glob ("@{foo/@{,bar,biz@},baz@}", GLOB_BRACE, NULL, &result) |
| 661 | @end smallexample |
| 662 | |
| 663 | @noindent |
| 664 | is equivalent to the sequence |
| 665 | |
| 666 | @smallexample |
| 667 | glob ("foo/", GLOB_BRACE, NULL, &result) |
| 668 | glob ("foo/bar", GLOB_BRACE|GLOB_APPEND, NULL, &result) |
| 669 | glob ("foo/biz", GLOB_BRACE|GLOB_APPEND, NULL, &result) |
| 670 | glob ("baz", GLOB_BRACE|GLOB_APPEND, NULL, &result) |
| 671 | @end smallexample |
| 672 | |
| 673 | @noindent |
| 674 | if we leave aside error handling. |
| 675 | |
| 676 | @comment glob.h |
| 677 | @comment GNU |
| 678 | @item GLOB_NOMAGIC |
| 679 | If the pattern contains no wildcard constructs (it is a literal file name), |
| 680 | return it as the sole ``matching'' word, even if no file exists by that name. |
| 681 | |
| 682 | @comment glob.h |
| 683 | @comment GNU |
| 684 | @item GLOB_TILDE |
| 685 | If this flag is used the character @code{~} (tilde) is handled special |
| 686 | if it appears at the beginning of the pattern. Instead of being taken |
| 687 | verbatim it is used to represent the home directory of a known user. |
| 688 | |
| 689 | If @code{~} is the only character in pattern or it is followed by a |
| 690 | @code{/} (slash), the home directory of the process owner is |
| 691 | substituted. Using @code{getlogin} and @code{getpwnam} the information |
| 692 | is read from the system databases. As an example take user @code{bart} |
| 693 | with his home directory at @file{/home/bart}. For him a call like |
| 694 | |
| 695 | @smallexample |
| 696 | glob ("~/bin/*", GLOB_TILDE, NULL, &result) |
| 697 | @end smallexample |
| 698 | |
| 699 | @noindent |
| 700 | would return the contents of the directory @file{/home/bart/bin}. |
| 701 | Instead of referring to the own home directory it is also possible to |
| 702 | name the home directory of other users. To do so one has to append the |
| 703 | user name after the tilde character. So the contents of user |
| 704 | @code{homer}'s @file{bin} directory can be retrieved by |
| 705 | |
| 706 | @smallexample |
| 707 | glob ("~homer/bin/*", GLOB_TILDE, NULL, &result) |
| 708 | @end smallexample |
| 709 | |
| 710 | If the user name is not valid or the home directory cannot be determined |
| 711 | for some reason the pattern is left untouched and itself used as the |
| 712 | result. I.e., if in the last example @code{home} is not available the |
| 713 | tilde expansion yields to @code{"~homer/bin/*"} and @code{glob} is not |
| 714 | looking for a directory named @code{~homer}. |
| 715 | |
| 716 | This functionality is equivalent to what is available in C-shells if the |
| 717 | @code{nonomatch} flag is set. |
| 718 | |
| 719 | @comment glob.h |
| 720 | @comment GNU |
| 721 | @item GLOB_TILDE_CHECK |
| 722 | If this flag is used @code{glob} behaves like as if @code{GLOB_TILDE} is |
| 723 | given. The only difference is that if the user name is not available or |
| 724 | the home directory cannot be determined for other reasons this leads to |
| 725 | an error. @code{glob} will return @code{GLOB_NOMATCH} instead of using |
| 726 | the pattern itself as the name. |
| 727 | |
| 728 | This functionality is equivalent to what is available in C-shells if |
| 729 | @code{nonomatch} flag is not set. |
| 730 | |
| 731 | @comment glob.h |
| 732 | @comment GNU |
| 733 | @item GLOB_ONLYDIR |
| 734 | If this flag is used the globbing function takes this as a |
| 735 | @strong{hint} that the caller is only interested in directories |
| 736 | matching the pattern. If the information about the type of the file |
| 737 | is easily available non-directories will be rejected but no extra |
| 738 | work will be done to determine the information for each file. I.e., |
| 739 | the caller must still be able to filter directories out. |
| 740 | |
| 741 | This functionality is only available with the GNU @code{glob} |
| 742 | implementation. It is mainly used internally to increase the |
| 743 | performance but might be useful for a user as well and therefore is |
| 744 | documented here. |
| 745 | @end vtable |
| 746 | |
| 747 | Calling @code{glob} will in most cases allocate resources which are used |
| 748 | to represent the result of the function call. If the same object of |
| 749 | type @code{glob_t} is used in multiple call to @code{glob} the resources |
| 750 | are freed or reused so that no leaks appear. But this does not include |
| 751 | the time when all @code{glob} calls are done. |
| 752 | |
| 753 | @comment glob.h |
| 754 | @comment POSIX.2 |
| 755 | @deftypefun void globfree (glob_t *@var{pglob}) |
| 756 | @safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{} @ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}} |
| 757 | @c globfree dup @asucorrupt @ascuheap @acucorrupt @acsmem |
| 758 | @c free dup @ascuheap @acsmem |
| 759 | The @code{globfree} function frees all resources allocated by previous |
| 760 | calls to @code{glob} associated with the object pointed to by |
| 761 | @var{pglob}. This function should be called whenever the currently used |
| 762 | @code{glob_t} typed object isn't used anymore. |
| 763 | @end deftypefun |
| 764 | |
| 765 | @comment glob.h |
| 766 | @comment GNU |
| 767 | @deftypefun void globfree64 (glob64_t *@var{pglob}) |
| 768 | @safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{} @asulock{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} |
| 769 | This function is equivalent to @code{globfree} but it frees records of |
| 770 | type @code{glob64_t} which were allocated by @code{glob64}. |
| 771 | @end deftypefun |
| 772 | |
| 773 | |
| 774 | @node Regular Expressions |
| 775 | @section Regular Expression Matching |
| 776 | |
| 777 | @Theglibc{} supports two interfaces for matching regular |
| 778 | expressions. One is the standard POSIX.2 interface, and the other is |
| 779 | what @theglibc{} has had for many years. |
| 780 | |
| 781 | Both interfaces are declared in the header file @file{regex.h}. |
| 782 | If you define @w{@code{_POSIX_C_SOURCE}}, then only the POSIX.2 |
| 783 | functions, structures, and constants are declared. |
| 784 | @c !!! we only document the POSIX.2 interface here!! |
| 785 | |
| 786 | @menu |
| 787 | * POSIX Regexp Compilation:: Using @code{regcomp} to prepare to match. |
| 788 | * Flags for POSIX Regexps:: Syntax variations for @code{regcomp}. |
| 789 | * Matching POSIX Regexps:: Using @code{regexec} to match the compiled |
| 790 | pattern that you get from @code{regcomp}. |
| 791 | * Regexp Subexpressions:: Finding which parts of the string were matched. |
| 792 | * Subexpression Complications:: Find points of which parts were matched. |
| 793 | * Regexp Cleanup:: Freeing storage; reporting errors. |
| 794 | @end menu |
| 795 | |
| 796 | @node POSIX Regexp Compilation |
| 797 | @subsection POSIX Regular Expression Compilation |
| 798 | |
| 799 | Before you can actually match a regular expression, you must |
| 800 | @dfn{compile} it. This is not true compilation---it produces a special |
| 801 | data structure, not machine instructions. But it is like ordinary |
| 802 | compilation in that its purpose is to enable you to ``execute'' the |
| 803 | pattern fast. (@xref{Matching POSIX Regexps}, for how to use the |
| 804 | compiled regular expression for matching.) |
| 805 | |
| 806 | There is a special data type for compiled regular expressions: |
| 807 | |
| 808 | @comment regex.h |
| 809 | @comment POSIX.2 |
| 810 | @deftp {Data Type} regex_t |
| 811 | This type of object holds a compiled regular expression. |
| 812 | It is actually a structure. It has just one field that your programs |
| 813 | should look at: |
| 814 | |
| 815 | @table @code |
| 816 | @item re_nsub |
| 817 | This field holds the number of parenthetical subexpressions in the |
| 818 | regular expression that was compiled. |
| 819 | @end table |
| 820 | |
| 821 | There are several other fields, but we don't describe them here, because |
| 822 | only the functions in the library should use them. |
| 823 | @end deftp |
| 824 | |
| 825 | After you create a @code{regex_t} object, you can compile a regular |
| 826 | expression into it by calling @code{regcomp}. |
| 827 | |
| 828 | @comment regex.h |
| 829 | @comment POSIX.2 |
| 830 | @deftypefun int regcomp (regex_t *restrict @var{compiled}, const char *restrict @var{pattern}, int @var{cflags}) |
| 831 | @safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} |
| 832 | @c All of the issues have to do with memory allocation and multi-byte |
| 833 | @c character handling present in the input string, or implied by ranges |
| 834 | @c or inverted character classes. |
| 835 | @c (re_)malloc @ascuheap @acsmem |
| 836 | @c re_compile_internal @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 837 | @c (re_)realloc @ascuheap @acsmem [no @asucorrupt @acucorrupt for we zero the buffer] |
| 838 | @c init_dfa @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 839 | @c (re_)malloc @ascuheap @acsmem |
| 840 | @c calloc @ascuheap @acsmem |
| 841 | @c _NL_CURRENT ok |
| 842 | @c _NL_CURRENT_WORD ok |
| 843 | @c btowc @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 844 | @c libc_lock_init ok |
| 845 | @c re_string_construct @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 846 | @c re_string_construct_common ok |
| 847 | @c re_string_realloc_buffers @ascuheap @acsmem |
| 848 | @c (re_)realloc dup @ascuheap @acsmem |
| 849 | @c build_wcs_upper_buffer @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 850 | @c isascii ok |
| 851 | @c mbsinit ok |
| 852 | @c toupper ok |
| 853 | @c mbrtowc dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 854 | @c iswlower @mtslocale |
| 855 | @c towupper @mtslocale |
| 856 | @c wcrtomb dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 857 | @c (re_)malloc dup @ascuheap @acsmem |
| 858 | @c build_upper_buffer ok (@mtslocale but optimized) |
| 859 | @c islower ok |
| 860 | @c toupper ok |
| 861 | @c build_wcs_buffer @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 862 | @c mbrtowc dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 863 | @c re_string_translate_buffer ok |
| 864 | @c parse @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 865 | @c fetch_token @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 866 | @c peek_token @mtslocale |
| 867 | @c re_string_eoi ok |
| 868 | @c re_string_peek_byte ok |
| 869 | @c re_string_cur_idx ok |
| 870 | @c re_string_length ok |
| 871 | @c re_string_peek_byte_case @mtslocale |
| 872 | @c re_string_peek_byte dup ok |
| 873 | @c re_string_is_single_byte_char ok |
| 874 | @c isascii ok |
| 875 | @c re_string_peek_byte dup ok |
| 876 | @c re_string_wchar_at ok |
| 877 | @c re_string_skip_bytes ok |
| 878 | @c re_string_skip_bytes dup ok |
| 879 | @c parse_reg_exp @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 880 | @c parse_branch @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 881 | @c parse_expression @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 882 | @c create_token_tree dup @ascuheap @acsmem |
| 883 | @c re_string_eoi dup ok |
| 884 | @c re_string_first_byte ok |
| 885 | @c fetch_token dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 886 | @c create_tree dup @ascuheap @acsmem |
| 887 | @c parse_sub_exp @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 888 | @c fetch_token dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 889 | @c parse_reg_exp dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 890 | @c postorder() @ascuheap @acsmem |
| 891 | @c free_tree @ascuheap @acsmem |
| 892 | @c free_token dup @ascuheap @acsmem |
| 893 | @c create_tree dup @ascuheap @acsmem |
| 894 | @c parse_bracket_exp @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 895 | @c _NL_CURRENT dup ok |
| 896 | @c _NL_CURRENT_WORD dup ok |
| 897 | @c calloc dup @ascuheap @acsmem |
| 898 | @c (re_)free dup @ascuheap @acsmem |
| 899 | @c peek_token_bracket ok |
| 900 | @c re_string_eoi dup ok |
| 901 | @c re_string_peek_byte dup ok |
| 902 | @c re_string_first_byte dup ok |
| 903 | @c re_string_cur_idx dup ok |
| 904 | @c re_string_length dup ok |
| 905 | @c re_string_skip_bytes dup ok |
| 906 | @c bitset_set ok |
| 907 | @c re_string_skip_bytes ok |
| 908 | @c parse_bracket_element @mtslocale |
| 909 | @c re_string_char_size_at ok |
| 910 | @c re_string_wchar_at dup ok |
| 911 | @c re_string_skip_bytes dup ok |
| 912 | @c parse_bracket_symbol @mtslocale |
| 913 | @c re_string_eoi dup ok |
| 914 | @c re_string_fetch_byte_case @mtslocale |
| 915 | @c re_string_fetch_byte ok |
| 916 | @c re_string_first_byte dup ok |
| 917 | @c isascii ok |
| 918 | @c re_string_char_size_at dup ok |
| 919 | @c re_string_skip_bytes dup ok |
| 920 | @c re_string_fetch_byte dup ok |
| 921 | @c re_string_peek_byte dup ok |
| 922 | @c re_string_skip_bytes dup ok |
| 923 | @c peek_token_bracket dup ok |
| 924 | @c auto build_range_exp @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 925 | @c auto lookup_collation_sequence_value @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 926 | @c btowc dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 927 | @c collseq_table_lookup ok |
| 928 | @c auto seek_collating_symbol_entry dup ok |
| 929 | @c (re_)realloc dup @ascuheap @acsmem |
| 930 | @c collseq_table_lookup dup ok |
| 931 | @c bitset_set dup ok |
| 932 | @c (re_)realloc dup @ascuheap @acsmem |
| 933 | @c build_equiv_class @mtslocale @ascuheap @acsmem |
| 934 | @c _NL_CURRENT ok |
| 935 | @c auto findidx ok |
| 936 | @c bitset_set dup ok |
| 937 | @c (re_)realloc dup @ascuheap @acsmem |
| 938 | @c auto build_collating_symbol @ascuheap @acsmem |
| 939 | @c auto seek_collating_symbol_entry ok |
| 940 | @c bitset_set dup ok |
| 941 | @c (re_)realloc dup @ascuheap @acsmem |
| 942 | @c build_charclass @mtslocale @ascuheap @acsmem |
| 943 | @c (re_)realloc dup @ascuheap @acsmem |
| 944 | @c bitset_set dup ok |
| 945 | @c isalnum ok |
| 946 | @c iscntrl ok |
| 947 | @c isspace ok |
| 948 | @c isalpha ok |
| 949 | @c isdigit ok |
| 950 | @c isprint ok |
| 951 | @c isupper ok |
| 952 | @c isblank ok |
| 953 | @c isgraph ok |
| 954 | @c ispunct ok |
| 955 | @c isxdigit ok |
| 956 | @c bitset_not ok |
| 957 | @c bitset_mask ok |
| 958 | @c create_token_tree dup @ascuheap @acsmem |
| 959 | @c create_tree dup @ascuheap @acsmem |
| 960 | @c free_charset dup @ascuheap @acsmem |
| 961 | @c init_word_char @mtslocale |
| 962 | @c isalnum ok |
| 963 | @c build_charclass_op @mtslocale @ascuheap @acsmem |
| 964 | @c calloc dup @ascuheap @acsmem |
| 965 | @c build_charclass dup @mtslocale @ascuheap @acsmem |
| 966 | @c (re_)free dup @ascuheap @acsmem |
| 967 | @c free_charset dup @ascuheap @acsmem |
| 968 | @c bitset_set dup ok |
| 969 | @c bitset_not dup ok |
| 970 | @c bitset_mask dup ok |
| 971 | @c create_token_tree dup @ascuheap @acsmem |
| 972 | @c create_tree dup @ascuheap @acsmem |
| 973 | @c parse_dup_op @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 974 | @c re_string_cur_idx dup ok |
| 975 | @c fetch_number @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 976 | @c fetch_token dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 977 | @c re_string_set_index ok |
| 978 | @c postorder() @ascuheap @acsmem |
| 979 | @c free_tree dup @ascuheap @acsmem |
| 980 | @c mark_opt_subexp ok |
| 981 | @c duplicate_tree @ascuheap @acsmem |
| 982 | @c create_token_tree dup @ascuheap @acsmem |
| 983 | @c create_tree dup @ascuheap @acsmem |
| 984 | @c postorder() @ascuheap @acsmem |
| 985 | @c free_tree dup @ascuheap @acsmem |
| 986 | @c fetch_token dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 987 | @c parse_branch dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 988 | @c create_tree dup @ascuheap @acsmem |
| 989 | @c create_tree @ascuheap @acsmem |
| 990 | @c create_token_tree @ascuheap @acsmem |
| 991 | @c (re_)malloc dup @ascuheap @acsmem |
| 992 | @c analyze @ascuheap @acsmem |
| 993 | @c (re_)malloc dup @ascuheap @acsmem |
| 994 | @c preorder() @ascuheap @acsmem |
| 995 | @c optimize_subexps ok |
| 996 | @c calc_next ok |
| 997 | @c link_nfa_nodes @ascuheap @acsmem |
| 998 | @c re_node_set_init_1 @ascuheap @acsmem |
| 999 | @c (re_)malloc dup @ascuheap @acsmem |
| 1000 | @c re_node_set_init_2 @ascuheap @acsmem |
| 1001 | @c (re_)malloc dup @ascuheap @acsmem |
| 1002 | @c postorder() @ascuheap @acsmem |
| 1003 | @c lower_subexps @ascuheap @acsmem |
| 1004 | @c lower_subexp @ascuheap @acsmem |
| 1005 | @c create_tree dup @ascuheap @acsmem |
| 1006 | @c calc_first @ascuheap @acsmem |
| 1007 | @c re_dfa_add_node @ascuheap @acsmem |
| 1008 | @c (re_)realloc dup @ascuheap @acsmem |
| 1009 | @c re_node_set_init_empty ok |
| 1010 | @c calc_eclosure @ascuheap @acsmem |
| 1011 | @c calc_eclosure_iter @ascuheap @acsmem |
| 1012 | @c re_node_set_alloc @ascuheap @acsmem |
| 1013 | @c (re_)malloc dup @ascuheap @acsmem |
| 1014 | @c duplicate_node_closure @ascuheap @acsmem |
| 1015 | @c re_node_set_empty ok |
| 1016 | @c duplicate_node @ascuheap @acsmem |
| 1017 | @c re_dfa_add_node dup @ascuheap @acsmem |
| 1018 | @c re_node_set_insert @ascuheap @acsmem |
| 1019 | @c (re_)realloc dup @ascuheap @acsmem |
| 1020 | @c search_duplicated_node ok |
| 1021 | @c re_node_set_merge @ascuheap @acsmem |
| 1022 | @c (re_)realloc dup @ascuheap @acsmem |
| 1023 | @c re_node_set_free @ascuheap @acsmem |
| 1024 | @c (re_)free dup @ascuheap @acsmem |
| 1025 | @c re_node_set_insert dup @ascuheap @acsmem |
| 1026 | @c re_node_set_free dup @ascuheap @acsmem |
| 1027 | @c calc_inveclosure @ascuheap @acsmem |
| 1028 | @c re_node_set_init_empty dup ok |
| 1029 | @c re_node_set_insert_last @ascuheap @acsmem |
| 1030 | @c (re_)realloc dup @ascuheap @acsmem |
| 1031 | @c optimize_utf8 ok |
| 1032 | @c create_initial_state @ascuheap @acsmem |
| 1033 | @c re_node_set_init_copy @ascuheap @acsmem |
| 1034 | @c (re_)malloc dup @ascuheap @acsmem |
| 1035 | @c re_node_set_init_empty dup ok |
| 1036 | @c re_node_set_contains ok |
| 1037 | @c re_node_set_merge dup @ascuheap @acsmem |
| 1038 | @c re_acquire_state_context @ascuheap @acsmem |
| 1039 | @c calc_state_hash ok |
| 1040 | @c re_node_set_compare ok |
| 1041 | @c create_cd_newstate @ascuheap @acsmem |
| 1042 | @c calloc dup @ascuheap @acsmem |
| 1043 | @c re_node_set_init_copy dup @ascuheap @acsmem |
| 1044 | @c (re_)free dup @ascuheap @acsmem |
| 1045 | @c free_state @ascuheap @acsmem |
| 1046 | @c re_node_set_free dup @ascuheap @acsmem |
| 1047 | @c (re_)free dup @ascuheap @acsmem |
| 1048 | @c NOT_SATISFY_PREV_CONSTRAINT ok |
| 1049 | @c re_node_set_remove_at ok |
| 1050 | @c register_state @ascuheap @acsmem |
| 1051 | @c re_node_set_alloc dup @ascuheap @acsmem |
| 1052 | @c re_node_set_insert_last dup @ascuheap @acsmem |
| 1053 | @c (re_)realloc dup @ascuheap @acsmem |
| 1054 | @c re_node_set_free dup @ascuheap @acsmem |
| 1055 | @c free_workarea_compile @ascuheap @acsmem |
| 1056 | @c (re_)free dup @ascuheap @acsmem |
| 1057 | @c re_string_destruct @ascuheap @acsmem |
| 1058 | @c (re_)free dup @ascuheap @acsmem |
| 1059 | @c free_dfa_content @ascuheap @acsmem |
| 1060 | @c free_token @ascuheap @acsmem |
| 1061 | @c free_charset @ascuheap @acsmem |
| 1062 | @c (re_)free dup @ascuheap @acsmem |
| 1063 | @c (re_)free dup @ascuheap @acsmem |
| 1064 | @c (re_)free dup @ascuheap @acsmem |
| 1065 | @c re_node_set_free dup @ascuheap @acsmem |
| 1066 | @c re_compile_fastmap @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1067 | @c re_compile_fastmap_iter @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1068 | @c re_set_fastmap ok |
| 1069 | @c tolower ok |
| 1070 | @c mbrtowc dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1071 | @c wcrtomb dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1072 | @c towlower @mtslocale |
| 1073 | @c _NL_CURRENT ok |
| 1074 | @c (re_)free @ascuheap @acsmem |
| 1075 | The function @code{regcomp} ``compiles'' a regular expression into a |
| 1076 | data structure that you can use with @code{regexec} to match against a |
| 1077 | string. The compiled regular expression format is designed for |
| 1078 | efficient matching. @code{regcomp} stores it into @code{*@var{compiled}}. |
| 1079 | |
| 1080 | It's up to you to allocate an object of type @code{regex_t} and pass its |
| 1081 | address to @code{regcomp}. |
| 1082 | |
| 1083 | The argument @var{cflags} lets you specify various options that control |
| 1084 | the syntax and semantics of regular expressions. @xref{Flags for POSIX |
| 1085 | Regexps}. |
| 1086 | |
| 1087 | If you use the flag @code{REG_NOSUB}, then @code{regcomp} omits from |
| 1088 | the compiled regular expression the information necessary to record |
| 1089 | how subexpressions actually match. In this case, you might as well |
| 1090 | pass @code{0} for the @var{matchptr} and @var{nmatch} arguments when |
| 1091 | you call @code{regexec}. |
| 1092 | |
| 1093 | If you don't use @code{REG_NOSUB}, then the compiled regular expression |
| 1094 | does have the capacity to record how subexpressions match. Also, |
| 1095 | @code{regcomp} tells you how many subexpressions @var{pattern} has, by |
| 1096 | storing the number in @code{@var{compiled}->re_nsub}. You can use that |
| 1097 | value to decide how long an array to allocate to hold information about |
| 1098 | subexpression matches. |
| 1099 | |
| 1100 | @code{regcomp} returns @code{0} if it succeeds in compiling the regular |
| 1101 | expression; otherwise, it returns a nonzero error code (see the table |
| 1102 | below). You can use @code{regerror} to produce an error message string |
| 1103 | describing the reason for a nonzero value; see @ref{Regexp Cleanup}. |
| 1104 | |
| 1105 | @end deftypefun |
| 1106 | |
| 1107 | Here are the possible nonzero values that @code{regcomp} can return: |
| 1108 | |
| 1109 | @table @code |
| 1110 | @comment regex.h |
| 1111 | @comment POSIX.2 |
| 1112 | @item REG_BADBR |
| 1113 | There was an invalid @samp{\@{@dots{}\@}} construct in the regular |
| 1114 | expression. A valid @samp{\@{@dots{}\@}} construct must contain either |
| 1115 | a single number, or two numbers in increasing order separated by a |
| 1116 | comma. |
| 1117 | |
| 1118 | @comment regex.h |
| 1119 | @comment POSIX.2 |
| 1120 | @item REG_BADPAT |
| 1121 | There was a syntax error in the regular expression. |
| 1122 | |
| 1123 | @comment regex.h |
| 1124 | @comment POSIX.2 |
| 1125 | @item REG_BADRPT |
| 1126 | A repetition operator such as @samp{?} or @samp{*} appeared in a bad |
| 1127 | position (with no preceding subexpression to act on). |
| 1128 | |
| 1129 | @comment regex.h |
| 1130 | @comment POSIX.2 |
| 1131 | @item REG_ECOLLATE |
| 1132 | The regular expression referred to an invalid collating element (one not |
| 1133 | defined in the current locale for string collation). @xref{Locale |
| 1134 | Categories}. |
| 1135 | |
| 1136 | @comment regex.h |
| 1137 | @comment POSIX.2 |
| 1138 | @item REG_ECTYPE |
| 1139 | The regular expression referred to an invalid character class name. |
| 1140 | |
| 1141 | @comment regex.h |
| 1142 | @comment POSIX.2 |
| 1143 | @item REG_EESCAPE |
| 1144 | The regular expression ended with @samp{\}. |
| 1145 | |
| 1146 | @comment regex.h |
| 1147 | @comment POSIX.2 |
| 1148 | @item REG_ESUBREG |
| 1149 | There was an invalid number in the @samp{\@var{digit}} construct. |
| 1150 | |
| 1151 | @comment regex.h |
| 1152 | @comment POSIX.2 |
| 1153 | @item REG_EBRACK |
| 1154 | There were unbalanced square brackets in the regular expression. |
| 1155 | |
| 1156 | @comment regex.h |
| 1157 | @comment POSIX.2 |
| 1158 | @item REG_EPAREN |
| 1159 | An extended regular expression had unbalanced parentheses, |
| 1160 | or a basic regular expression had unbalanced @samp{\(} and @samp{\)}. |
| 1161 | |
| 1162 | @comment regex.h |
| 1163 | @comment POSIX.2 |
| 1164 | @item REG_EBRACE |
| 1165 | The regular expression had unbalanced @samp{\@{} and @samp{\@}}. |
| 1166 | |
| 1167 | @comment regex.h |
| 1168 | @comment POSIX.2 |
| 1169 | @item REG_ERANGE |
| 1170 | One of the endpoints in a range expression was invalid. |
| 1171 | |
| 1172 | @comment regex.h |
| 1173 | @comment POSIX.2 |
| 1174 | @item REG_ESPACE |
| 1175 | @code{regcomp} ran out of memory. |
| 1176 | @end table |
| 1177 | |
| 1178 | @node Flags for POSIX Regexps |
| 1179 | @subsection Flags for POSIX Regular Expressions |
| 1180 | |
| 1181 | These are the bit flags that you can use in the @var{cflags} operand when |
| 1182 | compiling a regular expression with @code{regcomp}. |
| 1183 | |
| 1184 | @table @code |
| 1185 | @comment regex.h |
| 1186 | @comment POSIX.2 |
| 1187 | @item REG_EXTENDED |
| 1188 | Treat the pattern as an extended regular expression, rather than as a |
| 1189 | basic regular expression. |
| 1190 | |
| 1191 | @comment regex.h |
| 1192 | @comment POSIX.2 |
| 1193 | @item REG_ICASE |
| 1194 | Ignore case when matching letters. |
| 1195 | |
| 1196 | @comment regex.h |
| 1197 | @comment POSIX.2 |
| 1198 | @item REG_NOSUB |
| 1199 | Don't bother storing the contents of the @var{matches-ptr} array. |
| 1200 | |
| 1201 | @comment regex.h |
| 1202 | @comment POSIX.2 |
| 1203 | @item REG_NEWLINE |
| 1204 | Treat a newline in @var{string} as dividing @var{string} into multiple |
| 1205 | lines, so that @samp{$} can match before the newline and @samp{^} can |
| 1206 | match after. Also, don't permit @samp{.} to match a newline, and don't |
| 1207 | permit @samp{[^@dots{}]} to match a newline. |
| 1208 | |
| 1209 | Otherwise, newline acts like any other ordinary character. |
| 1210 | @end table |
| 1211 | |
| 1212 | @node Matching POSIX Regexps |
| 1213 | @subsection Matching a Compiled POSIX Regular Expression |
| 1214 | |
| 1215 | Once you have compiled a regular expression, as described in @ref{POSIX |
| 1216 | Regexp Compilation}, you can match it against strings using |
| 1217 | @code{regexec}. A match anywhere inside the string counts as success, |
| 1218 | unless the regular expression contains anchor characters (@samp{^} or |
| 1219 | @samp{$}). |
| 1220 | |
| 1221 | @comment regex.h |
| 1222 | @comment POSIX.2 |
| 1223 | @deftypefun int regexec (const regex_t *restrict @var{compiled}, const char *restrict @var{string}, size_t @var{nmatch}, regmatch_t @var{matchptr}[restrict], int @var{eflags}) |
| 1224 | @safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} |
| 1225 | @c libc_lock_lock @asulock @aculock |
| 1226 | @c re_search_internal @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1227 | @c re_string_allocate @ascuheap @acsmem |
| 1228 | @c re_string_construct_common dup ok |
| 1229 | @c re_string_realloc_buffers dup @ascuheap @acsmem |
| 1230 | @c match_ctx_init @ascuheap @acsmem |
| 1231 | @c (re_)malloc dup @ascuheap @acsmem |
| 1232 | @c re_string_byte_at ok |
| 1233 | @c re_string_first_byte dup ok |
| 1234 | @c check_matching @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1235 | @c re_string_cur_idx dup ok |
| 1236 | @c acquire_init_state_context dup @ascuheap @acsmem |
| 1237 | @c re_string_context_at ok |
| 1238 | @c re_string_byte_at dup ok |
| 1239 | @c bitset_contain ok |
| 1240 | @c re_acquire_state_context dup @ascuheap @acsmem |
| 1241 | @c check_subexp_matching_top @ascuheap @acsmem |
| 1242 | @c match_ctx_add_subtop @ascuheap @acsmem |
| 1243 | @c (re_)realloc dup @ascuheap @acsmem |
| 1244 | @c calloc dup @ascuheap @acsmem |
| 1245 | @c transit_state_bkref @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1246 | @c re_string_cur_idx dup ok |
| 1247 | @c re_string_context_at dup ok |
| 1248 | @c NOT_SATISFY_NEXT_CONSTRAINT ok |
| 1249 | @c get_subexp @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1250 | @c re_string_get_buffer ok |
| 1251 | @c search_cur_bkref_entry ok |
| 1252 | @c clean_state_log_if_needed @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1253 | @c extend_buffers @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1254 | @c re_string_realloc_buffers dup @ascuheap @acsmem |
| 1255 | @c (re_)realloc dup @ascuheap @acsmem |
| 1256 | @c build_wcs_upper_buffer dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1257 | @c build_upper_buffer dup ok (@mtslocale but optimized) |
| 1258 | @c build_wcs_buffer dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1259 | @c re_string_translate_buffer dup ok |
| 1260 | @c get_subexp_sub @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1261 | @c check_arrival @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1262 | @c (re_)realloc dup @ascuheap @acsmem |
| 1263 | @c re_string_context_at dup ok |
| 1264 | @c re_node_set_init_1 dup @ascuheap @acsmem |
| 1265 | @c check_arrival_expand_ecl @ascuheap @acsmem |
| 1266 | @c re_node_set_alloc dup @ascuheap @acsmem |
| 1267 | @c find_subexp_node ok |
| 1268 | @c re_node_set_merge dup @ascuheap @acsmem |
| 1269 | @c re_node_set_free dup @ascuheap @acsmem |
| 1270 | @c check_arrival_expand_ecl_sub @ascuheap @acsmem |
| 1271 | @c re_node_set_contains dup ok |
| 1272 | @c re_node_set_insert dup @ascuheap @acsmem |
| 1273 | @c re_node_set_free dup @ascuheap @acsmem |
| 1274 | @c re_node_set_init_copy dup @ascuheap @acsmem |
| 1275 | @c re_node_set_init_empty dup ok |
| 1276 | @c expand_bkref_cache @ascuheap @acsmem |
| 1277 | @c search_cur_bkref_entry dup ok |
| 1278 | @c re_node_set_contains dup ok |
| 1279 | @c re_node_set_init_1 dup @ascuheap @acsmem |
| 1280 | @c check_arrival_expand_ecl dup @ascuheap @acsmem |
| 1281 | @c re_node_set_merge dup @ascuheap @acsmem |
| 1282 | @c re_node_set_init_copy dup @ascuheap @acsmem |
| 1283 | @c re_node_set_insert dup @ascuheap @acsmem |
| 1284 | @c re_node_set_free dup @ascuheap @acsmem |
| 1285 | @c re_acquire_state @ascuheap @acsmem |
| 1286 | @c calc_state_hash dup ok |
| 1287 | @c re_node_set_compare dup ok |
| 1288 | @c create_ci_newstate @ascuheap @acsmem |
| 1289 | @c calloc dup @ascuheap @acsmem |
| 1290 | @c re_node_set_init_copy dup @ascuheap @acsmem |
| 1291 | @c (re_)free dup @ascuheap @acsmem |
| 1292 | @c register_state dup @ascuheap @acsmem |
| 1293 | @c free_state dup @ascuheap @acsmem |
| 1294 | @c re_acquire_state_context dup @ascuheap @acsmem |
| 1295 | @c re_node_set_merge dup @ascuheap @acsmem |
| 1296 | @c check_arrival_add_next_nodes @mtslocale @ascuheap @acsmem |
| 1297 | @c re_node_set_init_empty dup ok |
| 1298 | @c check_node_accept_bytes @mtslocale @ascuheap @acsmem |
| 1299 | @c re_string_byte_at dup ok |
| 1300 | @c re_string_char_size_at dup ok |
| 1301 | @c re_string_elem_size_at @mtslocale |
| 1302 | @c _NL_CURRENT_WORD dup ok |
| 1303 | @c _NL_CURRENT dup ok |
| 1304 | @c auto findidx dup ok |
| 1305 | @c _NL_CURRENT_WORD dup ok |
| 1306 | @c _NL_CURRENT dup ok |
| 1307 | @c collseq_table_lookup dup ok |
| 1308 | @c find_collation_sequence_value @mtslocale |
| 1309 | @c _NL_CURRENT_WORD dup ok |
| 1310 | @c _NL_CURRENT dup ok |
| 1311 | @c auto findidx dup ok |
| 1312 | @c wcscoll @mtslocale @ascuheap @acsmem |
| 1313 | @c re_node_set_empty dup ok |
| 1314 | @c re_node_set_merge dup @ascuheap @acsmem |
| 1315 | @c re_node_set_free dup @ascuheap @acsmem |
| 1316 | @c re_node_set_insert dup @ascuheap @acsmem |
| 1317 | @c re_acquire_state dup @ascuheap @acsmem |
| 1318 | @c check_node_accept ok |
| 1319 | @c re_string_byte_at dup ok |
| 1320 | @c bitset_contain dup ok |
| 1321 | @c re_string_context_at dup ok |
| 1322 | @c NOT_SATISFY_NEXT_CONSTRAINT dup ok |
| 1323 | @c match_ctx_add_entry @ascuheap @acsmem |
| 1324 | @c (re_)realloc dup @ascuheap @acsmem |
| 1325 | @c (re_)free dup @ascuheap @acsmem |
| 1326 | @c clean_state_log_if_needed dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1327 | @c extend_buffers dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1328 | @c find_subexp_node dup ok |
| 1329 | @c calloc dup @ascuheap @acsmem |
| 1330 | @c check_arrival dup *** |
| 1331 | @c match_ctx_add_sublast @ascuheap @acsmem |
| 1332 | @c (re_)realloc dup @ascuheap @acsmem |
| 1333 | @c re_acquire_state_context dup @ascuheap @acsmem |
| 1334 | @c re_node_set_init_union @ascuheap @acsmem |
| 1335 | @c (re_)malloc dup @ascuheap @acsmem |
| 1336 | @c re_node_set_init_copy dup @ascuheap @acsmem |
| 1337 | @c re_node_set_init_empty dup ok |
| 1338 | @c re_node_set_free dup @ascuheap @acsmem |
| 1339 | @c check_subexp_matching_top dup @ascuheap @acsmem |
| 1340 | @c check_halt_state_context ok |
| 1341 | @c re_string_context_at dup ok |
| 1342 | @c check_halt_node_context ok |
| 1343 | @c NOT_SATISFY_NEXT_CONSTRAINT dup ok |
| 1344 | @c re_string_eoi dup ok |
| 1345 | @c extend_buffers dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1346 | @c transit_state @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1347 | @c transit_state_mb @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1348 | @c re_string_context_at dup ok |
| 1349 | @c NOT_SATISFY_NEXT_CONSTRAINT dup ok |
| 1350 | @c check_node_accept_bytes dup @mtslocale @ascuheap @acsmem |
| 1351 | @c re_string_cur_idx dup ok |
| 1352 | @c clean_state_log_if_needed @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1353 | @c re_node_set_init_union dup @ascuheap @acsmem |
| 1354 | @c re_acquire_state_context dup @ascuheap @acsmem |
| 1355 | @c re_string_fetch_byte dup ok |
| 1356 | @c re_string_context_at dup ok |
| 1357 | @c build_trtable @ascuheap @acsmem |
| 1358 | @c (re_)malloc dup @ascuheap @acsmem |
| 1359 | @c group_nodes_into_DFAstates @ascuheap @acsmem |
| 1360 | @c bitset_empty dup ok |
| 1361 | @c bitset_set dup ok |
| 1362 | @c bitset_merge dup ok |
| 1363 | @c bitset_set_all ok |
| 1364 | @c bitset_clear ok |
| 1365 | @c bitset_contain dup ok |
| 1366 | @c bitset_copy ok |
| 1367 | @c re_node_set_init_copy dup @ascuheap @acsmem |
| 1368 | @c re_node_set_insert dup @ascuheap @acsmem |
| 1369 | @c re_node_set_init_1 dup @ascuheap @acsmem |
| 1370 | @c re_node_set_free dup @ascuheap @acsmem |
| 1371 | @c re_node_set_alloc dup @ascuheap @acsmem |
| 1372 | @c malloc dup @ascuheap @acsmem |
| 1373 | @c free dup @ascuheap @acsmem |
| 1374 | @c re_node_set_free dup @ascuheap @acsmem |
| 1375 | @c bitset_empty ok |
| 1376 | @c re_node_set_empty dup ok |
| 1377 | @c re_node_set_merge dup @ascuheap @acsmem |
| 1378 | @c re_acquire_state_context dup @ascuheap @acsmem |
| 1379 | @c bitset_merge ok |
| 1380 | @c calloc dup @ascuheap @acsmem |
| 1381 | @c bitset_contain dup ok |
| 1382 | @c merge_state_with_log @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1383 | @c re_string_cur_idx dup ok |
| 1384 | @c re_node_set_init_union dup @ascuheap @acsmem |
| 1385 | @c re_string_context_at dup ok |
| 1386 | @c re_node_set_free dup @ascuheap @acsmem |
| 1387 | @c check_subexp_matching_top @ascuheap @acsmem |
| 1388 | @c match_ctx_add_subtop dup @ascuheap @acsmem |
| 1389 | @c transit_state_bkref dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1390 | @c find_recover_state |
| 1391 | @c re_string_cur_idx dup ok |
| 1392 | @c re_string_skip_bytes dup ok |
| 1393 | @c merge_state_with_log dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd |
| 1394 | @c check_halt_state_context dup ok |
| 1395 | @c prune_impossible_nodes @mtslocale @ascuheap @acsmem |
| 1396 | @c (re_)malloc dup @ascuheap @acsmem |
| 1397 | @c sift_ctx_init ok |
| 1398 | @c re_node_set_init_empty dup ok |
| 1399 | @c sift_states_backward @mtslocale @ascuheap @acsmem |
| 1400 | @c re_node_set_init_1 dup @ascuheap @acsmem |
| 1401 | @c update_cur_sifted_state @mtslocale @ascuheap @acsmem |
| 1402 | @c add_epsilon_src_nodes @ascuheap @acsmem |
| 1403 | @c re_acquire_state dup @ascuheap @acsmem |
| 1404 | @c re_node_set_alloc dup @ascuheap @acsmem |
| 1405 | @c re_node_set_merge dup @ascuheap @acsmem |
| 1406 | @c re_node_set_add_intersect @ascuheap @acsmem |
| 1407 | @c (re_)realloc dup @ascuheap @acsmem |
| 1408 | @c check_subexp_limits @ascuheap @acsmem |
| 1409 | @c sub_epsilon_src_nodes @ascuheap @acsmem |
| 1410 | @c re_node_set_init_empty dup ok |
| 1411 | @c re_node_set_contains dup ok |
| 1412 | @c re_node_set_add_intersect dup @ascuheap @acsmem |
| 1413 | @c re_node_set_free dup @ascuheap @acsmem |
| 1414 | @c re_node_set_remove_at dup ok |
| 1415 | @c re_node_set_contains dup ok |
| 1416 | @c re_acquire_state dup @ascuheap @acsmem |
| 1417 | @c sift_states_bkref @mtslocale @ascuheap @acsmem |
| 1418 | @c search_cur_bkref_entry dup ok |
| 1419 | @c check_dst_limits ok |
| 1420 | @c search_cur_bkref_entry dup ok |
| 1421 | @c check_dst_limits_calc_pos ok |
| 1422 | @c check_dst_limits_calc_pos_1 ok |
| 1423 | @c re_node_set_init_copy dup @ascuheap @acsmem |
| 1424 | @c re_node_set_insert dup @ascuheap @acsmem |
| 1425 | @c sift_states_backward dup @mtslocale @ascuheap @acsmem |
| 1426 | @c merge_state_array dup @ascuheap @acsmem |
| 1427 | @c re_node_set_remove ok |
| 1428 | @c re_node_set_contains dup ok |
| 1429 | @c re_node_set_remove_at dup ok |
| 1430 | @c re_node_set_free dup @ascuheap @acsmem |
| 1431 | @c re_node_set_free dup @ascuheap @acsmem |
| 1432 | @c re_node_set_empty dup ok |
| 1433 | @c build_sifted_states @mtslocale @ascuheap @acsmem |
| 1434 | @c sift_states_iter_mb @mtslocale @ascuheap @acsmem |
| 1435 | @c check_node_accept_bytes dup @mtslocale @ascuheap @acsmem |
| 1436 | @c check_node_accept dup ok |
| 1437 | @c check_dst_limits dup ok |
| 1438 | @c re_node_set_insert dup @ascuheap @acsmem |
| 1439 | @c re_node_set_free dup @ascuheap @acsmem |
| 1440 | @c check_halt_state_context dup ok |
| 1441 | @c merge_state_array @ascuheap @acsmem |
| 1442 | @c re_node_set_init_union dup @ascuheap @acsmem |
| 1443 | @c re_acquire_state dup @ascuheap @acsmem |
| 1444 | @c re_node_set_free dup @ascuheap @acsmem |
| 1445 | @c (re_)free dup @ascuheap @acsmem |
| 1446 | @c set_regs @ascuheap @acsmem |
| 1447 | @c (re_)malloc dup @ascuheap @acsmem |
| 1448 | @c re_node_set_init_empty dup ok |
| 1449 | @c free_fail_stack_return @ascuheap @acsmem |
| 1450 | @c re_node_set_free dup @ascuheap @acsmem |
| 1451 | @c (re_)free dup @ascuheap @acsmem |
| 1452 | @c update_regs ok |
| 1453 | @c re_node_set_free dup @ascuheap @acsmem |
| 1454 | @c pop_fail_stack @ascuheap @acsmem |
| 1455 | @c re_node_set_free dup @ascuheap @acsmem |
| 1456 | @c (re_)free dup @ascuheap @acsmem |
| 1457 | @c (re_)free dup @ascuheap @acsmem |
| 1458 | @c (re_)free dup @ascuheap @acsmem |
| 1459 | @c match_ctx_free @ascuheap @acsmem |
| 1460 | @c match_ctx_clean @ascuheap @acsmem |
| 1461 | @c (re_)free dup @ascuheap @acsmem |
| 1462 | @c (re_)free dup @ascuheap @acsmem |
| 1463 | @c re_string_destruct dup @ascuheap @acsmem |
| 1464 | @c libc_lock_unlock @aculock |
| 1465 | This function tries to match the compiled regular expression |
| 1466 | @code{*@var{compiled}} against @var{string}. |
| 1467 | |
| 1468 | @code{regexec} returns @code{0} if the regular expression matches; |
| 1469 | otherwise, it returns a nonzero value. See the table below for |
| 1470 | what nonzero values mean. You can use @code{regerror} to produce an |
| 1471 | error message string describing the reason for a nonzero value; |
| 1472 | see @ref{Regexp Cleanup}. |
| 1473 | |
| 1474 | The argument @var{eflags} is a word of bit flags that enable various |
| 1475 | options. |
| 1476 | |
| 1477 | If you want to get information about what part of @var{string} actually |
| 1478 | matched the regular expression or its subexpressions, use the arguments |
| 1479 | @var{matchptr} and @var{nmatch}. Otherwise, pass @code{0} for |
| 1480 | @var{nmatch}, and @code{NULL} for @var{matchptr}. @xref{Regexp |
| 1481 | Subexpressions}. |
| 1482 | @end deftypefun |
| 1483 | |
| 1484 | You must match the regular expression with the same set of current |
| 1485 | locales that were in effect when you compiled the regular expression. |
| 1486 | |
| 1487 | The function @code{regexec} accepts the following flags in the |
| 1488 | @var{eflags} argument: |
| 1489 | |
| 1490 | @table @code |
| 1491 | @comment regex.h |
| 1492 | @comment POSIX.2 |
| 1493 | @item REG_NOTBOL |
| 1494 | Do not regard the beginning of the specified string as the beginning of |
| 1495 | a line; more generally, don't make any assumptions about what text might |
| 1496 | precede it. |
| 1497 | |
| 1498 | @comment regex.h |
| 1499 | @comment POSIX.2 |
| 1500 | @item REG_NOTEOL |
| 1501 | Do not regard the end of the specified string as the end of a line; more |
| 1502 | generally, don't make any assumptions about what text might follow it. |
| 1503 | @end table |
| 1504 | |
| 1505 | Here are the possible nonzero values that @code{regexec} can return: |
| 1506 | |
| 1507 | @table @code |
| 1508 | @comment regex.h |
| 1509 | @comment POSIX.2 |
| 1510 | @item REG_NOMATCH |
| 1511 | The pattern didn't match the string. This isn't really an error. |
| 1512 | |
| 1513 | @comment regex.h |
| 1514 | @comment POSIX.2 |
| 1515 | @item REG_ESPACE |
| 1516 | @code{regexec} ran out of memory. |
| 1517 | @end table |
| 1518 | |
| 1519 | @node Regexp Subexpressions |
| 1520 | @subsection Match Results with Subexpressions |
| 1521 | |
| 1522 | When @code{regexec} matches parenthetical subexpressions of |
| 1523 | @var{pattern}, it records which parts of @var{string} they match. It |
| 1524 | returns that information by storing the offsets into an array whose |
| 1525 | elements are structures of type @code{regmatch_t}. The first element of |
| 1526 | the array (index @code{0}) records the part of the string that matched |
| 1527 | the entire regular expression. Each other element of the array records |
| 1528 | the beginning and end of the part that matched a single parenthetical |
| 1529 | subexpression. |
| 1530 | |
| 1531 | @comment regex.h |
| 1532 | @comment POSIX.2 |
| 1533 | @deftp {Data Type} regmatch_t |
| 1534 | This is the data type of the @var{matcharray} array that you pass to |
| 1535 | @code{regexec}. It contains two structure fields, as follows: |
| 1536 | |
| 1537 | @table @code |
| 1538 | @item rm_so |
| 1539 | The offset in @var{string} of the beginning of a substring. Add this |
| 1540 | value to @var{string} to get the address of that part. |
| 1541 | |
| 1542 | @item rm_eo |
| 1543 | The offset in @var{string} of the end of the substring. |
| 1544 | @end table |
| 1545 | @end deftp |
| 1546 | |
| 1547 | @comment regex.h |
| 1548 | @comment POSIX.2 |
| 1549 | @deftp {Data Type} regoff_t |
| 1550 | @code{regoff_t} is an alias for another signed integer type. |
| 1551 | The fields of @code{regmatch_t} have type @code{regoff_t}. |
| 1552 | @end deftp |
| 1553 | |
| 1554 | The @code{regmatch_t} elements correspond to subexpressions |
| 1555 | positionally; the first element (index @code{1}) records where the first |
| 1556 | subexpression matched, the second element records the second |
| 1557 | subexpression, and so on. The order of the subexpressions is the order |
| 1558 | in which they begin. |
| 1559 | |
| 1560 | When you call @code{regexec}, you specify how long the @var{matchptr} |
| 1561 | array is, with the @var{nmatch} argument. This tells @code{regexec} how |
| 1562 | many elements to store. If the actual regular expression has more than |
| 1563 | @var{nmatch} subexpressions, then you won't get offset information about |
| 1564 | the rest of them. But this doesn't alter whether the pattern matches a |
| 1565 | particular string or not. |
| 1566 | |
| 1567 | If you don't want @code{regexec} to return any information about where |
| 1568 | the subexpressions matched, you can either supply @code{0} for |
| 1569 | @var{nmatch}, or use the flag @code{REG_NOSUB} when you compile the |
| 1570 | pattern with @code{regcomp}. |
| 1571 | |
| 1572 | @node Subexpression Complications |
| 1573 | @subsection Complications in Subexpression Matching |
| 1574 | |
| 1575 | Sometimes a subexpression matches a substring of no characters. This |
| 1576 | happens when @samp{f\(o*\)} matches the string @samp{fum}. (It really |
| 1577 | matches just the @samp{f}.) In this case, both of the offsets identify |
| 1578 | the point in the string where the null substring was found. In this |
| 1579 | example, the offsets are both @code{1}. |
| 1580 | |
| 1581 | Sometimes the entire regular expression can match without using some of |
| 1582 | its subexpressions at all---for example, when @samp{ba\(na\)*} matches the |
| 1583 | string @samp{ba}, the parenthetical subexpression is not used. When |
| 1584 | this happens, @code{regexec} stores @code{-1} in both fields of the |
| 1585 | element for that subexpression. |
| 1586 | |
| 1587 | Sometimes matching the entire regular expression can match a particular |
| 1588 | subexpression more than once---for example, when @samp{ba\(na\)*} |
| 1589 | matches the string @samp{bananana}, the parenthetical subexpression |
| 1590 | matches three times. When this happens, @code{regexec} usually stores |
| 1591 | the offsets of the last part of the string that matched the |
| 1592 | subexpression. In the case of @samp{bananana}, these offsets are |
| 1593 | @code{6} and @code{8}. |
| 1594 | |
| 1595 | But the last match is not always the one that is chosen. It's more |
| 1596 | accurate to say that the last @emph{opportunity} to match is the one |
| 1597 | that takes precedence. What this means is that when one subexpression |
| 1598 | appears within another, then the results reported for the inner |
| 1599 | subexpression reflect whatever happened on the last match of the outer |
| 1600 | subexpression. For an example, consider @samp{\(ba\(na\)*s \)*} matching |
| 1601 | the string @samp{bananas bas }. The last time the inner expression |
| 1602 | actually matches is near the end of the first word. But it is |
| 1603 | @emph{considered} again in the second word, and fails to match there. |
| 1604 | @code{regexec} reports nonuse of the ``na'' subexpression. |
| 1605 | |
| 1606 | Another place where this rule applies is when the regular expression |
| 1607 | @smallexample |
| 1608 | \(ba\(na\)*s \|nefer\(ti\)* \)* |
| 1609 | @end smallexample |
| 1610 | @noindent |
| 1611 | matches @samp{bananas nefertiti}. The ``na'' subexpression does match |
| 1612 | in the first word, but it doesn't match in the second word because the |
| 1613 | other alternative is used there. Once again, the second repetition of |
| 1614 | the outer subexpression overrides the first, and within that second |
| 1615 | repetition, the ``na'' subexpression is not used. So @code{regexec} |
| 1616 | reports nonuse of the ``na'' subexpression. |
| 1617 | |
| 1618 | @node Regexp Cleanup |
| 1619 | @subsection POSIX Regexp Matching Cleanup |
| 1620 | |
| 1621 | When you are finished using a compiled regular expression, you can |
| 1622 | free the storage it uses by calling @code{regfree}. |
| 1623 | |
| 1624 | @comment regex.h |
| 1625 | @comment POSIX.2 |
| 1626 | @deftypefun void regfree (regex_t *@var{compiled}) |
| 1627 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| 1628 | @c (re_)free dup @ascuheap @acsmem |
| 1629 | @c free_dfa_content dup @ascuheap @acsmem |
| 1630 | Calling @code{regfree} frees all the storage that @code{*@var{compiled}} |
| 1631 | points to. This includes various internal fields of the @code{regex_t} |
| 1632 | structure that aren't documented in this manual. |
| 1633 | |
| 1634 | @code{regfree} does not free the object @code{*@var{compiled}} itself. |
| 1635 | @end deftypefun |
| 1636 | |
| 1637 | You should always free the space in a @code{regex_t} structure with |
| 1638 | @code{regfree} before using the structure to compile another regular |
| 1639 | expression. |
| 1640 | |
| 1641 | When @code{regcomp} or @code{regexec} reports an error, you can use |
| 1642 | the function @code{regerror} to turn it into an error message string. |
| 1643 | |
| 1644 | @comment regex.h |
| 1645 | @comment POSIX.2 |
| 1646 | @deftypefun size_t regerror (int @var{errcode}, const regex_t *restrict @var{compiled}, char *restrict @var{buffer}, size_t @var{length}) |
| 1647 | @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} |
| 1648 | @c regerror calls gettext, strcmp and mempcpy or memcpy. |
| 1649 | This function produces an error message string for the error code |
| 1650 | @var{errcode}, and stores the string in @var{length} bytes of memory |
| 1651 | starting at @var{buffer}. For the @var{compiled} argument, supply the |
| 1652 | same compiled regular expression structure that @code{regcomp} or |
| 1653 | @code{regexec} was working with when it got the error. Alternatively, |
| 1654 | you can supply @code{NULL} for @var{compiled}; you will still get a |
| 1655 | meaningful error message, but it might not be as detailed. |
| 1656 | |
| 1657 | If the error message can't fit in @var{length} bytes (including a |
| 1658 | terminating null character), then @code{regerror} truncates it. |
| 1659 | The string that @code{regerror} stores is always null-terminated |
| 1660 | even if it has been truncated. |
| 1661 | |
| 1662 | The return value of @code{regerror} is the minimum length needed to |
| 1663 | store the entire error message. If this is less than @var{length}, then |
| 1664 | the error message was not truncated, and you can use it. Otherwise, you |
| 1665 | should call @code{regerror} again with a larger buffer. |
| 1666 | |
| 1667 | Here is a function which uses @code{regerror}, but always dynamically |
| 1668 | allocates a buffer for the error message: |
| 1669 | |
| 1670 | @smallexample |
| 1671 | char *get_regerror (int errcode, regex_t *compiled) |
| 1672 | @{ |
| 1673 | size_t length = regerror (errcode, compiled, NULL, 0); |
| 1674 | char *buffer = xmalloc (length); |
| 1675 | (void) regerror (errcode, compiled, buffer, length); |
| 1676 | return buffer; |
| 1677 | @} |
| 1678 | @end smallexample |
| 1679 | @end deftypefun |
| 1680 | |
| 1681 | @node Word Expansion |
| 1682 | @section Shell-Style Word Expansion |
| 1683 | @cindex word expansion |
| 1684 | @cindex expansion of shell words |
| 1685 | |
| 1686 | @dfn{Word expansion} means the process of splitting a string into |
| 1687 | @dfn{words} and substituting for variables, commands, and wildcards |
| 1688 | just as the shell does. |
| 1689 | |
| 1690 | For example, when you write @samp{ls -l foo.c}, this string is split |
| 1691 | into three separate words---@samp{ls}, @samp{-l} and @samp{foo.c}. |
| 1692 | This is the most basic function of word expansion. |
| 1693 | |
| 1694 | When you write @samp{ls *.c}, this can become many words, because |
| 1695 | the word @samp{*.c} can be replaced with any number of file names. |
| 1696 | This is called @dfn{wildcard expansion}, and it is also a part of |
| 1697 | word expansion. |
| 1698 | |
| 1699 | When you use @samp{echo $PATH} to print your path, you are taking |
| 1700 | advantage of @dfn{variable substitution}, which is also part of word |
| 1701 | expansion. |
| 1702 | |
| 1703 | Ordinary programs can perform word expansion just like the shell by |
| 1704 | calling the library function @code{wordexp}. |
| 1705 | |
| 1706 | @menu |
| 1707 | * Expansion Stages:: What word expansion does to a string. |
| 1708 | * Calling Wordexp:: How to call @code{wordexp}. |
| 1709 | * Flags for Wordexp:: Options you can enable in @code{wordexp}. |
| 1710 | * Wordexp Example:: A sample program that does word expansion. |
| 1711 | * Tilde Expansion:: Details of how tilde expansion works. |
| 1712 | * Variable Substitution:: Different types of variable substitution. |
| 1713 | @end menu |
| 1714 | |
| 1715 | @node Expansion Stages |
| 1716 | @subsection The Stages of Word Expansion |
| 1717 | |
| 1718 | When word expansion is applied to a sequence of words, it performs the |
| 1719 | following transformations in the order shown here: |
| 1720 | |
| 1721 | @enumerate |
| 1722 | @item |
| 1723 | @cindex tilde expansion |
| 1724 | @dfn{Tilde expansion}: Replacement of @samp{~foo} with the name of |
| 1725 | the home directory of @samp{foo}. |
| 1726 | |
| 1727 | @item |
| 1728 | Next, three different transformations are applied in the same step, |
| 1729 | from left to right: |
| 1730 | |
| 1731 | @itemize @bullet |
| 1732 | @item |
| 1733 | @cindex variable substitution |
| 1734 | @cindex substitution of variables and commands |
| 1735 | @dfn{Variable substitution}: Environment variables are substituted for |
| 1736 | references such as @samp{$foo}. |
| 1737 | |
| 1738 | @item |
| 1739 | @cindex command substitution |
| 1740 | @dfn{Command substitution}: Constructs such as @w{@samp{`cat foo`}} and |
| 1741 | the equivalent @w{@samp{$(cat foo)}} are replaced with the output from |
| 1742 | the inner command. |
| 1743 | |
| 1744 | @item |
| 1745 | @cindex arithmetic expansion |
| 1746 | @dfn{Arithmetic expansion}: Constructs such as @samp{$(($x-1))} are |
| 1747 | replaced with the result of the arithmetic computation. |
| 1748 | @end itemize |
| 1749 | |
| 1750 | @item |
| 1751 | @cindex field splitting |
| 1752 | @dfn{Field splitting}: subdivision of the text into @dfn{words}. |
| 1753 | |
| 1754 | @item |
| 1755 | @cindex wildcard expansion |
| 1756 | @dfn{Wildcard expansion}: The replacement of a construct such as @samp{*.c} |
| 1757 | with a list of @samp{.c} file names. Wildcard expansion applies to an |
| 1758 | entire word at a time, and replaces that word with 0 or more file names |
| 1759 | that are themselves words. |
| 1760 | |
| 1761 | @item |
| 1762 | @cindex quote removal |
| 1763 | @cindex removal of quotes |
| 1764 | @dfn{Quote removal}: The deletion of string-quotes, now that they have |
| 1765 | done their job by inhibiting the above transformations when appropriate. |
| 1766 | @end enumerate |
| 1767 | |
| 1768 | For the details of these transformations, and how to write the constructs |
| 1769 | that use them, see @w{@cite{The BASH Manual}} (to appear). |
| 1770 | |
| 1771 | @node Calling Wordexp |
| 1772 | @subsection Calling @code{wordexp} |
| 1773 | |
| 1774 | All the functions, constants and data types for word expansion are |
| 1775 | declared in the header file @file{wordexp.h}. |
| 1776 | |
| 1777 | Word expansion produces a vector of words (strings). To return this |
| 1778 | vector, @code{wordexp} uses a special data type, @code{wordexp_t}, which |
| 1779 | is a structure. You pass @code{wordexp} the address of the structure, |
| 1780 | and it fills in the structure's fields to tell you about the results. |
| 1781 | |
| 1782 | @comment wordexp.h |
| 1783 | @comment POSIX.2 |
| 1784 | @deftp {Data Type} {wordexp_t} |
| 1785 | This data type holds a pointer to a word vector. More precisely, it |
| 1786 | records both the address of the word vector and its size. |
| 1787 | |
| 1788 | @table @code |
| 1789 | @item we_wordc |
| 1790 | The number of elements in the vector. |
| 1791 | |
| 1792 | @item we_wordv |
| 1793 | The address of the vector. This field has type @w{@code{char **}}. |
| 1794 | |
| 1795 | @item we_offs |
| 1796 | The offset of the first real element of the vector, from its nominal |
| 1797 | address in the @code{we_wordv} field. Unlike the other fields, this |
| 1798 | is always an input to @code{wordexp}, rather than an output from it. |
| 1799 | |
| 1800 | If you use a nonzero offset, then that many elements at the beginning of |
| 1801 | the vector are left empty. (The @code{wordexp} function fills them with |
| 1802 | null pointers.) |
| 1803 | |
| 1804 | The @code{we_offs} field is meaningful only if you use the |
| 1805 | @code{WRDE_DOOFFS} flag. Otherwise, the offset is always zero |
| 1806 | regardless of what is in this field, and the first real element comes at |
| 1807 | the beginning of the vector. |
| 1808 | @end table |
| 1809 | @end deftp |
| 1810 | |
| 1811 | @comment wordexp.h |
| 1812 | @comment POSIX.2 |
| 1813 | @deftypefun int wordexp (const char *@var{words}, wordexp_t *@var{word-vector-ptr}, int @var{flags}) |
| 1814 | @safety{@prelim{}@mtunsafe{@mtasurace{:utent} @mtasuconst{:@mtsenv{}} @mtsenv{} @mtascusig{:ALRM} @mtascutimer{} @mtslocale{}}@asunsafe{@ascudlopen{} @ascuplugin{} @ascuintl{} @ascuheap{} @asucorrupt{} @asulock{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} |
| 1815 | @c wordexp @mtasurace:utent @mtasuconst:@mtsenv @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuintl @ascuheap @asucorrupt @asulock @acucorrupt @aculock @acsfd @acsmem |
| 1816 | @c w_newword ok |
| 1817 | @c wordfree dup @asucorrupt @ascuheap @acucorrupt @acsmem |
| 1818 | @c calloc dup @ascuheap @acsmem |
| 1819 | @c getenv dup @mtsenv |
| 1820 | @c strcpy dup ok |
| 1821 | @c parse_backslash @ascuheap @acsmem |
| 1822 | @c w_addchar dup @ascuheap @acsmem |
| 1823 | @c parse_dollars @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem |
| 1824 | @c w_addchar dup @ascuheap @acsmem |
| 1825 | @c parse_arith @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem |
| 1826 | @c w_newword dup ok |
| 1827 | @c parse_dollars dup @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem |
| 1828 | @c parse_backtick dup @ascuplugin @ascuheap @aculock @acsfd @acsmem |
| 1829 | @c parse_qtd_backslash dup @ascuheap @acsmem |
| 1830 | @c eval_expr @mtslocale |
| 1831 | @c eval_expr_multidiv @mtslocale |
| 1832 | @c eval_expr_val @mtslocale |
| 1833 | @c isspace dup @mtslocale |
| 1834 | @c eval_expr dup @mtslocale |
| 1835 | @c isspace dup @mtslocale |
| 1836 | @c isspace dup @mtslocale |
| 1837 | @c free dup @ascuheap @acsmem |
| 1838 | @c w_addchar dup @ascuheap @acsmem |
| 1839 | @c w_addstr dup @ascuheap @acsmem |
| 1840 | @c itoa_word dup ok |
| 1841 | @c parse_comm @ascuplugin @ascuheap @aculock @acsfd @acsmem |
| 1842 | @c w_newword dup ok |
| 1843 | @c pthread_setcancelstate @ascuplugin @ascuheap @acsmem |
| 1844 | @c (disable cancellation around exec_comm; it may do_cancel the |
| 1845 | @c second time, if async cancel is enabled) |
| 1846 | @c THREAD_ATOMIC_CMPXCHG_VAL dup ok |
| 1847 | @c CANCEL_ENABLED_AND_CANCELED_AND_ASYNCHRONOUS dup ok |
| 1848 | @c do_cancel @ascuplugin @ascuheap @acsmem |
| 1849 | @c THREAD_ATOMIC_BIT_SET dup ok |
| 1850 | @c pthread_unwind @ascuplugin @ascuheap @acsmem |
| 1851 | @c Unwind_ForcedUnwind if available @ascuplugin @ascuheap @acsmem |
| 1852 | @c libc_unwind_longjmp otherwise |
| 1853 | @c cleanups |
| 1854 | @c exec_comm @ascuplugin @ascuheap @aculock @acsfd @acsmem |
| 1855 | @c pipe2 dup ok |
| 1856 | @c pipe dup ok |
| 1857 | @c fork dup @ascuplugin @aculock |
| 1858 | @c close dup @acsfd |
| 1859 | @c on child: exec_comm_child -> exec or abort |
| 1860 | @c waitpid dup ok |
| 1861 | @c read dup ok |
| 1862 | @c w_addmem dup @ascuheap @acsmem |
| 1863 | @c strchr dup ok |
| 1864 | @c w_addword dup @ascuheap @acsmem |
| 1865 | @c w_newword dup ok |
| 1866 | @c w_addchar dup @ascuheap @acsmem |
| 1867 | @c free dup @ascuheap @acsmem |
| 1868 | @c kill dup ok |
| 1869 | @c free dup @ascuheap @acsmem |
| 1870 | @c parse_param @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem |
| 1871 | @c reads from __libc_argc and __libc_argv without guards |
| 1872 | @c w_newword dup ok |
| 1873 | @c isalpha dup @mtslocale^^ |
| 1874 | @c w_addchar dup @ascuheap @acsmem |
| 1875 | @c isalnum dup @mtslocale^^ |
| 1876 | @c isdigit dup @mtslocale^^ |
| 1877 | @c strchr dup ok |
| 1878 | @c itoa_word dup ok |
| 1879 | @c atoi dup @mtslocale |
| 1880 | @c getpid dup ok |
| 1881 | @c w_addstr dup @ascuheap @acsmem |
| 1882 | @c free dup @ascuheap @acsmem |
| 1883 | @c strlen dup ok |
| 1884 | @c malloc dup @ascuheap @acsmem |
| 1885 | @c stpcpy dup ok |
| 1886 | @c w_addword dup @ascuheap @acsmem |
| 1887 | @c strdup dup @ascuheap @acsmem |
| 1888 | @c getenv dup @mtsenv |
| 1889 | @c parse_dollars dup @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem |
| 1890 | @c parse_tilde dup @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem |
| 1891 | @c fnmatch dup @mtsenv @mtslocale @ascuheap @acsmem |
| 1892 | @c mempcpy dup ok |
| 1893 | @c _ dup @ascuintl |
| 1894 | @c fxprintf dup @aculock |
| 1895 | @c setenv dup @mtasuconst:@mtsenv @ascuheap @asulock @acucorrupt @aculock @acsmem |
| 1896 | @c strspn dup ok |
| 1897 | @c strcspn dup ok |
| 1898 | @c parse_backtick @ascuplugin @ascuheap @aculock @acsfd @acsmem |
| 1899 | @c w_newword dup ok |
| 1900 | @c exec_comm dup @ascuplugin @ascuheap @aculock @acsfd @acsmem |
| 1901 | @c free dup @ascuheap @acsmem |
| 1902 | @c parse_qtd_backslash dup @ascuheap @acsmem |
| 1903 | @c parse_backslash dup @ascuheap @acsmem |
| 1904 | @c w_addchar dup @ascuheap @acsmem |
| 1905 | @c parse_dquote @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem |
| 1906 | @c parse_dollars dup @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem |
| 1907 | @c parse_backtick dup @ascuplugin @ascuheap @aculock @acsfd @acsmem |
| 1908 | @c parse_qtd_backslash dup @ascuheap @acsmem |
| 1909 | @c w_addchar dup @ascuheap @acsmem |
| 1910 | @c w_addword dup @ascuheap @acsmem |
| 1911 | @c strdup dup @ascuheap @acsmem |
| 1912 | @c realloc dup @ascuheap @acsmem |
| 1913 | @c free dup @ascuheap @acsmem |
| 1914 | @c parse_squote dup @ascuheap @acsmem |
| 1915 | @c w_addchar dup @ascuheap @acsmem |
| 1916 | @c parse_tilde @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem |
| 1917 | @c strchr dup ok |
| 1918 | @c w_addchar dup @ascuheap @acsmem |
| 1919 | @c getenv dup @mtsenv |
| 1920 | @c w_addstr dup @ascuheap @acsmem |
| 1921 | @c strlen dup ok |
| 1922 | @c w_addmem dup @ascuheap @acsmem |
| 1923 | @c realloc dup @ascuheap @acsmem |
| 1924 | @c free dup @ascuheap @acsmem |
| 1925 | @c mempcpy dup ok |
| 1926 | @c getuid dup ok |
| 1927 | @c getpwuid_r dup @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem |
| 1928 | @c getpwnam_r dup @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem |
| 1929 | @c parse_glob @mtasurace:utent @mtasuconst:@mtsenv @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem |
| 1930 | @c strchr dup ok |
| 1931 | @c parse_dollars dup @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem |
| 1932 | @c parse_qtd_backslash @ascuheap @acsmem |
| 1933 | @c w_addchar dup @ascuheap @acsmem |
| 1934 | @c parse_backslash dup @ascuheap @acsmem |
| 1935 | @c w_addchar dup @ascuheap @acsmem |
| 1936 | @c w_addword dup @ascuheap @acsmem |
| 1937 | @c w_newword dup ok |
| 1938 | @c do_parse_glob @mtasurace:utent @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @aculock @acsfd @acsmem |
| 1939 | @c glob dup @mtasurace:utent @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @aculock @acsfd @acsmem [auto glob_t avoids @asucorrupt @acucorrupt] |
| 1940 | @c w_addstr dup @ascuheap @acsmem |
| 1941 | @c w_addchar dup @ascuheap @acsmem |
| 1942 | @c globfree dup @ascuheap @acsmem [auto glob_t avoids @asucorrupt @acucorrupt] |
| 1943 | @c free dup @ascuheap @acsmem |
| 1944 | @c w_newword dup ok |
| 1945 | @c strdup dup @ascuheap @acsmem |
| 1946 | @c w_addword dup @ascuheap @acsmem |
| 1947 | @c wordfree dup @asucorrupt @ascuheap @acucorrupt @acsmem |
| 1948 | @c strchr dup ok |
| 1949 | @c w_addchar dup @ascuheap @acsmem |
| 1950 | @c realloc dup @ascuheap @acsmem |
| 1951 | @c free dup @ascuheap @acsmem |
| 1952 | @c free dup @ascuheap @acsmem |
| 1953 | Perform word expansion on the string @var{words}, putting the result in |
| 1954 | a newly allocated vector, and store the size and address of this vector |
| 1955 | into @code{*@var{word-vector-ptr}}. The argument @var{flags} is a |
| 1956 | combination of bit flags; see @ref{Flags for Wordexp}, for details of |
| 1957 | the flags. |
| 1958 | |
| 1959 | You shouldn't use any of the characters @samp{|&;<>} in the string |
| 1960 | @var{words} unless they are quoted; likewise for newline. If you use |
| 1961 | these characters unquoted, you will get the @code{WRDE_BADCHAR} error |
| 1962 | code. Don't use parentheses or braces unless they are quoted or part of |
| 1963 | a word expansion construct. If you use quotation characters @samp{'"`}, |
| 1964 | they should come in pairs that balance. |
| 1965 | |
| 1966 | The results of word expansion are a sequence of words. The function |
| 1967 | @code{wordexp} allocates a string for each resulting word, then |
| 1968 | allocates a vector of type @code{char **} to store the addresses of |
| 1969 | these strings. The last element of the vector is a null pointer. |
| 1970 | This vector is called the @dfn{word vector}. |
| 1971 | |
| 1972 | To return this vector, @code{wordexp} stores both its address and its |
| 1973 | length (number of elements, not counting the terminating null pointer) |
| 1974 | into @code{*@var{word-vector-ptr}}. |
| 1975 | |
| 1976 | If @code{wordexp} succeeds, it returns 0. Otherwise, it returns one |
| 1977 | of these error codes: |
| 1978 | |
| 1979 | @table @code |
| 1980 | @comment wordexp.h |
| 1981 | @comment POSIX.2 |
| 1982 | @item WRDE_BADCHAR |
| 1983 | The input string @var{words} contains an unquoted invalid character such |
| 1984 | as @samp{|}. |
| 1985 | |
| 1986 | @comment wordexp.h |
| 1987 | @comment POSIX.2 |
| 1988 | @item WRDE_BADVAL |
| 1989 | The input string refers to an undefined shell variable, and you used the flag |
| 1990 | @code{WRDE_UNDEF} to forbid such references. |
| 1991 | |
| 1992 | @comment wordexp.h |
| 1993 | @comment POSIX.2 |
| 1994 | @item WRDE_CMDSUB |
| 1995 | The input string uses command substitution, and you used the flag |
| 1996 | @code{WRDE_NOCMD} to forbid command substitution. |
| 1997 | |
| 1998 | @comment wordexp.h |
| 1999 | @comment POSIX.2 |
| 2000 | @item WRDE_NOSPACE |
| 2001 | It was impossible to allocate memory to hold the result. In this case, |
| 2002 | @code{wordexp} can store part of the results---as much as it could |
| 2003 | allocate room for. |
| 2004 | |
| 2005 | @comment wordexp.h |
| 2006 | @comment POSIX.2 |
| 2007 | @item WRDE_SYNTAX |
| 2008 | There was a syntax error in the input string. For example, an unmatched |
| 2009 | quoting character is a syntax error. This error code is also used to |
| 2010 | signal division by zero and overflow in arithmetic expansion. |
| 2011 | @end table |
| 2012 | @end deftypefun |
| 2013 | |
| 2014 | @comment wordexp.h |
| 2015 | @comment POSIX.2 |
| 2016 | @deftypefun void wordfree (wordexp_t *@var{word-vector-ptr}) |
| 2017 | @safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{} @ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}} |
| 2018 | @c wordfree dup @asucorrupt @ascuheap @acucorrupt @acsmem |
| 2019 | @c free dup @ascuheap @acsmem |
| 2020 | Free the storage used for the word-strings and vector that |
| 2021 | @code{*@var{word-vector-ptr}} points to. This does not free the |
| 2022 | structure @code{*@var{word-vector-ptr}} itself---only the other |
| 2023 | data it points to. |
| 2024 | @end deftypefun |
| 2025 | |
| 2026 | @node Flags for Wordexp |
| 2027 | @subsection Flags for Word Expansion |
| 2028 | |
| 2029 | This section describes the flags that you can specify in the |
| 2030 | @var{flags} argument to @code{wordexp}. Choose the flags you want, |
| 2031 | and combine them with the C operator @code{|}. |
| 2032 | |
| 2033 | @table @code |
| 2034 | @comment wordexp.h |
| 2035 | @comment POSIX.2 |
| 2036 | @item WRDE_APPEND |
| 2037 | Append the words from this expansion to the vector of words produced by |
| 2038 | previous calls to @code{wordexp}. This way you can effectively expand |
| 2039 | several words as if they were concatenated with spaces between them. |
| 2040 | |
| 2041 | In order for appending to work, you must not modify the contents of the |
| 2042 | word vector structure between calls to @code{wordexp}. And, if you set |
| 2043 | @code{WRDE_DOOFFS} in the first call to @code{wordexp}, you must also |
| 2044 | set it when you append to the results. |
| 2045 | |
| 2046 | @comment wordexp.h |
| 2047 | @comment POSIX.2 |
| 2048 | @item WRDE_DOOFFS |
| 2049 | Leave blank slots at the beginning of the vector of words. |
| 2050 | The @code{we_offs} field says how many slots to leave. |
| 2051 | The blank slots contain null pointers. |
| 2052 | |
| 2053 | @comment wordexp.h |
| 2054 | @comment POSIX.2 |
| 2055 | @item WRDE_NOCMD |
| 2056 | Don't do command substitution; if the input requests command substitution, |
| 2057 | report an error. |
| 2058 | |
| 2059 | @comment wordexp.h |
| 2060 | @comment POSIX.2 |
| 2061 | @item WRDE_REUSE |
| 2062 | Reuse a word vector made by a previous call to @code{wordexp}. |
| 2063 | Instead of allocating a new vector of words, this call to @code{wordexp} |
| 2064 | will use the vector that already exists (making it larger if necessary). |
| 2065 | |
| 2066 | Note that the vector may move, so it is not safe to save an old pointer |
| 2067 | and use it again after calling @code{wordexp}. You must fetch |
| 2068 | @code{we_pathv} anew after each call. |
| 2069 | |
| 2070 | @comment wordexp.h |
| 2071 | @comment POSIX.2 |
| 2072 | @item WRDE_SHOWERR |
| 2073 | Do show any error messages printed by commands run by command substitution. |
| 2074 | More precisely, allow these commands to inherit the standard error output |
| 2075 | stream of the current process. By default, @code{wordexp} gives these |
| 2076 | commands a standard error stream that discards all output. |
| 2077 | |
| 2078 | @comment wordexp.h |
| 2079 | @comment POSIX.2 |
| 2080 | @item WRDE_UNDEF |
| 2081 | If the input refers to a shell variable that is not defined, report an |
| 2082 | error. |
| 2083 | @end table |
| 2084 | |
| 2085 | @node Wordexp Example |
| 2086 | @subsection @code{wordexp} Example |
| 2087 | |
| 2088 | Here is an example of using @code{wordexp} to expand several strings |
| 2089 | and use the results to run a shell command. It also shows the use of |
| 2090 | @code{WRDE_APPEND} to concatenate the expansions and of @code{wordfree} |
| 2091 | to free the space allocated by @code{wordexp}. |
| 2092 | |
| 2093 | @smallexample |
| 2094 | int |
| 2095 | expand_and_execute (const char *program, const char **options) |
| 2096 | @{ |
| 2097 | wordexp_t result; |
| 2098 | pid_t pid |
| 2099 | int status, i; |
| 2100 | |
| 2101 | /* @r{Expand the string for the program to run.} */ |
| 2102 | switch (wordexp (program, &result, 0)) |
| 2103 | @{ |
| 2104 | case 0: /* @r{Successful}. */ |
| 2105 | break; |
| 2106 | case WRDE_NOSPACE: |
| 2107 | /* @r{If the error was @code{WRDE_NOSPACE},} |
| 2108 | @r{then perhaps part of the result was allocated.} */ |
| 2109 | wordfree (&result); |
| 2110 | default: /* @r{Some other error.} */ |
| 2111 | return -1; |
| 2112 | @} |
| 2113 | |
| 2114 | /* @r{Expand the strings specified for the arguments.} */ |
| 2115 | for (i = 0; options[i] != NULL; i++) |
| 2116 | @{ |
| 2117 | if (wordexp (options[i], &result, WRDE_APPEND)) |
| 2118 | @{ |
| 2119 | wordfree (&result); |
| 2120 | return -1; |
| 2121 | @} |
| 2122 | @} |
| 2123 | |
| 2124 | pid = fork (); |
| 2125 | if (pid == 0) |
| 2126 | @{ |
| 2127 | /* @r{This is the child process. Execute the command.} */ |
| 2128 | execv (result.we_wordv[0], result.we_wordv); |
| 2129 | exit (EXIT_FAILURE); |
| 2130 | @} |
| 2131 | else if (pid < 0) |
| 2132 | /* @r{The fork failed. Report failure.} */ |
| 2133 | status = -1; |
| 2134 | else |
| 2135 | /* @r{This is the parent process. Wait for the child to complete.} */ |
| 2136 | if (waitpid (pid, &status, 0) != pid) |
| 2137 | status = -1; |
| 2138 | |
| 2139 | wordfree (&result); |
| 2140 | return status; |
| 2141 | @} |
| 2142 | @end smallexample |
| 2143 | |
| 2144 | @node Tilde Expansion |
| 2145 | @subsection Details of Tilde Expansion |
| 2146 | |
| 2147 | It's a standard part of shell syntax that you can use @samp{~} at the |
| 2148 | beginning of a file name to stand for your own home directory. You |
| 2149 | can use @samp{~@var{user}} to stand for @var{user}'s home directory. |
| 2150 | |
| 2151 | @dfn{Tilde expansion} is the process of converting these abbreviations |
| 2152 | to the directory names that they stand for. |
| 2153 | |
| 2154 | Tilde expansion applies to the @samp{~} plus all following characters up |
| 2155 | to whitespace or a slash. It takes place only at the beginning of a |
| 2156 | word, and only if none of the characters to be transformed is quoted in |
| 2157 | any way. |
| 2158 | |
| 2159 | Plain @samp{~} uses the value of the environment variable @code{HOME} |
| 2160 | as the proper home directory name. @samp{~} followed by a user name |
| 2161 | uses @code{getpwname} to look up that user in the user database, and |
| 2162 | uses whatever directory is recorded there. Thus, @samp{~} followed |
| 2163 | by your own name can give different results from plain @samp{~}, if |
| 2164 | the value of @code{HOME} is not really your home directory. |
| 2165 | |
| 2166 | @node Variable Substitution |
| 2167 | @subsection Details of Variable Substitution |
| 2168 | |
| 2169 | Part of ordinary shell syntax is the use of @samp{$@var{variable}} to |
| 2170 | substitute the value of a shell variable into a command. This is called |
| 2171 | @dfn{variable substitution}, and it is one part of doing word expansion. |
| 2172 | |
| 2173 | There are two basic ways you can write a variable reference for |
| 2174 | substitution: |
| 2175 | |
| 2176 | @table @code |
| 2177 | @item $@{@var{variable}@} |
| 2178 | If you write braces around the variable name, then it is completely |
| 2179 | unambiguous where the variable name ends. You can concatenate |
| 2180 | additional letters onto the end of the variable value by writing them |
| 2181 | immediately after the close brace. For example, @samp{$@{foo@}s} |
| 2182 | expands into @samp{tractors}. |
| 2183 | |
| 2184 | @item $@var{variable} |
| 2185 | If you do not put braces around the variable name, then the variable |
| 2186 | name consists of all the alphanumeric characters and underscores that |
| 2187 | follow the @samp{$}. The next punctuation character ends the variable |
| 2188 | name. Thus, @samp{$foo-bar} refers to the variable @code{foo} and expands |
| 2189 | into @samp{tractor-bar}. |
| 2190 | @end table |
| 2191 | |
| 2192 | When you use braces, you can also use various constructs to modify the |
| 2193 | value that is substituted, or test it in various ways. |
| 2194 | |
| 2195 | @table @code |
| 2196 | @item $@{@var{variable}:-@var{default}@} |
| 2197 | Substitute the value of @var{variable}, but if that is empty or |
| 2198 | undefined, use @var{default} instead. |
| 2199 | |
| 2200 | @item $@{@var{variable}:=@var{default}@} |
| 2201 | Substitute the value of @var{variable}, but if that is empty or |
| 2202 | undefined, use @var{default} instead and set the variable to |
| 2203 | @var{default}. |
| 2204 | |
| 2205 | @item $@{@var{variable}:?@var{message}@} |
| 2206 | If @var{variable} is defined and not empty, substitute its value. |
| 2207 | |
| 2208 | Otherwise, print @var{message} as an error message on the standard error |
| 2209 | stream, and consider word expansion a failure. |
| 2210 | |
| 2211 | @c ??? How does wordexp report such an error? |
| 2212 | @c WRDE_BADVAL is returned. |
| 2213 | |
| 2214 | @item $@{@var{variable}:+@var{replacement}@} |
| 2215 | Substitute @var{replacement}, but only if @var{variable} is defined and |
| 2216 | nonempty. Otherwise, substitute nothing for this construct. |
| 2217 | @end table |
| 2218 | |
| 2219 | @table @code |
| 2220 | @item $@{#@var{variable}@} |
| 2221 | Substitute a numeral which expresses in base ten the number of |
| 2222 | characters in the value of @var{variable}. @samp{$@{#foo@}} stands for |
| 2223 | @samp{7}, because @samp{tractor} is seven characters. |
| 2224 | @end table |
| 2225 | |
| 2226 | These variants of variable substitution let you remove part of the |
| 2227 | variable's value before substituting it. The @var{prefix} and |
| 2228 | @var{suffix} are not mere strings; they are wildcard patterns, just |
| 2229 | like the patterns that you use to match multiple file names. But |
| 2230 | in this context, they match against parts of the variable value |
| 2231 | rather than against file names. |
| 2232 | |
| 2233 | @table @code |
| 2234 | @item $@{@var{variable}%%@var{suffix}@} |
| 2235 | Substitute the value of @var{variable}, but first discard from that |
| 2236 | variable any portion at the end that matches the pattern @var{suffix}. |
| 2237 | |
| 2238 | If there is more than one alternative for how to match against |
| 2239 | @var{suffix}, this construct uses the longest possible match. |
| 2240 | |
| 2241 | Thus, @samp{$@{foo%%r*@}} substitutes @samp{t}, because the largest |
| 2242 | match for @samp{r*} at the end of @samp{tractor} is @samp{ractor}. |
| 2243 | |
| 2244 | @item $@{@var{variable}%@var{suffix}@} |
| 2245 | Substitute the value of @var{variable}, but first discard from that |
| 2246 | variable any portion at the end that matches the pattern @var{suffix}. |
| 2247 | |
| 2248 | If there is more than one alternative for how to match against |
| 2249 | @var{suffix}, this construct uses the shortest possible alternative. |
| 2250 | |
| 2251 | Thus, @samp{$@{foo%r*@}} substitutes @samp{tracto}, because the shortest |
| 2252 | match for @samp{r*} at the end of @samp{tractor} is just @samp{r}. |
| 2253 | |
| 2254 | @item $@{@var{variable}##@var{prefix}@} |
| 2255 | Substitute the value of @var{variable}, but first discard from that |
| 2256 | variable any portion at the beginning that matches the pattern @var{prefix}. |
| 2257 | |
| 2258 | If there is more than one alternative for how to match against |
| 2259 | @var{prefix}, this construct uses the longest possible match. |
| 2260 | |
| 2261 | Thus, @samp{$@{foo##*t@}} substitutes @samp{or}, because the largest |
| 2262 | match for @samp{*t} at the beginning of @samp{tractor} is @samp{tract}. |
| 2263 | |
| 2264 | @item $@{@var{variable}#@var{prefix}@} |
| 2265 | Substitute the value of @var{variable}, but first discard from that |
| 2266 | variable any portion at the beginning that matches the pattern @var{prefix}. |
| 2267 | |
| 2268 | If there is more than one alternative for how to match against |
| 2269 | @var{prefix}, this construct uses the shortest possible alternative. |
| 2270 | |
| 2271 | Thus, @samp{$@{foo#*t@}} substitutes @samp{ractor}, because the shortest |
| 2272 | match for @samp{*t} at the beginning of @samp{tractor} is just @samp{t}. |
| 2273 | |
| 2274 | @end table |