| lh | 9ed821d | 2023-04-07 01:36:19 -0700 | [diff] [blame] | 1 | @node Message Translation, Searching and Sorting, Locales, Top | 
|  | 2 | @c %MENU% How to make the program speak the user's language | 
|  | 3 | @chapter Message Translation | 
|  | 4 |  | 
|  | 5 | The program's interface with the user should be designed to ease the user's | 
|  | 6 | task.  One way to ease the user's task is to use messages in whatever | 
|  | 7 | language the user prefers. | 
|  | 8 |  | 
|  | 9 | Printing messages in different languages can be implemented in different | 
|  | 10 | ways.  One could add all the different languages in the source code and | 
|  | 11 | choose among the variants every time a message has to be printed.  This is | 
|  | 12 | certainly not a good solution since extending the set of languages is | 
|  | 13 | cumbersome (the code must be changed) and the code itself can become | 
|  | 14 | really big with dozens of message sets. | 
|  | 15 |  | 
|  | 16 | A better solution is to keep the message sets for each language | 
|  | 17 | in separate files which are loaded at runtime depending on the language | 
|  | 18 | selection of the user. | 
|  | 19 |  | 
|  | 20 | @Theglibc{} provides two different sets of functions to support | 
|  | 21 | message translation.  The problem is that neither of the interfaces is | 
|  | 22 | officially defined by the POSIX standard.  The @code{catgets} family of | 
|  | 23 | functions is defined in the X/Open standard but this is derived from | 
|  | 24 | industry decisions and therefore not necessarily based on reasonable | 
|  | 25 | decisions. | 
|  | 26 |  | 
|  | 27 | As mentioned above the message catalog handling provides easy | 
|  | 28 | extendibility by using external data files which contain the message | 
|  | 29 | translations.  I.e., these files contain for each of the messages used | 
|  | 30 | in the program a translation for the appropriate language.  So the tasks | 
|  | 31 | of the message handling functions are | 
|  | 32 |  | 
|  | 33 | @itemize @bullet | 
|  | 34 | @item | 
|  | 35 | locate the external data file with the appropriate translations | 
|  | 36 | @item | 
|  | 37 | load the data and make it possible to address the messages | 
|  | 38 | @item | 
|  | 39 | map a given key to the translated message | 
|  | 40 | @end itemize | 
|  | 41 |  | 
|  | 42 | The two approaches mainly differ in the implementation of this last | 
|  | 43 | step.  Decisions made in the last step influence the rest of the design. | 
|  | 44 |  | 
|  | 45 | @menu | 
|  | 46 | * Message catalogs a la X/Open::  The @code{catgets} family of functions. | 
|  | 47 | * The Uniforum approach::         The @code{gettext} family of functions. | 
|  | 48 | @end menu | 
|  | 49 |  | 
|  | 50 |  | 
|  | 51 | @node Message catalogs a la X/Open | 
|  | 52 | @section X/Open Message Catalog Handling | 
|  | 53 |  | 
|  | 54 | The @code{catgets} functions are based on the simple scheme: | 
|  | 55 |  | 
|  | 56 | @quotation | 
|  | 57 | Associate every message to translate in the source code with a unique | 
|  | 58 | identifier.  To retrieve a message from a catalog file solely the | 
|  | 59 | identifier is used. | 
|  | 60 | @end quotation | 
|  | 61 |  | 
|  | 62 | This means for the author of the program that s/he will have to make | 
|  | 63 | sure the meaning of the identifier in the program code and in the | 
|  | 64 | message catalogs are always the same. | 
|  | 65 |  | 
|  | 66 | Before a message can be translated the catalog file must be located. | 
|  | 67 | The user of the program must be able to guide the responsible function | 
|  | 68 | to find whatever catalog the user wants.  This is separated from what | 
|  | 69 | the programmer had in mind. | 
|  | 70 |  | 
|  | 71 | All the types, constants and functions for the @code{catgets} functions | 
|  | 72 | are defined/declared in the @file{nl_types.h} header file. | 
|  | 73 |  | 
|  | 74 | @menu | 
|  | 75 | * The catgets Functions::      The @code{catgets} function family. | 
|  | 76 | * The message catalog files::  Format of the message catalog files. | 
|  | 77 | * The gencat program::         How to generate message catalogs files which | 
|  | 78 | can be used by the functions. | 
|  | 79 | * Common Usage::               How to use the @code{catgets} interface. | 
|  | 80 | @end menu | 
|  | 81 |  | 
|  | 82 |  | 
|  | 83 | @node The catgets Functions | 
|  | 84 | @subsection The @code{catgets} function family | 
|  | 85 |  | 
|  | 86 | @comment nl_types.h | 
|  | 87 | @comment X/Open | 
|  | 88 | @deftypefun nl_catd catopen (const char *@var{cat_name}, int @var{flag}) | 
|  | 89 | @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} | 
|  | 90 | @c catopen @mtsenv @ascuheap @acsmem | 
|  | 91 | @c  strchr ok | 
|  | 92 | @c  setlocale(,NULL) ok | 
|  | 93 | @c  getenv @mtsenv | 
|  | 94 | @c  strlen ok | 
|  | 95 | @c  alloca ok | 
|  | 96 | @c  stpcpy ok | 
|  | 97 | @c  malloc @ascuheap @acsmem | 
|  | 98 | @c  __open_catalog @ascuheap @acsmem | 
|  | 99 | @c   strchr ok | 
|  | 100 | @c   open_not_cancel_2 @acsfd | 
|  | 101 | @c   strlen ok | 
|  | 102 | @c   ENOUGH ok | 
|  | 103 | @c    alloca ok | 
|  | 104 | @c    memcpy ok | 
|  | 105 | @c   fxstat64 ok | 
|  | 106 | @c   __set_errno ok | 
|  | 107 | @c   mmap @acsmem | 
|  | 108 | @c   malloc dup @ascuheap @acsmem | 
|  | 109 | @c   read_not_cancel ok | 
|  | 110 | @c   free dup @ascuheap @acsmem | 
|  | 111 | @c   munmap ok | 
|  | 112 | @c   close_not_cancel_no_status ok | 
|  | 113 | @c  free @ascuheap @acsmem | 
|  | 114 | The @code{catopen} function tries to locate the message data file names | 
|  | 115 | @var{cat_name} and loads it when found.  The return value is of an | 
|  | 116 | opaque type and can be used in calls to the other functions to refer to | 
|  | 117 | this loaded catalog. | 
|  | 118 |  | 
|  | 119 | The return value is @code{(nl_catd) -1} in case the function failed and | 
|  | 120 | no catalog was loaded.  The global variable @var{errno} contains a code | 
|  | 121 | for the error causing the failure.  But even if the function call | 
|  | 122 | succeeded this does not mean that all messages can be translated. | 
|  | 123 |  | 
|  | 124 | Locating the catalog file must happen in a way which lets the user of | 
|  | 125 | the program influence the decision.  It is up to the user to decide | 
|  | 126 | about the language to use and sometimes it is useful to use alternate | 
|  | 127 | catalog files.  All this can be specified by the user by setting some | 
|  | 128 | environment variables. | 
|  | 129 |  | 
|  | 130 | The first problem is to find out where all the message catalogs are | 
|  | 131 | stored.  Every program could have its own place to keep all the | 
|  | 132 | different files but usually the catalog files are grouped by languages | 
|  | 133 | and the catalogs for all programs are kept in the same place. | 
|  | 134 |  | 
|  | 135 | @cindex NLSPATH environment variable | 
|  | 136 | To tell the @code{catopen} function where the catalog for the program | 
|  | 137 | can be found the user can set the environment variable @code{NLSPATH} to | 
|  | 138 | a value which describes her/his choice.  Since this value must be usable | 
|  | 139 | for different languages and locales it cannot be a simple string. | 
|  | 140 | Instead it is a format string (similar to @code{printf}'s).  An example | 
|  | 141 | is | 
|  | 142 |  | 
|  | 143 | @smallexample | 
|  | 144 | /usr/share/locale/%L/%N:/usr/share/locale/%L/LC_MESSAGES/%N | 
|  | 145 | @end smallexample | 
|  | 146 |  | 
|  | 147 | First one can see that more than one directory can be specified (with | 
|  | 148 | the usual syntax of separating them by colons).  The next things to | 
|  | 149 | observe are the format string, @code{%L} and @code{%N} in this case. | 
|  | 150 | The @code{catopen} function knows about several of them and the | 
|  | 151 | replacement for all of them is of course different. | 
|  | 152 |  | 
|  | 153 | @table @code | 
|  | 154 | @item %N | 
|  | 155 | This format element is substituted with the name of the catalog file. | 
|  | 156 | This is the value of the @var{cat_name} argument given to | 
|  | 157 | @code{catgets}. | 
|  | 158 |  | 
|  | 159 | @item %L | 
|  | 160 | This format element is substituted with the name of the currently | 
|  | 161 | selected locale for translating messages.  How this is determined is | 
|  | 162 | explained below. | 
|  | 163 |  | 
|  | 164 | @item %l | 
|  | 165 | (This is the lowercase ell.) This format element is substituted with the | 
|  | 166 | language element of the locale name.  The string describing the selected | 
|  | 167 | locale is expected to have the form | 
|  | 168 | @code{@var{lang}[_@var{terr}[.@var{codeset}]]} and this format uses the | 
|  | 169 | first part @var{lang}. | 
|  | 170 |  | 
|  | 171 | @item %t | 
|  | 172 | This format element is substituted by the territory part @var{terr} of | 
|  | 173 | the name of the currently selected locale.  See the explanation of the | 
|  | 174 | format above. | 
|  | 175 |  | 
|  | 176 | @item %c | 
|  | 177 | This format element is substituted by the codeset part @var{codeset} of | 
|  | 178 | the name of the currently selected locale.  See the explanation of the | 
|  | 179 | format above. | 
|  | 180 |  | 
|  | 181 | @item %% | 
|  | 182 | Since @code{%} is used in a meta character there must be a way to | 
|  | 183 | express the @code{%} character in the result itself.  Using @code{%%} | 
|  | 184 | does this just like it works for @code{printf}. | 
|  | 185 | @end table | 
|  | 186 |  | 
|  | 187 |  | 
|  | 188 | Using @code{NLSPATH} allows arbitrary directories to be searched for | 
|  | 189 | message catalogs while still allowing different languages to be used. | 
|  | 190 | If the @code{NLSPATH} environment variable is not set, the default value | 
|  | 191 | is | 
|  | 192 |  | 
|  | 193 | @smallexample | 
|  | 194 | @var{prefix}/share/locale/%L/%N:@var{prefix}/share/locale/%L/LC_MESSAGES/%N | 
|  | 195 | @end smallexample | 
|  | 196 |  | 
|  | 197 | @noindent | 
|  | 198 | where @var{prefix} is given to @code{configure} while installing @theglibc{} | 
|  | 199 | (this value is in many cases @code{/usr} or the empty string). | 
|  | 200 |  | 
|  | 201 | The remaining problem is to decide which must be used.  The value | 
|  | 202 | decides about the substitution of the format elements mentioned above. | 
|  | 203 | First of all the user can specify a path in the message catalog name | 
|  | 204 | (i.e., the name contains a slash character).  In this situation the | 
|  | 205 | @code{NLSPATH} environment variable is not used.  The catalog must exist | 
|  | 206 | as specified in the program, perhaps relative to the current working | 
|  | 207 | directory.  This situation in not desirable and catalogs names never | 
|  | 208 | should be written this way.  Beside this, this behavior is not portable | 
|  | 209 | to all other platforms providing the @code{catgets} interface. | 
|  | 210 |  | 
|  | 211 | @cindex LC_ALL environment variable | 
|  | 212 | @cindex LC_MESSAGES environment variable | 
|  | 213 | @cindex LANG environment variable | 
|  | 214 | Otherwise the values of environment variables from the standard | 
|  | 215 | environment are examined (@pxref{Standard Environment}).  Which | 
|  | 216 | variables are examined is decided by the @var{flag} parameter of | 
|  | 217 | @code{catopen}.  If the value is @code{NL_CAT_LOCALE} (which is defined | 
|  | 218 | in @file{nl_types.h}) then the @code{catopen} function use the name of | 
|  | 219 | the locale currently selected for the @code{LC_MESSAGES} category. | 
|  | 220 |  | 
|  | 221 | If @var{flag} is zero the @code{LANG} environment variable is examined. | 
|  | 222 | This is a left-over from the early days where the concept of the locales | 
|  | 223 | had not even reached the level of POSIX locales. | 
|  | 224 |  | 
|  | 225 | The environment variable and the locale name should have a value of the | 
|  | 226 | form @code{@var{lang}[_@var{terr}[.@var{codeset}]]} as explained above. | 
|  | 227 | If no environment variable is set the @code{"C"} locale is used which | 
|  | 228 | prevents any translation. | 
|  | 229 |  | 
|  | 230 | The return value of the function is in any case a valid string.  Either | 
|  | 231 | it is a translation from a message catalog or it is the same as the | 
|  | 232 | @var{string} parameter.  So a piece of code to decide whether a | 
|  | 233 | translation actually happened must look like this: | 
|  | 234 |  | 
|  | 235 | @smallexample | 
|  | 236 | @{ | 
|  | 237 | char *trans = catgets (desc, set, msg, input_string); | 
|  | 238 | if (trans == input_string) | 
|  | 239 | @{ | 
|  | 240 | /* Something went wrong.  */ | 
|  | 241 | @} | 
|  | 242 | @} | 
|  | 243 | @end smallexample | 
|  | 244 |  | 
|  | 245 | @noindent | 
|  | 246 | When an error occurred the global variable @var{errno} is set to | 
|  | 247 |  | 
|  | 248 | @table @var | 
|  | 249 | @item EBADF | 
|  | 250 | The catalog does not exist. | 
|  | 251 | @item ENOMSG | 
|  | 252 | The set/message tuple does not name an existing element in the | 
|  | 253 | message catalog. | 
|  | 254 | @end table | 
|  | 255 |  | 
|  | 256 | While it sometimes can be useful to test for errors programs normally | 
|  | 257 | will avoid any test.  If the translation is not available it is no big | 
|  | 258 | problem if the original, untranslated message is printed.  Either the | 
|  | 259 | user understands this as well or s/he will look for the reason why the | 
|  | 260 | messages are not translated. | 
|  | 261 | @end deftypefun | 
|  | 262 |  | 
|  | 263 | Please note that the currently selected locale does not depend on a call | 
|  | 264 | to the @code{setlocale} function.  It is not necessary that the locale | 
|  | 265 | data files for this locale exist and calling @code{setlocale} succeeds. | 
|  | 266 | The @code{catopen} function directly reads the values of the environment | 
|  | 267 | variables. | 
|  | 268 |  | 
|  | 269 |  | 
|  | 270 | @deftypefun {char *} catgets (nl_catd @var{catalog_desc}, int @var{set}, int @var{message}, const char *@var{string}) | 
|  | 271 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} | 
|  | 272 | The function @code{catgets} has to be used to access the massage catalog | 
|  | 273 | previously opened using the @code{catopen} function.  The | 
|  | 274 | @var{catalog_desc} parameter must be a value previously returned by | 
|  | 275 | @code{catopen}. | 
|  | 276 |  | 
|  | 277 | The next two parameters, @var{set} and @var{message}, reflect the | 
|  | 278 | internal organization of the message catalog files.  This will be | 
|  | 279 | explained in detail below.  For now it is interesting to know that a | 
|  | 280 | catalog can consists of several set and the messages in each thread are | 
|  | 281 | individually numbered using numbers.  Neither the set number nor the | 
|  | 282 | message number must be consecutive.  They can be arbitrarily chosen. | 
|  | 283 | But each message (unless equal to another one) must have its own unique | 
|  | 284 | pair of set and message number. | 
|  | 285 |  | 
|  | 286 | Since it is not guaranteed that the message catalog for the language | 
|  | 287 | selected by the user exists the last parameter @var{string} helps to | 
|  | 288 | handle this case gracefully.  If no matching string can be found | 
|  | 289 | @var{string} is returned.  This means for the programmer that | 
|  | 290 |  | 
|  | 291 | @itemize @bullet | 
|  | 292 | @item | 
|  | 293 | the @var{string} parameters should contain reasonable text (this also | 
|  | 294 | helps to understand the program seems otherwise there would be no hint | 
|  | 295 | on the string which is expected to be returned. | 
|  | 296 | @item | 
|  | 297 | all @var{string} arguments should be written in the same language. | 
|  | 298 | @end itemize | 
|  | 299 | @end deftypefun | 
|  | 300 |  | 
|  | 301 | It is somewhat uncomfortable to write a program using the @code{catgets} | 
|  | 302 | functions if no supporting functionality is available.  Since each | 
|  | 303 | set/message number tuple must be unique the programmer must keep lists | 
|  | 304 | of the messages at the same time the code is written.  And the work | 
|  | 305 | between several people working on the same project must be coordinated. | 
|  | 306 | We will see some how these problems can be relaxed a bit (@pxref{Common | 
|  | 307 | Usage}). | 
|  | 308 |  | 
|  | 309 | @deftypefun int catclose (nl_catd @var{catalog_desc}) | 
|  | 310 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}} | 
|  | 311 | @c catclose @ascuheap @acucorrupt @acsmem | 
|  | 312 | @c  __set_errno ok | 
|  | 313 | @c  munmap ok | 
|  | 314 | @c  free @ascuheap @acsmem | 
|  | 315 | The @code{catclose} function can be used to free the resources | 
|  | 316 | associated with a message catalog which previously was opened by a call | 
|  | 317 | to @code{catopen}.  If the resources can be successfully freed the | 
|  | 318 | function returns @code{0}.  Otherwise it return @code{@minus{}1} and the | 
|  | 319 | global variable @var{errno} is set.  Errors can occur if the catalog | 
|  | 320 | descriptor @var{catalog_desc} is not valid in which case @var{errno} is | 
|  | 321 | set to @code{EBADF}. | 
|  | 322 | @end deftypefun | 
|  | 323 |  | 
|  | 324 |  | 
|  | 325 | @node The message catalog files | 
|  | 326 | @subsection  Format of the message catalog files | 
|  | 327 |  | 
|  | 328 | The only reasonable way the translate all the messages of a function and | 
|  | 329 | store the result in a message catalog file which can be read by the | 
|  | 330 | @code{catopen} function is to write all the message text to the | 
|  | 331 | translator and let her/him translate them all.  I.e., we must have a | 
|  | 332 | file with entries which associate the set/message tuple with a specific | 
|  | 333 | translation.  This file format is specified in the X/Open standard and | 
|  | 334 | is as follows: | 
|  | 335 |  | 
|  | 336 | @itemize @bullet | 
|  | 337 | @item | 
|  | 338 | Lines containing only whitespace characters or empty lines are ignored. | 
|  | 339 |  | 
|  | 340 | @item | 
|  | 341 | Lines which contain as the first non-whitespace character a @code{$} | 
|  | 342 | followed by a whitespace character are comment and are also ignored. | 
|  | 343 |  | 
|  | 344 | @item | 
|  | 345 | If a line contains as the first non-whitespace characters the sequence | 
|  | 346 | @code{$set} followed by a whitespace character an additional argument | 
|  | 347 | is required to follow.  This argument can either be: | 
|  | 348 |  | 
|  | 349 | @itemize @minus | 
|  | 350 | @item | 
|  | 351 | a number.  In this case the value of this number determines the set | 
|  | 352 | to which the following messages are added. | 
|  | 353 |  | 
|  | 354 | @item | 
|  | 355 | an identifier consisting of alphanumeric characters plus the underscore | 
|  | 356 | character.  In this case the set get automatically a number assigned. | 
|  | 357 | This value is one added to the largest set number which so far appeared. | 
|  | 358 |  | 
|  | 359 | How to use the symbolic names is explained in section @ref{Common Usage}. | 
|  | 360 |  | 
|  | 361 | It is an error if a symbol name appears more than once.  All following | 
|  | 362 | messages are placed in a set with this number. | 
|  | 363 | @end itemize | 
|  | 364 |  | 
|  | 365 | @item | 
|  | 366 | If a line contains as the first non-whitespace characters the sequence | 
|  | 367 | @code{$delset} followed by a whitespace character an additional argument | 
|  | 368 | is required to follow.  This argument can either be: | 
|  | 369 |  | 
|  | 370 | @itemize @minus | 
|  | 371 | @item | 
|  | 372 | a number.  In this case the value of this number determines the set | 
|  | 373 | which will be deleted. | 
|  | 374 |  | 
|  | 375 | @item | 
|  | 376 | an identifier consisting of alphanumeric characters plus the underscore | 
|  | 377 | character.  This symbolic identifier must match a name for a set which | 
|  | 378 | previously was defined.  It is an error if the name is unknown. | 
|  | 379 | @end itemize | 
|  | 380 |  | 
|  | 381 | In both cases all messages in the specified set will be removed.  They | 
|  | 382 | will not appear in the output.  But if this set is later again selected | 
|  | 383 | with a @code{$set} command again messages could be added and these | 
|  | 384 | messages will appear in the output. | 
|  | 385 |  | 
|  | 386 | @item | 
|  | 387 | If a line contains after leading whitespaces the sequence | 
|  | 388 | @code{$quote}, the quoting character used for this input file is | 
|  | 389 | changed to the first non-whitespace character following the | 
|  | 390 | @code{$quote}.  If no non-whitespace character is present before the | 
|  | 391 | line ends quoting is disable. | 
|  | 392 |  | 
|  | 393 | By default no quoting character is used.  In this mode strings are | 
|  | 394 | terminated with the first unescaped line break.  If there is a | 
|  | 395 | @code{$quote} sequence present newline need not be escaped.  Instead a | 
|  | 396 | string is terminated with the first unescaped appearance of the quote | 
|  | 397 | character. | 
|  | 398 |  | 
|  | 399 | A common usage of this feature would be to set the quote character to | 
|  | 400 | @code{"}.  Then any appearance of the @code{"} in the strings must | 
|  | 401 | be escaped using the backslash (i.e., @code{\"} must be written). | 
|  | 402 |  | 
|  | 403 | @item | 
|  | 404 | Any other line must start with a number or an alphanumeric identifier | 
|  | 405 | (with the underscore character included).  The following characters | 
|  | 406 | (starting after the first whitespace character) will form the string | 
|  | 407 | which gets associated with the currently selected set and the message | 
|  | 408 | number represented by the number and identifier respectively. | 
|  | 409 |  | 
|  | 410 | If the start of the line is a number the message number is obvious.  It | 
|  | 411 | is an error if the same message number already appeared for this set. | 
|  | 412 |  | 
|  | 413 | If the leading token was an identifier the message number gets | 
|  | 414 | automatically assigned.  The value is the current maximum messages | 
|  | 415 | number for this set plus one.  It is an error if the identifier was | 
|  | 416 | already used for a message in this set.  It is OK to reuse the | 
|  | 417 | identifier for a message in another thread.  How to use the symbolic | 
|  | 418 | identifiers will be explained below (@pxref{Common Usage}).  There is | 
|  | 419 | one limitation with the identifier: it must not be @code{Set}.  The | 
|  | 420 | reason will be explained below. | 
|  | 421 |  | 
|  | 422 | The text of the messages can contain escape characters.  The usual bunch | 
|  | 423 | of characters known from the @w{ISO C} language are recognized | 
|  | 424 | (@code{\n}, @code{\t}, @code{\v}, @code{\b}, @code{\r}, @code{\f}, | 
|  | 425 | @code{\\}, and @code{\@var{nnn}}, where @var{nnn} is the octal coding of | 
|  | 426 | a character code). | 
|  | 427 | @end itemize | 
|  | 428 |  | 
|  | 429 | @strong{Important:} The handling of identifiers instead of numbers for | 
|  | 430 | the set and messages is a GNU extension.  Systems strictly following the | 
|  | 431 | X/Open specification do not have this feature.  An example for a message | 
|  | 432 | catalog file is this: | 
|  | 433 |  | 
|  | 434 | @smallexample | 
|  | 435 | $ This is a leading comment. | 
|  | 436 | $quote " | 
|  | 437 |  | 
|  | 438 | $set SetOne | 
|  | 439 | 1 Message with ID 1. | 
|  | 440 | two "   Message with ID \"two\", which gets the value 2 assigned" | 
|  | 441 |  | 
|  | 442 | $set SetTwo | 
|  | 443 | $ Since the last set got the number 1 assigned this set has number 2. | 
|  | 444 | 4000 "The numbers can be arbitrary, they need not start at one." | 
|  | 445 | @end smallexample | 
|  | 446 |  | 
|  | 447 | This small example shows various aspects: | 
|  | 448 | @itemize @bullet | 
|  | 449 | @item | 
|  | 450 | Lines 1 and 9 are comments since they start with @code{$} followed by | 
|  | 451 | a whitespace. | 
|  | 452 | @item | 
|  | 453 | The quoting character is set to @code{"}.  Otherwise the quotes in the | 
|  | 454 | message definition would have to be left away and in this case the | 
|  | 455 | message with the identifier @code{two} would loose its leading whitespace. | 
|  | 456 | @item | 
|  | 457 | Mixing numbered messages with message having symbolic names is no | 
|  | 458 | problem and the numbering happens automatically. | 
|  | 459 | @end itemize | 
|  | 460 |  | 
|  | 461 |  | 
|  | 462 | While this file format is pretty easy it is not the best possible for | 
|  | 463 | use in a running program.  The @code{catopen} function would have to | 
|  | 464 | parser the file and handle syntactic errors gracefully.  This is not so | 
|  | 465 | easy and the whole process is pretty slow.  Therefore the @code{catgets} | 
|  | 466 | functions expect the data in another more compact and ready-to-use file | 
|  | 467 | format.  There is a special program @code{gencat} which is explained in | 
|  | 468 | detail in the next section. | 
|  | 469 |  | 
|  | 470 | Files in this other format are not human readable.  To be easy to use by | 
|  | 471 | programs it is a binary file.  But the format is byte order independent | 
|  | 472 | so translation files can be shared by systems of arbitrary architecture | 
|  | 473 | (as long as they use @theglibc{}). | 
|  | 474 |  | 
|  | 475 | Details about the binary file format are not important to know since | 
|  | 476 | these files are always created by the @code{gencat} program.  The | 
|  | 477 | sources of @theglibc{} also provide the sources for the | 
|  | 478 | @code{gencat} program and so the interested reader can look through | 
|  | 479 | these source files to learn about the file format. | 
|  | 480 |  | 
|  | 481 |  | 
|  | 482 | @node The gencat program | 
|  | 483 | @subsection Generate Message Catalogs files | 
|  | 484 |  | 
|  | 485 | @cindex gencat | 
|  | 486 | The @code{gencat} program is specified in the X/Open standard and the | 
|  | 487 | GNU implementation follows this specification and so processes | 
|  | 488 | all correctly formed input files.  Additionally some extension are | 
|  | 489 | implemented which help to work in a more reasonable way with the | 
|  | 490 | @code{catgets} functions. | 
|  | 491 |  | 
|  | 492 | The @code{gencat} program can be invoked in two ways: | 
|  | 493 |  | 
|  | 494 | @example | 
|  | 495 | `gencat [@var{Option}]@dots{} [@var{Output-File} [@var{Input-File}]@dots{}]` | 
|  | 496 | @end example | 
|  | 497 |  | 
|  | 498 | This is the interface defined in the X/Open standard.  If no | 
|  | 499 | @var{Input-File} parameter is given input will be read from standard | 
|  | 500 | input.  Multiple input files will be read as if they are concatenated. | 
|  | 501 | If @var{Output-File} is also missing, the output will be written to | 
|  | 502 | standard output.  To provide the interface one is used to from other | 
|  | 503 | programs a second interface is provided. | 
|  | 504 |  | 
|  | 505 | @smallexample | 
|  | 506 | `gencat [@var{Option}]@dots{} -o @var{Output-File} [@var{Input-File}]@dots{}` | 
|  | 507 | @end smallexample | 
|  | 508 |  | 
|  | 509 | The option @samp{-o} is used to specify the output file and all file | 
|  | 510 | arguments are used as input files. | 
|  | 511 |  | 
|  | 512 | Beside this one can use @file{-} or @file{/dev/stdin} for | 
|  | 513 | @var{Input-File} to denote the standard input.  Corresponding one can | 
|  | 514 | use @file{-} and @file{/dev/stdout} for @var{Output-File} to denote | 
|  | 515 | standard output.  Using @file{-} as a file name is allowed in X/Open | 
|  | 516 | while using the device names is a GNU extension. | 
|  | 517 |  | 
|  | 518 | The @code{gencat} program works by concatenating all input files and | 
|  | 519 | then @strong{merge} the resulting collection of message sets with a | 
|  | 520 | possibly existing output file.  This is done by removing all messages | 
|  | 521 | with set/message number tuples matching any of the generated messages | 
|  | 522 | from the output file and then adding all the new messages.  To | 
|  | 523 | regenerate a catalog file while ignoring the old contents therefore | 
|  | 524 | requires to remove the output file if it exists.  If the output is | 
|  | 525 | written to standard output no merging takes place. | 
|  | 526 |  | 
|  | 527 | @noindent | 
|  | 528 | The following table shows the options understood by the @code{gencat} | 
|  | 529 | program.  The X/Open standard does not specify any option for the | 
|  | 530 | program so all of these are GNU extensions. | 
|  | 531 |  | 
|  | 532 | @table @samp | 
|  | 533 | @item -V | 
|  | 534 | @itemx --version | 
|  | 535 | Print the version information and exit. | 
|  | 536 | @item -h | 
|  | 537 | @itemx --help | 
|  | 538 | Print a usage message listing all available options, then exit successfully. | 
|  | 539 | @item --new | 
|  | 540 | Do never merge the new messages from the input files with the old content | 
|  | 541 | of the output files.  The old content of the output file is discarded. | 
|  | 542 | @item -H | 
|  | 543 | @itemx --header=name | 
|  | 544 | This option is used to emit the symbolic names given to sets and | 
|  | 545 | messages in the input files for use in the program.  Details about how | 
|  | 546 | to use this are given in the next section.  The @var{name} parameter to | 
|  | 547 | this option specifies the name of the output file.  It will contain a | 
|  | 548 | number of C preprocessor @code{#define}s to associate a name with a | 
|  | 549 | number. | 
|  | 550 |  | 
|  | 551 | Please note that the generated file only contains the symbols from the | 
|  | 552 | input files.  If the output is merged with the previous content of the | 
|  | 553 | output file the possibly existing symbols from the file(s) which | 
|  | 554 | generated the old output files are not in the generated header file. | 
|  | 555 | @end table | 
|  | 556 |  | 
|  | 557 |  | 
|  | 558 | @node Common Usage | 
|  | 559 | @subsection How to use the @code{catgets} interface | 
|  | 560 |  | 
|  | 561 | The @code{catgets} functions can be used in two different ways.  By | 
|  | 562 | following slavishly the X/Open specs and not relying on the extension | 
|  | 563 | and by using the GNU extensions.  We will take a look at the former | 
|  | 564 | method first to understand the benefits of extensions. | 
|  | 565 |  | 
|  | 566 | @subsubsection Not using symbolic names | 
|  | 567 |  | 
|  | 568 | Since the X/Open format of the message catalog files does not allow | 
|  | 569 | symbol names we have to work with numbers all the time.  When we start | 
|  | 570 | writing a program we have to replace all appearances of translatable | 
|  | 571 | strings with something like | 
|  | 572 |  | 
|  | 573 | @smallexample | 
|  | 574 | catgets (catdesc, set, msg, "string") | 
|  | 575 | @end smallexample | 
|  | 576 |  | 
|  | 577 | @noindent | 
|  | 578 | @var{catgets} is retrieved from a call to @code{catopen} which is | 
|  | 579 | normally done once at the program start.  The @code{"string"} is the | 
|  | 580 | string we want to translate.  The problems start with the set and | 
|  | 581 | message numbers. | 
|  | 582 |  | 
|  | 583 | In a bigger program several programmers usually work at the same time on | 
|  | 584 | the program and so coordinating the number allocation is crucial. | 
|  | 585 | Though no two different strings must be indexed by the same tuple of | 
|  | 586 | numbers it is highly desirable to reuse the numbers for equal strings | 
|  | 587 | with equal translations (please note that there might be strings which | 
|  | 588 | are equal in one language but have different translations due to | 
|  | 589 | difference contexts). | 
|  | 590 |  | 
|  | 591 | The allocation process can be relaxed a bit by different set numbers for | 
|  | 592 | different parts of the program.  So the number of developers who have to | 
|  | 593 | coordinate the allocation can be reduced.  But still lists must be keep | 
|  | 594 | track of the allocation and errors can easily happen.  These errors | 
|  | 595 | cannot be discovered by the compiler or the @code{catgets} functions. | 
|  | 596 | Only the user of the program might see wrong messages printed.  In the | 
|  | 597 | worst cases the messages are so irritating that they cannot be | 
|  | 598 | recognized as wrong.  Think about the translations for @code{"true"} and | 
|  | 599 | @code{"false"} being exchanged.  This could result in a disaster. | 
|  | 600 |  | 
|  | 601 |  | 
|  | 602 | @subsubsection Using symbolic names | 
|  | 603 |  | 
|  | 604 | The problems mentioned in the last section derive from the fact that: | 
|  | 605 |  | 
|  | 606 | @enumerate | 
|  | 607 | @item | 
|  | 608 | the numbers are allocated once and due to the possibly frequent use of | 
|  | 609 | them it is difficult to change a number later. | 
|  | 610 | @item | 
|  | 611 | the numbers do not allow to guess anything about the string and | 
|  | 612 | therefore collisions can easily happen. | 
|  | 613 | @end enumerate | 
|  | 614 |  | 
|  | 615 | By constantly using symbolic names and by providing a method which maps | 
|  | 616 | the string content to a symbolic name (however this will happen) one can | 
|  | 617 | prevent both problems above.  The cost of this is that the programmer | 
|  | 618 | has to write a complete message catalog file while s/he is writing the | 
|  | 619 | program itself. | 
|  | 620 |  | 
|  | 621 | This is necessary since the symbolic names must be mapped to numbers | 
|  | 622 | before the program sources can be compiled.  In the last section it was | 
|  | 623 | described how to generate a header containing the mapping of the names. | 
|  | 624 | E.g., for the example message file given in the last section we could | 
|  | 625 | call the @code{gencat} program as follow (assume @file{ex.msg} contains | 
|  | 626 | the sources). | 
|  | 627 |  | 
|  | 628 | @smallexample | 
|  | 629 | gencat -H ex.h -o ex.cat ex.msg | 
|  | 630 | @end smallexample | 
|  | 631 |  | 
|  | 632 | @noindent | 
|  | 633 | This generates a header file with the following content: | 
|  | 634 |  | 
|  | 635 | @smallexample | 
|  | 636 | #define SetTwoSet 0x2   /* ex.msg:8 */ | 
|  | 637 |  | 
|  | 638 | #define SetOneSet 0x1   /* ex.msg:4 */ | 
|  | 639 | #define SetOnetwo 0x2   /* ex.msg:6 */ | 
|  | 640 | @end smallexample | 
|  | 641 |  | 
|  | 642 | As can be seen the various symbols given in the source file are mangled | 
|  | 643 | to generate unique identifiers and these identifiers get numbers | 
|  | 644 | assigned.  Reading the source file and knowing about the rules will | 
|  | 645 | allow to predict the content of the header file (it is deterministic) | 
|  | 646 | but this is not necessary.  The @code{gencat} program can take care for | 
|  | 647 | everything.  All the programmer has to do is to put the generated header | 
|  | 648 | file in the dependency list of the source files of her/his project and | 
|  | 649 | to add a rules to regenerate the header of any of the input files | 
|  | 650 | change. | 
|  | 651 |  | 
|  | 652 | One word about the symbol mangling.  Every symbol consists of two parts: | 
|  | 653 | the name of the message set plus the name of the message or the special | 
|  | 654 | string @code{Set}.  So @code{SetOnetwo} means this macro can be used to | 
|  | 655 | access the translation with identifier @code{two} in the message set | 
|  | 656 | @code{SetOne}. | 
|  | 657 |  | 
|  | 658 | The other names denote the names of the message sets.  The special | 
|  | 659 | string @code{Set} is used in the place of the message identifier. | 
|  | 660 |  | 
|  | 661 | If in the code the second string of the set @code{SetOne} is used the C | 
|  | 662 | code should look like this: | 
|  | 663 |  | 
|  | 664 | @smallexample | 
|  | 665 | catgets (catdesc, SetOneSet, SetOnetwo, | 
|  | 666 | "   Message with ID \"two\", which gets the value 2 assigned") | 
|  | 667 | @end smallexample | 
|  | 668 |  | 
|  | 669 | Writing the function this way will allow to change the message number | 
|  | 670 | and even the set number without requiring any change in the C source | 
|  | 671 | code.  (The text of the string is normally not the same; this is only | 
|  | 672 | for this example.) | 
|  | 673 |  | 
|  | 674 |  | 
|  | 675 | @subsubsection How does to this allow to develop | 
|  | 676 |  | 
|  | 677 | To illustrate the usual way to work with the symbolic version numbers | 
|  | 678 | here is a little example.  Assume we want to write the very complex and | 
|  | 679 | famous greeting program.  We start by writing the code as usual: | 
|  | 680 |  | 
|  | 681 | @smallexample | 
|  | 682 | #include <stdio.h> | 
|  | 683 | int | 
|  | 684 | main (void) | 
|  | 685 | @{ | 
|  | 686 | printf ("Hello, world!\n"); | 
|  | 687 | return 0; | 
|  | 688 | @} | 
|  | 689 | @end smallexample | 
|  | 690 |  | 
|  | 691 | Now we want to internationalize the message and therefore replace the | 
|  | 692 | message with whatever the user wants. | 
|  | 693 |  | 
|  | 694 | @smallexample | 
|  | 695 | #include <nl_types.h> | 
|  | 696 | #include <stdio.h> | 
|  | 697 | #include "msgnrs.h" | 
|  | 698 | int | 
|  | 699 | main (void) | 
|  | 700 | @{ | 
|  | 701 | nl_catd catdesc = catopen ("hello.cat", NL_CAT_LOCALE); | 
|  | 702 | printf (catgets (catdesc, SetMainSet, SetMainHello, | 
|  | 703 | "Hello, world!\n")); | 
|  | 704 | catclose (catdesc); | 
|  | 705 | return 0; | 
|  | 706 | @} | 
|  | 707 | @end smallexample | 
|  | 708 |  | 
|  | 709 | We see how the catalog object is opened and the returned descriptor used | 
|  | 710 | in the other function calls.  It is not really necessary to check for | 
|  | 711 | failure of any of the functions since even in these situations the | 
|  | 712 | functions will behave reasonable.  They simply will be return a | 
|  | 713 | translation. | 
|  | 714 |  | 
|  | 715 | What remains unspecified here are the constants @code{SetMainSet} and | 
|  | 716 | @code{SetMainHello}.  These are the symbolic names describing the | 
|  | 717 | message.  To get the actual definitions which match the information in | 
|  | 718 | the catalog file we have to create the message catalog source file and | 
|  | 719 | process it using the @code{gencat} program. | 
|  | 720 |  | 
|  | 721 | @smallexample | 
|  | 722 | $ Messages for the famous greeting program. | 
|  | 723 | $quote " | 
|  | 724 |  | 
|  | 725 | $set Main | 
|  | 726 | Hello "Hallo, Welt!\n" | 
|  | 727 | @end smallexample | 
|  | 728 |  | 
|  | 729 | Now we can start building the program (assume the message catalog source | 
|  | 730 | file is named @file{hello.msg} and the program source file @file{hello.c}): | 
|  | 731 |  | 
|  | 732 | @smallexample | 
|  | 733 | % gencat -H msgnrs.h -o hello.cat hello.msg | 
|  | 734 | % cat msgnrs.h | 
|  | 735 | #define MainSet 0x1     /* hello.msg:4 */ | 
|  | 736 | #define MainHello 0x1   /* hello.msg:5 */ | 
|  | 737 | % gcc -o hello hello.c -I. | 
|  | 738 | % cp hello.cat /usr/share/locale/de/LC_MESSAGES | 
|  | 739 | % echo $LC_ALL | 
|  | 740 | de | 
|  | 741 | % ./hello | 
|  | 742 | Hallo, Welt! | 
|  | 743 | % | 
|  | 744 | @end smallexample | 
|  | 745 |  | 
|  | 746 | The call of the @code{gencat} program creates the missing header file | 
|  | 747 | @file{msgnrs.h} as well as the message catalog binary.  The former is | 
|  | 748 | used in the compilation of @file{hello.c} while the later is placed in a | 
|  | 749 | directory in which the @code{catopen} function will try to locate it. | 
|  | 750 | Please check the @code{LC_ALL} environment variable and the default path | 
|  | 751 | for @code{catopen} presented in the description above. | 
|  | 752 |  | 
|  | 753 |  | 
|  | 754 | @node The Uniforum approach | 
|  | 755 | @section The Uniforum approach to Message Translation | 
|  | 756 |  | 
|  | 757 | Sun Microsystems tried to standardize a different approach to message | 
|  | 758 | translation in the Uniforum group.  There never was a real standard | 
|  | 759 | defined but still the interface was used in Sun's operating systems. | 
|  | 760 | Since this approach fits better in the development process of free | 
|  | 761 | software it is also used throughout the GNU project and the GNU | 
|  | 762 | @file{gettext} package provides support for this outside @theglibc{}. | 
|  | 763 |  | 
|  | 764 | The code of the @file{libintl} from GNU @file{gettext} is the same as | 
|  | 765 | the code in @theglibc{}.  So the documentation in the GNU | 
|  | 766 | @file{gettext} manual is also valid for the functionality here.  The | 
|  | 767 | following text will describe the library functions in detail.  But the | 
|  | 768 | numerous helper programs are not described in this manual.  Instead | 
|  | 769 | people should read the GNU @file{gettext} manual | 
|  | 770 | (@pxref{Top,,GNU gettext utilities,gettext,Native Language Support Library and Tools}). | 
|  | 771 | We will only give a short overview. | 
|  | 772 |  | 
|  | 773 | Though the @code{catgets} functions are available by default on more | 
|  | 774 | systems the @code{gettext} interface is at least as portable as the | 
|  | 775 | former.  The GNU @file{gettext} package can be used wherever the | 
|  | 776 | functions are not available. | 
|  | 777 |  | 
|  | 778 |  | 
|  | 779 | @menu | 
|  | 780 | * Message catalogs with gettext::  The @code{gettext} family of functions. | 
|  | 781 | * Helper programs for gettext::    Programs to handle message catalogs | 
|  | 782 | for @code{gettext}. | 
|  | 783 | @end menu | 
|  | 784 |  | 
|  | 785 |  | 
|  | 786 | @node Message catalogs with gettext | 
|  | 787 | @subsection The @code{gettext} family of functions | 
|  | 788 |  | 
|  | 789 | The paradigms underlying the @code{gettext} approach to message | 
|  | 790 | translations is different from that of the @code{catgets} functions the | 
|  | 791 | basic functionally is equivalent.  There are functions of the following | 
|  | 792 | categories: | 
|  | 793 |  | 
|  | 794 | @menu | 
|  | 795 | * Translation with gettext::       What has to be done to translate a message. | 
|  | 796 | * Locating gettext catalog::       How to determine which catalog to be used. | 
|  | 797 | * Advanced gettext functions::     Additional functions for more complicated | 
|  | 798 | situations. | 
|  | 799 | * Charset conversion in gettext::  How to specify the output character set | 
|  | 800 | @code{gettext} uses. | 
|  | 801 | * GUI program problems::           How to use @code{gettext} in GUI programs. | 
|  | 802 | * Using gettextized software::     The possibilities of the user to influence | 
|  | 803 | the way @code{gettext} works. | 
|  | 804 | @end menu | 
|  | 805 |  | 
|  | 806 | @node Translation with gettext | 
|  | 807 | @subsubsection What has to be done to translate a message? | 
|  | 808 |  | 
|  | 809 | The @code{gettext} functions have a very simple interface.  The most | 
|  | 810 | basic function just takes the string which shall be translated as the | 
|  | 811 | argument and it returns the translation.  This is fundamentally | 
|  | 812 | different from the @code{catgets} approach where an extra key is | 
|  | 813 | necessary and the original string is only used for the error case. | 
|  | 814 |  | 
|  | 815 | If the string which has to be translated is the only argument this of | 
|  | 816 | course means the string itself is the key.  I.e., the translation will | 
|  | 817 | be selected based on the original string.  The message catalogs must | 
|  | 818 | therefore contain the original strings plus one translation for any such | 
|  | 819 | string.  The task of the @code{gettext} function is it to compare the | 
|  | 820 | argument string with the available strings in the catalog and return the | 
|  | 821 | appropriate translation.  Of course this process is optimized so that | 
|  | 822 | this process is not more expensive than an access using an atomic key | 
|  | 823 | like in @code{catgets}. | 
|  | 824 |  | 
|  | 825 | The @code{gettext} approach has some advantages but also some | 
|  | 826 | disadvantages.  Please see the GNU @file{gettext} manual for a detailed | 
|  | 827 | discussion of the pros and cons. | 
|  | 828 |  | 
|  | 829 | All the definitions and declarations for @code{gettext} can be found in | 
|  | 830 | the @file{libintl.h} header file.  On systems where these functions are | 
|  | 831 | not part of the C library they can be found in a separate library named | 
|  | 832 | @file{libintl.a} (or accordingly different for shared libraries). | 
|  | 833 |  | 
|  | 834 | @comment libintl.h | 
|  | 835 | @comment GNU | 
|  | 836 | @deftypefun {char *} gettext (const char *@var{msgid}) | 
|  | 837 | @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} | 
|  | 838 | @c Wrapper for dcgettext. | 
|  | 839 | The @code{gettext} function searches the currently selected message | 
|  | 840 | catalogs for a string which is equal to @var{msgid}.  If there is such a | 
|  | 841 | string available it is returned.  Otherwise the argument string | 
|  | 842 | @var{msgid} is returned. | 
|  | 843 |  | 
|  | 844 | Please note that although the return value is @code{char *} the | 
|  | 845 | returned string must not be changed.  This broken type results from the | 
|  | 846 | history of the function and does not reflect the way the function should | 
|  | 847 | be used. | 
|  | 848 |  | 
|  | 849 | Please note that above we wrote ``message catalogs'' (plural).  This is | 
|  | 850 | a specialty of the GNU implementation of these functions and we will | 
|  | 851 | say more about this when we talk about the ways message catalogs are | 
|  | 852 | selected (@pxref{Locating gettext catalog}). | 
|  | 853 |  | 
|  | 854 | The @code{gettext} function does not modify the value of the global | 
|  | 855 | @var{errno} variable.  This is necessary to make it possible to write | 
|  | 856 | something like | 
|  | 857 |  | 
|  | 858 | @smallexample | 
|  | 859 | printf (gettext ("Operation failed: %m\n")); | 
|  | 860 | @end smallexample | 
|  | 861 |  | 
|  | 862 | Here the @var{errno} value is used in the @code{printf} function while | 
|  | 863 | processing the @code{%m} format element and if the @code{gettext} | 
|  | 864 | function would change this value (it is called before @code{printf} is | 
|  | 865 | called) we would get a wrong message. | 
|  | 866 |  | 
|  | 867 | So there is no easy way to detect a missing message catalog beside | 
|  | 868 | comparing the argument string with the result.  But it is normally the | 
|  | 869 | task of the user to react on missing catalogs.  The program cannot guess | 
|  | 870 | when a message catalog is really necessary since for a user who speaks | 
|  | 871 | the language the program was developed in does not need any translation. | 
|  | 872 | @end deftypefun | 
|  | 873 |  | 
|  | 874 | The remaining two functions to access the message catalog add some | 
|  | 875 | functionality to select a message catalog which is not the default one. | 
|  | 876 | This is important if parts of the program are developed independently. | 
|  | 877 | Every part can have its own message catalog and all of them can be used | 
|  | 878 | at the same time.  The C library itself is an example: internally it | 
|  | 879 | uses the @code{gettext} functions but since it must not depend on a | 
|  | 880 | currently selected default message catalog it must specify all ambiguous | 
|  | 881 | information. | 
|  | 882 |  | 
|  | 883 | @comment libintl.h | 
|  | 884 | @comment GNU | 
|  | 885 | @deftypefun {char *} dgettext (const char *@var{domainname}, const char *@var{msgid}) | 
|  | 886 | @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} | 
|  | 887 | @c Wrapper for dcgettext. | 
|  | 888 | The @code{dgettext} functions acts just like the @code{gettext} | 
|  | 889 | function.  It only takes an additional first argument @var{domainname} | 
|  | 890 | which guides the selection of the message catalogs which are searched | 
|  | 891 | for the translation.  If the @var{domainname} parameter is the null | 
|  | 892 | pointer the @code{dgettext} function is exactly equivalent to | 
|  | 893 | @code{gettext} since the default value for the domain name is used. | 
|  | 894 |  | 
|  | 895 | As for @code{gettext} the return value type is @code{char *} which is an | 
|  | 896 | anachronism.  The returned string must never be modified. | 
|  | 897 | @end deftypefun | 
|  | 898 |  | 
|  | 899 | @comment libintl.h | 
|  | 900 | @comment GNU | 
|  | 901 | @deftypefun {char *} dcgettext (const char *@var{domainname}, const char *@var{msgid}, int @var{category}) | 
|  | 902 | @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} | 
|  | 903 | @c dcgettext @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem | 
|  | 904 | @c  dcigettext @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem | 
|  | 905 | @c   libc_rwlock_rdlock @asulock @aculock | 
|  | 906 | @c   current_locale_name ok [protected from @mtslocale] | 
|  | 907 | @c   tfind ok | 
|  | 908 | @c   libc_rwlock_unlock ok | 
|  | 909 | @c   plural_lookup ok | 
|  | 910 | @c    plural_eval ok | 
|  | 911 | @c    rawmemchr ok | 
|  | 912 | @c   DETERMINE_SECURE ok, nothing | 
|  | 913 | @c   strcmp ok | 
|  | 914 | @c   strlen ok | 
|  | 915 | @c   getcwd @ascuheap @acsmem @acsfd | 
|  | 916 | @c   strchr ok | 
|  | 917 | @c   stpcpy ok | 
|  | 918 | @c   category_to_name ok | 
|  | 919 | @c   guess_category_value @mtsenv | 
|  | 920 | @c    getenv @mtsenv | 
|  | 921 | @c    current_locale_name dup ok [protected from @mtslocale by dcigettext] | 
|  | 922 | @c    strcmp ok | 
|  | 923 | @c   ENABLE_SECURE ok | 
|  | 924 | @c   _nl_find_domain @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem | 
|  | 925 | @c    libc_rwlock_rdlock dup @asulock @aculock | 
|  | 926 | @c    _nl_make_l10nflist dup @ascuheap @acsmem | 
|  | 927 | @c    libc_rwlock_unlock dup ok | 
|  | 928 | @c    _nl_load_domain @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem | 
|  | 929 | @c     libc_lock_lock_recursive @aculock | 
|  | 930 | @c     libc_lock_unlock_recursive @aculock | 
|  | 931 | @c     open->open_not_cancel_2 @acsfd | 
|  | 932 | @c     fstat ok | 
|  | 933 | @c     mmap dup @acsmem | 
|  | 934 | @c     close->close_not_cancel_no_status @acsfd | 
|  | 935 | @c     malloc dup @ascuheap @acsmem | 
|  | 936 | @c     read->read_not_cancel ok | 
|  | 937 | @c     munmap dup @acsmem | 
|  | 938 | @c     W dup ok | 
|  | 939 | @c     strlen dup ok | 
|  | 940 | @c     get_sysdep_segment_value ok | 
|  | 941 | @c     memcpy dup ok | 
|  | 942 | @c     hash_string dup ok | 
|  | 943 | @c     free dup @ascuheap @acsmem | 
|  | 944 | @c     libc_rwlock_init ok | 
|  | 945 | @c     _nl_find_msg dup @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem | 
|  | 946 | @c     libc_rwlock_fini ok | 
|  | 947 | @c     EXTRACT_PLURAL_EXPRESSION @ascuheap @acsmem | 
|  | 948 | @c      strstr dup ok | 
|  | 949 | @c      isspace ok | 
|  | 950 | @c      strtoul ok | 
|  | 951 | @c      PLURAL_PARSE @ascuheap @acsmem | 
|  | 952 | @c       malloc dup @ascuheap @acsmem | 
|  | 953 | @c       free dup @ascuheap @acsmem | 
|  | 954 | @c      INIT_GERMANIC_PLURAL ok, nothing | 
|  | 955 | @c        the pre-C99 variant is @acucorrupt [protected from @mtuinit by dcigettext] | 
|  | 956 | @c    _nl_expand_alias dup @ascuheap @asulock @acsmem @acsfd @aculock | 
|  | 957 | @c    _nl_explode_name dup @ascuheap @acsmem | 
|  | 958 | @c    libc_rwlock_wrlock dup @asulock @aculock | 
|  | 959 | @c    free dup @asulock @aculock @acsfd @acsmem | 
|  | 960 | @c   _nl_find_msg @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem | 
|  | 961 | @c    _nl_load_domain dup @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem | 
|  | 962 | @c    strlen ok | 
|  | 963 | @c    hash_string ok | 
|  | 964 | @c    W ok | 
|  | 965 | @c     SWAP ok | 
|  | 966 | @c      bswap_32 ok | 
|  | 967 | @c    strcmp ok | 
|  | 968 | @c    get_output_charset @mtsenv @ascuheap @acsmem | 
|  | 969 | @c     getenv dup @mtsenv | 
|  | 970 | @c     strlen dup ok | 
|  | 971 | @c     malloc dup @ascuheap @acsmem | 
|  | 972 | @c     memcpy dup ok | 
|  | 973 | @c    libc_rwlock_rdlock dup @asulock @aculock | 
|  | 974 | @c    libc_rwlock_unlock dup ok | 
|  | 975 | @c    libc_rwlock_wrlock dup @asulock @aculock | 
|  | 976 | @c    realloc @ascuheap @acsmem | 
|  | 977 | @c    strdup @ascuheap @acsmem | 
|  | 978 | @c    strstr ok | 
|  | 979 | @c    strcspn ok | 
|  | 980 | @c    mempcpy dup ok | 
|  | 981 | @c    norm_add_slashes dup ok | 
|  | 982 | @c    gconv_open @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | 
|  | 983 | @c     [protected from @mtslocale by dcigettext locale lock] | 
|  | 984 | @c    free dup @ascuheap @acsmem | 
|  | 985 | @c    libc_lock_lock @asulock @aculock | 
|  | 986 | @c    calloc @ascuheap @acsmem | 
|  | 987 | @c    gconv dup @acucorrupt [protected from @mtsrace and @asucorrupt by lock] | 
|  | 988 | @c    libc_lock_unlock ok | 
|  | 989 | @c   malloc @ascuheap @acsmem | 
|  | 990 | @c   mempcpy ok | 
|  | 991 | @c   memcpy ok | 
|  | 992 | @c   strcpy ok | 
|  | 993 | @c   libc_rwlock_wrlock @asulock @aculock | 
|  | 994 | @c   tsearch @ascuheap @acucorrupt @acsmem [protected from @mtsrace and @asucorrupt] | 
|  | 995 | @c    transcmp ok | 
|  | 996 | @c     strmp dup ok | 
|  | 997 | @c   free @ascuheap @acsmem | 
|  | 998 | The @code{dcgettext} adds another argument to those which | 
|  | 999 | @code{dgettext} takes.  This argument @var{category} specifies the last | 
|  | 1000 | piece of information needed to localize the message catalog.  I.e., the | 
|  | 1001 | domain name and the locale category exactly specify which message | 
|  | 1002 | catalog has to be used (relative to a given directory, see below). | 
|  | 1003 |  | 
|  | 1004 | The @code{dgettext} function can be expressed in terms of | 
|  | 1005 | @code{dcgettext} by using | 
|  | 1006 |  | 
|  | 1007 | @smallexample | 
|  | 1008 | dcgettext (domain, string, LC_MESSAGES) | 
|  | 1009 | @end smallexample | 
|  | 1010 |  | 
|  | 1011 | @noindent | 
|  | 1012 | instead of | 
|  | 1013 |  | 
|  | 1014 | @smallexample | 
|  | 1015 | dgettext (domain, string) | 
|  | 1016 | @end smallexample | 
|  | 1017 |  | 
|  | 1018 | This also shows which values are expected for the third parameter.  One | 
|  | 1019 | has to use the available selectors for the categories available in | 
|  | 1020 | @file{locale.h}.  Normally the available values are @code{LC_CTYPE}, | 
|  | 1021 | @code{LC_COLLATE}, @code{LC_MESSAGES}, @code{LC_MONETARY}, | 
|  | 1022 | @code{LC_NUMERIC}, and @code{LC_TIME}.  Please note that @code{LC_ALL} | 
|  | 1023 | must not be used and even though the names might suggest this, there is | 
|  | 1024 | no relation to the environments variables of this name. | 
|  | 1025 |  | 
|  | 1026 | The @code{dcgettext} function is only implemented for compatibility with | 
|  | 1027 | other systems which have @code{gettext} functions.  There is not really | 
|  | 1028 | any situation where it is necessary (or useful) to use a different value | 
|  | 1029 | but @code{LC_MESSAGES} in for the @var{category} parameter.  We are | 
|  | 1030 | dealing with messages here and any other choice can only be irritating. | 
|  | 1031 |  | 
|  | 1032 | As for @code{gettext} the return value type is @code{char *} which is an | 
|  | 1033 | anachronism.  The returned string must never be modified. | 
|  | 1034 | @end deftypefun | 
|  | 1035 |  | 
|  | 1036 | When using the three functions above in a program it is a frequent case | 
|  | 1037 | that the @var{msgid} argument is a constant string.  So it is worth to | 
|  | 1038 | optimize this case.  Thinking shortly about this one will realize that | 
|  | 1039 | as long as no new message catalog is loaded the translation of a message | 
|  | 1040 | will not change.  This optimization is actually implemented by the | 
|  | 1041 | @code{gettext}, @code{dgettext} and @code{dcgettext} functions. | 
|  | 1042 |  | 
|  | 1043 |  | 
|  | 1044 | @node Locating gettext catalog | 
|  | 1045 | @subsubsection How to determine which catalog to be used | 
|  | 1046 |  | 
|  | 1047 | The functions to retrieve the translations for a given message have a | 
|  | 1048 | remarkable simple interface.  But to provide the user of the program | 
|  | 1049 | still the opportunity to select exactly the translation s/he wants and | 
|  | 1050 | also to provide the programmer the possibility to influence the way to | 
|  | 1051 | locate the search for catalogs files there is a quite complicated | 
|  | 1052 | underlying mechanism which controls all this.  The code is complicated | 
|  | 1053 | the use is easy. | 
|  | 1054 |  | 
|  | 1055 | Basically we have two different tasks to perform which can also be | 
|  | 1056 | performed by the @code{catgets} functions: | 
|  | 1057 |  | 
|  | 1058 | @enumerate | 
|  | 1059 | @item | 
|  | 1060 | Locate the set of message catalogs.  There are a number of files for | 
|  | 1061 | different languages and which all belong to the package.  Usually they | 
|  | 1062 | are all stored in the filesystem below a certain directory. | 
|  | 1063 |  | 
|  | 1064 | There can be arbitrary many packages installed and they can follow | 
|  | 1065 | different guidelines for the placement of their files. | 
|  | 1066 |  | 
|  | 1067 | @item | 
|  | 1068 | Relative to the location specified by the package the actual translation | 
|  | 1069 | files must be searched, based on the wishes of the user.  I.e., for each | 
|  | 1070 | language the user selects the program should be able to locate the | 
|  | 1071 | appropriate file. | 
|  | 1072 | @end enumerate | 
|  | 1073 |  | 
|  | 1074 | This is the functionality required by the specifications for | 
|  | 1075 | @code{gettext} and this is also what the @code{catgets} functions are | 
|  | 1076 | able to do.  But there are some problems unresolved: | 
|  | 1077 |  | 
|  | 1078 | @itemize @bullet | 
|  | 1079 | @item | 
|  | 1080 | The language to be used can be specified in several different ways. | 
|  | 1081 | There is no generally accepted standard for this and the user always | 
|  | 1082 | expects the program understand what s/he means.  E.g., to select the | 
|  | 1083 | German translation one could write @code{de}, @code{german}, or | 
|  | 1084 | @code{deutsch} and the program should always react the same. | 
|  | 1085 |  | 
|  | 1086 | @item | 
|  | 1087 | Sometimes the specification of the user is too detailed.  If s/he, e.g., | 
|  | 1088 | specifies @code{de_DE.ISO-8859-1} which means German, spoken in Germany, | 
|  | 1089 | coded using the @w{ISO 8859-1} character set there is the possibility | 
|  | 1090 | that a message catalog matching this exactly is not available.  But | 
|  | 1091 | there could be a catalog matching @code{de} and if the character set | 
|  | 1092 | used on the machine is always @w{ISO 8859-1} there is no reason why this | 
|  | 1093 | later message catalog should not be used.  (We call this @dfn{message | 
|  | 1094 | inheritance}.) | 
|  | 1095 |  | 
|  | 1096 | @item | 
|  | 1097 | If a catalog for a wanted language is not available it is not always the | 
|  | 1098 | second best choice to fall back on the language of the developer and | 
|  | 1099 | simply not translate any message.  Instead a user might be better able | 
|  | 1100 | to read the messages in another language and so the user of the program | 
|  | 1101 | should be able to define a precedence order of languages. | 
|  | 1102 | @end itemize | 
|  | 1103 |  | 
|  | 1104 | We can divide the configuration actions in two parts: the one is | 
|  | 1105 | performed by the programmer, the other by the user.  We will start with | 
|  | 1106 | the functions the programmer can use since the user configuration will | 
|  | 1107 | be based on this. | 
|  | 1108 |  | 
|  | 1109 | As the functions described in the last sections already mention separate | 
|  | 1110 | sets of messages can be selected by a @dfn{domain name}.  This is a | 
|  | 1111 | simple string which should be unique for each program part with uses a | 
|  | 1112 | separate domain.  It is possible to use in one program arbitrary many | 
|  | 1113 | domains at the same time.  E.g., @theglibc{} itself uses a domain | 
|  | 1114 | named @code{libc} while the program using the C Library could use a | 
|  | 1115 | domain named @code{foo}.  The important point is that at any time | 
|  | 1116 | exactly one domain is active.  This is controlled with the following | 
|  | 1117 | function. | 
|  | 1118 |  | 
|  | 1119 | @comment libintl.h | 
|  | 1120 | @comment GNU | 
|  | 1121 | @deftypefun {char *} textdomain (const char *@var{domainname}) | 
|  | 1122 | @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} | 
|  | 1123 | @c textdomain @asulock @ascuheap @aculock @acsmem | 
|  | 1124 | @c  libc_rwlock_wrlock @asulock @aculock | 
|  | 1125 | @c  strcmp ok | 
|  | 1126 | @c  strdup @ascuheap @acsmem | 
|  | 1127 | @c  free @ascuheap @acsmem | 
|  | 1128 | @c  libc_rwlock_unlock ok | 
|  | 1129 | The @code{textdomain} function sets the default domain, which is used in | 
|  | 1130 | all future @code{gettext} calls, to @var{domainname}.  Please note that | 
|  | 1131 | @code{dgettext} and @code{dcgettext} calls are not influenced if the | 
|  | 1132 | @var{domainname} parameter of these functions is not the null pointer. | 
|  | 1133 |  | 
|  | 1134 | Before the first call to @code{textdomain} the default domain is | 
|  | 1135 | @code{messages}.  This is the name specified in the specification of | 
|  | 1136 | the @code{gettext} API.  This name is as good as any other name.  No | 
|  | 1137 | program should ever really use a domain with this name since this can | 
|  | 1138 | only lead to problems. | 
|  | 1139 |  | 
|  | 1140 | The function returns the value which is from now on taken as the default | 
|  | 1141 | domain.  If the system went out of memory the returned value is | 
|  | 1142 | @code{NULL} and the global variable @var{errno} is set to @code{ENOMEM}. | 
|  | 1143 | Despite the return value type being @code{char *} the return string must | 
|  | 1144 | not be changed.  It is allocated internally by the @code{textdomain} | 
|  | 1145 | function. | 
|  | 1146 |  | 
|  | 1147 | If the @var{domainname} parameter is the null pointer no new default | 
|  | 1148 | domain is set.  Instead the currently selected default domain is | 
|  | 1149 | returned. | 
|  | 1150 |  | 
|  | 1151 | If the @var{domainname} parameter is the empty string the default domain | 
|  | 1152 | is reset to its initial value, the domain with the name @code{messages}. | 
|  | 1153 | This possibility is questionable to use since the domain @code{messages} | 
|  | 1154 | really never should be used. | 
|  | 1155 | @end deftypefun | 
|  | 1156 |  | 
|  | 1157 | @comment libintl.h | 
|  | 1158 | @comment GNU | 
|  | 1159 | @deftypefun {char *} bindtextdomain (const char *@var{domainname}, const char *@var{dirname}) | 
|  | 1160 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} | 
|  | 1161 | @c bindtextdomain @ascuheap @acsmem | 
|  | 1162 | @c  set_binding_values @ascuheap @acsmem | 
|  | 1163 | @c   libc_rwlock_wrlock dup @asulock @aculock | 
|  | 1164 | @c   strcmp dup ok | 
|  | 1165 | @c   strdup dup @ascuheap @acsmem | 
|  | 1166 | @c   free dup @ascuheap @acsmem | 
|  | 1167 | @c   malloc dup @ascuheap @acsmem | 
|  | 1168 | The @code{bindtextdomain} function can be used to specify the directory | 
|  | 1169 | which contains the message catalogs for domain @var{domainname} for the | 
|  | 1170 | different languages.  To be correct, this is the directory where the | 
|  | 1171 | hierarchy of directories is expected.  Details are explained below. | 
|  | 1172 |  | 
|  | 1173 | For the programmer it is important to note that the translations which | 
|  | 1174 | come with the program have be placed in a directory hierarchy starting | 
|  | 1175 | at, say, @file{/foo/bar}.  Then the program should make a | 
|  | 1176 | @code{bindtextdomain} call to bind the domain for the current program to | 
|  | 1177 | this directory.  So it is made sure the catalogs are found.  A correctly | 
|  | 1178 | running program does not depend on the user setting an environment | 
|  | 1179 | variable. | 
|  | 1180 |  | 
|  | 1181 | The @code{bindtextdomain} function can be used several times and if the | 
|  | 1182 | @var{domainname} argument is different the previously bound domains | 
|  | 1183 | will not be overwritten. | 
|  | 1184 |  | 
|  | 1185 | If the program which wish to use @code{bindtextdomain} at some point of | 
|  | 1186 | time use the @code{chdir} function to change the current working | 
|  | 1187 | directory it is important that the @var{dirname} strings ought to be an | 
|  | 1188 | absolute pathname.  Otherwise the addressed directory might vary with | 
|  | 1189 | the time. | 
|  | 1190 |  | 
|  | 1191 | If the @var{dirname} parameter is the null pointer @code{bindtextdomain} | 
|  | 1192 | returns the currently selected directory for the domain with the name | 
|  | 1193 | @var{domainname}. | 
|  | 1194 |  | 
|  | 1195 | The @code{bindtextdomain} function returns a pointer to a string | 
|  | 1196 | containing the name of the selected directory name.  The string is | 
|  | 1197 | allocated internally in the function and must not be changed by the | 
|  | 1198 | user.  If the system went out of core during the execution of | 
|  | 1199 | @code{bindtextdomain} the return value is @code{NULL} and the global | 
|  | 1200 | variable @var{errno} is set accordingly. | 
|  | 1201 | @end deftypefun | 
|  | 1202 |  | 
|  | 1203 |  | 
|  | 1204 | @node Advanced gettext functions | 
|  | 1205 | @subsubsection Additional functions for more complicated situations | 
|  | 1206 |  | 
|  | 1207 | The functions of the @code{gettext} family described so far (and all the | 
|  | 1208 | @code{catgets} functions as well) have one problem in the real world | 
|  | 1209 | which have been neglected completely in all existing approaches.  What | 
|  | 1210 | is meant here is the handling of plural forms. | 
|  | 1211 |  | 
|  | 1212 | Looking through Unix source code before the time anybody thought about | 
|  | 1213 | internationalization (and, sadly, even afterwards) one can often find | 
|  | 1214 | code similar to the following: | 
|  | 1215 |  | 
|  | 1216 | @smallexample | 
|  | 1217 | printf ("%d file%s deleted", n, n == 1 ? "" : "s"); | 
|  | 1218 | @end smallexample | 
|  | 1219 |  | 
|  | 1220 | @noindent | 
|  | 1221 | After the first complaints from people internationalizing the code people | 
|  | 1222 | either completely avoided formulations like this or used strings like | 
|  | 1223 | @code{"file(s)"}.  Both look unnatural and should be avoided.  First | 
|  | 1224 | tries to solve the problem correctly looked like this: | 
|  | 1225 |  | 
|  | 1226 | @smallexample | 
|  | 1227 | if (n == 1) | 
|  | 1228 | printf ("%d file deleted", n); | 
|  | 1229 | else | 
|  | 1230 | printf ("%d files deleted", n); | 
|  | 1231 | @end smallexample | 
|  | 1232 |  | 
|  | 1233 | But this does not solve the problem.  It helps languages where the | 
|  | 1234 | plural form of a noun is not simply constructed by adding an `s' but | 
|  | 1235 | that is all.  Once again people fell into the trap of believing the | 
|  | 1236 | rules their language is using are universal.  But the handling of plural | 
|  | 1237 | forms differs widely between the language families.  There are two | 
|  | 1238 | things we can differ between (and even inside language families); | 
|  | 1239 |  | 
|  | 1240 | @itemize @bullet | 
|  | 1241 | @item | 
|  | 1242 | The form how plural forms are build differs.  This is a problem with | 
|  | 1243 | language which have many irregularities.  German, for instance, is a | 
|  | 1244 | drastic case.  Though English and German are part of the same language | 
|  | 1245 | family (Germanic), the almost regular forming of plural noun forms | 
|  | 1246 | (appending an `s') is hardly found in German. | 
|  | 1247 |  | 
|  | 1248 | @item | 
|  | 1249 | The number of plural forms differ.  This is somewhat surprising for | 
|  | 1250 | those who only have experiences with Romanic and Germanic languages | 
|  | 1251 | since here the number is the same (there are two). | 
|  | 1252 |  | 
|  | 1253 | But other language families have only one form or many forms.  More | 
|  | 1254 | information on this in an extra section. | 
|  | 1255 | @end itemize | 
|  | 1256 |  | 
|  | 1257 | The consequence of this is that application writers should not try to | 
|  | 1258 | solve the problem in their code.  This would be localization since it is | 
|  | 1259 | only usable for certain, hardcoded language environments.  Instead the | 
|  | 1260 | extended @code{gettext} interface should be used. | 
|  | 1261 |  | 
|  | 1262 | These extra functions are taking instead of the one key string two | 
|  | 1263 | strings and a numerical argument.  The idea behind this is that using | 
|  | 1264 | the numerical argument and the first string as a key, the implementation | 
|  | 1265 | can select using rules specified by the translator the right plural | 
|  | 1266 | form.  The two string arguments then will be used to provide a return | 
|  | 1267 | value in case no message catalog is found (similar to the normal | 
|  | 1268 | @code{gettext} behavior).  In this case the rules for Germanic language | 
|  | 1269 | is used and it is assumed that the first string argument is the singular | 
|  | 1270 | form, the second the plural form. | 
|  | 1271 |  | 
|  | 1272 | This has the consequence that programs without language catalogs can | 
|  | 1273 | display the correct strings only if the program itself is written using | 
|  | 1274 | a Germanic language.  This is a limitation but since @theglibc{} | 
|  | 1275 | (as well as the GNU @code{gettext} package) are written as part of the | 
|  | 1276 | GNU package and the coding standards for the GNU project require program | 
|  | 1277 | being written in English, this solution nevertheless fulfills its | 
|  | 1278 | purpose. | 
|  | 1279 |  | 
|  | 1280 | @comment libintl.h | 
|  | 1281 | @comment GNU | 
|  | 1282 | @deftypefun {char *} ngettext (const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}) | 
|  | 1283 | @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} | 
|  | 1284 | @c Wrapper for dcngettext. | 
|  | 1285 | The @code{ngettext} function is similar to the @code{gettext} function | 
|  | 1286 | as it finds the message catalogs in the same way.  But it takes two | 
|  | 1287 | extra arguments.  The @var{msgid1} parameter must contain the singular | 
|  | 1288 | form of the string to be converted.  It is also used as the key for the | 
|  | 1289 | search in the catalog.  The @var{msgid2} parameter is the plural form. | 
|  | 1290 | The parameter @var{n} is used to determine the plural form.  If no | 
|  | 1291 | message catalog is found @var{msgid1} is returned if @code{n == 1}, | 
|  | 1292 | otherwise @code{msgid2}. | 
|  | 1293 |  | 
|  | 1294 | An example for the us of this function is: | 
|  | 1295 |  | 
|  | 1296 | @smallexample | 
|  | 1297 | printf (ngettext ("%d file removed", "%d files removed", n), n); | 
|  | 1298 | @end smallexample | 
|  | 1299 |  | 
|  | 1300 | Please note that the numeric value @var{n} has to be passed to the | 
|  | 1301 | @code{printf} function as well.  It is not sufficient to pass it only to | 
|  | 1302 | @code{ngettext}. | 
|  | 1303 | @end deftypefun | 
|  | 1304 |  | 
|  | 1305 | @comment libintl.h | 
|  | 1306 | @comment GNU | 
|  | 1307 | @deftypefun {char *} dngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}) | 
|  | 1308 | @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} | 
|  | 1309 | @c Wrapper for dcngettext. | 
|  | 1310 | The @code{dngettext} is similar to the @code{dgettext} function in the | 
|  | 1311 | way the message catalog is selected.  The difference is that it takes | 
|  | 1312 | two extra parameter to provide the correct plural form.  These two | 
|  | 1313 | parameters are handled in the same way @code{ngettext} handles them. | 
|  | 1314 | @end deftypefun | 
|  | 1315 |  | 
|  | 1316 | @comment libintl.h | 
|  | 1317 | @comment GNU | 
|  | 1318 | @deftypefun {char *} dcngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}, int @var{category}) | 
|  | 1319 | @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} | 
|  | 1320 | @c Wrapper for dcigettext. | 
|  | 1321 | The @code{dcngettext} is similar to the @code{dcgettext} function in the | 
|  | 1322 | way the message catalog is selected.  The difference is that it takes | 
|  | 1323 | two extra parameter to provide the correct plural form.  These two | 
|  | 1324 | parameters are handled in the same way @code{ngettext} handles them. | 
|  | 1325 | @end deftypefun | 
|  | 1326 |  | 
|  | 1327 | @subsubheading The problem of plural forms | 
|  | 1328 |  | 
|  | 1329 | A description of the problem can be found at the beginning of the last | 
|  | 1330 | section.  Now there is the question how to solve it.  Without the input | 
|  | 1331 | of linguists (which was not available) it was not possible to determine | 
|  | 1332 | whether there are only a few different forms in which plural forms are | 
|  | 1333 | formed or whether the number can increase with every new supported | 
|  | 1334 | language. | 
|  | 1335 |  | 
|  | 1336 | Therefore the solution implemented is to allow the translator to specify | 
|  | 1337 | the rules of how to select the plural form.  Since the formula varies | 
|  | 1338 | with every language this is the only viable solution except for | 
|  | 1339 | hardcoding the information in the code (which still would require the | 
|  | 1340 | possibility of extensions to not prevent the use of new languages).  The | 
|  | 1341 | details are explained in the GNU @code{gettext} manual.  Here only a | 
|  | 1342 | bit of information is provided. | 
|  | 1343 |  | 
|  | 1344 | The information about the plural form selection has to be stored in the | 
|  | 1345 | header entry (the one with the empty (@code{msgid} string).  It looks | 
|  | 1346 | like this: | 
|  | 1347 |  | 
|  | 1348 | @smallexample | 
|  | 1349 | Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1; | 
|  | 1350 | @end smallexample | 
|  | 1351 |  | 
|  | 1352 | The @code{nplurals} value must be a decimal number which specifies how | 
|  | 1353 | many different plural forms exist for this language.  The string | 
|  | 1354 | following @code{plural} is an expression which is using the C language | 
|  | 1355 | syntax.  Exceptions are that no negative number are allowed, numbers | 
|  | 1356 | must be decimal, and the only variable allowed is @code{n}.  This | 
|  | 1357 | expression will be evaluated whenever one of the functions | 
|  | 1358 | @code{ngettext}, @code{dngettext}, or @code{dcngettext} is called.  The | 
|  | 1359 | numeric value passed to these functions is then substituted for all uses | 
|  | 1360 | of the variable @code{n} in the expression.  The resulting value then | 
|  | 1361 | must be greater or equal to zero and smaller than the value given as the | 
|  | 1362 | value of @code{nplurals}. | 
|  | 1363 |  | 
|  | 1364 | @noindent | 
|  | 1365 | The following rules are known at this point.  The language with families | 
|  | 1366 | are listed.  But this does not necessarily mean the information can be | 
|  | 1367 | generalized for the whole family (as can be easily seen in the table | 
|  | 1368 | below).@footnote{Additions are welcome.  Send appropriate information to | 
|  | 1369 | @email{bug-glibc-manual@@gnu.org}.} | 
|  | 1370 |  | 
|  | 1371 | @table @asis | 
|  | 1372 | @item Only one form: | 
|  | 1373 | Some languages only require one single form.  There is no distinction | 
|  | 1374 | between the singular and plural form.  An appropriate header entry | 
|  | 1375 | would look like this: | 
|  | 1376 |  | 
|  | 1377 | @smallexample | 
|  | 1378 | Plural-Forms: nplurals=1; plural=0; | 
|  | 1379 | @end smallexample | 
|  | 1380 |  | 
|  | 1381 | @noindent | 
|  | 1382 | Languages with this property include: | 
|  | 1383 |  | 
|  | 1384 | @table @asis | 
|  | 1385 | @item Finno-Ugric family | 
|  | 1386 | Hungarian | 
|  | 1387 | @item Asian family | 
|  | 1388 | Japanese, Korean | 
|  | 1389 | @item Turkic/Altaic family | 
|  | 1390 | Turkish | 
|  | 1391 | @end table | 
|  | 1392 |  | 
|  | 1393 | @item Two forms, singular used for one only | 
|  | 1394 | This is the form used in most existing programs since it is what English | 
|  | 1395 | is using.  A header entry would look like this: | 
|  | 1396 |  | 
|  | 1397 | @smallexample | 
|  | 1398 | Plural-Forms: nplurals=2; plural=n != 1; | 
|  | 1399 | @end smallexample | 
|  | 1400 |  | 
|  | 1401 | (Note: this uses the feature of C expressions that boolean expressions | 
|  | 1402 | have to value zero or one.) | 
|  | 1403 |  | 
|  | 1404 | @noindent | 
|  | 1405 | Languages with this property include: | 
|  | 1406 |  | 
|  | 1407 | @table @asis | 
|  | 1408 | @item Germanic family | 
|  | 1409 | Danish, Dutch, English, German, Norwegian, Swedish | 
|  | 1410 | @item Finno-Ugric family | 
|  | 1411 | Estonian, Finnish | 
|  | 1412 | @item Latin/Greek family | 
|  | 1413 | Greek | 
|  | 1414 | @item Semitic family | 
|  | 1415 | Hebrew | 
|  | 1416 | @item Romance family | 
|  | 1417 | Italian, Portuguese, Spanish | 
|  | 1418 | @item Artificial | 
|  | 1419 | Esperanto | 
|  | 1420 | @end table | 
|  | 1421 |  | 
|  | 1422 | @item Two forms, singular used for zero and one | 
|  | 1423 | Exceptional case in the language family.  The header entry would be: | 
|  | 1424 |  | 
|  | 1425 | @smallexample | 
|  | 1426 | Plural-Forms: nplurals=2; plural=n>1; | 
|  | 1427 | @end smallexample | 
|  | 1428 |  | 
|  | 1429 | @noindent | 
|  | 1430 | Languages with this property include: | 
|  | 1431 |  | 
|  | 1432 | @table @asis | 
|  | 1433 | @item Romanic family | 
|  | 1434 | French, Brazilian Portuguese | 
|  | 1435 | @end table | 
|  | 1436 |  | 
|  | 1437 | @item Three forms, special case for zero | 
|  | 1438 | The header entry would be: | 
|  | 1439 |  | 
|  | 1440 | @smallexample | 
|  | 1441 | Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2; | 
|  | 1442 | @end smallexample | 
|  | 1443 |  | 
|  | 1444 | @noindent | 
|  | 1445 | Languages with this property include: | 
|  | 1446 |  | 
|  | 1447 | @table @asis | 
|  | 1448 | @item Baltic family | 
|  | 1449 | Latvian | 
|  | 1450 | @end table | 
|  | 1451 |  | 
|  | 1452 | @item Three forms, special cases for one and two | 
|  | 1453 | The header entry would be: | 
|  | 1454 |  | 
|  | 1455 | @smallexample | 
|  | 1456 | Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2; | 
|  | 1457 | @end smallexample | 
|  | 1458 |  | 
|  | 1459 | @noindent | 
|  | 1460 | Languages with this property include: | 
|  | 1461 |  | 
|  | 1462 | @table @asis | 
|  | 1463 | @item Celtic | 
|  | 1464 | Gaeilge (Irish) | 
|  | 1465 | @end table | 
|  | 1466 |  | 
|  | 1467 | @item Three forms, special case for numbers ending in 1[2-9] | 
|  | 1468 | The header entry would look like this: | 
|  | 1469 |  | 
|  | 1470 | @smallexample | 
|  | 1471 | Plural-Forms: nplurals=3; \ | 
|  | 1472 | plural=n%10==1 && n%100!=11 ? 0 : \ | 
|  | 1473 | n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2; | 
|  | 1474 | @end smallexample | 
|  | 1475 |  | 
|  | 1476 | @noindent | 
|  | 1477 | Languages with this property include: | 
|  | 1478 |  | 
|  | 1479 | @table @asis | 
|  | 1480 | @item Baltic family | 
|  | 1481 | Lithuanian | 
|  | 1482 | @end table | 
|  | 1483 |  | 
|  | 1484 | @item Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4] | 
|  | 1485 | The header entry would look like this: | 
|  | 1486 |  | 
|  | 1487 | @smallexample | 
|  | 1488 | Plural-Forms: nplurals=3; \ | 
|  | 1489 | plural=n%100/10==1 ? 2 : n%10==1 ? 0 : (n+9)%10>3 ? 2 : 1; | 
|  | 1490 | @end smallexample | 
|  | 1491 |  | 
|  | 1492 | @noindent | 
|  | 1493 | Languages with this property include: | 
|  | 1494 |  | 
|  | 1495 | @table @asis | 
|  | 1496 | @item Slavic family | 
|  | 1497 | Croatian, Czech, Russian, Ukrainian | 
|  | 1498 | @end table | 
|  | 1499 |  | 
|  | 1500 | @item Three forms, special cases for 1 and 2, 3, 4 | 
|  | 1501 | The header entry would look like this: | 
|  | 1502 |  | 
|  | 1503 | @smallexample | 
|  | 1504 | Plural-Forms: nplurals=3; \ | 
|  | 1505 | plural=(n==1) ? 1 : (n>=2 && n<=4) ? 2 : 0; | 
|  | 1506 | @end smallexample | 
|  | 1507 |  | 
|  | 1508 | @noindent | 
|  | 1509 | Languages with this property include: | 
|  | 1510 |  | 
|  | 1511 | @table @asis | 
|  | 1512 | @item Slavic family | 
|  | 1513 | Slovak | 
|  | 1514 | @end table | 
|  | 1515 |  | 
|  | 1516 | @item Three forms, special case for one and some numbers ending in 2, 3, or 4 | 
|  | 1517 | The header entry would look like this: | 
|  | 1518 |  | 
|  | 1519 | @smallexample | 
|  | 1520 | Plural-Forms: nplurals=3; \ | 
|  | 1521 | plural=n==1 ? 0 : \ | 
|  | 1522 | n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2; | 
|  | 1523 | @end smallexample | 
|  | 1524 |  | 
|  | 1525 | @noindent | 
|  | 1526 | Languages with this property include: | 
|  | 1527 |  | 
|  | 1528 | @table @asis | 
|  | 1529 | @item Slavic family | 
|  | 1530 | Polish | 
|  | 1531 | @end table | 
|  | 1532 |  | 
|  | 1533 | @item Four forms, special case for one and all numbers ending in 02, 03, or 04 | 
|  | 1534 | The header entry would look like this: | 
|  | 1535 |  | 
|  | 1536 | @smallexample | 
|  | 1537 | Plural-Forms: nplurals=4; \ | 
|  | 1538 | plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3; | 
|  | 1539 | @end smallexample | 
|  | 1540 |  | 
|  | 1541 | @noindent | 
|  | 1542 | Languages with this property include: | 
|  | 1543 |  | 
|  | 1544 | @table @asis | 
|  | 1545 | @item Slavic family | 
|  | 1546 | Slovenian | 
|  | 1547 | @end table | 
|  | 1548 | @end table | 
|  | 1549 |  | 
|  | 1550 |  | 
|  | 1551 | @node Charset conversion in gettext | 
|  | 1552 | @subsubsection How to specify the output character set @code{gettext} uses | 
|  | 1553 |  | 
|  | 1554 | @code{gettext} not only looks up a translation in a message catalog.  It | 
|  | 1555 | also converts the translation on the fly to the desired output character | 
|  | 1556 | set.  This is useful if the user is working in a different character set | 
|  | 1557 | than the translator who created the message catalog, because it avoids | 
|  | 1558 | distributing variants of message catalogs which differ only in the | 
|  | 1559 | character set. | 
|  | 1560 |  | 
|  | 1561 | The output character set is, by default, the value of @code{nl_langinfo | 
|  | 1562 | (CODESET)}, which depends on the @code{LC_CTYPE} part of the current | 
|  | 1563 | locale.  But programs which store strings in a locale independent way | 
|  | 1564 | (e.g. UTF-8) can request that @code{gettext} and related functions | 
|  | 1565 | return the translations in that encoding, by use of the | 
|  | 1566 | @code{bind_textdomain_codeset} function. | 
|  | 1567 |  | 
|  | 1568 | Note that the @var{msgid} argument to @code{gettext} is not subject to | 
|  | 1569 | character set conversion.  Also, when @code{gettext} does not find a | 
|  | 1570 | translation for @var{msgid}, it returns @var{msgid} unchanged -- | 
|  | 1571 | independently of the current output character set.  It is therefore | 
|  | 1572 | recommended that all @var{msgid}s be US-ASCII strings. | 
|  | 1573 |  | 
|  | 1574 | @comment libintl.h | 
|  | 1575 | @comment GNU | 
|  | 1576 | @deftypefun {char *} bind_textdomain_codeset (const char *@var{domainname}, const char *@var{codeset}) | 
|  | 1577 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} | 
|  | 1578 | @c bind_textdomain_codeset @ascuheap @acsmem | 
|  | 1579 | @c  set_binding_values dup @ascuheap @acsmem | 
|  | 1580 | The @code{bind_textdomain_codeset} function can be used to specify the | 
|  | 1581 | output character set for message catalogs for domain @var{domainname}. | 
|  | 1582 | The @var{codeset} argument must be a valid codeset name which can be used | 
|  | 1583 | for the @code{iconv_open} function, or a null pointer. | 
|  | 1584 |  | 
|  | 1585 | If the @var{codeset} parameter is the null pointer, | 
|  | 1586 | @code{bind_textdomain_codeset} returns the currently selected codeset | 
|  | 1587 | for the domain with the name @var{domainname}.  It returns @code{NULL} if | 
|  | 1588 | no codeset has yet been selected. | 
|  | 1589 |  | 
|  | 1590 | The @code{bind_textdomain_codeset} function can be used several times. | 
|  | 1591 | If used multiple times with the same @var{domainname} argument, the | 
|  | 1592 | later call overrides the settings made by the earlier one. | 
|  | 1593 |  | 
|  | 1594 | The @code{bind_textdomain_codeset} function returns a pointer to a | 
|  | 1595 | string containing the name of the selected codeset.  The string is | 
|  | 1596 | allocated internally in the function and must not be changed by the | 
|  | 1597 | user.  If the system went out of core during the execution of | 
|  | 1598 | @code{bind_textdomain_codeset}, the return value is @code{NULL} and the | 
|  | 1599 | global variable @var{errno} is set accordingly. | 
|  | 1600 | @end deftypefun | 
|  | 1601 |  | 
|  | 1602 |  | 
|  | 1603 | @node GUI program problems | 
|  | 1604 | @subsubsection How to use @code{gettext} in GUI programs | 
|  | 1605 |  | 
|  | 1606 | One place where the @code{gettext} functions, if used normally, have big | 
|  | 1607 | problems is within programs with graphical user interfaces (GUIs).  The | 
|  | 1608 | problem is that many of the strings which have to be translated are very | 
|  | 1609 | short.  They have to appear in pull-down menus which restricts the | 
|  | 1610 | length.  But strings which are not containing entire sentences or at | 
|  | 1611 | least large fragments of a sentence may appear in more than one | 
|  | 1612 | situation in the program but might have different translations.  This is | 
|  | 1613 | especially true for the one-word strings which are frequently used in | 
|  | 1614 | GUI programs. | 
|  | 1615 |  | 
|  | 1616 | As a consequence many people say that the @code{gettext} approach is | 
|  | 1617 | wrong and instead @code{catgets} should be used which indeed does not | 
|  | 1618 | have this problem.  But there is a very simple and powerful method to | 
|  | 1619 | handle these kind of problems with the @code{gettext} functions. | 
|  | 1620 |  | 
|  | 1621 | @noindent | 
|  | 1622 | As an example consider the following fictional situation.  A GUI program | 
|  | 1623 | has a menu bar with the following entries: | 
|  | 1624 |  | 
|  | 1625 | @smallexample | 
|  | 1626 | +------------+------------+--------------------------------------+ | 
|  | 1627 | | File       | Printer    |                                      | | 
|  | 1628 | +------------+------------+--------------------------------------+ | 
|  | 1629 | | Open     | | Select   | | 
|  | 1630 | | New      | | Open     | | 
|  | 1631 | +----------+ | Connect  | | 
|  | 1632 | +----------+ | 
|  | 1633 | @end smallexample | 
|  | 1634 |  | 
|  | 1635 | To have the strings @code{File}, @code{Printer}, @code{Open}, | 
|  | 1636 | @code{New}, @code{Select}, and @code{Connect} translated there has to be | 
|  | 1637 | at some point in the code a call to a function of the @code{gettext} | 
|  | 1638 | family.  But in two places the string passed into the function would be | 
|  | 1639 | @code{Open}.  The translations might not be the same and therefore we | 
|  | 1640 | are in the dilemma described above. | 
|  | 1641 |  | 
|  | 1642 | One solution to this problem is to artificially enlengthen the strings | 
|  | 1643 | to make them unambiguous.  But what would the program do if no | 
|  | 1644 | translation is available?  The enlengthened string is not what should be | 
|  | 1645 | printed.  So we should use a little bit modified version of the functions. | 
|  | 1646 |  | 
|  | 1647 | To enlengthen the strings a uniform method should be used.  E.g., in the | 
|  | 1648 | example above the strings could be chosen as | 
|  | 1649 |  | 
|  | 1650 | @smallexample | 
|  | 1651 | Menu|File | 
|  | 1652 | Menu|Printer | 
|  | 1653 | Menu|File|Open | 
|  | 1654 | Menu|File|New | 
|  | 1655 | Menu|Printer|Select | 
|  | 1656 | Menu|Printer|Open | 
|  | 1657 | Menu|Printer|Connect | 
|  | 1658 | @end smallexample | 
|  | 1659 |  | 
|  | 1660 | Now all the strings are different and if now instead of @code{gettext} | 
|  | 1661 | the following little wrapper function is used, everything works just | 
|  | 1662 | fine: | 
|  | 1663 |  | 
|  | 1664 | @cindex sgettext | 
|  | 1665 | @smallexample | 
|  | 1666 | char * | 
|  | 1667 | sgettext (const char *msgid) | 
|  | 1668 | @{ | 
|  | 1669 | char *msgval = gettext (msgid); | 
|  | 1670 | if (msgval == msgid) | 
|  | 1671 | msgval = strrchr (msgid, '|') + 1; | 
|  | 1672 | return msgval; | 
|  | 1673 | @} | 
|  | 1674 | @end smallexample | 
|  | 1675 |  | 
|  | 1676 | What this little function does is to recognize the case when no | 
|  | 1677 | translation is available.  This can be done very efficiently by a | 
|  | 1678 | pointer comparison since the return value is the input value.  If there | 
|  | 1679 | is no translation we know that the input string is in the format we used | 
|  | 1680 | for the Menu entries and therefore contains a @code{|} character.  We | 
|  | 1681 | simply search for the last occurrence of this character and return a | 
|  | 1682 | pointer to the character following it.  That's it! | 
|  | 1683 |  | 
|  | 1684 | If one now consistently uses the enlengthened string form and replaces | 
|  | 1685 | the @code{gettext} calls with calls to @code{sgettext} (this is normally | 
|  | 1686 | limited to very few places in the GUI implementation) then it is | 
|  | 1687 | possible to produce a program which can be internationalized. | 
|  | 1688 |  | 
|  | 1689 | With advanced compilers (such as GNU C) one can write the | 
|  | 1690 | @code{sgettext} functions as an inline function or as a macro like this: | 
|  | 1691 |  | 
|  | 1692 | @cindex sgettext | 
|  | 1693 | @smallexample | 
|  | 1694 | #define sgettext(msgid) \ | 
|  | 1695 | (@{ const char *__msgid = (msgid);            \ | 
|  | 1696 | char *__msgstr = gettext (__msgid);       \ | 
|  | 1697 | if (__msgval == __msgid)                  \ | 
|  | 1698 | __msgval = strrchr (__msgid, '|') + 1;  \ | 
|  | 1699 | __msgval; @}) | 
|  | 1700 | @end smallexample | 
|  | 1701 |  | 
|  | 1702 | The other @code{gettext} functions (@code{dgettext}, @code{dcgettext} | 
|  | 1703 | and the @code{ngettext} equivalents) can and should have corresponding | 
|  | 1704 | functions as well which look almost identical, except for the parameters | 
|  | 1705 | and the call to the underlying function. | 
|  | 1706 |  | 
|  | 1707 | Now there is of course the question why such functions do not exist in | 
|  | 1708 | @theglibc{}?  There are two parts of the answer to this question. | 
|  | 1709 |  | 
|  | 1710 | @itemize @bullet | 
|  | 1711 | @item | 
|  | 1712 | They are easy to write and therefore can be provided by the project they | 
|  | 1713 | are used in.  This is not an answer by itself and must be seen together | 
|  | 1714 | with the second part which is: | 
|  | 1715 |  | 
|  | 1716 | @item | 
|  | 1717 | There is no way the C library can contain a version which can work | 
|  | 1718 | everywhere.  The problem is the selection of the character to separate | 
|  | 1719 | the prefix from the actual string in the enlenghtened string.  The | 
|  | 1720 | examples above used @code{|} which is a quite good choice because it | 
|  | 1721 | resembles a notation frequently used in this context and it also is a | 
|  | 1722 | character not often used in message strings. | 
|  | 1723 |  | 
|  | 1724 | But what if the character is used in message strings.  Or if the chose | 
|  | 1725 | character is not available in the character set on the machine one | 
|  | 1726 | compiles (e.g., @code{|} is not required to exist for @w{ISO C}; this is | 
|  | 1727 | why the @file{iso646.h} file exists in @w{ISO C} programming environments). | 
|  | 1728 | @end itemize | 
|  | 1729 |  | 
|  | 1730 | There is only one more comment to make left.  The wrapper function above | 
|  | 1731 | require that the translations strings are not enlengthened themselves. | 
|  | 1732 | This is only logical.  There is no need to disambiguate the strings | 
|  | 1733 | (since they are never used as keys for a search) and one also saves | 
|  | 1734 | quite some memory and disk space by doing this. | 
|  | 1735 |  | 
|  | 1736 |  | 
|  | 1737 | @node Using gettextized software | 
|  | 1738 | @subsubsection User influence on @code{gettext} | 
|  | 1739 |  | 
|  | 1740 | The last sections described what the programmer can do to | 
|  | 1741 | internationalize the messages of the program.  But it is finally up to | 
|  | 1742 | the user to select the message s/he wants to see.  S/He must understand | 
|  | 1743 | them. | 
|  | 1744 |  | 
|  | 1745 | The POSIX locale model uses the environment variables @code{LC_COLLATE}, | 
|  | 1746 | @code{LC_CTYPE}, @code{LC_MESSAGES}, @code{LC_MONETARY}, @code{LC_NUMERIC}, | 
|  | 1747 | and @code{LC_TIME} to select the locale which is to be used.  This way | 
|  | 1748 | the user can influence lots of functions.  As we mentioned above the | 
|  | 1749 | @code{gettext} functions also take advantage of this. | 
|  | 1750 |  | 
|  | 1751 | To understand how this happens it is necessary to take a look at the | 
|  | 1752 | various components of the filename which gets computed to locate a | 
|  | 1753 | message catalog.  It is composed as follows: | 
|  | 1754 |  | 
|  | 1755 | @smallexample | 
|  | 1756 | @var{dir_name}/@var{locale}/LC_@var{category}/@var{domain_name}.mo | 
|  | 1757 | @end smallexample | 
|  | 1758 |  | 
|  | 1759 | The default value for @var{dir_name} is system specific.  It is computed | 
|  | 1760 | from the value given as the prefix while configuring the C library. | 
|  | 1761 | This value normally is @file{/usr} or @file{/}.  For the former the | 
|  | 1762 | complete @var{dir_name} is: | 
|  | 1763 |  | 
|  | 1764 | @smallexample | 
|  | 1765 | /usr/share/locale | 
|  | 1766 | @end smallexample | 
|  | 1767 |  | 
|  | 1768 | We can use @file{/usr/share} since the @file{.mo} files containing the | 
|  | 1769 | message catalogs are system independent, so all systems can use the same | 
|  | 1770 | files.  If the program executed the @code{bindtextdomain} function for | 
|  | 1771 | the message domain that is currently handled, the @code{dir_name} | 
|  | 1772 | component is exactly the value which was given to the function as | 
|  | 1773 | the second parameter.  I.e., @code{bindtextdomain} allows overwriting | 
|  | 1774 | the only system dependent and fixed value to make it possible to | 
|  | 1775 | address files anywhere in the filesystem. | 
|  | 1776 |  | 
|  | 1777 | The @var{category} is the name of the locale category which was selected | 
|  | 1778 | in the program code.  For @code{gettext} and @code{dgettext} this is | 
|  | 1779 | always @code{LC_MESSAGES}, for @code{dcgettext} this is selected by the | 
|  | 1780 | value of the third parameter.  As said above it should be avoided to | 
|  | 1781 | ever use a category other than @code{LC_MESSAGES}. | 
|  | 1782 |  | 
|  | 1783 | The @var{locale} component is computed based on the category used.  Just | 
|  | 1784 | like for the @code{setlocale} function here comes the user selection | 
|  | 1785 | into the play.  Some environment variables are examined in a fixed order | 
|  | 1786 | and the first environment variable set determines the return value of | 
|  | 1787 | the lookup process.  In detail, for the category @code{LC_xxx} the | 
|  | 1788 | following variables in this order are examined: | 
|  | 1789 |  | 
|  | 1790 | @table @code | 
|  | 1791 | @item LANGUAGE | 
|  | 1792 | @item LC_ALL | 
|  | 1793 | @item LC_xxx | 
|  | 1794 | @item LANG | 
|  | 1795 | @end table | 
|  | 1796 |  | 
|  | 1797 | This looks very familiar.  With the exception of the @code{LANGUAGE} | 
|  | 1798 | environment variable this is exactly the lookup order the | 
|  | 1799 | @code{setlocale} function uses.  But why introducing the @code{LANGUAGE} | 
|  | 1800 | variable? | 
|  | 1801 |  | 
|  | 1802 | The reason is that the syntax of the values these variables can have is | 
|  | 1803 | different to what is expected by the @code{setlocale} function.  If we | 
|  | 1804 | would set @code{LC_ALL} to a value following the extended syntax that | 
|  | 1805 | would mean the @code{setlocale} function will never be able to use the | 
|  | 1806 | value of this variable as well.  An additional variable removes this | 
|  | 1807 | problem plus we can select the language independently of the locale | 
|  | 1808 | setting which sometimes is useful. | 
|  | 1809 |  | 
|  | 1810 | While for the @code{LC_xxx} variables the value should consist of | 
|  | 1811 | exactly one specification of a locale the @code{LANGUAGE} variable's | 
|  | 1812 | value can consist of a colon separated list of locale names.  The | 
|  | 1813 | attentive reader will realize that this is the way we manage to | 
|  | 1814 | implement one of our additional demands above: we want to be able to | 
|  | 1815 | specify an ordered list of language. | 
|  | 1816 |  | 
|  | 1817 | Back to the constructed filename we have only one component missing. | 
|  | 1818 | The @var{domain_name} part is the name which was either registered using | 
|  | 1819 | the @code{textdomain} function or which was given to @code{dgettext} or | 
|  | 1820 | @code{dcgettext} as the first parameter.  Now it becomes obvious that a | 
|  | 1821 | good choice for the domain name in the program code is a string which is | 
|  | 1822 | closely related to the program/package name.  E.g., for @theglibc{} | 
|  | 1823 | the domain name is @code{libc}. | 
|  | 1824 |  | 
|  | 1825 | @noindent | 
|  | 1826 | A limit piece of example code should show how the programmer is supposed | 
|  | 1827 | to work: | 
|  | 1828 |  | 
|  | 1829 | @smallexample | 
|  | 1830 | @{ | 
|  | 1831 | setlocale (LC_ALL, ""); | 
|  | 1832 | textdomain ("test-package"); | 
|  | 1833 | bindtextdomain ("test-package", "/usr/local/share/locale"); | 
|  | 1834 | puts (gettext ("Hello, world!")); | 
|  | 1835 | @} | 
|  | 1836 | @end smallexample | 
|  | 1837 |  | 
|  | 1838 | At the program start the default domain is @code{messages}, and the | 
|  | 1839 | default locale is "C".  The @code{setlocale} call sets the locale | 
|  | 1840 | according to the user's environment variables; remember that correct | 
|  | 1841 | functioning of @code{gettext} relies on the correct setting of the | 
|  | 1842 | @code{LC_MESSAGES} locale (for looking up the message catalog) and | 
|  | 1843 | of the @code{LC_CTYPE} locale (for the character set conversion). | 
|  | 1844 | The @code{textdomain} call changes the default domain to | 
|  | 1845 | @code{test-package}.  The @code{bindtextdomain} call specifies that | 
|  | 1846 | the message catalogs for the domain @code{test-package} can be found | 
|  | 1847 | below the directory @file{/usr/local/share/locale}. | 
|  | 1848 |  | 
|  | 1849 | If now the user set in her/his environment the variable @code{LANGUAGE} | 
|  | 1850 | to @code{de} the @code{gettext} function will try to use the | 
|  | 1851 | translations from the file | 
|  | 1852 |  | 
|  | 1853 | @smallexample | 
|  | 1854 | /usr/local/share/locale/de/LC_MESSAGES/test-package.mo | 
|  | 1855 | @end smallexample | 
|  | 1856 |  | 
|  | 1857 | From the above descriptions it should be clear which component of this | 
|  | 1858 | filename is determined by which source. | 
|  | 1859 |  | 
|  | 1860 | In the above example we assumed that the @code{LANGUAGE} environment | 
|  | 1861 | variable to @code{de}.  This might be an appropriate selection but what | 
|  | 1862 | happens if the user wants to use @code{LC_ALL} because of the wider | 
|  | 1863 | usability and here the required value is @code{de_DE.ISO-8859-1}?  We | 
|  | 1864 | already mentioned above that a situation like this is not infrequent. | 
|  | 1865 | E.g., a person might prefer reading a dialect and if this is not | 
|  | 1866 | available fall back on the standard language. | 
|  | 1867 |  | 
|  | 1868 | The @code{gettext} functions know about situations like this and can | 
|  | 1869 | handle them gracefully.  The functions recognize the format of the value | 
|  | 1870 | of the environment variable.  It can split the value is different pieces | 
|  | 1871 | and by leaving out the only or the other part it can construct new | 
|  | 1872 | values.  This happens of course in a predictable way.  To understand | 
|  | 1873 | this one must know the format of the environment variable value.  There | 
|  | 1874 | is one more or less standardized form, originally from the X/Open | 
|  | 1875 | specification: | 
|  | 1876 |  | 
|  | 1877 | @code{language[_territory[.codeset]][@@modifier]} | 
|  | 1878 |  | 
|  | 1879 | Less specific locale names will be stripped of in the order of the | 
|  | 1880 | following list: | 
|  | 1881 |  | 
|  | 1882 | @enumerate | 
|  | 1883 | @item | 
|  | 1884 | @code{codeset} | 
|  | 1885 | @item | 
|  | 1886 | @code{normalized codeset} | 
|  | 1887 | @item | 
|  | 1888 | @code{territory} | 
|  | 1889 | @item | 
|  | 1890 | @code{modifier} | 
|  | 1891 | @end enumerate | 
|  | 1892 |  | 
|  | 1893 | The @code{language} field will never be dropped for obvious reasons. | 
|  | 1894 |  | 
|  | 1895 | The only new thing is the @code{normalized codeset} entry.  This is | 
|  | 1896 | another goodie which is introduced to help reducing the chaos which | 
|  | 1897 | derives from the inability of the people to standardize the names of | 
|  | 1898 | character sets.  Instead of @w{ISO-8859-1} one can often see @w{8859-1}, | 
|  | 1899 | @w{88591}, @w{iso8859-1}, or @w{iso_8859-1}.  The @code{normalized | 
|  | 1900 | codeset} value is generated from the user-provided character set name by | 
|  | 1901 | applying the following rules: | 
|  | 1902 |  | 
|  | 1903 | @enumerate | 
|  | 1904 | @item | 
|  | 1905 | Remove all characters beside numbers and letters. | 
|  | 1906 | @item | 
|  | 1907 | Fold letters to lowercase. | 
|  | 1908 | @item | 
|  | 1909 | If the same only contains digits prepend the string @code{"iso"}. | 
|  | 1910 | @end enumerate | 
|  | 1911 |  | 
|  | 1912 | @noindent | 
|  | 1913 | So all of the above name will be normalized to @code{iso88591}.  This | 
|  | 1914 | allows the program user much more freely choosing the locale name. | 
|  | 1915 |  | 
|  | 1916 | Even this extended functionality still does not help to solve the | 
|  | 1917 | problem that completely different names can be used to denote the same | 
|  | 1918 | locale (e.g., @code{de} and @code{german}).  To be of help in this | 
|  | 1919 | situation the locale implementation and also the @code{gettext} | 
|  | 1920 | functions know about aliases. | 
|  | 1921 |  | 
|  | 1922 | The file @file{/usr/share/locale/locale.alias} (replace @file{/usr} with | 
|  | 1923 | whatever prefix you used for configuring the C library) contains a | 
|  | 1924 | mapping of alternative names to more regular names.  The system manager | 
|  | 1925 | is free to add new entries to fill her/his own needs.  The selected | 
|  | 1926 | locale from the environment is compared with the entries in the first | 
|  | 1927 | column of this file ignoring the case.  If they match the value of the | 
|  | 1928 | second column is used instead for the further handling. | 
|  | 1929 |  | 
|  | 1930 | In the description of the format of the environment variables we already | 
|  | 1931 | mentioned the character set as a factor in the selection of the message | 
|  | 1932 | catalog.  In fact, only catalogs which contain text written using the | 
|  | 1933 | character set of the system/program can be used (directly; there will | 
|  | 1934 | come a solution for this some day).  This means for the user that s/he | 
|  | 1935 | will always have to take care for this.  If in the collection of the | 
|  | 1936 | message catalogs there are files for the same language but coded using | 
|  | 1937 | different character sets the user has to be careful. | 
|  | 1938 |  | 
|  | 1939 |  | 
|  | 1940 | @node Helper programs for gettext | 
|  | 1941 | @subsection Programs to handle message catalogs for @code{gettext} | 
|  | 1942 |  | 
|  | 1943 | @Theglibc{} does not contain the source code for the programs to | 
|  | 1944 | handle message catalogs for the @code{gettext} functions.  As part of | 
|  | 1945 | the GNU project the GNU gettext package contains everything the | 
|  | 1946 | developer needs.  The functionality provided by the tools in this | 
|  | 1947 | package by far exceeds the abilities of the @code{gencat} program | 
|  | 1948 | described above for the @code{catgets} functions. | 
|  | 1949 |  | 
|  | 1950 | There is a program @code{msgfmt} which is the equivalent program to the | 
|  | 1951 | @code{gencat} program.  It generates from the human-readable and | 
|  | 1952 | -editable form of the message catalog a binary file which can be used by | 
|  | 1953 | the @code{gettext} functions.  But there are several more programs | 
|  | 1954 | available. | 
|  | 1955 |  | 
|  | 1956 | The @code{xgettext} program can be used to automatically extract the | 
|  | 1957 | translatable messages from a source file.  I.e., the programmer need not | 
|  | 1958 | take care of the translations and the list of messages which have to be | 
|  | 1959 | translated.  S/He will simply wrap the translatable string in calls to | 
|  | 1960 | @code{gettext} et.al and the rest will be done by @code{xgettext}.  This | 
|  | 1961 | program has a lot of options which help to customize the output or | 
|  | 1962 | help to understand the input better. | 
|  | 1963 |  | 
|  | 1964 | Other programs help to manage the development cycle when new messages appear | 
|  | 1965 | in the source files or when a new translation of the messages appears. | 
|  | 1966 | Here it should only be noted that using all the tools in GNU gettext it | 
|  | 1967 | is possible to @emph{completely} automate the handling of message | 
|  | 1968 | catalogs.  Beside marking the translatable strings in the source code and | 
|  | 1969 | generating the translations the developers do not have anything to do | 
|  | 1970 | themselves. |