blob: b03a14a57a29c313f574a72517fca946af671a1d [file] [log] [blame]
lh9ed821d2023-04-07 01:36:19 -07001@node Message Translation, Searching and Sorting, Locales, Top
2@c %MENU% How to make the program speak the user's language
3@chapter Message Translation
4
5The program's interface with the user should be designed to ease the user's
6task. One way to ease the user's task is to use messages in whatever
7language the user prefers.
8
9Printing messages in different languages can be implemented in different
10ways. One could add all the different languages in the source code and
11choose among the variants every time a message has to be printed. This is
12certainly not a good solution since extending the set of languages is
13cumbersome (the code must be changed) and the code itself can become
14really big with dozens of message sets.
15
16A better solution is to keep the message sets for each language
17in separate files which are loaded at runtime depending on the language
18selection of the user.
19
20@Theglibc{} provides two different sets of functions to support
21message translation. The problem is that neither of the interfaces is
22officially defined by the POSIX standard. The @code{catgets} family of
23functions is defined in the X/Open standard but this is derived from
24industry decisions and therefore not necessarily based on reasonable
25decisions.
26
27As mentioned above the message catalog handling provides easy
28extendibility by using external data files which contain the message
29translations. I.e., these files contain for each of the messages used
30in the program a translation for the appropriate language. So the tasks
31of the message handling functions are
32
33@itemize @bullet
34@item
35locate the external data file with the appropriate translations
36@item
37load the data and make it possible to address the messages
38@item
39map a given key to the translated message
40@end itemize
41
42The two approaches mainly differ in the implementation of this last
43step. Decisions made in the last step influence the rest of the design.
44
45@menu
46* Message catalogs a la X/Open:: The @code{catgets} family of functions.
47* The Uniforum approach:: The @code{gettext} family of functions.
48@end menu
49
50
51@node Message catalogs a la X/Open
52@section X/Open Message Catalog Handling
53
54The @code{catgets} functions are based on the simple scheme:
55
56@quotation
57Associate every message to translate in the source code with a unique
58identifier. To retrieve a message from a catalog file solely the
59identifier is used.
60@end quotation
61
62This means for the author of the program that s/he will have to make
63sure the meaning of the identifier in the program code and in the
64message catalogs are always the same.
65
66Before a message can be translated the catalog file must be located.
67The user of the program must be able to guide the responsible function
68to find whatever catalog the user wants. This is separated from what
69the programmer had in mind.
70
71All the types, constants and functions for the @code{catgets} functions
72are defined/declared in the @file{nl_types.h} header file.
73
74@menu
75* The catgets Functions:: The @code{catgets} function family.
76* The message catalog files:: Format of the message catalog files.
77* The gencat program:: How to generate message catalogs files which
78 can be used by the functions.
79* Common Usage:: How to use the @code{catgets} interface.
80@end menu
81
82
83@node The catgets Functions
84@subsection The @code{catgets} function family
85
86@comment nl_types.h
87@comment X/Open
88@deftypefun nl_catd catopen (const char *@var{cat_name}, int @var{flag})
89@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
90@c catopen @mtsenv @ascuheap @acsmem
91@c strchr ok
92@c setlocale(,NULL) ok
93@c getenv @mtsenv
94@c strlen ok
95@c alloca ok
96@c stpcpy ok
97@c malloc @ascuheap @acsmem
98@c __open_catalog @ascuheap @acsmem
99@c strchr ok
100@c open_not_cancel_2 @acsfd
101@c strlen ok
102@c ENOUGH ok
103@c alloca ok
104@c memcpy ok
105@c fxstat64 ok
106@c __set_errno ok
107@c mmap @acsmem
108@c malloc dup @ascuheap @acsmem
109@c read_not_cancel ok
110@c free dup @ascuheap @acsmem
111@c munmap ok
112@c close_not_cancel_no_status ok
113@c free @ascuheap @acsmem
114The @code{catopen} function tries to locate the message data file names
115@var{cat_name} and loads it when found. The return value is of an
116opaque type and can be used in calls to the other functions to refer to
117this loaded catalog.
118
119The return value is @code{(nl_catd) -1} in case the function failed and
120no catalog was loaded. The global variable @var{errno} contains a code
121for the error causing the failure. But even if the function call
122succeeded this does not mean that all messages can be translated.
123
124Locating the catalog file must happen in a way which lets the user of
125the program influence the decision. It is up to the user to decide
126about the language to use and sometimes it is useful to use alternate
127catalog files. All this can be specified by the user by setting some
128environment variables.
129
130The first problem is to find out where all the message catalogs are
131stored. Every program could have its own place to keep all the
132different files but usually the catalog files are grouped by languages
133and the catalogs for all programs are kept in the same place.
134
135@cindex NLSPATH environment variable
136To tell the @code{catopen} function where the catalog for the program
137can be found the user can set the environment variable @code{NLSPATH} to
138a value which describes her/his choice. Since this value must be usable
139for different languages and locales it cannot be a simple string.
140Instead it is a format string (similar to @code{printf}'s). An example
141is
142
143@smallexample
144/usr/share/locale/%L/%N:/usr/share/locale/%L/LC_MESSAGES/%N
145@end smallexample
146
147First one can see that more than one directory can be specified (with
148the usual syntax of separating them by colons). The next things to
149observe are the format string, @code{%L} and @code{%N} in this case.
150The @code{catopen} function knows about several of them and the
151replacement for all of them is of course different.
152
153@table @code
154@item %N
155This format element is substituted with the name of the catalog file.
156This is the value of the @var{cat_name} argument given to
157@code{catgets}.
158
159@item %L
160This format element is substituted with the name of the currently
161selected locale for translating messages. How this is determined is
162explained below.
163
164@item %l
165(This is the lowercase ell.) This format element is substituted with the
166language element of the locale name. The string describing the selected
167locale is expected to have the form
168@code{@var{lang}[_@var{terr}[.@var{codeset}]]} and this format uses the
169first part @var{lang}.
170
171@item %t
172This format element is substituted by the territory part @var{terr} of
173the name of the currently selected locale. See the explanation of the
174format above.
175
176@item %c
177This format element is substituted by the codeset part @var{codeset} of
178the name of the currently selected locale. See the explanation of the
179format above.
180
181@item %%
182Since @code{%} is used in a meta character there must be a way to
183express the @code{%} character in the result itself. Using @code{%%}
184does this just like it works for @code{printf}.
185@end table
186
187
188Using @code{NLSPATH} allows arbitrary directories to be searched for
189message catalogs while still allowing different languages to be used.
190If the @code{NLSPATH} environment variable is not set, the default value
191is
192
193@smallexample
194@var{prefix}/share/locale/%L/%N:@var{prefix}/share/locale/%L/LC_MESSAGES/%N
195@end smallexample
196
197@noindent
198where @var{prefix} is given to @code{configure} while installing @theglibc{}
199(this value is in many cases @code{/usr} or the empty string).
200
201The remaining problem is to decide which must be used. The value
202decides about the substitution of the format elements mentioned above.
203First of all the user can specify a path in the message catalog name
204(i.e., the name contains a slash character). In this situation the
205@code{NLSPATH} environment variable is not used. The catalog must exist
206as specified in the program, perhaps relative to the current working
207directory. This situation in not desirable and catalogs names never
208should be written this way. Beside this, this behavior is not portable
209to all other platforms providing the @code{catgets} interface.
210
211@cindex LC_ALL environment variable
212@cindex LC_MESSAGES environment variable
213@cindex LANG environment variable
214Otherwise the values of environment variables from the standard
215environment are examined (@pxref{Standard Environment}). Which
216variables are examined is decided by the @var{flag} parameter of
217@code{catopen}. If the value is @code{NL_CAT_LOCALE} (which is defined
218in @file{nl_types.h}) then the @code{catopen} function use the name of
219the locale currently selected for the @code{LC_MESSAGES} category.
220
221If @var{flag} is zero the @code{LANG} environment variable is examined.
222This is a left-over from the early days where the concept of the locales
223had not even reached the level of POSIX locales.
224
225The environment variable and the locale name should have a value of the
226form @code{@var{lang}[_@var{terr}[.@var{codeset}]]} as explained above.
227If no environment variable is set the @code{"C"} locale is used which
228prevents any translation.
229
230The return value of the function is in any case a valid string. Either
231it is a translation from a message catalog or it is the same as the
232@var{string} parameter. So a piece of code to decide whether a
233translation actually happened must look like this:
234
235@smallexample
236@{
237 char *trans = catgets (desc, set, msg, input_string);
238 if (trans == input_string)
239 @{
240 /* Something went wrong. */
241 @}
242@}
243@end smallexample
244
245@noindent
246When an error occurred the global variable @var{errno} is set to
247
248@table @var
249@item EBADF
250The catalog does not exist.
251@item ENOMSG
252The set/message tuple does not name an existing element in the
253message catalog.
254@end table
255
256While it sometimes can be useful to test for errors programs normally
257will avoid any test. If the translation is not available it is no big
258problem if the original, untranslated message is printed. Either the
259user understands this as well or s/he will look for the reason why the
260messages are not translated.
261@end deftypefun
262
263Please note that the currently selected locale does not depend on a call
264to the @code{setlocale} function. It is not necessary that the locale
265data files for this locale exist and calling @code{setlocale} succeeds.
266The @code{catopen} function directly reads the values of the environment
267variables.
268
269
270@deftypefun {char *} catgets (nl_catd @var{catalog_desc}, int @var{set}, int @var{message}, const char *@var{string})
271@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
272The function @code{catgets} has to be used to access the massage catalog
273previously opened using the @code{catopen} function. The
274@var{catalog_desc} parameter must be a value previously returned by
275@code{catopen}.
276
277The next two parameters, @var{set} and @var{message}, reflect the
278internal organization of the message catalog files. This will be
279explained in detail below. For now it is interesting to know that a
280catalog can consists of several set and the messages in each thread are
281individually numbered using numbers. Neither the set number nor the
282message number must be consecutive. They can be arbitrarily chosen.
283But each message (unless equal to another one) must have its own unique
284pair of set and message number.
285
286Since it is not guaranteed that the message catalog for the language
287selected by the user exists the last parameter @var{string} helps to
288handle this case gracefully. If no matching string can be found
289@var{string} is returned. This means for the programmer that
290
291@itemize @bullet
292@item
293the @var{string} parameters should contain reasonable text (this also
294helps to understand the program seems otherwise there would be no hint
295on the string which is expected to be returned.
296@item
297all @var{string} arguments should be written in the same language.
298@end itemize
299@end deftypefun
300
301It is somewhat uncomfortable to write a program using the @code{catgets}
302functions if no supporting functionality is available. Since each
303set/message number tuple must be unique the programmer must keep lists
304of the messages at the same time the code is written. And the work
305between several people working on the same project must be coordinated.
306We will see some how these problems can be relaxed a bit (@pxref{Common
307Usage}).
308
309@deftypefun int catclose (nl_catd @var{catalog_desc})
310@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}}
311@c catclose @ascuheap @acucorrupt @acsmem
312@c __set_errno ok
313@c munmap ok
314@c free @ascuheap @acsmem
315The @code{catclose} function can be used to free the resources
316associated with a message catalog which previously was opened by a call
317to @code{catopen}. If the resources can be successfully freed the
318function returns @code{0}. Otherwise it return @code{@minus{}1} and the
319global variable @var{errno} is set. Errors can occur if the catalog
320descriptor @var{catalog_desc} is not valid in which case @var{errno} is
321set to @code{EBADF}.
322@end deftypefun
323
324
325@node The message catalog files
326@subsection Format of the message catalog files
327
328The only reasonable way the translate all the messages of a function and
329store the result in a message catalog file which can be read by the
330@code{catopen} function is to write all the message text to the
331translator and let her/him translate them all. I.e., we must have a
332file with entries which associate the set/message tuple with a specific
333translation. This file format is specified in the X/Open standard and
334is as follows:
335
336@itemize @bullet
337@item
338Lines containing only whitespace characters or empty lines are ignored.
339
340@item
341Lines which contain as the first non-whitespace character a @code{$}
342followed by a whitespace character are comment and are also ignored.
343
344@item
345If a line contains as the first non-whitespace characters the sequence
346@code{$set} followed by a whitespace character an additional argument
347is required to follow. This argument can either be:
348
349@itemize @minus
350@item
351a number. In this case the value of this number determines the set
352to which the following messages are added.
353
354@item
355an identifier consisting of alphanumeric characters plus the underscore
356character. In this case the set get automatically a number assigned.
357This value is one added to the largest set number which so far appeared.
358
359How to use the symbolic names is explained in section @ref{Common Usage}.
360
361It is an error if a symbol name appears more than once. All following
362messages are placed in a set with this number.
363@end itemize
364
365@item
366If a line contains as the first non-whitespace characters the sequence
367@code{$delset} followed by a whitespace character an additional argument
368is required to follow. This argument can either be:
369
370@itemize @minus
371@item
372a number. In this case the value of this number determines the set
373which will be deleted.
374
375@item
376an identifier consisting of alphanumeric characters plus the underscore
377character. This symbolic identifier must match a name for a set which
378previously was defined. It is an error if the name is unknown.
379@end itemize
380
381In both cases all messages in the specified set will be removed. They
382will not appear in the output. But if this set is later again selected
383with a @code{$set} command again messages could be added and these
384messages will appear in the output.
385
386@item
387If a line contains after leading whitespaces the sequence
388@code{$quote}, the quoting character used for this input file is
389changed to the first non-whitespace character following the
390@code{$quote}. If no non-whitespace character is present before the
391line ends quoting is disable.
392
393By default no quoting character is used. In this mode strings are
394terminated with the first unescaped line break. If there is a
395@code{$quote} sequence present newline need not be escaped. Instead a
396string is terminated with the first unescaped appearance of the quote
397character.
398
399A common usage of this feature would be to set the quote character to
400@code{"}. Then any appearance of the @code{"} in the strings must
401be escaped using the backslash (i.e., @code{\"} must be written).
402
403@item
404Any other line must start with a number or an alphanumeric identifier
405(with the underscore character included). The following characters
406(starting after the first whitespace character) will form the string
407which gets associated with the currently selected set and the message
408number represented by the number and identifier respectively.
409
410If the start of the line is a number the message number is obvious. It
411is an error if the same message number already appeared for this set.
412
413If the leading token was an identifier the message number gets
414automatically assigned. The value is the current maximum messages
415number for this set plus one. It is an error if the identifier was
416already used for a message in this set. It is OK to reuse the
417identifier for a message in another thread. How to use the symbolic
418identifiers will be explained below (@pxref{Common Usage}). There is
419one limitation with the identifier: it must not be @code{Set}. The
420reason will be explained below.
421
422The text of the messages can contain escape characters. The usual bunch
423of characters known from the @w{ISO C} language are recognized
424(@code{\n}, @code{\t}, @code{\v}, @code{\b}, @code{\r}, @code{\f},
425@code{\\}, and @code{\@var{nnn}}, where @var{nnn} is the octal coding of
426a character code).
427@end itemize
428
429@strong{Important:} The handling of identifiers instead of numbers for
430the set and messages is a GNU extension. Systems strictly following the
431X/Open specification do not have this feature. An example for a message
432catalog file is this:
433
434@smallexample
435$ This is a leading comment.
436$quote "
437
438$set SetOne
4391 Message with ID 1.
440two " Message with ID \"two\", which gets the value 2 assigned"
441
442$set SetTwo
443$ Since the last set got the number 1 assigned this set has number 2.
4444000 "The numbers can be arbitrary, they need not start at one."
445@end smallexample
446
447This small example shows various aspects:
448@itemize @bullet
449@item
450Lines 1 and 9 are comments since they start with @code{$} followed by
451a whitespace.
452@item
453The quoting character is set to @code{"}. Otherwise the quotes in the
454message definition would have to be left away and in this case the
455message with the identifier @code{two} would loose its leading whitespace.
456@item
457Mixing numbered messages with message having symbolic names is no
458problem and the numbering happens automatically.
459@end itemize
460
461
462While this file format is pretty easy it is not the best possible for
463use in a running program. The @code{catopen} function would have to
464parser the file and handle syntactic errors gracefully. This is not so
465easy and the whole process is pretty slow. Therefore the @code{catgets}
466functions expect the data in another more compact and ready-to-use file
467format. There is a special program @code{gencat} which is explained in
468detail in the next section.
469
470Files in this other format are not human readable. To be easy to use by
471programs it is a binary file. But the format is byte order independent
472so translation files can be shared by systems of arbitrary architecture
473(as long as they use @theglibc{}).
474
475Details about the binary file format are not important to know since
476these files are always created by the @code{gencat} program. The
477sources of @theglibc{} also provide the sources for the
478@code{gencat} program and so the interested reader can look through
479these source files to learn about the file format.
480
481
482@node The gencat program
483@subsection Generate Message Catalogs files
484
485@cindex gencat
486The @code{gencat} program is specified in the X/Open standard and the
487GNU implementation follows this specification and so processes
488all correctly formed input files. Additionally some extension are
489implemented which help to work in a more reasonable way with the
490@code{catgets} functions.
491
492The @code{gencat} program can be invoked in two ways:
493
494@example
495`gencat [@var{Option}]@dots{} [@var{Output-File} [@var{Input-File}]@dots{}]`
496@end example
497
498This is the interface defined in the X/Open standard. If no
499@var{Input-File} parameter is given input will be read from standard
500input. Multiple input files will be read as if they are concatenated.
501If @var{Output-File} is also missing, the output will be written to
502standard output. To provide the interface one is used to from other
503programs a second interface is provided.
504
505@smallexample
506`gencat [@var{Option}]@dots{} -o @var{Output-File} [@var{Input-File}]@dots{}`
507@end smallexample
508
509The option @samp{-o} is used to specify the output file and all file
510arguments are used as input files.
511
512Beside this one can use @file{-} or @file{/dev/stdin} for
513@var{Input-File} to denote the standard input. Corresponding one can
514use @file{-} and @file{/dev/stdout} for @var{Output-File} to denote
515standard output. Using @file{-} as a file name is allowed in X/Open
516while using the device names is a GNU extension.
517
518The @code{gencat} program works by concatenating all input files and
519then @strong{merge} the resulting collection of message sets with a
520possibly existing output file. This is done by removing all messages
521with set/message number tuples matching any of the generated messages
522from the output file and then adding all the new messages. To
523regenerate a catalog file while ignoring the old contents therefore
524requires to remove the output file if it exists. If the output is
525written to standard output no merging takes place.
526
527@noindent
528The following table shows the options understood by the @code{gencat}
529program. The X/Open standard does not specify any option for the
530program so all of these are GNU extensions.
531
532@table @samp
533@item -V
534@itemx --version
535Print the version information and exit.
536@item -h
537@itemx --help
538Print a usage message listing all available options, then exit successfully.
539@item --new
540Do never merge the new messages from the input files with the old content
541of the output files. The old content of the output file is discarded.
542@item -H
543@itemx --header=name
544This option is used to emit the symbolic names given to sets and
545messages in the input files for use in the program. Details about how
546to use this are given in the next section. The @var{name} parameter to
547this option specifies the name of the output file. It will contain a
548number of C preprocessor @code{#define}s to associate a name with a
549number.
550
551Please note that the generated file only contains the symbols from the
552input files. If the output is merged with the previous content of the
553output file the possibly existing symbols from the file(s) which
554generated the old output files are not in the generated header file.
555@end table
556
557
558@node Common Usage
559@subsection How to use the @code{catgets} interface
560
561The @code{catgets} functions can be used in two different ways. By
562following slavishly the X/Open specs and not relying on the extension
563and by using the GNU extensions. We will take a look at the former
564method first to understand the benefits of extensions.
565
566@subsubsection Not using symbolic names
567
568Since the X/Open format of the message catalog files does not allow
569symbol names we have to work with numbers all the time. When we start
570writing a program we have to replace all appearances of translatable
571strings with something like
572
573@smallexample
574catgets (catdesc, set, msg, "string")
575@end smallexample
576
577@noindent
578@var{catgets} is retrieved from a call to @code{catopen} which is
579normally done once at the program start. The @code{"string"} is the
580string we want to translate. The problems start with the set and
581message numbers.
582
583In a bigger program several programmers usually work at the same time on
584the program and so coordinating the number allocation is crucial.
585Though no two different strings must be indexed by the same tuple of
586numbers it is highly desirable to reuse the numbers for equal strings
587with equal translations (please note that there might be strings which
588are equal in one language but have different translations due to
589difference contexts).
590
591The allocation process can be relaxed a bit by different set numbers for
592different parts of the program. So the number of developers who have to
593coordinate the allocation can be reduced. But still lists must be keep
594track of the allocation and errors can easily happen. These errors
595cannot be discovered by the compiler or the @code{catgets} functions.
596Only the user of the program might see wrong messages printed. In the
597worst cases the messages are so irritating that they cannot be
598recognized as wrong. Think about the translations for @code{"true"} and
599@code{"false"} being exchanged. This could result in a disaster.
600
601
602@subsubsection Using symbolic names
603
604The problems mentioned in the last section derive from the fact that:
605
606@enumerate
607@item
608the numbers are allocated once and due to the possibly frequent use of
609them it is difficult to change a number later.
610@item
611the numbers do not allow to guess anything about the string and
612therefore collisions can easily happen.
613@end enumerate
614
615By constantly using symbolic names and by providing a method which maps
616the string content to a symbolic name (however this will happen) one can
617prevent both problems above. The cost of this is that the programmer
618has to write a complete message catalog file while s/he is writing the
619program itself.
620
621This is necessary since the symbolic names must be mapped to numbers
622before the program sources can be compiled. In the last section it was
623described how to generate a header containing the mapping of the names.
624E.g., for the example message file given in the last section we could
625call the @code{gencat} program as follow (assume @file{ex.msg} contains
626the sources).
627
628@smallexample
629gencat -H ex.h -o ex.cat ex.msg
630@end smallexample
631
632@noindent
633This generates a header file with the following content:
634
635@smallexample
636#define SetTwoSet 0x2 /* ex.msg:8 */
637
638#define SetOneSet 0x1 /* ex.msg:4 */
639#define SetOnetwo 0x2 /* ex.msg:6 */
640@end smallexample
641
642As can be seen the various symbols given in the source file are mangled
643to generate unique identifiers and these identifiers get numbers
644assigned. Reading the source file and knowing about the rules will
645allow to predict the content of the header file (it is deterministic)
646but this is not necessary. The @code{gencat} program can take care for
647everything. All the programmer has to do is to put the generated header
648file in the dependency list of the source files of her/his project and
649to add a rules to regenerate the header of any of the input files
650change.
651
652One word about the symbol mangling. Every symbol consists of two parts:
653the name of the message set plus the name of the message or the special
654string @code{Set}. So @code{SetOnetwo} means this macro can be used to
655access the translation with identifier @code{two} in the message set
656@code{SetOne}.
657
658The other names denote the names of the message sets. The special
659string @code{Set} is used in the place of the message identifier.
660
661If in the code the second string of the set @code{SetOne} is used the C
662code should look like this:
663
664@smallexample
665catgets (catdesc, SetOneSet, SetOnetwo,
666 " Message with ID \"two\", which gets the value 2 assigned")
667@end smallexample
668
669Writing the function this way will allow to change the message number
670and even the set number without requiring any change in the C source
671code. (The text of the string is normally not the same; this is only
672for this example.)
673
674
675@subsubsection How does to this allow to develop
676
677To illustrate the usual way to work with the symbolic version numbers
678here is a little example. Assume we want to write the very complex and
679famous greeting program. We start by writing the code as usual:
680
681@smallexample
682#include <stdio.h>
683int
684main (void)
685@{
686 printf ("Hello, world!\n");
687 return 0;
688@}
689@end smallexample
690
691Now we want to internationalize the message and therefore replace the
692message with whatever the user wants.
693
694@smallexample
695#include <nl_types.h>
696#include <stdio.h>
697#include "msgnrs.h"
698int
699main (void)
700@{
701 nl_catd catdesc = catopen ("hello.cat", NL_CAT_LOCALE);
702 printf (catgets (catdesc, SetMainSet, SetMainHello,
703 "Hello, world!\n"));
704 catclose (catdesc);
705 return 0;
706@}
707@end smallexample
708
709We see how the catalog object is opened and the returned descriptor used
710in the other function calls. It is not really necessary to check for
711failure of any of the functions since even in these situations the
712functions will behave reasonable. They simply will be return a
713translation.
714
715What remains unspecified here are the constants @code{SetMainSet} and
716@code{SetMainHello}. These are the symbolic names describing the
717message. To get the actual definitions which match the information in
718the catalog file we have to create the message catalog source file and
719process it using the @code{gencat} program.
720
721@smallexample
722$ Messages for the famous greeting program.
723$quote "
724
725$set Main
726Hello "Hallo, Welt!\n"
727@end smallexample
728
729Now we can start building the program (assume the message catalog source
730file is named @file{hello.msg} and the program source file @file{hello.c}):
731
732@smallexample
733% gencat -H msgnrs.h -o hello.cat hello.msg
734% cat msgnrs.h
735#define MainSet 0x1 /* hello.msg:4 */
736#define MainHello 0x1 /* hello.msg:5 */
737% gcc -o hello hello.c -I.
738% cp hello.cat /usr/share/locale/de/LC_MESSAGES
739% echo $LC_ALL
740de
741% ./hello
742Hallo, Welt!
743%
744@end smallexample
745
746The call of the @code{gencat} program creates the missing header file
747@file{msgnrs.h} as well as the message catalog binary. The former is
748used in the compilation of @file{hello.c} while the later is placed in a
749directory in which the @code{catopen} function will try to locate it.
750Please check the @code{LC_ALL} environment variable and the default path
751for @code{catopen} presented in the description above.
752
753
754@node The Uniforum approach
755@section The Uniforum approach to Message Translation
756
757Sun Microsystems tried to standardize a different approach to message
758translation in the Uniforum group. There never was a real standard
759defined but still the interface was used in Sun's operating systems.
760Since this approach fits better in the development process of free
761software it is also used throughout the GNU project and the GNU
762@file{gettext} package provides support for this outside @theglibc{}.
763
764The code of the @file{libintl} from GNU @file{gettext} is the same as
765the code in @theglibc{}. So the documentation in the GNU
766@file{gettext} manual is also valid for the functionality here. The
767following text will describe the library functions in detail. But the
768numerous helper programs are not described in this manual. Instead
769people should read the GNU @file{gettext} manual
770(@pxref{Top,,GNU gettext utilities,gettext,Native Language Support Library and Tools}).
771We will only give a short overview.
772
773Though the @code{catgets} functions are available by default on more
774systems the @code{gettext} interface is at least as portable as the
775former. The GNU @file{gettext} package can be used wherever the
776functions are not available.
777
778
779@menu
780* Message catalogs with gettext:: The @code{gettext} family of functions.
781* Helper programs for gettext:: Programs to handle message catalogs
782 for @code{gettext}.
783@end menu
784
785
786@node Message catalogs with gettext
787@subsection The @code{gettext} family of functions
788
789The paradigms underlying the @code{gettext} approach to message
790translations is different from that of the @code{catgets} functions the
791basic functionally is equivalent. There are functions of the following
792categories:
793
794@menu
795* Translation with gettext:: What has to be done to translate a message.
796* Locating gettext catalog:: How to determine which catalog to be used.
797* Advanced gettext functions:: Additional functions for more complicated
798 situations.
799* Charset conversion in gettext:: How to specify the output character set
800 @code{gettext} uses.
801* GUI program problems:: How to use @code{gettext} in GUI programs.
802* Using gettextized software:: The possibilities of the user to influence
803 the way @code{gettext} works.
804@end menu
805
806@node Translation with gettext
807@subsubsection What has to be done to translate a message?
808
809The @code{gettext} functions have a very simple interface. The most
810basic function just takes the string which shall be translated as the
811argument and it returns the translation. This is fundamentally
812different from the @code{catgets} approach where an extra key is
813necessary and the original string is only used for the error case.
814
815If the string which has to be translated is the only argument this of
816course means the string itself is the key. I.e., the translation will
817be selected based on the original string. The message catalogs must
818therefore contain the original strings plus one translation for any such
819string. The task of the @code{gettext} function is it to compare the
820argument string with the available strings in the catalog and return the
821appropriate translation. Of course this process is optimized so that
822this process is not more expensive than an access using an atomic key
823like in @code{catgets}.
824
825The @code{gettext} approach has some advantages but also some
826disadvantages. Please see the GNU @file{gettext} manual for a detailed
827discussion of the pros and cons.
828
829All the definitions and declarations for @code{gettext} can be found in
830the @file{libintl.h} header file. On systems where these functions are
831not part of the C library they can be found in a separate library named
832@file{libintl.a} (or accordingly different for shared libraries).
833
834@comment libintl.h
835@comment GNU
836@deftypefun {char *} gettext (const char *@var{msgid})
837@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
838@c Wrapper for dcgettext.
839The @code{gettext} function searches the currently selected message
840catalogs for a string which is equal to @var{msgid}. If there is such a
841string available it is returned. Otherwise the argument string
842@var{msgid} is returned.
843
844Please note that although the return value is @code{char *} the
845returned string must not be changed. This broken type results from the
846history of the function and does not reflect the way the function should
847be used.
848
849Please note that above we wrote ``message catalogs'' (plural). This is
850a specialty of the GNU implementation of these functions and we will
851say more about this when we talk about the ways message catalogs are
852selected (@pxref{Locating gettext catalog}).
853
854The @code{gettext} function does not modify the value of the global
855@var{errno} variable. This is necessary to make it possible to write
856something like
857
858@smallexample
859 printf (gettext ("Operation failed: %m\n"));
860@end smallexample
861
862Here the @var{errno} value is used in the @code{printf} function while
863processing the @code{%m} format element and if the @code{gettext}
864function would change this value (it is called before @code{printf} is
865called) we would get a wrong message.
866
867So there is no easy way to detect a missing message catalog beside
868comparing the argument string with the result. But it is normally the
869task of the user to react on missing catalogs. The program cannot guess
870when a message catalog is really necessary since for a user who speaks
871the language the program was developed in does not need any translation.
872@end deftypefun
873
874The remaining two functions to access the message catalog add some
875functionality to select a message catalog which is not the default one.
876This is important if parts of the program are developed independently.
877Every part can have its own message catalog and all of them can be used
878at the same time. The C library itself is an example: internally it
879uses the @code{gettext} functions but since it must not depend on a
880currently selected default message catalog it must specify all ambiguous
881information.
882
883@comment libintl.h
884@comment GNU
885@deftypefun {char *} dgettext (const char *@var{domainname}, const char *@var{msgid})
886@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
887@c Wrapper for dcgettext.
888The @code{dgettext} functions acts just like the @code{gettext}
889function. It only takes an additional first argument @var{domainname}
890which guides the selection of the message catalogs which are searched
891for the translation. If the @var{domainname} parameter is the null
892pointer the @code{dgettext} function is exactly equivalent to
893@code{gettext} since the default value for the domain name is used.
894
895As for @code{gettext} the return value type is @code{char *} which is an
896anachronism. The returned string must never be modified.
897@end deftypefun
898
899@comment libintl.h
900@comment GNU
901@deftypefun {char *} dcgettext (const char *@var{domainname}, const char *@var{msgid}, int @var{category})
902@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
903@c dcgettext @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
904@c dcigettext @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
905@c libc_rwlock_rdlock @asulock @aculock
906@c current_locale_name ok [protected from @mtslocale]
907@c tfind ok
908@c libc_rwlock_unlock ok
909@c plural_lookup ok
910@c plural_eval ok
911@c rawmemchr ok
912@c DETERMINE_SECURE ok, nothing
913@c strcmp ok
914@c strlen ok
915@c getcwd @ascuheap @acsmem @acsfd
916@c strchr ok
917@c stpcpy ok
918@c category_to_name ok
919@c guess_category_value @mtsenv
920@c getenv @mtsenv
921@c current_locale_name dup ok [protected from @mtslocale by dcigettext]
922@c strcmp ok
923@c ENABLE_SECURE ok
924@c _nl_find_domain @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
925@c libc_rwlock_rdlock dup @asulock @aculock
926@c _nl_make_l10nflist dup @ascuheap @acsmem
927@c libc_rwlock_unlock dup ok
928@c _nl_load_domain @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
929@c libc_lock_lock_recursive @aculock
930@c libc_lock_unlock_recursive @aculock
931@c open->open_not_cancel_2 @acsfd
932@c fstat ok
933@c mmap dup @acsmem
934@c close->close_not_cancel_no_status @acsfd
935@c malloc dup @ascuheap @acsmem
936@c read->read_not_cancel ok
937@c munmap dup @acsmem
938@c W dup ok
939@c strlen dup ok
940@c get_sysdep_segment_value ok
941@c memcpy dup ok
942@c hash_string dup ok
943@c free dup @ascuheap @acsmem
944@c libc_rwlock_init ok
945@c _nl_find_msg dup @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
946@c libc_rwlock_fini ok
947@c EXTRACT_PLURAL_EXPRESSION @ascuheap @acsmem
948@c strstr dup ok
949@c isspace ok
950@c strtoul ok
951@c PLURAL_PARSE @ascuheap @acsmem
952@c malloc dup @ascuheap @acsmem
953@c free dup @ascuheap @acsmem
954@c INIT_GERMANIC_PLURAL ok, nothing
955@c the pre-C99 variant is @acucorrupt [protected from @mtuinit by dcigettext]
956@c _nl_expand_alias dup @ascuheap @asulock @acsmem @acsfd @aculock
957@c _nl_explode_name dup @ascuheap @acsmem
958@c libc_rwlock_wrlock dup @asulock @aculock
959@c free dup @asulock @aculock @acsfd @acsmem
960@c _nl_find_msg @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
961@c _nl_load_domain dup @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
962@c strlen ok
963@c hash_string ok
964@c W ok
965@c SWAP ok
966@c bswap_32 ok
967@c strcmp ok
968@c get_output_charset @mtsenv @ascuheap @acsmem
969@c getenv dup @mtsenv
970@c strlen dup ok
971@c malloc dup @ascuheap @acsmem
972@c memcpy dup ok
973@c libc_rwlock_rdlock dup @asulock @aculock
974@c libc_rwlock_unlock dup ok
975@c libc_rwlock_wrlock dup @asulock @aculock
976@c realloc @ascuheap @acsmem
977@c strdup @ascuheap @acsmem
978@c strstr ok
979@c strcspn ok
980@c mempcpy dup ok
981@c norm_add_slashes dup ok
982@c gconv_open @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd
983@c [protected from @mtslocale by dcigettext locale lock]
984@c free dup @ascuheap @acsmem
985@c libc_lock_lock @asulock @aculock
986@c calloc @ascuheap @acsmem
987@c gconv dup @acucorrupt [protected from @mtsrace and @asucorrupt by lock]
988@c libc_lock_unlock ok
989@c malloc @ascuheap @acsmem
990@c mempcpy ok
991@c memcpy ok
992@c strcpy ok
993@c libc_rwlock_wrlock @asulock @aculock
994@c tsearch @ascuheap @acucorrupt @acsmem [protected from @mtsrace and @asucorrupt]
995@c transcmp ok
996@c strmp dup ok
997@c free @ascuheap @acsmem
998The @code{dcgettext} adds another argument to those which
999@code{dgettext} takes. This argument @var{category} specifies the last
1000piece of information needed to localize the message catalog. I.e., the
1001domain name and the locale category exactly specify which message
1002catalog has to be used (relative to a given directory, see below).
1003
1004The @code{dgettext} function can be expressed in terms of
1005@code{dcgettext} by using
1006
1007@smallexample
1008dcgettext (domain, string, LC_MESSAGES)
1009@end smallexample
1010
1011@noindent
1012instead of
1013
1014@smallexample
1015dgettext (domain, string)
1016@end smallexample
1017
1018This also shows which values are expected for the third parameter. One
1019has to use the available selectors for the categories available in
1020@file{locale.h}. Normally the available values are @code{LC_CTYPE},
1021@code{LC_COLLATE}, @code{LC_MESSAGES}, @code{LC_MONETARY},
1022@code{LC_NUMERIC}, and @code{LC_TIME}. Please note that @code{LC_ALL}
1023must not be used and even though the names might suggest this, there is
1024no relation to the environments variables of this name.
1025
1026The @code{dcgettext} function is only implemented for compatibility with
1027other systems which have @code{gettext} functions. There is not really
1028any situation where it is necessary (or useful) to use a different value
1029but @code{LC_MESSAGES} in for the @var{category} parameter. We are
1030dealing with messages here and any other choice can only be irritating.
1031
1032As for @code{gettext} the return value type is @code{char *} which is an
1033anachronism. The returned string must never be modified.
1034@end deftypefun
1035
1036When using the three functions above in a program it is a frequent case
1037that the @var{msgid} argument is a constant string. So it is worth to
1038optimize this case. Thinking shortly about this one will realize that
1039as long as no new message catalog is loaded the translation of a message
1040will not change. This optimization is actually implemented by the
1041@code{gettext}, @code{dgettext} and @code{dcgettext} functions.
1042
1043
1044@node Locating gettext catalog
1045@subsubsection How to determine which catalog to be used
1046
1047The functions to retrieve the translations for a given message have a
1048remarkable simple interface. But to provide the user of the program
1049still the opportunity to select exactly the translation s/he wants and
1050also to provide the programmer the possibility to influence the way to
1051locate the search for catalogs files there is a quite complicated
1052underlying mechanism which controls all this. The code is complicated
1053the use is easy.
1054
1055Basically we have two different tasks to perform which can also be
1056performed by the @code{catgets} functions:
1057
1058@enumerate
1059@item
1060Locate the set of message catalogs. There are a number of files for
1061different languages and which all belong to the package. Usually they
1062are all stored in the filesystem below a certain directory.
1063
1064There can be arbitrary many packages installed and they can follow
1065different guidelines for the placement of their files.
1066
1067@item
1068Relative to the location specified by the package the actual translation
1069files must be searched, based on the wishes of the user. I.e., for each
1070language the user selects the program should be able to locate the
1071appropriate file.
1072@end enumerate
1073
1074This is the functionality required by the specifications for
1075@code{gettext} and this is also what the @code{catgets} functions are
1076able to do. But there are some problems unresolved:
1077
1078@itemize @bullet
1079@item
1080The language to be used can be specified in several different ways.
1081There is no generally accepted standard for this and the user always
1082expects the program understand what s/he means. E.g., to select the
1083German translation one could write @code{de}, @code{german}, or
1084@code{deutsch} and the program should always react the same.
1085
1086@item
1087Sometimes the specification of the user is too detailed. If s/he, e.g.,
1088specifies @code{de_DE.ISO-8859-1} which means German, spoken in Germany,
1089coded using the @w{ISO 8859-1} character set there is the possibility
1090that a message catalog matching this exactly is not available. But
1091there could be a catalog matching @code{de} and if the character set
1092used on the machine is always @w{ISO 8859-1} there is no reason why this
1093later message catalog should not be used. (We call this @dfn{message
1094inheritance}.)
1095
1096@item
1097If a catalog for a wanted language is not available it is not always the
1098second best choice to fall back on the language of the developer and
1099simply not translate any message. Instead a user might be better able
1100to read the messages in another language and so the user of the program
1101should be able to define a precedence order of languages.
1102@end itemize
1103
1104We can divide the configuration actions in two parts: the one is
1105performed by the programmer, the other by the user. We will start with
1106the functions the programmer can use since the user configuration will
1107be based on this.
1108
1109As the functions described in the last sections already mention separate
1110sets of messages can be selected by a @dfn{domain name}. This is a
1111simple string which should be unique for each program part with uses a
1112separate domain. It is possible to use in one program arbitrary many
1113domains at the same time. E.g., @theglibc{} itself uses a domain
1114named @code{libc} while the program using the C Library could use a
1115domain named @code{foo}. The important point is that at any time
1116exactly one domain is active. This is controlled with the following
1117function.
1118
1119@comment libintl.h
1120@comment GNU
1121@deftypefun {char *} textdomain (const char *@var{domainname})
1122@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}}
1123@c textdomain @asulock @ascuheap @aculock @acsmem
1124@c libc_rwlock_wrlock @asulock @aculock
1125@c strcmp ok
1126@c strdup @ascuheap @acsmem
1127@c free @ascuheap @acsmem
1128@c libc_rwlock_unlock ok
1129The @code{textdomain} function sets the default domain, which is used in
1130all future @code{gettext} calls, to @var{domainname}. Please note that
1131@code{dgettext} and @code{dcgettext} calls are not influenced if the
1132@var{domainname} parameter of these functions is not the null pointer.
1133
1134Before the first call to @code{textdomain} the default domain is
1135@code{messages}. This is the name specified in the specification of
1136the @code{gettext} API. This name is as good as any other name. No
1137program should ever really use a domain with this name since this can
1138only lead to problems.
1139
1140The function returns the value which is from now on taken as the default
1141domain. If the system went out of memory the returned value is
1142@code{NULL} and the global variable @var{errno} is set to @code{ENOMEM}.
1143Despite the return value type being @code{char *} the return string must
1144not be changed. It is allocated internally by the @code{textdomain}
1145function.
1146
1147If the @var{domainname} parameter is the null pointer no new default
1148domain is set. Instead the currently selected default domain is
1149returned.
1150
1151If the @var{domainname} parameter is the empty string the default domain
1152is reset to its initial value, the domain with the name @code{messages}.
1153This possibility is questionable to use since the domain @code{messages}
1154really never should be used.
1155@end deftypefun
1156
1157@comment libintl.h
1158@comment GNU
1159@deftypefun {char *} bindtextdomain (const char *@var{domainname}, const char *@var{dirname})
1160@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
1161@c bindtextdomain @ascuheap @acsmem
1162@c set_binding_values @ascuheap @acsmem
1163@c libc_rwlock_wrlock dup @asulock @aculock
1164@c strcmp dup ok
1165@c strdup dup @ascuheap @acsmem
1166@c free dup @ascuheap @acsmem
1167@c malloc dup @ascuheap @acsmem
1168The @code{bindtextdomain} function can be used to specify the directory
1169which contains the message catalogs for domain @var{domainname} for the
1170different languages. To be correct, this is the directory where the
1171hierarchy of directories is expected. Details are explained below.
1172
1173For the programmer it is important to note that the translations which
1174come with the program have be placed in a directory hierarchy starting
1175at, say, @file{/foo/bar}. Then the program should make a
1176@code{bindtextdomain} call to bind the domain for the current program to
1177this directory. So it is made sure the catalogs are found. A correctly
1178running program does not depend on the user setting an environment
1179variable.
1180
1181The @code{bindtextdomain} function can be used several times and if the
1182@var{domainname} argument is different the previously bound domains
1183will not be overwritten.
1184
1185If the program which wish to use @code{bindtextdomain} at some point of
1186time use the @code{chdir} function to change the current working
1187directory it is important that the @var{dirname} strings ought to be an
1188absolute pathname. Otherwise the addressed directory might vary with
1189the time.
1190
1191If the @var{dirname} parameter is the null pointer @code{bindtextdomain}
1192returns the currently selected directory for the domain with the name
1193@var{domainname}.
1194
1195The @code{bindtextdomain} function returns a pointer to a string
1196containing the name of the selected directory name. The string is
1197allocated internally in the function and must not be changed by the
1198user. If the system went out of core during the execution of
1199@code{bindtextdomain} the return value is @code{NULL} and the global
1200variable @var{errno} is set accordingly.
1201@end deftypefun
1202
1203
1204@node Advanced gettext functions
1205@subsubsection Additional functions for more complicated situations
1206
1207The functions of the @code{gettext} family described so far (and all the
1208@code{catgets} functions as well) have one problem in the real world
1209which have been neglected completely in all existing approaches. What
1210is meant here is the handling of plural forms.
1211
1212Looking through Unix source code before the time anybody thought about
1213internationalization (and, sadly, even afterwards) one can often find
1214code similar to the following:
1215
1216@smallexample
1217 printf ("%d file%s deleted", n, n == 1 ? "" : "s");
1218@end smallexample
1219
1220@noindent
1221After the first complaints from people internationalizing the code people
1222either completely avoided formulations like this or used strings like
1223@code{"file(s)"}. Both look unnatural and should be avoided. First
1224tries to solve the problem correctly looked like this:
1225
1226@smallexample
1227 if (n == 1)
1228 printf ("%d file deleted", n);
1229 else
1230 printf ("%d files deleted", n);
1231@end smallexample
1232
1233But this does not solve the problem. It helps languages where the
1234plural form of a noun is not simply constructed by adding an `s' but
1235that is all. Once again people fell into the trap of believing the
1236rules their language is using are universal. But the handling of plural
1237forms differs widely between the language families. There are two
1238things we can differ between (and even inside language families);
1239
1240@itemize @bullet
1241@item
1242The form how plural forms are build differs. This is a problem with
1243language which have many irregularities. German, for instance, is a
1244drastic case. Though English and German are part of the same language
1245family (Germanic), the almost regular forming of plural noun forms
1246(appending an `s') is hardly found in German.
1247
1248@item
1249The number of plural forms differ. This is somewhat surprising for
1250those who only have experiences with Romanic and Germanic languages
1251since here the number is the same (there are two).
1252
1253But other language families have only one form or many forms. More
1254information on this in an extra section.
1255@end itemize
1256
1257The consequence of this is that application writers should not try to
1258solve the problem in their code. This would be localization since it is
1259only usable for certain, hardcoded language environments. Instead the
1260extended @code{gettext} interface should be used.
1261
1262These extra functions are taking instead of the one key string two
1263strings and a numerical argument. The idea behind this is that using
1264the numerical argument and the first string as a key, the implementation
1265can select using rules specified by the translator the right plural
1266form. The two string arguments then will be used to provide a return
1267value in case no message catalog is found (similar to the normal
1268@code{gettext} behavior). In this case the rules for Germanic language
1269is used and it is assumed that the first string argument is the singular
1270form, the second the plural form.
1271
1272This has the consequence that programs without language catalogs can
1273display the correct strings only if the program itself is written using
1274a Germanic language. This is a limitation but since @theglibc{}
1275(as well as the GNU @code{gettext} package) are written as part of the
1276GNU package and the coding standards for the GNU project require program
1277being written in English, this solution nevertheless fulfills its
1278purpose.
1279
1280@comment libintl.h
1281@comment GNU
1282@deftypefun {char *} ngettext (const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
1283@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
1284@c Wrapper for dcngettext.
1285The @code{ngettext} function is similar to the @code{gettext} function
1286as it finds the message catalogs in the same way. But it takes two
1287extra arguments. The @var{msgid1} parameter must contain the singular
1288form of the string to be converted. It is also used as the key for the
1289search in the catalog. The @var{msgid2} parameter is the plural form.
1290The parameter @var{n} is used to determine the plural form. If no
1291message catalog is found @var{msgid1} is returned if @code{n == 1},
1292otherwise @code{msgid2}.
1293
1294An example for the us of this function is:
1295
1296@smallexample
1297 printf (ngettext ("%d file removed", "%d files removed", n), n);
1298@end smallexample
1299
1300Please note that the numeric value @var{n} has to be passed to the
1301@code{printf} function as well. It is not sufficient to pass it only to
1302@code{ngettext}.
1303@end deftypefun
1304
1305@comment libintl.h
1306@comment GNU
1307@deftypefun {char *} dngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
1308@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
1309@c Wrapper for dcngettext.
1310The @code{dngettext} is similar to the @code{dgettext} function in the
1311way the message catalog is selected. The difference is that it takes
1312two extra parameter to provide the correct plural form. These two
1313parameters are handled in the same way @code{ngettext} handles them.
1314@end deftypefun
1315
1316@comment libintl.h
1317@comment GNU
1318@deftypefun {char *} dcngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}, int @var{category})
1319@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
1320@c Wrapper for dcigettext.
1321The @code{dcngettext} is similar to the @code{dcgettext} function in the
1322way the message catalog is selected. The difference is that it takes
1323two extra parameter to provide the correct plural form. These two
1324parameters are handled in the same way @code{ngettext} handles them.
1325@end deftypefun
1326
1327@subsubheading The problem of plural forms
1328
1329A description of the problem can be found at the beginning of the last
1330section. Now there is the question how to solve it. Without the input
1331of linguists (which was not available) it was not possible to determine
1332whether there are only a few different forms in which plural forms are
1333formed or whether the number can increase with every new supported
1334language.
1335
1336Therefore the solution implemented is to allow the translator to specify
1337the rules of how to select the plural form. Since the formula varies
1338with every language this is the only viable solution except for
1339hardcoding the information in the code (which still would require the
1340possibility of extensions to not prevent the use of new languages). The
1341details are explained in the GNU @code{gettext} manual. Here only a
1342bit of information is provided.
1343
1344The information about the plural form selection has to be stored in the
1345header entry (the one with the empty (@code{msgid} string). It looks
1346like this:
1347
1348@smallexample
1349Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
1350@end smallexample
1351
1352The @code{nplurals} value must be a decimal number which specifies how
1353many different plural forms exist for this language. The string
1354following @code{plural} is an expression which is using the C language
1355syntax. Exceptions are that no negative number are allowed, numbers
1356must be decimal, and the only variable allowed is @code{n}. This
1357expression will be evaluated whenever one of the functions
1358@code{ngettext}, @code{dngettext}, or @code{dcngettext} is called. The
1359numeric value passed to these functions is then substituted for all uses
1360of the variable @code{n} in the expression. The resulting value then
1361must be greater or equal to zero and smaller than the value given as the
1362value of @code{nplurals}.
1363
1364@noindent
1365The following rules are known at this point. The language with families
1366are listed. But this does not necessarily mean the information can be
1367generalized for the whole family (as can be easily seen in the table
1368below).@footnote{Additions are welcome. Send appropriate information to
1369@email{bug-glibc-manual@@gnu.org}.}
1370
1371@table @asis
1372@item Only one form:
1373Some languages only require one single form. There is no distinction
1374between the singular and plural form. An appropriate header entry
1375would look like this:
1376
1377@smallexample
1378Plural-Forms: nplurals=1; plural=0;
1379@end smallexample
1380
1381@noindent
1382Languages with this property include:
1383
1384@table @asis
1385@item Finno-Ugric family
1386Hungarian
1387@item Asian family
1388Japanese, Korean
1389@item Turkic/Altaic family
1390Turkish
1391@end table
1392
1393@item Two forms, singular used for one only
1394This is the form used in most existing programs since it is what English
1395is using. A header entry would look like this:
1396
1397@smallexample
1398Plural-Forms: nplurals=2; plural=n != 1;
1399@end smallexample
1400
1401(Note: this uses the feature of C expressions that boolean expressions
1402have to value zero or one.)
1403
1404@noindent
1405Languages with this property include:
1406
1407@table @asis
1408@item Germanic family
1409Danish, Dutch, English, German, Norwegian, Swedish
1410@item Finno-Ugric family
1411Estonian, Finnish
1412@item Latin/Greek family
1413Greek
1414@item Semitic family
1415Hebrew
1416@item Romance family
1417Italian, Portuguese, Spanish
1418@item Artificial
1419Esperanto
1420@end table
1421
1422@item Two forms, singular used for zero and one
1423Exceptional case in the language family. The header entry would be:
1424
1425@smallexample
1426Plural-Forms: nplurals=2; plural=n>1;
1427@end smallexample
1428
1429@noindent
1430Languages with this property include:
1431
1432@table @asis
1433@item Romanic family
1434French, Brazilian Portuguese
1435@end table
1436
1437@item Three forms, special case for zero
1438The header entry would be:
1439
1440@smallexample
1441Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2;
1442@end smallexample
1443
1444@noindent
1445Languages with this property include:
1446
1447@table @asis
1448@item Baltic family
1449Latvian
1450@end table
1451
1452@item Three forms, special cases for one and two
1453The header entry would be:
1454
1455@smallexample
1456Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;
1457@end smallexample
1458
1459@noindent
1460Languages with this property include:
1461
1462@table @asis
1463@item Celtic
1464Gaeilge (Irish)
1465@end table
1466
1467@item Three forms, special case for numbers ending in 1[2-9]
1468The header entry would look like this:
1469
1470@smallexample
1471Plural-Forms: nplurals=3; \
1472 plural=n%10==1 && n%100!=11 ? 0 : \
1473 n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2;
1474@end smallexample
1475
1476@noindent
1477Languages with this property include:
1478
1479@table @asis
1480@item Baltic family
1481Lithuanian
1482@end table
1483
1484@item Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]
1485The header entry would look like this:
1486
1487@smallexample
1488Plural-Forms: nplurals=3; \
1489 plural=n%100/10==1 ? 2 : n%10==1 ? 0 : (n+9)%10>3 ? 2 : 1;
1490@end smallexample
1491
1492@noindent
1493Languages with this property include:
1494
1495@table @asis
1496@item Slavic family
1497Croatian, Czech, Russian, Ukrainian
1498@end table
1499
1500@item Three forms, special cases for 1 and 2, 3, 4
1501The header entry would look like this:
1502
1503@smallexample
1504Plural-Forms: nplurals=3; \
1505 plural=(n==1) ? 1 : (n>=2 && n<=4) ? 2 : 0;
1506@end smallexample
1507
1508@noindent
1509Languages with this property include:
1510
1511@table @asis
1512@item Slavic family
1513Slovak
1514@end table
1515
1516@item Three forms, special case for one and some numbers ending in 2, 3, or 4
1517The header entry would look like this:
1518
1519@smallexample
1520Plural-Forms: nplurals=3; \
1521 plural=n==1 ? 0 : \
1522 n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
1523@end smallexample
1524
1525@noindent
1526Languages with this property include:
1527
1528@table @asis
1529@item Slavic family
1530Polish
1531@end table
1532
1533@item Four forms, special case for one and all numbers ending in 02, 03, or 04
1534The header entry would look like this:
1535
1536@smallexample
1537Plural-Forms: nplurals=4; \
1538 plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;
1539@end smallexample
1540
1541@noindent
1542Languages with this property include:
1543
1544@table @asis
1545@item Slavic family
1546Slovenian
1547@end table
1548@end table
1549
1550
1551@node Charset conversion in gettext
1552@subsubsection How to specify the output character set @code{gettext} uses
1553
1554@code{gettext} not only looks up a translation in a message catalog. It
1555also converts the translation on the fly to the desired output character
1556set. This is useful if the user is working in a different character set
1557than the translator who created the message catalog, because it avoids
1558distributing variants of message catalogs which differ only in the
1559character set.
1560
1561The output character set is, by default, the value of @code{nl_langinfo
1562(CODESET)}, which depends on the @code{LC_CTYPE} part of the current
1563locale. But programs which store strings in a locale independent way
1564(e.g. UTF-8) can request that @code{gettext} and related functions
1565return the translations in that encoding, by use of the
1566@code{bind_textdomain_codeset} function.
1567
1568Note that the @var{msgid} argument to @code{gettext} is not subject to
1569character set conversion. Also, when @code{gettext} does not find a
1570translation for @var{msgid}, it returns @var{msgid} unchanged --
1571independently of the current output character set. It is therefore
1572recommended that all @var{msgid}s be US-ASCII strings.
1573
1574@comment libintl.h
1575@comment GNU
1576@deftypefun {char *} bind_textdomain_codeset (const char *@var{domainname}, const char *@var{codeset})
1577@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
1578@c bind_textdomain_codeset @ascuheap @acsmem
1579@c set_binding_values dup @ascuheap @acsmem
1580The @code{bind_textdomain_codeset} function can be used to specify the
1581output character set for message catalogs for domain @var{domainname}.
1582The @var{codeset} argument must be a valid codeset name which can be used
1583for the @code{iconv_open} function, or a null pointer.
1584
1585If the @var{codeset} parameter is the null pointer,
1586@code{bind_textdomain_codeset} returns the currently selected codeset
1587for the domain with the name @var{domainname}. It returns @code{NULL} if
1588no codeset has yet been selected.
1589
1590The @code{bind_textdomain_codeset} function can be used several times.
1591If used multiple times with the same @var{domainname} argument, the
1592later call overrides the settings made by the earlier one.
1593
1594The @code{bind_textdomain_codeset} function returns a pointer to a
1595string containing the name of the selected codeset. The string is
1596allocated internally in the function and must not be changed by the
1597user. If the system went out of core during the execution of
1598@code{bind_textdomain_codeset}, the return value is @code{NULL} and the
1599global variable @var{errno} is set accordingly.
1600@end deftypefun
1601
1602
1603@node GUI program problems
1604@subsubsection How to use @code{gettext} in GUI programs
1605
1606One place where the @code{gettext} functions, if used normally, have big
1607problems is within programs with graphical user interfaces (GUIs). The
1608problem is that many of the strings which have to be translated are very
1609short. They have to appear in pull-down menus which restricts the
1610length. But strings which are not containing entire sentences or at
1611least large fragments of a sentence may appear in more than one
1612situation in the program but might have different translations. This is
1613especially true for the one-word strings which are frequently used in
1614GUI programs.
1615
1616As a consequence many people say that the @code{gettext} approach is
1617wrong and instead @code{catgets} should be used which indeed does not
1618have this problem. But there is a very simple and powerful method to
1619handle these kind of problems with the @code{gettext} functions.
1620
1621@noindent
1622As an example consider the following fictional situation. A GUI program
1623has a menu bar with the following entries:
1624
1625@smallexample
1626+------------+------------+--------------------------------------+
1627| File | Printer | |
1628+------------+------------+--------------------------------------+
1629| Open | | Select |
1630| New | | Open |
1631+----------+ | Connect |
1632 +----------+
1633@end smallexample
1634
1635To have the strings @code{File}, @code{Printer}, @code{Open},
1636@code{New}, @code{Select}, and @code{Connect} translated there has to be
1637at some point in the code a call to a function of the @code{gettext}
1638family. But in two places the string passed into the function would be
1639@code{Open}. The translations might not be the same and therefore we
1640are in the dilemma described above.
1641
1642One solution to this problem is to artificially enlengthen the strings
1643to make them unambiguous. But what would the program do if no
1644translation is available? The enlengthened string is not what should be
1645printed. So we should use a little bit modified version of the functions.
1646
1647To enlengthen the strings a uniform method should be used. E.g., in the
1648example above the strings could be chosen as
1649
1650@smallexample
1651Menu|File
1652Menu|Printer
1653Menu|File|Open
1654Menu|File|New
1655Menu|Printer|Select
1656Menu|Printer|Open
1657Menu|Printer|Connect
1658@end smallexample
1659
1660Now all the strings are different and if now instead of @code{gettext}
1661the following little wrapper function is used, everything works just
1662fine:
1663
1664@cindex sgettext
1665@smallexample
1666 char *
1667 sgettext (const char *msgid)
1668 @{
1669 char *msgval = gettext (msgid);
1670 if (msgval == msgid)
1671 msgval = strrchr (msgid, '|') + 1;
1672 return msgval;
1673 @}
1674@end smallexample
1675
1676What this little function does is to recognize the case when no
1677translation is available. This can be done very efficiently by a
1678pointer comparison since the return value is the input value. If there
1679is no translation we know that the input string is in the format we used
1680for the Menu entries and therefore contains a @code{|} character. We
1681simply search for the last occurrence of this character and return a
1682pointer to the character following it. That's it!
1683
1684If one now consistently uses the enlengthened string form and replaces
1685the @code{gettext} calls with calls to @code{sgettext} (this is normally
1686limited to very few places in the GUI implementation) then it is
1687possible to produce a program which can be internationalized.
1688
1689With advanced compilers (such as GNU C) one can write the
1690@code{sgettext} functions as an inline function or as a macro like this:
1691
1692@cindex sgettext
1693@smallexample
1694#define sgettext(msgid) \
1695 (@{ const char *__msgid = (msgid); \
1696 char *__msgstr = gettext (__msgid); \
1697 if (__msgval == __msgid) \
1698 __msgval = strrchr (__msgid, '|') + 1; \
1699 __msgval; @})
1700@end smallexample
1701
1702The other @code{gettext} functions (@code{dgettext}, @code{dcgettext}
1703and the @code{ngettext} equivalents) can and should have corresponding
1704functions as well which look almost identical, except for the parameters
1705and the call to the underlying function.
1706
1707Now there is of course the question why such functions do not exist in
1708@theglibc{}? There are two parts of the answer to this question.
1709
1710@itemize @bullet
1711@item
1712They are easy to write and therefore can be provided by the project they
1713are used in. This is not an answer by itself and must be seen together
1714with the second part which is:
1715
1716@item
1717There is no way the C library can contain a version which can work
1718everywhere. The problem is the selection of the character to separate
1719the prefix from the actual string in the enlenghtened string. The
1720examples above used @code{|} which is a quite good choice because it
1721resembles a notation frequently used in this context and it also is a
1722character not often used in message strings.
1723
1724But what if the character is used in message strings. Or if the chose
1725character is not available in the character set on the machine one
1726compiles (e.g., @code{|} is not required to exist for @w{ISO C}; this is
1727why the @file{iso646.h} file exists in @w{ISO C} programming environments).
1728@end itemize
1729
1730There is only one more comment to make left. The wrapper function above
1731require that the translations strings are not enlengthened themselves.
1732This is only logical. There is no need to disambiguate the strings
1733(since they are never used as keys for a search) and one also saves
1734quite some memory and disk space by doing this.
1735
1736
1737@node Using gettextized software
1738@subsubsection User influence on @code{gettext}
1739
1740The last sections described what the programmer can do to
1741internationalize the messages of the program. But it is finally up to
1742the user to select the message s/he wants to see. S/He must understand
1743them.
1744
1745The POSIX locale model uses the environment variables @code{LC_COLLATE},
1746@code{LC_CTYPE}, @code{LC_MESSAGES}, @code{LC_MONETARY}, @code{LC_NUMERIC},
1747and @code{LC_TIME} to select the locale which is to be used. This way
1748the user can influence lots of functions. As we mentioned above the
1749@code{gettext} functions also take advantage of this.
1750
1751To understand how this happens it is necessary to take a look at the
1752various components of the filename which gets computed to locate a
1753message catalog. It is composed as follows:
1754
1755@smallexample
1756@var{dir_name}/@var{locale}/LC_@var{category}/@var{domain_name}.mo
1757@end smallexample
1758
1759The default value for @var{dir_name} is system specific. It is computed
1760from the value given as the prefix while configuring the C library.
1761This value normally is @file{/usr} or @file{/}. For the former the
1762complete @var{dir_name} is:
1763
1764@smallexample
1765/usr/share/locale
1766@end smallexample
1767
1768We can use @file{/usr/share} since the @file{.mo} files containing the
1769message catalogs are system independent, so all systems can use the same
1770files. If the program executed the @code{bindtextdomain} function for
1771the message domain that is currently handled, the @code{dir_name}
1772component is exactly the value which was given to the function as
1773the second parameter. I.e., @code{bindtextdomain} allows overwriting
1774the only system dependent and fixed value to make it possible to
1775address files anywhere in the filesystem.
1776
1777The @var{category} is the name of the locale category which was selected
1778in the program code. For @code{gettext} and @code{dgettext} this is
1779always @code{LC_MESSAGES}, for @code{dcgettext} this is selected by the
1780value of the third parameter. As said above it should be avoided to
1781ever use a category other than @code{LC_MESSAGES}.
1782
1783The @var{locale} component is computed based on the category used. Just
1784like for the @code{setlocale} function here comes the user selection
1785into the play. Some environment variables are examined in a fixed order
1786and the first environment variable set determines the return value of
1787the lookup process. In detail, for the category @code{LC_xxx} the
1788following variables in this order are examined:
1789
1790@table @code
1791@item LANGUAGE
1792@item LC_ALL
1793@item LC_xxx
1794@item LANG
1795@end table
1796
1797This looks very familiar. With the exception of the @code{LANGUAGE}
1798environment variable this is exactly the lookup order the
1799@code{setlocale} function uses. But why introducing the @code{LANGUAGE}
1800variable?
1801
1802The reason is that the syntax of the values these variables can have is
1803different to what is expected by the @code{setlocale} function. If we
1804would set @code{LC_ALL} to a value following the extended syntax that
1805would mean the @code{setlocale} function will never be able to use the
1806value of this variable as well. An additional variable removes this
1807problem plus we can select the language independently of the locale
1808setting which sometimes is useful.
1809
1810While for the @code{LC_xxx} variables the value should consist of
1811exactly one specification of a locale the @code{LANGUAGE} variable's
1812value can consist of a colon separated list of locale names. The
1813attentive reader will realize that this is the way we manage to
1814implement one of our additional demands above: we want to be able to
1815specify an ordered list of language.
1816
1817Back to the constructed filename we have only one component missing.
1818The @var{domain_name} part is the name which was either registered using
1819the @code{textdomain} function or which was given to @code{dgettext} or
1820@code{dcgettext} as the first parameter. Now it becomes obvious that a
1821good choice for the domain name in the program code is a string which is
1822closely related to the program/package name. E.g., for @theglibc{}
1823the domain name is @code{libc}.
1824
1825@noindent
1826A limit piece of example code should show how the programmer is supposed
1827to work:
1828
1829@smallexample
1830@{
1831 setlocale (LC_ALL, "");
1832 textdomain ("test-package");
1833 bindtextdomain ("test-package", "/usr/local/share/locale");
1834 puts (gettext ("Hello, world!"));
1835@}
1836@end smallexample
1837
1838At the program start the default domain is @code{messages}, and the
1839default locale is "C". The @code{setlocale} call sets the locale
1840according to the user's environment variables; remember that correct
1841functioning of @code{gettext} relies on the correct setting of the
1842@code{LC_MESSAGES} locale (for looking up the message catalog) and
1843of the @code{LC_CTYPE} locale (for the character set conversion).
1844The @code{textdomain} call changes the default domain to
1845@code{test-package}. The @code{bindtextdomain} call specifies that
1846the message catalogs for the domain @code{test-package} can be found
1847below the directory @file{/usr/local/share/locale}.
1848
1849If now the user set in her/his environment the variable @code{LANGUAGE}
1850to @code{de} the @code{gettext} function will try to use the
1851translations from the file
1852
1853@smallexample
1854/usr/local/share/locale/de/LC_MESSAGES/test-package.mo
1855@end smallexample
1856
1857From the above descriptions it should be clear which component of this
1858filename is determined by which source.
1859
1860In the above example we assumed that the @code{LANGUAGE} environment
1861variable to @code{de}. This might be an appropriate selection but what
1862happens if the user wants to use @code{LC_ALL} because of the wider
1863usability and here the required value is @code{de_DE.ISO-8859-1}? We
1864already mentioned above that a situation like this is not infrequent.
1865E.g., a person might prefer reading a dialect and if this is not
1866available fall back on the standard language.
1867
1868The @code{gettext} functions know about situations like this and can
1869handle them gracefully. The functions recognize the format of the value
1870of the environment variable. It can split the value is different pieces
1871and by leaving out the only or the other part it can construct new
1872values. This happens of course in a predictable way. To understand
1873this one must know the format of the environment variable value. There
1874is one more or less standardized form, originally from the X/Open
1875specification:
1876
1877@code{language[_territory[.codeset]][@@modifier]}
1878
1879Less specific locale names will be stripped of in the order of the
1880following list:
1881
1882@enumerate
1883@item
1884@code{codeset}
1885@item
1886@code{normalized codeset}
1887@item
1888@code{territory}
1889@item
1890@code{modifier}
1891@end enumerate
1892
1893The @code{language} field will never be dropped for obvious reasons.
1894
1895The only new thing is the @code{normalized codeset} entry. This is
1896another goodie which is introduced to help reducing the chaos which
1897derives from the inability of the people to standardize the names of
1898character sets. Instead of @w{ISO-8859-1} one can often see @w{8859-1},
1899@w{88591}, @w{iso8859-1}, or @w{iso_8859-1}. The @code{normalized
1900codeset} value is generated from the user-provided character set name by
1901applying the following rules:
1902
1903@enumerate
1904@item
1905Remove all characters beside numbers and letters.
1906@item
1907Fold letters to lowercase.
1908@item
1909If the same only contains digits prepend the string @code{"iso"}.
1910@end enumerate
1911
1912@noindent
1913So all of the above name will be normalized to @code{iso88591}. This
1914allows the program user much more freely choosing the locale name.
1915
1916Even this extended functionality still does not help to solve the
1917problem that completely different names can be used to denote the same
1918locale (e.g., @code{de} and @code{german}). To be of help in this
1919situation the locale implementation and also the @code{gettext}
1920functions know about aliases.
1921
1922The file @file{/usr/share/locale/locale.alias} (replace @file{/usr} with
1923whatever prefix you used for configuring the C library) contains a
1924mapping of alternative names to more regular names. The system manager
1925is free to add new entries to fill her/his own needs. The selected
1926locale from the environment is compared with the entries in the first
1927column of this file ignoring the case. If they match the value of the
1928second column is used instead for the further handling.
1929
1930In the description of the format of the environment variables we already
1931mentioned the character set as a factor in the selection of the message
1932catalog. In fact, only catalogs which contain text written using the
1933character set of the system/program can be used (directly; there will
1934come a solution for this some day). This means for the user that s/he
1935will always have to take care for this. If in the collection of the
1936message catalogs there are files for the same language but coded using
1937different character sets the user has to be careful.
1938
1939
1940@node Helper programs for gettext
1941@subsection Programs to handle message catalogs for @code{gettext}
1942
1943@Theglibc{} does not contain the source code for the programs to
1944handle message catalogs for the @code{gettext} functions. As part of
1945the GNU project the GNU gettext package contains everything the
1946developer needs. The functionality provided by the tools in this
1947package by far exceeds the abilities of the @code{gencat} program
1948described above for the @code{catgets} functions.
1949
1950There is a program @code{msgfmt} which is the equivalent program to the
1951@code{gencat} program. It generates from the human-readable and
1952-editable form of the message catalog a binary file which can be used by
1953the @code{gettext} functions. But there are several more programs
1954available.
1955
1956The @code{xgettext} program can be used to automatically extract the
1957translatable messages from a source file. I.e., the programmer need not
1958take care of the translations and the list of messages which have to be
1959translated. S/He will simply wrap the translatable string in calls to
1960@code{gettext} et.al and the rest will be done by @code{xgettext}. This
1961program has a lot of options which help to customize the output or
1962help to understand the input better.
1963
1964Other programs help to manage the development cycle when new messages appear
1965in the source files or when a new translation of the messages appears.
1966Here it should only be noted that using all the tools in GNU gettext it
1967is possible to @emph{completely} automate the handling of message
1968catalogs. Beside marking the translatable strings in the source code and
1969generating the translations the developers do not have anything to do
1970themselves.