Blame - ap/libc/glibc/glibc-2.23/manual/message.texi - T106_DC

blob: b03a14a57a29c313f574a72517fca946af671a1d [file] [log] [blame]

lh	9ed821d	2023-04-07 01:36:19 -0700	[diff] [blame]	1	@node Message Translation, Searching and Sorting, Locales, Top
				2	@c %MENU% How to make the program speak the user's language
				3	@chapter Message Translation
				4
				5	The program's interface with the user should be designed to ease the user's
				6	task. One way to ease the user's task is to use messages in whatever
				7	language the user prefers.
				8
				9	Printing messages in different languages can be implemented in different
				10	ways. One could add all the different languages in the source code and
				11	choose among the variants every time a message has to be printed. This is
				12	certainly not a good solution since extending the set of languages is
				13	cumbersome (the code must be changed) and the code itself can become
				14	really big with dozens of message sets.
				15
				16	A better solution is to keep the message sets for each language
				17	in separate files which are loaded at runtime depending on the language
				18	selection of the user.
				19
				20	@Theglibc{} provides two different sets of functions to support
				21	message translation. The problem is that neither of the interfaces is
				22	officially defined by the POSIX standard. The @code{catgets} family of
				23	functions is defined in the X/Open standard but this is derived from
				24	industry decisions and therefore not necessarily based on reasonable
				25	decisions.
				26
				27	As mentioned above the message catalog handling provides easy
				28	extendibility by using external data files which contain the message
				29	translations. I.e., these files contain for each of the messages used
				30	in the program a translation for the appropriate language. So the tasks
				31	of the message handling functions are
				32
				33	@itemize @bullet
				34	@item
				35	locate the external data file with the appropriate translations
				36	@item
				37	load the data and make it possible to address the messages
				38	@item
				39	map a given key to the translated message
				40	@end itemize
				41
				42	The two approaches mainly differ in the implementation of this last
				43	step. Decisions made in the last step influence the rest of the design.
				44
				45	@menu
				46	* Message catalogs a la X/Open:: The @code{catgets} family of functions.
				47	* The Uniforum approach:: The @code{gettext} family of functions.
				48	@end menu
				49
				50
				51	@node Message catalogs a la X/Open
				52	@section X/Open Message Catalog Handling
				53
				54	The @code{catgets} functions are based on the simple scheme:
				55
				56	@quotation
				57	Associate every message to translate in the source code with a unique
				58	identifier. To retrieve a message from a catalog file solely the
				59	identifier is used.
				60	@end quotation
				61
				62	This means for the author of the program that s/he will have to make
				63	sure the meaning of the identifier in the program code and in the
				64	message catalogs are always the same.
				65
				66	Before a message can be translated the catalog file must be located.
				67	The user of the program must be able to guide the responsible function
				68	to find whatever catalog the user wants. This is separated from what
				69	the programmer had in mind.
				70
				71	All the types, constants and functions for the @code{catgets} functions
				72	are defined/declared in the @file{nl_types.h} header file.
				73
				74	@menu
				75	* The catgets Functions:: The @code{catgets} function family.
				76	* The message catalog files:: Format of the message catalog files.
				77	* The gencat program:: How to generate message catalogs files which
				78	can be used by the functions.
				79	* Common Usage:: How to use the @code{catgets} interface.
				80	@end menu
				81
				82
				83	@node The catgets Functions
				84	@subsection The @code{catgets} function family
				85
				86	@comment nl_types.h
				87	@comment X/Open
				88	@deftypefun nl_catd catopen (const char *@var{cat_name}, int @var{flag})
				89	@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
				90	@c catopen @mtsenv @ascuheap @acsmem
				91	@c strchr ok
				92	@c setlocale(,NULL) ok
				93	@c getenv @mtsenv
				94	@c strlen ok
				95	@c alloca ok
				96	@c stpcpy ok
				97	@c malloc @ascuheap @acsmem
				98	@c __open_catalog @ascuheap @acsmem
				99	@c strchr ok
				100	@c open_not_cancel_2 @acsfd
				101	@c strlen ok
				102	@c ENOUGH ok
				103	@c alloca ok
				104	@c memcpy ok
				105	@c fxstat64 ok
				106	@c __set_errno ok
				107	@c mmap @acsmem
				108	@c malloc dup @ascuheap @acsmem
				109	@c read_not_cancel ok
				110	@c free dup @ascuheap @acsmem
				111	@c munmap ok
				112	@c close_not_cancel_no_status ok
				113	@c free @ascuheap @acsmem
				114	The @code{catopen} function tries to locate the message data file names
				115	@var{cat_name} and loads it when found. The return value is of an
				116	opaque type and can be used in calls to the other functions to refer to
				117	this loaded catalog.
				118
				119	The return value is @code{(nl_catd) -1} in case the function failed and
				120	no catalog was loaded. The global variable @var{errno} contains a code
				121	for the error causing the failure. But even if the function call
				122	succeeded this does not mean that all messages can be translated.
				123
				124	Locating the catalog file must happen in a way which lets the user of
				125	the program influence the decision. It is up to the user to decide
				126	about the language to use and sometimes it is useful to use alternate
				127	catalog files. All this can be specified by the user by setting some
				128	environment variables.
				129
				130	The first problem is to find out where all the message catalogs are
				131	stored. Every program could have its own place to keep all the
				132	different files but usually the catalog files are grouped by languages
				133	and the catalogs for all programs are kept in the same place.
				134
				135	@cindex NLSPATH environment variable
				136	To tell the @code{catopen} function where the catalog for the program
				137	can be found the user can set the environment variable @code{NLSPATH} to
				138	a value which describes her/his choice. Since this value must be usable
				139	for different languages and locales it cannot be a simple string.
				140	Instead it is a format string (similar to @code{printf}'s). An example
				141	is
				142
				143	@smallexample
				144	/usr/share/locale/%L/%N:/usr/share/locale/%L/LC_MESSAGES/%N
				145	@end smallexample
				146
				147	First one can see that more than one directory can be specified (with
				148	the usual syntax of separating them by colons). The next things to
				149	observe are the format string, @code{%L} and @code{%N} in this case.
				150	The @code{catopen} function knows about several of them and the
				151	replacement for all of them is of course different.
				152
				153	@table @code
				154	@item %N
				155	This format element is substituted with the name of the catalog file.
				156	This is the value of the @var{cat_name} argument given to
				157	@code{catgets}.
				158
				159	@item %L
				160	This format element is substituted with the name of the currently
				161	selected locale for translating messages. How this is determined is
				162	explained below.
				163
				164	@item %l
				165	(This is the lowercase ell.) This format element is substituted with the
				166	language element of the locale name. The string describing the selected
				167	locale is expected to have the form
				168	@code{@var{lang}[_@var{terr}[.@var{codeset}]]} and this format uses the
				169	first part @var{lang}.
				170
				171	@item %t
				172	This format element is substituted by the territory part @var{terr} of
				173	the name of the currently selected locale. See the explanation of the
				174	format above.
				175
				176	@item %c
				177	This format element is substituted by the codeset part @var{codeset} of
				178	the name of the currently selected locale. See the explanation of the
				179	format above.
				180
				181	@item %%
				182	Since @code{%} is used in a meta character there must be a way to
				183	express the @code{%} character in the result itself. Using @code{%%}
				184	does this just like it works for @code{printf}.
				185	@end table
				186
				187
				188	Using @code{NLSPATH} allows arbitrary directories to be searched for
				189	message catalogs while still allowing different languages to be used.
				190	If the @code{NLSPATH} environment variable is not set, the default value
				191	is
				192
				193	@smallexample
				194	@var{prefix}/share/locale/%L/%N:@var{prefix}/share/locale/%L/LC_MESSAGES/%N
				195	@end smallexample
				196
				197	@noindent
				198	where @var{prefix} is given to @code{configure} while installing @theglibc{}
				199	(this value is in many cases @code{/usr} or the empty string).
				200
				201	The remaining problem is to decide which must be used. The value
				202	decides about the substitution of the format elements mentioned above.
				203	First of all the user can specify a path in the message catalog name
				204	(i.e., the name contains a slash character). In this situation the
				205	@code{NLSPATH} environment variable is not used. The catalog must exist
				206	as specified in the program, perhaps relative to the current working
				207	directory. This situation in not desirable and catalogs names never
				208	should be written this way. Beside this, this behavior is not portable
				209	to all other platforms providing the @code{catgets} interface.
				210
				211	@cindex LC_ALL environment variable
				212	@cindex LC_MESSAGES environment variable
				213	@cindex LANG environment variable
				214	Otherwise the values of environment variables from the standard
				215	environment are examined (@pxref{Standard Environment}). Which
				216	variables are examined is decided by the @var{flag} parameter of
				217	@code{catopen}. If the value is @code{NL_CAT_LOCALE} (which is defined
				218	in @file{nl_types.h}) then the @code{catopen} function use the name of
				219	the locale currently selected for the @code{LC_MESSAGES} category.
				220
				221	If @var{flag} is zero the @code{LANG} environment variable is examined.
				222	This is a left-over from the early days where the concept of the locales
				223	had not even reached the level of POSIX locales.
				224
				225	The environment variable and the locale name should have a value of the
				226	form @code{@var{lang}[_@var{terr}[.@var{codeset}]]} as explained above.
				227	If no environment variable is set the @code{"C"} locale is used which
				228	prevents any translation.
				229
				230	The return value of the function is in any case a valid string. Either
				231	it is a translation from a message catalog or it is the same as the
				232	@var{string} parameter. So a piece of code to decide whether a
				233	translation actually happened must look like this:
				234
				235	@smallexample
				236	@{
				237	char *trans = catgets (desc, set, msg, input_string);
				238	if (trans == input_string)
				239	@{
				240	/* Something went wrong. */
				241	@}
				242	@}
				243	@end smallexample
				244
				245	@noindent
				246	When an error occurred the global variable @var{errno} is set to
				247
				248	@table @var
				249	@item EBADF
				250	The catalog does not exist.
				251	@item ENOMSG
				252	The set/message tuple does not name an existing element in the
				253	message catalog.
				254	@end table
				255
				256	While it sometimes can be useful to test for errors programs normally
				257	will avoid any test. If the translation is not available it is no big
				258	problem if the original, untranslated message is printed. Either the
				259	user understands this as well or s/he will look for the reason why the
				260	messages are not translated.
				261	@end deftypefun
				262
				263	Please note that the currently selected locale does not depend on a call
				264	to the @code{setlocale} function. It is not necessary that the locale
				265	data files for this locale exist and calling @code{setlocale} succeeds.
				266	The @code{catopen} function directly reads the values of the environment
				267	variables.
				268
				269
				270	@deftypefun {char } catgets (nl_catd @var{catalog_desc}, int @var{set}, int @var{message}, const char @var{string})
				271	@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
				272	The function @code{catgets} has to be used to access the massage catalog
				273	previously opened using the @code{catopen} function. The
				274	@var{catalog_desc} parameter must be a value previously returned by
				275	@code{catopen}.
				276
				277	The next two parameters, @var{set} and @var{message}, reflect the
				278	internal organization of the message catalog files. This will be
				279	explained in detail below. For now it is interesting to know that a
				280	catalog can consists of several set and the messages in each thread are
				281	individually numbered using numbers. Neither the set number nor the
				282	message number must be consecutive. They can be arbitrarily chosen.
				283	But each message (unless equal to another one) must have its own unique
				284	pair of set and message number.
				285
				286	Since it is not guaranteed that the message catalog for the language
				287	selected by the user exists the last parameter @var{string} helps to
				288	handle this case gracefully. If no matching string can be found
				289	@var{string} is returned. This means for the programmer that
				290
				291	@itemize @bullet
				292	@item
				293	the @var{string} parameters should contain reasonable text (this also
				294	helps to understand the program seems otherwise there would be no hint
				295	on the string which is expected to be returned.
				296	@item
				297	all @var{string} arguments should be written in the same language.
				298	@end itemize
				299	@end deftypefun
				300
				301	It is somewhat uncomfortable to write a program using the @code{catgets}
				302	functions if no supporting functionality is available. Since each
				303	set/message number tuple must be unique the programmer must keep lists
				304	of the messages at the same time the code is written. And the work
				305	between several people working on the same project must be coordinated.
				306	We will see some how these problems can be relaxed a bit (@pxref{Common
				307	Usage}).
				308
				309	@deftypefun int catclose (nl_catd @var{catalog_desc})
				310	@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}}
				311	@c catclose @ascuheap @acucorrupt @acsmem
				312	@c __set_errno ok
				313	@c munmap ok
				314	@c free @ascuheap @acsmem
				315	The @code{catclose} function can be used to free the resources
				316	associated with a message catalog which previously was opened by a call
				317	to @code{catopen}. If the resources can be successfully freed the
				318	function returns @code{0}. Otherwise it return @code{@minus{}1} and the
				319	global variable @var{errno} is set. Errors can occur if the catalog
				320	descriptor @var{catalog_desc} is not valid in which case @var{errno} is
				321	set to @code{EBADF}.
				322	@end deftypefun
				323
				324
				325	@node The message catalog files
				326	@subsection Format of the message catalog files
				327
				328	The only reasonable way the translate all the messages of a function and
				329	store the result in a message catalog file which can be read by the
				330	@code{catopen} function is to write all the message text to the
				331	translator and let her/him translate them all. I.e., we must have a
				332	file with entries which associate the set/message tuple with a specific
				333	translation. This file format is specified in the X/Open standard and
				334	is as follows:
				335
				336	@itemize @bullet
				337	@item
				338	Lines containing only whitespace characters or empty lines are ignored.
				339
				340	@item
				341	Lines which contain as the first non-whitespace character a @code{$}
				342	followed by a whitespace character are comment and are also ignored.
				343
				344	@item
				345	If a line contains as the first non-whitespace characters the sequence
				346	@code{$set} followed by a whitespace character an additional argument
				347	is required to follow. This argument can either be:
				348
				349	@itemize @minus
				350	@item
				351	a number. In this case the value of this number determines the set
				352	to which the following messages are added.
				353
				354	@item
				355	an identifier consisting of alphanumeric characters plus the underscore
				356	character. In this case the set get automatically a number assigned.
				357	This value is one added to the largest set number which so far appeared.
				358
				359	How to use the symbolic names is explained in section @ref{Common Usage}.
				360
				361	It is an error if a symbol name appears more than once. All following
				362	messages are placed in a set with this number.
				363	@end itemize
				364
				365	@item
				366	If a line contains as the first non-whitespace characters the sequence
				367	@code{$delset} followed by a whitespace character an additional argument
				368	is required to follow. This argument can either be:
				369
				370	@itemize @minus
				371	@item
				372	a number. In this case the value of this number determines the set
				373	which will be deleted.
				374
				375	@item
				376	an identifier consisting of alphanumeric characters plus the underscore
				377	character. This symbolic identifier must match a name for a set which
				378	previously was defined. It is an error if the name is unknown.
				379	@end itemize
				380
				381	In both cases all messages in the specified set will be removed. They
				382	will not appear in the output. But if this set is later again selected
				383	with a @code{$set} command again messages could be added and these
				384	messages will appear in the output.
				385
				386	@item
				387	If a line contains after leading whitespaces the sequence
				388	@code{$quote}, the quoting character used for this input file is
				389	changed to the first non-whitespace character following the
				390	@code{$quote}. If no non-whitespace character is present before the
				391	line ends quoting is disable.
				392
				393	By default no quoting character is used. In this mode strings are
				394	terminated with the first unescaped line break. If there is a
				395	@code{$quote} sequence present newline need not be escaped. Instead a
				396	string is terminated with the first unescaped appearance of the quote
				397	character.
				398
				399	A common usage of this feature would be to set the quote character to
				400	@code{"}. Then any appearance of the @code{"} in the strings must
				401	be escaped using the backslash (i.e., @code{\"} must be written).
				402
				403	@item
				404	Any other line must start with a number or an alphanumeric identifier
				405	(with the underscore character included). The following characters
				406	(starting after the first whitespace character) will form the string
				407	which gets associated with the currently selected set and the message
				408	number represented by the number and identifier respectively.
				409
				410	If the start of the line is a number the message number is obvious. It
				411	is an error if the same message number already appeared for this set.
				412
				413	If the leading token was an identifier the message number gets
				414	automatically assigned. The value is the current maximum messages
				415	number for this set plus one. It is an error if the identifier was
				416	already used for a message in this set. It is OK to reuse the
				417	identifier for a message in another thread. How to use the symbolic
				418	identifiers will be explained below (@pxref{Common Usage}). There is
				419	one limitation with the identifier: it must not be @code{Set}. The
				420	reason will be explained below.
				421
				422	The text of the messages can contain escape characters. The usual bunch
				423	of characters known from the @w{ISO C} language are recognized
				424	(@code{\n}, @code{\t}, @code{\v}, @code{\b}, @code{\r}, @code{\f},
				425	@code{\\}, and @code{\@var{nnn}}, where @var{nnn} is the octal coding of
				426	a character code).
				427	@end itemize
				428
				429	@strong{Important:} The handling of identifiers instead of numbers for
				430	the set and messages is a GNU extension. Systems strictly following the
				431	X/Open specification do not have this feature. An example for a message
				432	catalog file is this:
				433
				434	@smallexample
				435	$ This is a leading comment.
				436	$quote "
				437
				438	$set SetOne
				439	1 Message with ID 1.
				440	two " Message with ID \"two\", which gets the value 2 assigned"
				441
				442	$set SetTwo
				443	$ Since the last set got the number 1 assigned this set has number 2.
				444	4000 "The numbers can be arbitrary, they need not start at one."
				445	@end smallexample
				446
				447	This small example shows various aspects:
				448	@itemize @bullet
				449	@item
				450	Lines 1 and 9 are comments since they start with @code{$} followed by
				451	a whitespace.
				452	@item
				453	The quoting character is set to @code{"}. Otherwise the quotes in the
				454	message definition would have to be left away and in this case the
				455	message with the identifier @code{two} would loose its leading whitespace.
				456	@item
				457	Mixing numbered messages with message having symbolic names is no
				458	problem and the numbering happens automatically.
				459	@end itemize
				460
				461
				462	While this file format is pretty easy it is not the best possible for
				463	use in a running program. The @code{catopen} function would have to
				464	parser the file and handle syntactic errors gracefully. This is not so
				465	easy and the whole process is pretty slow. Therefore the @code{catgets}
				466	functions expect the data in another more compact and ready-to-use file
				467	format. There is a special program @code{gencat} which is explained in
				468	detail in the next section.
				469
				470	Files in this other format are not human readable. To be easy to use by
				471	programs it is a binary file. But the format is byte order independent
				472	so translation files can be shared by systems of arbitrary architecture
				473	(as long as they use @theglibc{}).
				474
				475	Details about the binary file format are not important to know since
				476	these files are always created by the @code{gencat} program. The
				477	sources of @theglibc{} also provide the sources for the
				478	@code{gencat} program and so the interested reader can look through
				479	these source files to learn about the file format.
				480
				481
				482	@node The gencat program
				483	@subsection Generate Message Catalogs files
				484
				485	@cindex gencat
				486	The @code{gencat} program is specified in the X/Open standard and the
				487	GNU implementation follows this specification and so processes
				488	all correctly formed input files. Additionally some extension are
				489	implemented which help to work in a more reasonable way with the
				490	@code{catgets} functions.
				491
				492	The @code{gencat} program can be invoked in two ways:
				493
				494	@example
				495	`gencat [@var{Option}]@dots{} [@var{Output-File} [@var{Input-File}]@dots{}]`
				496	@end example
				497
				498	This is the interface defined in the X/Open standard. If no
				499	@var{Input-File} parameter is given input will be read from standard
				500	input. Multiple input files will be read as if they are concatenated.
				501	If @var{Output-File} is also missing, the output will be written to
				502	standard output. To provide the interface one is used to from other
				503	programs a second interface is provided.
				504
				505	@smallexample
				506	`gencat [@var{Option}]@dots{} -o @var{Output-File} [@var{Input-File}]@dots{}`
				507	@end smallexample
				508
				509	The option @samp{-o} is used to specify the output file and all file
				510	arguments are used as input files.
				511
				512	Beside this one can use @file{-} or @file{/dev/stdin} for
				513	@var{Input-File} to denote the standard input. Corresponding one can
				514	use @file{-} and @file{/dev/stdout} for @var{Output-File} to denote
				515	standard output. Using @file{-} as a file name is allowed in X/Open
				516	while using the device names is a GNU extension.
				517
				518	The @code{gencat} program works by concatenating all input files and
				519	then @strong{merge} the resulting collection of message sets with a
				520	possibly existing output file. This is done by removing all messages
				521	with set/message number tuples matching any of the generated messages
				522	from the output file and then adding all the new messages. To
				523	regenerate a catalog file while ignoring the old contents therefore
				524	requires to remove the output file if it exists. If the output is
				525	written to standard output no merging takes place.
				526
				527	@noindent
				528	The following table shows the options understood by the @code{gencat}
				529	program. The X/Open standard does not specify any option for the
				530	program so all of these are GNU extensions.
				531
				532	@table @samp
				533	@item -V
				534	@itemx --version
				535	Print the version information and exit.
				536	@item -h
				537	@itemx --help
				538	Print a usage message listing all available options, then exit successfully.
				539	@item --new
				540	Do never merge the new messages from the input files with the old content
				541	of the output files. The old content of the output file is discarded.
				542	@item -H
				543	@itemx --header=name
				544	This option is used to emit the symbolic names given to sets and
				545	messages in the input files for use in the program. Details about how
				546	to use this are given in the next section. The @var{name} parameter to
				547	this option specifies the name of the output file. It will contain a
				548	number of C preprocessor @code{#define}s to associate a name with a
				549	number.
				550
				551	Please note that the generated file only contains the symbols from the
				552	input files. If the output is merged with the previous content of the
				553	output file the possibly existing symbols from the file(s) which
				554	generated the old output files are not in the generated header file.
				555	@end table
				556
				557
				558	@node Common Usage
				559	@subsection How to use the @code{catgets} interface
				560
				561	The @code{catgets} functions can be used in two different ways. By
				562	following slavishly the X/Open specs and not relying on the extension
				563	and by using the GNU extensions. We will take a look at the former
				564	method first to understand the benefits of extensions.
				565
				566	@subsubsection Not using symbolic names
				567
				568	Since the X/Open format of the message catalog files does not allow
				569	symbol names we have to work with numbers all the time. When we start
				570	writing a program we have to replace all appearances of translatable
				571	strings with something like
				572
				573	@smallexample
				574	catgets (catdesc, set, msg, "string")
				575	@end smallexample
				576
				577	@noindent
				578	@var{catgets} is retrieved from a call to @code{catopen} which is
				579	normally done once at the program start. The @code{"string"} is the
				580	string we want to translate. The problems start with the set and
				581	message numbers.
				582
				583	In a bigger program several programmers usually work at the same time on
				584	the program and so coordinating the number allocation is crucial.
				585	Though no two different strings must be indexed by the same tuple of
				586	numbers it is highly desirable to reuse the numbers for equal strings
				587	with equal translations (please note that there might be strings which
				588	are equal in one language but have different translations due to
				589	difference contexts).
				590
				591	The allocation process can be relaxed a bit by different set numbers for
				592	different parts of the program. So the number of developers who have to
				593	coordinate the allocation can be reduced. But still lists must be keep
				594	track of the allocation and errors can easily happen. These errors
				595	cannot be discovered by the compiler or the @code{catgets} functions.
				596	Only the user of the program might see wrong messages printed. In the
				597	worst cases the messages are so irritating that they cannot be
				598	recognized as wrong. Think about the translations for @code{"true"} and
				599	@code{"false"} being exchanged. This could result in a disaster.
				600
				601
				602	@subsubsection Using symbolic names
				603
				604	The problems mentioned in the last section derive from the fact that:
				605
				606	@enumerate
				607	@item
				608	the numbers are allocated once and due to the possibly frequent use of
				609	them it is difficult to change a number later.
				610	@item
				611	the numbers do not allow to guess anything about the string and
				612	therefore collisions can easily happen.
				613	@end enumerate
				614
				615	By constantly using symbolic names and by providing a method which maps
				616	the string content to a symbolic name (however this will happen) one can
				617	prevent both problems above. The cost of this is that the programmer
				618	has to write a complete message catalog file while s/he is writing the
				619	program itself.
				620
				621	This is necessary since the symbolic names must be mapped to numbers
				622	before the program sources can be compiled. In the last section it was
				623	described how to generate a header containing the mapping of the names.
				624	E.g., for the example message file given in the last section we could
				625	call the @code{gencat} program as follow (assume @file{ex.msg} contains
				626	the sources).
				627
				628	@smallexample
				629	gencat -H ex.h -o ex.cat ex.msg
				630	@end smallexample
				631
				632	@noindent
				633	This generates a header file with the following content:
				634
				635	@smallexample
				636	#define SetTwoSet 0x2 /* ex.msg:8 */
				637
				638	#define SetOneSet 0x1 /* ex.msg:4 */
				639	#define SetOnetwo 0x2 /* ex.msg:6 */
				640	@end smallexample
				641
				642	As can be seen the various symbols given in the source file are mangled
				643	to generate unique identifiers and these identifiers get numbers
				644	assigned. Reading the source file and knowing about the rules will
				645	allow to predict the content of the header file (it is deterministic)
				646	but this is not necessary. The @code{gencat} program can take care for
				647	everything. All the programmer has to do is to put the generated header
				648	file in the dependency list of the source files of her/his project and
				649	to add a rules to regenerate the header of any of the input files
				650	change.
				651
				652	One word about the symbol mangling. Every symbol consists of two parts:
				653	the name of the message set plus the name of the message or the special
				654	string @code{Set}. So @code{SetOnetwo} means this macro can be used to
				655	access the translation with identifier @code{two} in the message set
				656	@code{SetOne}.
				657
				658	The other names denote the names of the message sets. The special
				659	string @code{Set} is used in the place of the message identifier.
				660
				661	If in the code the second string of the set @code{SetOne} is used the C
				662	code should look like this:
				663
				664	@smallexample
				665	catgets (catdesc, SetOneSet, SetOnetwo,
				666	" Message with ID \"two\", which gets the value 2 assigned")
				667	@end smallexample
				668
				669	Writing the function this way will allow to change the message number
				670	and even the set number without requiring any change in the C source
				671	code. (The text of the string is normally not the same; this is only
				672	for this example.)
				673
				674
				675	@subsubsection How does to this allow to develop
				676
				677	To illustrate the usual way to work with the symbolic version numbers
				678	here is a little example. Assume we want to write the very complex and
				679	famous greeting program. We start by writing the code as usual:
				680
				681	@smallexample
				682	#include <stdio.h>
				683	int
				684	main (void)
				685	@{
				686	printf ("Hello, world!\n");
				687	return 0;
				688	@}
				689	@end smallexample
				690
				691	Now we want to internationalize the message and therefore replace the
				692	message with whatever the user wants.
				693
				694	@smallexample
				695	#include <nl_types.h>
				696	#include <stdio.h>
				697	#include "msgnrs.h"
				698	int
				699	main (void)
				700	@{
				701	nl_catd catdesc = catopen ("hello.cat", NL_CAT_LOCALE);
				702	printf (catgets (catdesc, SetMainSet, SetMainHello,
				703	"Hello, world!\n"));
				704	catclose (catdesc);
				705	return 0;
				706	@}
				707	@end smallexample
				708
				709	We see how the catalog object is opened and the returned descriptor used
				710	in the other function calls. It is not really necessary to check for
				711	failure of any of the functions since even in these situations the
				712	functions will behave reasonable. They simply will be return a
				713	translation.
				714
				715	What remains unspecified here are the constants @code{SetMainSet} and
				716	@code{SetMainHello}. These are the symbolic names describing the
				717	message. To get the actual definitions which match the information in
				718	the catalog file we have to create the message catalog source file and
				719	process it using the @code{gencat} program.
				720
				721	@smallexample
				722	$ Messages for the famous greeting program.
				723	$quote "
				724
				725	$set Main
				726	Hello "Hallo, Welt!\n"
				727	@end smallexample
				728
				729	Now we can start building the program (assume the message catalog source
				730	file is named @file{hello.msg} and the program source file @file{hello.c}):
				731
				732	@smallexample
				733	% gencat -H msgnrs.h -o hello.cat hello.msg
				734	% cat msgnrs.h
				735	#define MainSet 0x1 /* hello.msg:4 */
				736	#define MainHello 0x1 /* hello.msg:5 */
				737	% gcc -o hello hello.c -I.
				738	% cp hello.cat /usr/share/locale/de/LC_MESSAGES
				739	% echo $LC_ALL
				740	de
				741	% ./hello
				742	Hallo, Welt!
				743	%
				744	@end smallexample
				745
				746	The call of the @code{gencat} program creates the missing header file
				747	@file{msgnrs.h} as well as the message catalog binary. The former is
				748	used in the compilation of @file{hello.c} while the later is placed in a
				749	directory in which the @code{catopen} function will try to locate it.
				750	Please check the @code{LC_ALL} environment variable and the default path
				751	for @code{catopen} presented in the description above.
				752
				753
				754	@node The Uniforum approach
				755	@section The Uniforum approach to Message Translation
				756
				757	Sun Microsystems tried to standardize a different approach to message
				758	translation in the Uniforum group. There never was a real standard
				759	defined but still the interface was used in Sun's operating systems.
				760	Since this approach fits better in the development process of free
				761	software it is also used throughout the GNU project and the GNU
				762	@file{gettext} package provides support for this outside @theglibc{}.
				763
				764	The code of the @file{libintl} from GNU @file{gettext} is the same as
				765	the code in @theglibc{}. So the documentation in the GNU
				766	@file{gettext} manual is also valid for the functionality here. The
				767	following text will describe the library functions in detail. But the
				768	numerous helper programs are not described in this manual. Instead
				769	people should read the GNU @file{gettext} manual
				770	(@pxref{Top,,GNU gettext utilities,gettext,Native Language Support Library and Tools}).
				771	We will only give a short overview.
				772
				773	Though the @code{catgets} functions are available by default on more
				774	systems the @code{gettext} interface is at least as portable as the
				775	former. The GNU @file{gettext} package can be used wherever the
				776	functions are not available.
				777
				778
				779	@menu
				780	* Message catalogs with gettext:: The @code{gettext} family of functions.
				781	* Helper programs for gettext:: Programs to handle message catalogs
				782	for @code{gettext}.
				783	@end menu
				784
				785
				786	@node Message catalogs with gettext
				787	@subsection The @code{gettext} family of functions
				788
				789	The paradigms underlying the @code{gettext} approach to message
				790	translations is different from that of the @code{catgets} functions the
				791	basic functionally is equivalent. There are functions of the following
				792	categories:
				793
				794	@menu
				795	* Translation with gettext:: What has to be done to translate a message.
				796	* Locating gettext catalog:: How to determine which catalog to be used.
				797	* Advanced gettext functions:: Additional functions for more complicated
				798	situations.
				799	* Charset conversion in gettext:: How to specify the output character set
				800	@code{gettext} uses.
				801	* GUI program problems:: How to use @code{gettext} in GUI programs.
				802	* Using gettextized software:: The possibilities of the user to influence
				803	the way @code{gettext} works.
				804	@end menu
				805
				806	@node Translation with gettext
				807	@subsubsection What has to be done to translate a message?
				808
				809	The @code{gettext} functions have a very simple interface. The most
				810	basic function just takes the string which shall be translated as the
				811	argument and it returns the translation. This is fundamentally
				812	different from the @code{catgets} approach where an extra key is
				813	necessary and the original string is only used for the error case.
				814
				815	If the string which has to be translated is the only argument this of
				816	course means the string itself is the key. I.e., the translation will
				817	be selected based on the original string. The message catalogs must
				818	therefore contain the original strings plus one translation for any such
				819	string. The task of the @code{gettext} function is it to compare the
				820	argument string with the available strings in the catalog and return the
				821	appropriate translation. Of course this process is optimized so that
				822	this process is not more expensive than an access using an atomic key
				823	like in @code{catgets}.
				824
				825	The @code{gettext} approach has some advantages but also some
				826	disadvantages. Please see the GNU @file{gettext} manual for a detailed
				827	discussion of the pros and cons.
				828
				829	All the definitions and declarations for @code{gettext} can be found in
				830	the @file{libintl.h} header file. On systems where these functions are
				831	not part of the C library they can be found in a separate library named
				832	@file{libintl.a} (or accordingly different for shared libraries).
				833
				834	@comment libintl.h
				835	@comment GNU
				836	@deftypefun {char } gettext (const char @var{msgid})
				837	@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
				838	@c Wrapper for dcgettext.
				839	The @code{gettext} function searches the currently selected message
				840	catalogs for a string which is equal to @var{msgid}. If there is such a
				841	string available it is returned. Otherwise the argument string
				842	@var{msgid} is returned.
				843
				844	Please note that although the return value is @code{char *} the
				845	returned string must not be changed. This broken type results from the
				846	history of the function and does not reflect the way the function should
				847	be used.
				848
				849	Please note that above we wrote ``message catalogs'' (plural). This is
				850	a specialty of the GNU implementation of these functions and we will
				851	say more about this when we talk about the ways message catalogs are
				852	selected (@pxref{Locating gettext catalog}).
				853
				854	The @code{gettext} function does not modify the value of the global
				855	@var{errno} variable. This is necessary to make it possible to write
				856	something like
				857
				858	@smallexample
				859	printf (gettext ("Operation failed: %m\n"));
				860	@end smallexample
				861
				862	Here the @var{errno} value is used in the @code{printf} function while
				863	processing the @code{%m} format element and if the @code{gettext}
				864	function would change this value (it is called before @code{printf} is
				865	called) we would get a wrong message.
				866
				867	So there is no easy way to detect a missing message catalog beside
				868	comparing the argument string with the result. But it is normally the
				869	task of the user to react on missing catalogs. The program cannot guess
				870	when a message catalog is really necessary since for a user who speaks
				871	the language the program was developed in does not need any translation.
				872	@end deftypefun
				873
				874	The remaining two functions to access the message catalog add some
				875	functionality to select a message catalog which is not the default one.
				876	This is important if parts of the program are developed independently.
				877	Every part can have its own message catalog and all of them can be used
				878	at the same time. The C library itself is an example: internally it
				879	uses the @code{gettext} functions but since it must not depend on a
				880	currently selected default message catalog it must specify all ambiguous
				881	information.
				882
				883	@comment libintl.h
				884	@comment GNU
				885	@deftypefun {char } dgettext (const char @var{domainname}, const char *@var{msgid})
				886	@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
				887	@c Wrapper for dcgettext.
				888	The @code{dgettext} functions acts just like the @code{gettext}
				889	function. It only takes an additional first argument @var{domainname}
				890	which guides the selection of the message catalogs which are searched
				891	for the translation. If the @var{domainname} parameter is the null
				892	pointer the @code{dgettext} function is exactly equivalent to
				893	@code{gettext} since the default value for the domain name is used.
				894
				895	As for @code{gettext} the return value type is @code{char *} which is an
				896	anachronism. The returned string must never be modified.
				897	@end deftypefun
				898
				899	@comment libintl.h
				900	@comment GNU
				901	@deftypefun {char } dcgettext (const char @var{domainname}, const char *@var{msgid}, int @var{category})
				902	@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
				903	@c dcgettext @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
				904	@c dcigettext @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
				905	@c libc_rwlock_rdlock @asulock @aculock
				906	@c current_locale_name ok [protected from @mtslocale]
				907	@c tfind ok
				908	@c libc_rwlock_unlock ok
				909	@c plural_lookup ok
				910	@c plural_eval ok
				911	@c rawmemchr ok
				912	@c DETERMINE_SECURE ok, nothing
				913	@c strcmp ok
				914	@c strlen ok
				915	@c getcwd @ascuheap @acsmem @acsfd
				916	@c strchr ok
				917	@c stpcpy ok
				918	@c category_to_name ok
				919	@c guess_category_value @mtsenv
				920	@c getenv @mtsenv
				921	@c current_locale_name dup ok [protected from @mtslocale by dcigettext]
				922	@c strcmp ok
				923	@c ENABLE_SECURE ok
				924	@c _nl_find_domain @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
				925	@c libc_rwlock_rdlock dup @asulock @aculock
				926	@c _nl_make_l10nflist dup @ascuheap @acsmem
				927	@c libc_rwlock_unlock dup ok
				928	@c _nl_load_domain @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
				929	@c libc_lock_lock_recursive @aculock
				930	@c libc_lock_unlock_recursive @aculock
				931	@c open->open_not_cancel_2 @acsfd
				932	@c fstat ok
				933	@c mmap dup @acsmem
				934	@c close->close_not_cancel_no_status @acsfd
				935	@c malloc dup @ascuheap @acsmem
				936	@c read->read_not_cancel ok
				937	@c munmap dup @acsmem
				938	@c W dup ok
				939	@c strlen dup ok
				940	@c get_sysdep_segment_value ok
				941	@c memcpy dup ok
				942	@c hash_string dup ok
				943	@c free dup @ascuheap @acsmem
				944	@c libc_rwlock_init ok
				945	@c _nl_find_msg dup @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
				946	@c libc_rwlock_fini ok
				947	@c EXTRACT_PLURAL_EXPRESSION @ascuheap @acsmem
				948	@c strstr dup ok
				949	@c isspace ok
				950	@c strtoul ok
				951	@c PLURAL_PARSE @ascuheap @acsmem
				952	@c malloc dup @ascuheap @acsmem
				953	@c free dup @ascuheap @acsmem
				954	@c INIT_GERMANIC_PLURAL ok, nothing
				955	@c the pre-C99 variant is @acucorrupt [protected from @mtuinit by dcigettext]
				956	@c _nl_expand_alias dup @ascuheap @asulock @acsmem @acsfd @aculock
				957	@c _nl_explode_name dup @ascuheap @acsmem
				958	@c libc_rwlock_wrlock dup @asulock @aculock
				959	@c free dup @asulock @aculock @acsfd @acsmem
				960	@c _nl_find_msg @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
				961	@c _nl_load_domain dup @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
				962	@c strlen ok
				963	@c hash_string ok
				964	@c W ok
				965	@c SWAP ok
				966	@c bswap_32 ok
				967	@c strcmp ok
				968	@c get_output_charset @mtsenv @ascuheap @acsmem
				969	@c getenv dup @mtsenv
				970	@c strlen dup ok
				971	@c malloc dup @ascuheap @acsmem
				972	@c memcpy dup ok
				973	@c libc_rwlock_rdlock dup @asulock @aculock
				974	@c libc_rwlock_unlock dup ok
				975	@c libc_rwlock_wrlock dup @asulock @aculock
				976	@c realloc @ascuheap @acsmem
				977	@c strdup @ascuheap @acsmem
				978	@c strstr ok
				979	@c strcspn ok
				980	@c mempcpy dup ok
				981	@c norm_add_slashes dup ok
				982	@c gconv_open @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd
				983	@c [protected from @mtslocale by dcigettext locale lock]
				984	@c free dup @ascuheap @acsmem
				985	@c libc_lock_lock @asulock @aculock
				986	@c calloc @ascuheap @acsmem
				987	@c gconv dup @acucorrupt [protected from @mtsrace and @asucorrupt by lock]
				988	@c libc_lock_unlock ok
				989	@c malloc @ascuheap @acsmem
				990	@c mempcpy ok
				991	@c memcpy ok
				992	@c strcpy ok
				993	@c libc_rwlock_wrlock @asulock @aculock
				994	@c tsearch @ascuheap @acucorrupt @acsmem [protected from @mtsrace and @asucorrupt]
				995	@c transcmp ok
				996	@c strmp dup ok
				997	@c free @ascuheap @acsmem
				998	The @code{dcgettext} adds another argument to those which
				999	@code{dgettext} takes. This argument @var{category} specifies the last
				1000	piece of information needed to localize the message catalog. I.e., the
				1001	domain name and the locale category exactly specify which message
				1002	catalog has to be used (relative to a given directory, see below).
				1003
				1004	The @code{dgettext} function can be expressed in terms of
				1005	@code{dcgettext} by using
				1006
				1007	@smallexample
				1008	dcgettext (domain, string, LC_MESSAGES)
				1009	@end smallexample
				1010
				1011	@noindent
				1012	instead of
				1013
				1014	@smallexample
				1015	dgettext (domain, string)
				1016	@end smallexample
				1017
				1018	This also shows which values are expected for the third parameter. One
				1019	has to use the available selectors for the categories available in
				1020	@file{locale.h}. Normally the available values are @code{LC_CTYPE},
				1021	@code{LC_COLLATE}, @code{LC_MESSAGES}, @code{LC_MONETARY},
				1022	@code{LC_NUMERIC}, and @code{LC_TIME}. Please note that @code{LC_ALL}
				1023	must not be used and even though the names might suggest this, there is
				1024	no relation to the environments variables of this name.
				1025
				1026	The @code{dcgettext} function is only implemented for compatibility with
				1027	other systems which have @code{gettext} functions. There is not really
				1028	any situation where it is necessary (or useful) to use a different value
				1029	but @code{LC_MESSAGES} in for the @var{category} parameter. We are
				1030	dealing with messages here and any other choice can only be irritating.
				1031
				1032	As for @code{gettext} the return value type is @code{char *} which is an
				1033	anachronism. The returned string must never be modified.
				1034	@end deftypefun
				1035
				1036	When using the three functions above in a program it is a frequent case
				1037	that the @var{msgid} argument is a constant string. So it is worth to
				1038	optimize this case. Thinking shortly about this one will realize that
				1039	as long as no new message catalog is loaded the translation of a message
				1040	will not change. This optimization is actually implemented by the
				1041	@code{gettext}, @code{dgettext} and @code{dcgettext} functions.
				1042
				1043
				1044	@node Locating gettext catalog
				1045	@subsubsection How to determine which catalog to be used
				1046
				1047	The functions to retrieve the translations for a given message have a
				1048	remarkable simple interface. But to provide the user of the program
				1049	still the opportunity to select exactly the translation s/he wants and
				1050	also to provide the programmer the possibility to influence the way to
				1051	locate the search for catalogs files there is a quite complicated
				1052	underlying mechanism which controls all this. The code is complicated
				1053	the use is easy.
				1054
				1055	Basically we have two different tasks to perform which can also be
				1056	performed by the @code{catgets} functions:
				1057
				1058	@enumerate
				1059	@item
				1060	Locate the set of message catalogs. There are a number of files for
				1061	different languages and which all belong to the package. Usually they
				1062	are all stored in the filesystem below a certain directory.
				1063
				1064	There can be arbitrary many packages installed and they can follow
				1065	different guidelines for the placement of their files.
				1066
				1067	@item
				1068	Relative to the location specified by the package the actual translation
				1069	files must be searched, based on the wishes of the user. I.e., for each
				1070	language the user selects the program should be able to locate the
				1071	appropriate file.
				1072	@end enumerate
				1073
				1074	This is the functionality required by the specifications for
				1075	@code{gettext} and this is also what the @code{catgets} functions are
				1076	able to do. But there are some problems unresolved:
				1077
				1078	@itemize @bullet
				1079	@item
				1080	The language to be used can be specified in several different ways.
				1081	There is no generally accepted standard for this and the user always
				1082	expects the program understand what s/he means. E.g., to select the
				1083	German translation one could write @code{de}, @code{german}, or
				1084	@code{deutsch} and the program should always react the same.
				1085
				1086	@item
				1087	Sometimes the specification of the user is too detailed. If s/he, e.g.,
				1088	specifies @code{de_DE.ISO-8859-1} which means German, spoken in Germany,
				1089	coded using the @w{ISO 8859-1} character set there is the possibility
				1090	that a message catalog matching this exactly is not available. But
				1091	there could be a catalog matching @code{de} and if the character set
				1092	used on the machine is always @w{ISO 8859-1} there is no reason why this
				1093	later message catalog should not be used. (We call this @dfn{message
				1094	inheritance}.)
				1095
				1096	@item
				1097	If a catalog for a wanted language is not available it is not always the
				1098	second best choice to fall back on the language of the developer and
				1099	simply not translate any message. Instead a user might be better able
				1100	to read the messages in another language and so the user of the program
				1101	should be able to define a precedence order of languages.
				1102	@end itemize
				1103
				1104	We can divide the configuration actions in two parts: the one is
				1105	performed by the programmer, the other by the user. We will start with
				1106	the functions the programmer can use since the user configuration will
				1107	be based on this.
				1108
				1109	As the functions described in the last sections already mention separate
				1110	sets of messages can be selected by a @dfn{domain name}. This is a
				1111	simple string which should be unique for each program part with uses a
				1112	separate domain. It is possible to use in one program arbitrary many
				1113	domains at the same time. E.g., @theglibc{} itself uses a domain
				1114	named @code{libc} while the program using the C Library could use a
				1115	domain named @code{foo}. The important point is that at any time
				1116	exactly one domain is active. This is controlled with the following
				1117	function.
				1118
				1119	@comment libintl.h
				1120	@comment GNU
				1121	@deftypefun {char } textdomain (const char @var{domainname})
				1122	@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}}
				1123	@c textdomain @asulock @ascuheap @aculock @acsmem
				1124	@c libc_rwlock_wrlock @asulock @aculock
				1125	@c strcmp ok
				1126	@c strdup @ascuheap @acsmem
				1127	@c free @ascuheap @acsmem
				1128	@c libc_rwlock_unlock ok
				1129	The @code{textdomain} function sets the default domain, which is used in
				1130	all future @code{gettext} calls, to @var{domainname}. Please note that
				1131	@code{dgettext} and @code{dcgettext} calls are not influenced if the
				1132	@var{domainname} parameter of these functions is not the null pointer.
				1133
				1134	Before the first call to @code{textdomain} the default domain is
				1135	@code{messages}. This is the name specified in the specification of
				1136	the @code{gettext} API. This name is as good as any other name. No
				1137	program should ever really use a domain with this name since this can
				1138	only lead to problems.
				1139
				1140	The function returns the value which is from now on taken as the default
				1141	domain. If the system went out of memory the returned value is
				1142	@code{NULL} and the global variable @var{errno} is set to @code{ENOMEM}.
				1143	Despite the return value type being @code{char *} the return string must
				1144	not be changed. It is allocated internally by the @code{textdomain}
				1145	function.
				1146
				1147	If the @var{domainname} parameter is the null pointer no new default
				1148	domain is set. Instead the currently selected default domain is
				1149	returned.
				1150
				1151	If the @var{domainname} parameter is the empty string the default domain
				1152	is reset to its initial value, the domain with the name @code{messages}.
				1153	This possibility is questionable to use since the domain @code{messages}
				1154	really never should be used.
				1155	@end deftypefun
				1156
				1157	@comment libintl.h
				1158	@comment GNU
				1159	@deftypefun {char } bindtextdomain (const char @var{domainname}, const char *@var{dirname})
				1160	@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
				1161	@c bindtextdomain @ascuheap @acsmem
				1162	@c set_binding_values @ascuheap @acsmem
				1163	@c libc_rwlock_wrlock dup @asulock @aculock
				1164	@c strcmp dup ok
				1165	@c strdup dup @ascuheap @acsmem
				1166	@c free dup @ascuheap @acsmem
				1167	@c malloc dup @ascuheap @acsmem
				1168	The @code{bindtextdomain} function can be used to specify the directory
				1169	which contains the message catalogs for domain @var{domainname} for the
				1170	different languages. To be correct, this is the directory where the
				1171	hierarchy of directories is expected. Details are explained below.
				1172
				1173	For the programmer it is important to note that the translations which
				1174	come with the program have be placed in a directory hierarchy starting
				1175	at, say, @file{/foo/bar}. Then the program should make a
				1176	@code{bindtextdomain} call to bind the domain for the current program to
				1177	this directory. So it is made sure the catalogs are found. A correctly
				1178	running program does not depend on the user setting an environment
				1179	variable.
				1180
				1181	The @code{bindtextdomain} function can be used several times and if the
				1182	@var{domainname} argument is different the previously bound domains
				1183	will not be overwritten.
				1184
				1185	If the program which wish to use @code{bindtextdomain} at some point of
				1186	time use the @code{chdir} function to change the current working
				1187	directory it is important that the @var{dirname} strings ought to be an
				1188	absolute pathname. Otherwise the addressed directory might vary with
				1189	the time.
				1190
				1191	If the @var{dirname} parameter is the null pointer @code{bindtextdomain}
				1192	returns the currently selected directory for the domain with the name
				1193	@var{domainname}.
				1194
				1195	The @code{bindtextdomain} function returns a pointer to a string
				1196	containing the name of the selected directory name. The string is
				1197	allocated internally in the function and must not be changed by the
				1198	user. If the system went out of core during the execution of
				1199	@code{bindtextdomain} the return value is @code{NULL} and the global
				1200	variable @var{errno} is set accordingly.
				1201	@end deftypefun
				1202
				1203
				1204	@node Advanced gettext functions
				1205	@subsubsection Additional functions for more complicated situations
				1206
				1207	The functions of the @code{gettext} family described so far (and all the
				1208	@code{catgets} functions as well) have one problem in the real world
				1209	which have been neglected completely in all existing approaches. What
				1210	is meant here is the handling of plural forms.
				1211
				1212	Looking through Unix source code before the time anybody thought about
				1213	internationalization (and, sadly, even afterwards) one can often find
				1214	code similar to the following:
				1215
				1216	@smallexample
				1217	printf ("%d file%s deleted", n, n == 1 ? "" : "s");
				1218	@end smallexample
				1219
				1220	@noindent
				1221	After the first complaints from people internationalizing the code people
				1222	either completely avoided formulations like this or used strings like
				1223	@code{"file(s)"}. Both look unnatural and should be avoided. First
				1224	tries to solve the problem correctly looked like this:
				1225
				1226	@smallexample
				1227	if (n == 1)
				1228	printf ("%d file deleted", n);
				1229	else
				1230	printf ("%d files deleted", n);
				1231	@end smallexample
				1232
				1233	But this does not solve the problem. It helps languages where the
				1234	plural form of a noun is not simply constructed by adding an `s' but
				1235	that is all. Once again people fell into the trap of believing the
				1236	rules their language is using are universal. But the handling of plural
				1237	forms differs widely between the language families. There are two
				1238	things we can differ between (and even inside language families);
				1239
				1240	@itemize @bullet
				1241	@item
				1242	The form how plural forms are build differs. This is a problem with
				1243	language which have many irregularities. German, for instance, is a
				1244	drastic case. Though English and German are part of the same language
				1245	family (Germanic), the almost regular forming of plural noun forms
				1246	(appending an `s') is hardly found in German.
				1247
				1248	@item
				1249	The number of plural forms differ. This is somewhat surprising for
				1250	those who only have experiences with Romanic and Germanic languages
				1251	since here the number is the same (there are two).
				1252
				1253	But other language families have only one form or many forms. More
				1254	information on this in an extra section.
				1255	@end itemize
				1256
				1257	The consequence of this is that application writers should not try to
				1258	solve the problem in their code. This would be localization since it is
				1259	only usable for certain, hardcoded language environments. Instead the
				1260	extended @code{gettext} interface should be used.
				1261
				1262	These extra functions are taking instead of the one key string two
				1263	strings and a numerical argument. The idea behind this is that using
				1264	the numerical argument and the first string as a key, the implementation
				1265	can select using rules specified by the translator the right plural
				1266	form. The two string arguments then will be used to provide a return
				1267	value in case no message catalog is found (similar to the normal
				1268	@code{gettext} behavior). In this case the rules for Germanic language
				1269	is used and it is assumed that the first string argument is the singular
				1270	form, the second the plural form.
				1271
				1272	This has the consequence that programs without language catalogs can
				1273	display the correct strings only if the program itself is written using
				1274	a Germanic language. This is a limitation but since @theglibc{}
				1275	(as well as the GNU @code{gettext} package) are written as part of the
				1276	GNU package and the coding standards for the GNU project require program
				1277	being written in English, this solution nevertheless fulfills its
				1278	purpose.
				1279
				1280	@comment libintl.h
				1281	@comment GNU
				1282	@deftypefun {char } ngettext (const char @var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
				1283	@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
				1284	@c Wrapper for dcngettext.
				1285	The @code{ngettext} function is similar to the @code{gettext} function
				1286	as it finds the message catalogs in the same way. But it takes two
				1287	extra arguments. The @var{msgid1} parameter must contain the singular
				1288	form of the string to be converted. It is also used as the key for the
				1289	search in the catalog. The @var{msgid2} parameter is the plural form.
				1290	The parameter @var{n} is used to determine the plural form. If no
				1291	message catalog is found @var{msgid1} is returned if @code{n == 1},
				1292	otherwise @code{msgid2}.
				1293
				1294	An example for the us of this function is:
				1295
				1296	@smallexample
				1297	printf (ngettext ("%d file removed", "%d files removed", n), n);
				1298	@end smallexample
				1299
				1300	Please note that the numeric value @var{n} has to be passed to the
				1301	@code{printf} function as well. It is not sufficient to pass it only to
				1302	@code{ngettext}.
				1303	@end deftypefun
				1304
				1305	@comment libintl.h
				1306	@comment GNU
				1307	@deftypefun {char } dngettext (const char @var{domain}, const char @var{msgid1}, const char @var{msgid2}, unsigned long int @var{n})
				1308	@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
				1309	@c Wrapper for dcngettext.
				1310	The @code{dngettext} is similar to the @code{dgettext} function in the
				1311	way the message catalog is selected. The difference is that it takes
				1312	two extra parameter to provide the correct plural form. These two
				1313	parameters are handled in the same way @code{ngettext} handles them.
				1314	@end deftypefun
				1315
				1316	@comment libintl.h
				1317	@comment GNU
				1318	@deftypefun {char } dcngettext (const char @var{domain}, const char @var{msgid1}, const char @var{msgid2}, unsigned long int @var{n}, int @var{category})
				1319	@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
				1320	@c Wrapper for dcigettext.
				1321	The @code{dcngettext} is similar to the @code{dcgettext} function in the
				1322	way the message catalog is selected. The difference is that it takes
				1323	two extra parameter to provide the correct plural form. These two
				1324	parameters are handled in the same way @code{ngettext} handles them.
				1325	@end deftypefun
				1326
				1327	@subsubheading The problem of plural forms
				1328
				1329	A description of the problem can be found at the beginning of the last
				1330	section. Now there is the question how to solve it. Without the input
				1331	of linguists (which was not available) it was not possible to determine
				1332	whether there are only a few different forms in which plural forms are
				1333	formed or whether the number can increase with every new supported
				1334	language.
				1335
				1336	Therefore the solution implemented is to allow the translator to specify
				1337	the rules of how to select the plural form. Since the formula varies
				1338	with every language this is the only viable solution except for
				1339	hardcoding the information in the code (which still would require the
				1340	possibility of extensions to not prevent the use of new languages). The
				1341	details are explained in the GNU @code{gettext} manual. Here only a
				1342	bit of information is provided.
				1343
				1344	The information about the plural form selection has to be stored in the
				1345	header entry (the one with the empty (@code{msgid} string). It looks
				1346	like this:
				1347
				1348	@smallexample
				1349	Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
				1350	@end smallexample
				1351
				1352	The @code{nplurals} value must be a decimal number which specifies how
				1353	many different plural forms exist for this language. The string
				1354	following @code{plural} is an expression which is using the C language
				1355	syntax. Exceptions are that no negative number are allowed, numbers
				1356	must be decimal, and the only variable allowed is @code{n}. This
				1357	expression will be evaluated whenever one of the functions
				1358	@code{ngettext}, @code{dngettext}, or @code{dcngettext} is called. The
				1359	numeric value passed to these functions is then substituted for all uses
				1360	of the variable @code{n} in the expression. The resulting value then
				1361	must be greater or equal to zero and smaller than the value given as the
				1362	value of @code{nplurals}.
				1363
				1364	@noindent
				1365	The following rules are known at this point. The language with families
				1366	are listed. But this does not necessarily mean the information can be
				1367	generalized for the whole family (as can be easily seen in the table
				1368	below).@footnote{Additions are welcome. Send appropriate information to
				1369	@email{bug-glibc-manual@@gnu.org}.}
				1370
				1371	@table @asis
				1372	@item Only one form:
				1373	Some languages only require one single form. There is no distinction
				1374	between the singular and plural form. An appropriate header entry
				1375	would look like this:
				1376
				1377	@smallexample
				1378	Plural-Forms: nplurals=1; plural=0;
				1379	@end smallexample
				1380
				1381	@noindent
				1382	Languages with this property include:
				1383
				1384	@table @asis
				1385	@item Finno-Ugric family
				1386	Hungarian
				1387	@item Asian family
				1388	Japanese, Korean
				1389	@item Turkic/Altaic family
				1390	Turkish
				1391	@end table
				1392
				1393	@item Two forms, singular used for one only
				1394	This is the form used in most existing programs since it is what English
				1395	is using. A header entry would look like this:
				1396
				1397	@smallexample
				1398	Plural-Forms: nplurals=2; plural=n != 1;
				1399	@end smallexample
				1400
				1401	(Note: this uses the feature of C expressions that boolean expressions
				1402	have to value zero or one.)
				1403
				1404	@noindent
				1405	Languages with this property include:
				1406
				1407	@table @asis
				1408	@item Germanic family
				1409	Danish, Dutch, English, German, Norwegian, Swedish
				1410	@item Finno-Ugric family
				1411	Estonian, Finnish
				1412	@item Latin/Greek family
				1413	Greek
				1414	@item Semitic family
				1415	Hebrew
				1416	@item Romance family
				1417	Italian, Portuguese, Spanish
				1418	@item Artificial
				1419	Esperanto
				1420	@end table
				1421
				1422	@item Two forms, singular used for zero and one
				1423	Exceptional case in the language family. The header entry would be:
				1424
				1425	@smallexample
				1426	Plural-Forms: nplurals=2; plural=n>1;
				1427	@end smallexample
				1428
				1429	@noindent
				1430	Languages with this property include:
				1431
				1432	@table @asis
				1433	@item Romanic family
				1434	French, Brazilian Portuguese
				1435	@end table
				1436
				1437	@item Three forms, special case for zero
				1438	The header entry would be:
				1439
				1440	@smallexample
				1441	Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2;
				1442	@end smallexample
				1443
				1444	@noindent
				1445	Languages with this property include:
				1446
				1447	@table @asis
				1448	@item Baltic family
				1449	Latvian
				1450	@end table
				1451
				1452	@item Three forms, special cases for one and two
				1453	The header entry would be:
				1454
				1455	@smallexample
				1456	Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;
				1457	@end smallexample
				1458
				1459	@noindent
				1460	Languages with this property include:
				1461
				1462	@table @asis
				1463	@item Celtic
				1464	Gaeilge (Irish)
				1465	@end table
				1466
				1467	@item Three forms, special case for numbers ending in 1[2-9]
				1468	The header entry would look like this:
				1469
				1470	@smallexample
				1471	Plural-Forms: nplurals=3; \
				1472	plural=n%10==1 && n%100!=11 ? 0 : \
				1473	n%10>=2 && (n%100<10 \|\| n%100>=20) ? 1 : 2;
				1474	@end smallexample
				1475
				1476	@noindent
				1477	Languages with this property include:
				1478
				1479	@table @asis
				1480	@item Baltic family
				1481	Lithuanian
				1482	@end table
				1483
				1484	@item Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]
				1485	The header entry would look like this:
				1486
				1487	@smallexample
				1488	Plural-Forms: nplurals=3; \
				1489	plural=n%100/10==1 ? 2 : n%10==1 ? 0 : (n+9)%10>3 ? 2 : 1;
				1490	@end smallexample
				1491
				1492	@noindent
				1493	Languages with this property include:
				1494
				1495	@table @asis
				1496	@item Slavic family
				1497	Croatian, Czech, Russian, Ukrainian
				1498	@end table
				1499
				1500	@item Three forms, special cases for 1 and 2, 3, 4
				1501	The header entry would look like this:
				1502
				1503	@smallexample
				1504	Plural-Forms: nplurals=3; \
				1505	plural=(n==1) ? 1 : (n>=2 && n<=4) ? 2 : 0;
				1506	@end smallexample
				1507
				1508	@noindent
				1509	Languages with this property include:
				1510
				1511	@table @asis
				1512	@item Slavic family
				1513	Slovak
				1514	@end table
				1515
				1516	@item Three forms, special case for one and some numbers ending in 2, 3, or 4
				1517	The header entry would look like this:
				1518
				1519	@smallexample
				1520	Plural-Forms: nplurals=3; \
				1521	plural=n==1 ? 0 : \
				1522	n%10>=2 && n%10<=4 && (n%100<10 \|\| n%100>=20) ? 1 : 2;
				1523	@end smallexample
				1524
				1525	@noindent
				1526	Languages with this property include:
				1527
				1528	@table @asis
				1529	@item Slavic family
				1530	Polish
				1531	@end table
				1532
				1533	@item Four forms, special case for one and all numbers ending in 02, 03, or 04
				1534	The header entry would look like this:
				1535
				1536	@smallexample
				1537	Plural-Forms: nplurals=4; \
				1538	plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 \|\| n%100==4 ? 2 : 3;
				1539	@end smallexample
				1540
				1541	@noindent
				1542	Languages with this property include:
				1543
				1544	@table @asis
				1545	@item Slavic family
				1546	Slovenian
				1547	@end table
				1548	@end table
				1549
				1550
				1551	@node Charset conversion in gettext
				1552	@subsubsection How to specify the output character set @code{gettext} uses
				1553
				1554	@code{gettext} not only looks up a translation in a message catalog. It
				1555	also converts the translation on the fly to the desired output character
				1556	set. This is useful if the user is working in a different character set
				1557	than the translator who created the message catalog, because it avoids
				1558	distributing variants of message catalogs which differ only in the
				1559	character set.
				1560
				1561	The output character set is, by default, the value of @code{nl_langinfo
				1562	(CODESET)}, which depends on the @code{LC_CTYPE} part of the current
				1563	locale. But programs which store strings in a locale independent way
				1564	(e.g. UTF-8) can request that @code{gettext} and related functions
				1565	return the translations in that encoding, by use of the
				1566	@code{bind_textdomain_codeset} function.
				1567
				1568	Note that the @var{msgid} argument to @code{gettext} is not subject to
				1569	character set conversion. Also, when @code{gettext} does not find a
				1570	translation for @var{msgid}, it returns @var{msgid} unchanged --
				1571	independently of the current output character set. It is therefore
				1572	recommended that all @var{msgid}s be US-ASCII strings.
				1573
				1574	@comment libintl.h
				1575	@comment GNU
				1576	@deftypefun {char } bind_textdomain_codeset (const char @var{domainname}, const char *@var{codeset})
				1577	@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
				1578	@c bind_textdomain_codeset @ascuheap @acsmem
				1579	@c set_binding_values dup @ascuheap @acsmem
				1580	The @code{bind_textdomain_codeset} function can be used to specify the
				1581	output character set for message catalogs for domain @var{domainname}.
				1582	The @var{codeset} argument must be a valid codeset name which can be used
				1583	for the @code{iconv_open} function, or a null pointer.
				1584
				1585	If the @var{codeset} parameter is the null pointer,
				1586	@code{bind_textdomain_codeset} returns the currently selected codeset
				1587	for the domain with the name @var{domainname}. It returns @code{NULL} if
				1588	no codeset has yet been selected.
				1589
				1590	The @code{bind_textdomain_codeset} function can be used several times.
				1591	If used multiple times with the same @var{domainname} argument, the
				1592	later call overrides the settings made by the earlier one.
				1593
				1594	The @code{bind_textdomain_codeset} function returns a pointer to a
				1595	string containing the name of the selected codeset. The string is
				1596	allocated internally in the function and must not be changed by the
				1597	user. If the system went out of core during the execution of
				1598	@code{bind_textdomain_codeset}, the return value is @code{NULL} and the
				1599	global variable @var{errno} is set accordingly.
				1600	@end deftypefun
				1601
				1602
				1603	@node GUI program problems
				1604	@subsubsection How to use @code{gettext} in GUI programs
				1605
				1606	One place where the @code{gettext} functions, if used normally, have big
				1607	problems is within programs with graphical user interfaces (GUIs). The
				1608	problem is that many of the strings which have to be translated are very
				1609	short. They have to appear in pull-down menus which restricts the
				1610	length. But strings which are not containing entire sentences or at
				1611	least large fragments of a sentence may appear in more than one
				1612	situation in the program but might have different translations. This is
				1613	especially true for the one-word strings which are frequently used in
				1614	GUI programs.
				1615
				1616	As a consequence many people say that the @code{gettext} approach is
				1617	wrong and instead @code{catgets} should be used which indeed does not
				1618	have this problem. But there is a very simple and powerful method to
				1619	handle these kind of problems with the @code{gettext} functions.
				1620
				1621	@noindent
				1622	As an example consider the following fictional situation. A GUI program
				1623	has a menu bar with the following entries:
				1624
				1625	@smallexample
				1626	+------------+------------+--------------------------------------+
				1627	\| File \| Printer \| \|
				1628	+------------+------------+--------------------------------------+
				1629	\| Open \| \| Select \|
				1630	\| New \| \| Open \|
				1631	+----------+ \| Connect \|
				1632	+----------+
				1633	@end smallexample
				1634
				1635	To have the strings @code{File}, @code{Printer}, @code{Open},
				1636	@code{New}, @code{Select}, and @code{Connect} translated there has to be
				1637	at some point in the code a call to a function of the @code{gettext}
				1638	family. But in two places the string passed into the function would be
				1639	@code{Open}. The translations might not be the same and therefore we
				1640	are in the dilemma described above.
				1641
				1642	One solution to this problem is to artificially enlengthen the strings
				1643	to make them unambiguous. But what would the program do if no
				1644	translation is available? The enlengthened string is not what should be
				1645	printed. So we should use a little bit modified version of the functions.
				1646
				1647	To enlengthen the strings a uniform method should be used. E.g., in the
				1648	example above the strings could be chosen as
				1649
				1650	@smallexample
				1651	Menu\|File
				1652	Menu\|Printer
				1653	Menu\|File\|Open
				1654	Menu\|File\|New
				1655	Menu\|Printer\|Select
				1656	Menu\|Printer\|Open
				1657	Menu\|Printer\|Connect
				1658	@end smallexample
				1659
				1660	Now all the strings are different and if now instead of @code{gettext}
				1661	the following little wrapper function is used, everything works just
				1662	fine:
				1663
				1664	@cindex sgettext
				1665	@smallexample
				1666	char *
				1667	sgettext (const char *msgid)
				1668	@{
				1669	char *msgval = gettext (msgid);
				1670	if (msgval == msgid)
				1671	msgval = strrchr (msgid, '\|') + 1;
				1672	return msgval;
				1673	@}
				1674	@end smallexample
				1675
				1676	What this little function does is to recognize the case when no
				1677	translation is available. This can be done very efficiently by a
				1678	pointer comparison since the return value is the input value. If there
				1679	is no translation we know that the input string is in the format we used
				1680	for the Menu entries and therefore contains a @code{\|} character. We
				1681	simply search for the last occurrence of this character and return a
				1682	pointer to the character following it. That's it!
				1683
				1684	If one now consistently uses the enlengthened string form and replaces
				1685	the @code{gettext} calls with calls to @code{sgettext} (this is normally
				1686	limited to very few places in the GUI implementation) then it is
				1687	possible to produce a program which can be internationalized.
				1688
				1689	With advanced compilers (such as GNU C) one can write the
				1690	@code{sgettext} functions as an inline function or as a macro like this:
				1691
				1692	@cindex sgettext
				1693	@smallexample
				1694	#define sgettext(msgid) \
				1695	(@{ const char *__msgid = (msgid); \
				1696	char *__msgstr = gettext (__msgid); \
				1697	if (__msgval == __msgid) \
				1698	__msgval = strrchr (__msgid, '\|') + 1; \
				1699	__msgval; @})
				1700	@end smallexample
				1701
				1702	The other @code{gettext} functions (@code{dgettext}, @code{dcgettext}
				1703	and the @code{ngettext} equivalents) can and should have corresponding
				1704	functions as well which look almost identical, except for the parameters
				1705	and the call to the underlying function.
				1706
				1707	Now there is of course the question why such functions do not exist in
				1708	@theglibc{}? There are two parts of the answer to this question.
				1709
				1710	@itemize @bullet
				1711	@item
				1712	They are easy to write and therefore can be provided by the project they
				1713	are used in. This is not an answer by itself and must be seen together
				1714	with the second part which is:
				1715
				1716	@item
				1717	There is no way the C library can contain a version which can work
				1718	everywhere. The problem is the selection of the character to separate
				1719	the prefix from the actual string in the enlenghtened string. The
				1720	examples above used @code{\|} which is a quite good choice because it
				1721	resembles a notation frequently used in this context and it also is a
				1722	character not often used in message strings.
				1723
				1724	But what if the character is used in message strings. Or if the chose
				1725	character is not available in the character set on the machine one
				1726	compiles (e.g., @code{\|} is not required to exist for @w{ISO C}; this is
				1727	why the @file{iso646.h} file exists in @w{ISO C} programming environments).
				1728	@end itemize
				1729
				1730	There is only one more comment to make left. The wrapper function above
				1731	require that the translations strings are not enlengthened themselves.
				1732	This is only logical. There is no need to disambiguate the strings
				1733	(since they are never used as keys for a search) and one also saves
				1734	quite some memory and disk space by doing this.
				1735
				1736
				1737	@node Using gettextized software
				1738	@subsubsection User influence on @code{gettext}
				1739
				1740	The last sections described what the programmer can do to
				1741	internationalize the messages of the program. But it is finally up to
				1742	the user to select the message s/he wants to see. S/He must understand
				1743	them.
				1744
				1745	The POSIX locale model uses the environment variables @code{LC_COLLATE},
				1746	@code{LC_CTYPE}, @code{LC_MESSAGES}, @code{LC_MONETARY}, @code{LC_NUMERIC},
				1747	and @code{LC_TIME} to select the locale which is to be used. This way
				1748	the user can influence lots of functions. As we mentioned above the
				1749	@code{gettext} functions also take advantage of this.
				1750
				1751	To understand how this happens it is necessary to take a look at the
				1752	various components of the filename which gets computed to locate a
				1753	message catalog. It is composed as follows:
				1754
				1755	@smallexample
				1756	@var{dir_name}/@var{locale}/LC_@var{category}/@var{domain_name}.mo
				1757	@end smallexample
				1758
				1759	The default value for @var{dir_name} is system specific. It is computed
				1760	from the value given as the prefix while configuring the C library.
				1761	This value normally is @file{/usr} or @file{/}. For the former the
				1762	complete @var{dir_name} is:
				1763
				1764	@smallexample
				1765	/usr/share/locale
				1766	@end smallexample
				1767
				1768	We can use @file{/usr/share} since the @file{.mo} files containing the
				1769	message catalogs are system independent, so all systems can use the same
				1770	files. If the program executed the @code{bindtextdomain} function for
				1771	the message domain that is currently handled, the @code{dir_name}
				1772	component is exactly the value which was given to the function as
				1773	the second parameter. I.e., @code{bindtextdomain} allows overwriting
				1774	the only system dependent and fixed value to make it possible to
				1775	address files anywhere in the filesystem.
				1776
				1777	The @var{category} is the name of the locale category which was selected
				1778	in the program code. For @code{gettext} and @code{dgettext} this is
				1779	always @code{LC_MESSAGES}, for @code{dcgettext} this is selected by the
				1780	value of the third parameter. As said above it should be avoided to
				1781	ever use a category other than @code{LC_MESSAGES}.
				1782
				1783	The @var{locale} component is computed based on the category used. Just
				1784	like for the @code{setlocale} function here comes the user selection
				1785	into the play. Some environment variables are examined in a fixed order
				1786	and the first environment variable set determines the return value of
				1787	the lookup process. In detail, for the category @code{LC_xxx} the
				1788	following variables in this order are examined:
				1789
				1790	@table @code
				1791	@item LANGUAGE
				1792	@item LC_ALL
				1793	@item LC_xxx
				1794	@item LANG
				1795	@end table
				1796
				1797	This looks very familiar. With the exception of the @code{LANGUAGE}
				1798	environment variable this is exactly the lookup order the
				1799	@code{setlocale} function uses. But why introducing the @code{LANGUAGE}
				1800	variable?
				1801
				1802	The reason is that the syntax of the values these variables can have is
				1803	different to what is expected by the @code{setlocale} function. If we
				1804	would set @code{LC_ALL} to a value following the extended syntax that
				1805	would mean the @code{setlocale} function will never be able to use the
				1806	value of this variable as well. An additional variable removes this
				1807	problem plus we can select the language independently of the locale
				1808	setting which sometimes is useful.
				1809
				1810	While for the @code{LC_xxx} variables the value should consist of
				1811	exactly one specification of a locale the @code{LANGUAGE} variable's
				1812	value can consist of a colon separated list of locale names. The
				1813	attentive reader will realize that this is the way we manage to
				1814	implement one of our additional demands above: we want to be able to
				1815	specify an ordered list of language.
				1816
				1817	Back to the constructed filename we have only one component missing.
				1818	The @var{domain_name} part is the name which was either registered using
				1819	the @code{textdomain} function or which was given to @code{dgettext} or
				1820	@code{dcgettext} as the first parameter. Now it becomes obvious that a
				1821	good choice for the domain name in the program code is a string which is
				1822	closely related to the program/package name. E.g., for @theglibc{}
				1823	the domain name is @code{libc}.
				1824
				1825	@noindent
				1826	A limit piece of example code should show how the programmer is supposed
				1827	to work:
				1828
				1829	@smallexample
				1830	@{
				1831	setlocale (LC_ALL, "");
				1832	textdomain ("test-package");
				1833	bindtextdomain ("test-package", "/usr/local/share/locale");
				1834	puts (gettext ("Hello, world!"));
				1835	@}
				1836	@end smallexample
				1837
				1838	At the program start the default domain is @code{messages}, and the
				1839	default locale is "C". The @code{setlocale} call sets the locale
				1840	according to the user's environment variables; remember that correct
				1841	functioning of @code{gettext} relies on the correct setting of the
				1842	@code{LC_MESSAGES} locale (for looking up the message catalog) and
				1843	of the @code{LC_CTYPE} locale (for the character set conversion).
				1844	The @code{textdomain} call changes the default domain to
				1845	@code{test-package}. The @code{bindtextdomain} call specifies that
				1846	the message catalogs for the domain @code{test-package} can be found
				1847	below the directory @file{/usr/local/share/locale}.
				1848
				1849	If now the user set in her/his environment the variable @code{LANGUAGE}
				1850	to @code{de} the @code{gettext} function will try to use the
				1851	translations from the file
				1852
				1853	@smallexample
				1854	/usr/local/share/locale/de/LC_MESSAGES/test-package.mo
				1855	@end smallexample
				1856
				1857	From the above descriptions it should be clear which component of this
				1858	filename is determined by which source.
				1859
				1860	In the above example we assumed that the @code{LANGUAGE} environment
				1861	variable to @code{de}. This might be an appropriate selection but what
				1862	happens if the user wants to use @code{LC_ALL} because of the wider
				1863	usability and here the required value is @code{de_DE.ISO-8859-1}? We
				1864	already mentioned above that a situation like this is not infrequent.
				1865	E.g., a person might prefer reading a dialect and if this is not
				1866	available fall back on the standard language.
				1867
				1868	The @code{gettext} functions know about situations like this and can
				1869	handle them gracefully. The functions recognize the format of the value
				1870	of the environment variable. It can split the value is different pieces
				1871	and by leaving out the only or the other part it can construct new
				1872	values. This happens of course in a predictable way. To understand
				1873	this one must know the format of the environment variable value. There
				1874	is one more or less standardized form, originally from the X/Open
				1875	specification:
				1876
				1877	@code{language[_territory[.codeset]][@@modifier]}
				1878
				1879	Less specific locale names will be stripped of in the order of the
				1880	following list:
				1881
				1882	@enumerate
				1883	@item
				1884	@code{codeset}
				1885	@item
				1886	@code{normalized codeset}
				1887	@item
				1888	@code{territory}
				1889	@item
				1890	@code{modifier}
				1891	@end enumerate
				1892
				1893	The @code{language} field will never be dropped for obvious reasons.
				1894
				1895	The only new thing is the @code{normalized codeset} entry. This is
				1896	another goodie which is introduced to help reducing the chaos which
				1897	derives from the inability of the people to standardize the names of
				1898	character sets. Instead of @w{ISO-8859-1} one can often see @w{8859-1},
				1899	@w{88591}, @w{iso8859-1}, or @w{iso_8859-1}. The @code{normalized
				1900	codeset} value is generated from the user-provided character set name by
				1901	applying the following rules:
				1902
				1903	@enumerate
				1904	@item
				1905	Remove all characters beside numbers and letters.
				1906	@item
				1907	Fold letters to lowercase.
				1908	@item
				1909	If the same only contains digits prepend the string @code{"iso"}.
				1910	@end enumerate
				1911
				1912	@noindent
				1913	So all of the above name will be normalized to @code{iso88591}. This
				1914	allows the program user much more freely choosing the locale name.
				1915
				1916	Even this extended functionality still does not help to solve the
				1917	problem that completely different names can be used to denote the same
				1918	locale (e.g., @code{de} and @code{german}). To be of help in this
				1919	situation the locale implementation and also the @code{gettext}
				1920	functions know about aliases.
				1921
				1922	The file @file{/usr/share/locale/locale.alias} (replace @file{/usr} with
				1923	whatever prefix you used for configuring the C library) contains a
				1924	mapping of alternative names to more regular names. The system manager
				1925	is free to add new entries to fill her/his own needs. The selected
				1926	locale from the environment is compared with the entries in the first
				1927	column of this file ignoring the case. If they match the value of the
				1928	second column is used instead for the further handling.
				1929
				1930	In the description of the format of the environment variables we already
				1931	mentioned the character set as a factor in the selection of the message
				1932	catalog. In fact, only catalogs which contain text written using the
				1933	character set of the system/program can be used (directly; there will
				1934	come a solution for this some day). This means for the user that s/he
				1935	will always have to take care for this. If in the collection of the
				1936	message catalogs there are files for the same language but coded using
				1937	different character sets the user has to be careful.
				1938
				1939
				1940	@node Helper programs for gettext
				1941	@subsection Programs to handle message catalogs for @code{gettext}
				1942
				1943	@Theglibc{} does not contain the source code for the programs to
				1944	handle message catalogs for the @code{gettext} functions. As part of
				1945	the GNU project the GNU gettext package contains everything the
				1946	developer needs. The functionality provided by the tools in this
				1947	package by far exceeds the abilities of the @code{gencat} program
				1948	described above for the @code{catgets} functions.
				1949
				1950	There is a program @code{msgfmt} which is the equivalent program to the
				1951	@code{gencat} program. It generates from the human-readable and
				1952	-editable form of the message catalog a binary file which can be used by
				1953	the @code{gettext} functions. But there are several more programs
				1954	available.
				1955
				1956	The @code{xgettext} program can be used to automatically extract the
				1957	translatable messages from a source file. I.e., the programmer need not
				1958	take care of the translations and the list of messages which have to be
				1959	translated. S/He will simply wrap the translatable string in calls to
				1960	@code{gettext} et.al and the rest will be done by @code{xgettext}. This
				1961	program has a lot of options which help to customize the output or
				1962	help to understand the input better.
				1963
				1964	Other programs help to manage the development cycle when new messages appear
				1965	in the source files or when a new translation of the messages appears.
				1966	Here it should only be noted that using all the tools in GNU gettext it
				1967	is possible to @emph{completely} automate the handling of message
				1968	catalogs. Beside marking the translatable strings in the source code and
				1969	generating the translations the developers do not have anything to do
				1970	themselves.