Blame - ap/libc/glibc/glibc-2.23/manual/io.texi - T106_DC

blob: bd82f76ee821d43b2dd1ea4588d4a3ae54cfd42c [file] [log] [blame]

lh	9ed821d	2023-04-07 01:36:19 -0700	[diff] [blame]	1	@node I/O Overview, I/O on Streams, Pattern Matching, Top
				2	@c %MENU% Introduction to the I/O facilities
				3	@chapter Input/Output Overview
				4
				5	Most programs need to do either input (reading data) or output (writing
				6	data), or most frequently both, in order to do anything useful. @Theglibc{}
				7	provides such a large selection of input and output functions
				8	that the hardest part is often deciding which function is most
				9	appropriate!
				10
				11	This chapter introduces concepts and terminology relating to input
				12	and output. Other chapters relating to the GNU I/O facilities are:
				13
				14	@itemize @bullet
				15	@item
				16	@ref{I/O on Streams}, which covers the high-level functions
				17	that operate on streams, including formatted input and output.
				18
				19	@item
				20	@ref{Low-Level I/O}, which covers the basic I/O and control
				21	functions on file descriptors.
				22
				23	@item
				24	@ref{File System Interface}, which covers functions for operating on
				25	directories and for manipulating file attributes such as access modes
				26	and ownership.
				27
				28	@item
				29	@ref{Pipes and FIFOs}, which includes information on the basic interprocess
				30	communication facilities.
				31
				32	@item
				33	@ref{Sockets}, which covers a more complicated interprocess communication
				34	facility with support for networking.
				35
				36	@item
				37	@ref{Low-Level Terminal Interface}, which covers functions for changing
				38	how input and output to terminals or other serial devices are processed.
				39	@end itemize
				40
				41
				42	@menu
				43	* I/O Concepts:: Some basic information and terminology.
				44	* File Names:: How to refer to a file.
				45	@end menu
				46
				47	@node I/O Concepts, File Names, , I/O Overview
				48	@section Input/Output Concepts
				49
				50	Before you can read or write the contents of a file, you must establish
				51	a connection or communications channel to the file. This process is
				52	called @dfn{opening} the file. You can open a file for reading, writing,
				53	or both.
				54	@cindex opening a file
				55
				56	The connection to an open file is represented either as a stream or as a
				57	file descriptor. You pass this as an argument to the functions that do
				58	the actual read or write operations, to tell them which file to operate
				59	on. Certain functions expect streams, and others are designed to
				60	operate on file descriptors.
				61
				62	When you have finished reading to or writing from the file, you can
				63	terminate the connection by @dfn{closing} the file. Once you have
				64	closed a stream or file descriptor, you cannot do any more input or
				65	output operations on it.
				66
				67	@menu
				68	* Streams and File Descriptors:: The GNU C Library provides two ways
				69	to access the contents of files.
				70	* File Position:: The number of bytes from the
				71	beginning of the file.
				72	@end menu
				73
				74	@node Streams and File Descriptors, File Position, , I/O Concepts
				75	@subsection Streams and File Descriptors
				76
				77	When you want to do input or output to a file, you have a choice of two
				78	basic mechanisms for representing the connection between your program
				79	and the file: file descriptors and streams. File descriptors are
				80	represented as objects of type @code{int}, while streams are represented
				81	as @code{FILE *} objects.
				82
				83	File descriptors provide a primitive, low-level interface to input and
				84	output operations. Both file descriptors and streams can represent a
				85	connection to a device (such as a terminal), or a pipe or socket for
				86	communicating with another process, as well as a normal file. But, if
				87	you want to do control operations that are specific to a particular kind
				88	of device, you must use a file descriptor; there are no facilities to
				89	use streams in this way. You must also use file descriptors if your
				90	program needs to do input or output in special modes, such as
				91	nonblocking (or polled) input (@pxref{File Status Flags}).
				92
				93	Streams provide a higher-level interface, layered on top of the
				94	primitive file descriptor facilities. The stream interface treats all
				95	kinds of files pretty much alike---the sole exception being the three
				96	styles of buffering that you can choose (@pxref{Stream Buffering}).
				97
				98	The main advantage of using the stream interface is that the set of
				99	functions for performing actual input and output operations (as opposed
				100	to control operations) on streams is much richer and more powerful than
				101	the corresponding facilities for file descriptors. The file descriptor
				102	interface provides only simple functions for transferring blocks of
				103	characters, but the stream interface also provides powerful formatted
				104	input and output functions (@code{printf} and @code{scanf}) as well as
				105	functions for character- and line-oriented input and output.
				106	@c !!! glibc has dprintf, which lets you do printf on an fd.
				107
				108	Since streams are implemented in terms of file descriptors, you can
				109	extract the file descriptor from a stream and perform low-level
				110	operations directly on the file descriptor. You can also initially open
				111	a connection as a file descriptor and then make a stream associated with
				112	that file descriptor.
				113
				114	In general, you should stick with using streams rather than file
				115	descriptors, unless there is some specific operation you want to do that
				116	can only be done on a file descriptor. If you are a beginning
				117	programmer and aren't sure what functions to use, we suggest that you
				118	concentrate on the formatted input functions (@pxref{Formatted Input})
				119	and formatted output functions (@pxref{Formatted Output}).
				120
				121	If you are concerned about portability of your programs to systems other
				122	than GNU, you should also be aware that file descriptors are not as
				123	portable as streams. You can expect any system running @w{ISO C} to
				124	support streams, but @nongnusystems{} may not support file descriptors at
				125	all, or may only implement a subset of the GNU functions that operate on
				126	file descriptors. Most of the file descriptor functions in @theglibc{}
				127	are included in the POSIX.1 standard, however.
				128
				129	@node File Position, , Streams and File Descriptors, I/O Concepts
				130	@subsection File Position
				131
				132	One of the attributes of an open file is its @dfn{file position} that
				133	keeps track of where in the file the next character is to be read or
				134	written. On @gnusystems{}, and all POSIX.1 systems, the file position
				135	is simply an integer representing the number of bytes from the beginning
				136	of the file.
				137
				138	The file position is normally set to the beginning of the file when it
				139	is opened, and each time a character is read or written, the file
				140	position is incremented. In other words, access to the file is normally
				141	@dfn{sequential}.
				142	@cindex file position
				143	@cindex sequential-access files
				144
				145	Ordinary files permit read or write operations at any position within
				146	the file. Some other kinds of files may also permit this. Files which
				147	do permit this are sometimes referred to as @dfn{random-access} files.
				148	You can change the file position using the @code{fseek} function on a
				149	stream (@pxref{File Positioning}) or the @code{lseek} function on a file
				150	descriptor (@pxref{I/O Primitives}). If you try to change the file
				151	position on a file that doesn't support random access, you get the
				152	@code{ESPIPE} error.
				153	@cindex random-access files
				154
				155	Streams and descriptors that are opened for @dfn{append access} are
				156	treated specially for output: output to such files is @emph{always}
				157	appended sequentially to the @emph{end} of the file, regardless of the
				158	file position. However, the file position is still used to control where in
				159	the file reading is done.
				160	@cindex append-access files
				161
				162	If you think about it, you'll realize that several programs can read a
				163	given file at the same time. In order for each program to be able to
				164	read the file at its own pace, each program must have its own file
				165	pointer, which is not affected by anything the other programs do.
				166
				167	In fact, each opening of a file creates a separate file position.
				168	Thus, if you open a file twice even in the same program, you get two
				169	streams or descriptors with independent file positions.
				170
				171	By contrast, if you open a descriptor and then duplicate it to get
				172	another descriptor, these two descriptors share the same file position:
				173	changing the file position of one descriptor will affect the other.
				174
				175	@node File Names, , I/O Concepts, I/O Overview
				176	@section File Names
				177
				178	In order to open a connection to a file, or to perform other operations
				179	such as deleting a file, you need some way to refer to the file. Nearly
				180	all files have names that are strings---even files which are actually
				181	devices such as tape drives or terminals. These strings are called
				182	@dfn{file names}. You specify the file name to say which file you want
				183	to open or operate on.
				184
				185	This section describes the conventions for file names and how the
				186	operating system works with them.
				187	@cindex file name
				188
				189	@menu
				190	* Directories:: Directories contain entries for files.
				191	* File Name Resolution:: A file name specifies how to look up a file.
				192	* File Name Errors:: Error conditions relating to file names.
				193	* File Name Portability:: File name portability and syntax issues.
				194	@end menu
				195
				196
				197	@node Directories, File Name Resolution, , File Names
				198	@subsection Directories
				199
				200	In order to understand the syntax of file names, you need to understand
				201	how the file system is organized into a hierarchy of directories.
				202
				203	@cindex directory
				204	@cindex link
				205	@cindex directory entry
				206	A @dfn{directory} is a file that contains information to associate other
				207	files with names; these associations are called @dfn{links} or
				208	@dfn{directory entries}. Sometimes, people speak of ``files in a
				209	directory'', but in reality, a directory only contains pointers to
				210	files, not the files themselves.
				211
				212	@cindex file name component
				213	The name of a file contained in a directory entry is called a @dfn{file
				214	name component}. In general, a file name consists of a sequence of one
				215	or more such components, separated by the slash character (@samp{/}). A
				216	file name which is just one component names a file with respect to its
				217	directory. A file name with multiple components names a directory, and
				218	then a file in that directory, and so on.
				219
				220	Some other documents, such as the POSIX standard, use the term
				221	@dfn{pathname} for what we call a file name, and either @dfn{filename}
				222	or @dfn{pathname component} for what this manual calls a file name
				223	component. We don't use this terminology because a ``path'' is
				224	something completely different (a list of directories to search), and we
				225	think that ``pathname'' used for something else will confuse users. We
				226	always use ``file name'' and ``file name component'' (or sometimes just
				227	``component'', where the context is obvious) in GNU documentation. Some
				228	macros use the POSIX terminology in their names, such as
				229	@code{PATH_MAX}. These macros are defined by the POSIX standard, so we
				230	cannot change their names.
				231
				232	You can find more detailed information about operations on directories
				233	in @ref{File System Interface}.
				234
				235	@node File Name Resolution, File Name Errors, Directories, File Names
				236	@subsection File Name Resolution
				237
				238	A file name consists of file name components separated by slash
				239	(@samp{/}) characters. On the systems that @theglibc{} supports,
				240	multiple successive @samp{/} characters are equivalent to a single
				241	@samp{/} character.
				242
				243	@cindex file name resolution
				244	The process of determining what file a file name refers to is called
				245	@dfn{file name resolution}. This is performed by examining the
				246	components that make up a file name in left-to-right order, and locating
				247	each successive component in the directory named by the previous
				248	component. Of course, each of the files that are referenced as
				249	directories must actually exist, be directories instead of regular
				250	files, and have the appropriate permissions to be accessible by the
				251	process; otherwise the file name resolution fails.
				252
				253	@cindex root directory
				254	@cindex absolute file name
				255	If a file name begins with a @samp{/}, the first component in the file
				256	name is located in the @dfn{root directory} of the process (usually all
				257	processes on the system have the same root directory). Such a file name
				258	is called an @dfn{absolute file name}.
				259	@c !!! xref here to chroot, if we ever document chroot. -rm
				260
				261	@cindex relative file name
				262	Otherwise, the first component in the file name is located in the
				263	current working directory (@pxref{Working Directory}). This kind of
				264	file name is called a @dfn{relative file name}.
				265
				266	@cindex parent directory
				267	The file name components @file{.} (``dot'') and @file{..} (``dot-dot'')
				268	have special meanings. Every directory has entries for these file name
				269	components. The file name component @file{.} refers to the directory
				270	itself, while the file name component @file{..} refers to its
				271	@dfn{parent directory} (the directory that contains the link for the
				272	directory in question). As a special case, @file{..} in the root
				273	directory refers to the root directory itself, since it has no parent;
				274	thus @file{/..} is the same as @file{/}.
				275
				276	Here are some examples of file names:
				277
				278	@table @file
				279	@item /a
				280	The file named @file{a}, in the root directory.
				281
				282	@item /a/b
				283	The file named @file{b}, in the directory named @file{a} in the root directory.
				284
				285	@item a
				286	The file named @file{a}, in the current working directory.
				287
				288	@item /a/./b
				289	This is the same as @file{/a/b}.
				290
				291	@item ./a
				292	The file named @file{a}, in the current working directory.
				293
				294	@item ../a
				295	The file named @file{a}, in the parent directory of the current working
				296	directory.
				297	@end table
				298
				299	@c An empty string may ``work'', but I think it's confusing to
				300	@c try to describe it. It's not a useful thing for users to use--rms.
				301	A file name that names a directory may optionally end in a @samp{/}.
				302	You can specify a file name of @file{/} to refer to the root directory,
				303	but the empty string is not a meaningful file name. If you want to
				304	refer to the current working directory, use a file name of @file{.} or
				305	@file{./}.
				306
				307	Unlike some other operating systems, @gnusystems{} don't have any
				308	built-in support for file types (or extensions) or file versions as part
				309	of its file name syntax. Many programs and utilities use conventions
				310	for file names---for example, files containing C source code usually
				311	have names suffixed with @samp{.c}---but there is nothing in the file
				312	system itself that enforces this kind of convention.
				313
				314	@node File Name Errors, File Name Portability, File Name Resolution, File Names
				315	@subsection File Name Errors
				316
				317	@cindex file name errors
				318	@cindex usual file name errors
				319
				320	Functions that accept file name arguments usually detect these
				321	@code{errno} error conditions relating to the file name syntax or
				322	trouble finding the named file. These errors are referred to throughout
				323	this manual as the @dfn{usual file name errors}.
				324
				325	@table @code
				326	@item EACCES
				327	The process does not have search permission for a directory component
				328	of the file name.
				329
				330	@item ENAMETOOLONG
				331	This error is used when either the total length of a file name is
				332	greater than @code{PATH_MAX}, or when an individual file name component
				333	has a length greater than @code{NAME_MAX}. @xref{Limits for Files}.
				334
				335	On @gnuhurdsystems{}, there is no imposed limit on overall file name
				336	length, but some file systems may place limits on the length of a
				337	component.
				338
				339	@item ENOENT
				340	This error is reported when a file referenced as a directory component
				341	in the file name doesn't exist, or when a component is a symbolic link
				342	whose target file does not exist. @xref{Symbolic Links}.
				343
				344	@item ENOTDIR
				345	A file that is referenced as a directory component in the file name
				346	exists, but it isn't a directory.
				347
				348	@item ELOOP
				349	Too many symbolic links were resolved while trying to look up the file
				350	name. The system has an arbitrary limit on the number of symbolic links
				351	that may be resolved in looking up a single file name, as a primitive
				352	way to detect loops. @xref{Symbolic Links}.
				353	@end table
				354
				355
				356	@node File Name Portability, , File Name Errors, File Names
				357	@subsection Portability of File Names
				358
				359	The rules for the syntax of file names discussed in @ref{File Names},
				360	are the rules normally used by @gnusystems{} and by other POSIX
				361	systems. However, other operating systems may use other conventions.
				362
				363	There are two reasons why it can be important for you to be aware of
				364	file name portability issues:
				365
				366	@itemize @bullet
				367	@item
				368	If your program makes assumptions about file name syntax, or contains
				369	embedded literal file name strings, it is more difficult to get it to
				370	run under other operating systems that use different syntax conventions.
				371
				372	@item
				373	Even if you are not concerned about running your program on machines
				374	that run other operating systems, it may still be possible to access
				375	files that use different naming conventions. For example, you may be
				376	able to access file systems on another computer running a different
				377	operating system over a network, or read and write disks in formats used
				378	by other operating systems.
				379	@end itemize
				380
				381	The @w{ISO C} standard says very little about file name syntax, only that
				382	file names are strings. In addition to varying restrictions on the
				383	length of file names and what characters can validly appear in a file
				384	name, different operating systems use different conventions and syntax
				385	for concepts such as structured directories and file types or
				386	extensions. Some concepts such as file versions might be supported in
				387	some operating systems and not by others.
				388
				389	The POSIX.1 standard allows implementations to put additional
				390	restrictions on file name syntax, concerning what characters are
				391	permitted in file names and on the length of file name and file name
				392	component strings. However, on @gnusystems{}, any character except
				393	the null character is permitted in a file name string, and
				394	on @gnuhurdsystems{} there are no limits on the length of file name
				395	strings.