Blame - src/kernel/linux/v4.19/Documentation/filesystems/autofs.txt - T800

blob: 373ad25852d35bf40de7bbdb00a77f186e82e128 [file] [log] [blame]

xj	b04a402	2021-11-25 15:01:52 +0800	[diff] [blame]	1	<head>
				2	<style> p { max-width:50em} ol, ul {max-width: 40em}</style>
				3	</head>
				4
				5	autofs - how it works
				6	=====================
				7
				8	Purpose
				9	-------
				10
				11	The goal of autofs is to provide on-demand mounting and race free
				12	automatic unmounting of various other filesystems. This provides two
				13	key advantages:
				14
				15	1. There is no need to delay boot until all filesystems that
				16	might be needed are mounted. Processes that try to access those
				17	slow filesystems might be delayed but other processes can
				18	continue freely. This is particularly important for
				19	network filesystems (e.g. NFS) or filesystems stored on
				20	media with a media-changing robot.
				21
				22	2. The names and locations of filesystems can be stored in
				23	a remote database and can change at any time. The content
				24	in that data base at the time of access will be used to provide
				25	a target for the access. The interpretation of names in the
				26	filesystem can even be programmatic rather than database-backed,
				27	allowing wildcards for example, and can vary based on the user who
				28	first accessed a name.
				29
				30	Context
				31	-------
				32
				33	The "autofs" filesystem module is only one part of an autofs system.
				34	There also needs to be a user-space program which looks up names
				35	and mounts filesystems. This will often be the "automount" program,
				36	though other tools including "systemd" can make use of "autofs".
				37	This document describes only the kernel module and the interactions
				38	required with any user-space program. Subsequent text refers to this
				39	as the "automount daemon" or simply "the daemon".
				40
				41	"autofs" is a Linux kernel module with provides the "autofs"
				42	filesystem type. Several "autofs" filesystems can be mounted and they
				43	can each be managed separately, or all managed by the same daemon.
				44
				45	Content
				46	-------
				47
				48	An autofs filesystem can contain 3 sorts of objects: directories,
				49	symbolic links and mount traps. Mount traps are directories with
				50	extra properties as described in the next section.
				51
				52	Objects can only be created by the automount daemon: symlinks are
				53	created with a regular `symlink` system call, while directories and
				54	mount traps are created with `mkdir`. The determination of whether a
				55	directory should be a mount trap or not is quite _ad hoc_, largely for
				56	historical reasons, and is determined in part by the
				57	direct/indirect/offset mount options, and the maxproto mount option.
				58
				59	If neither the direct or offset mount options are given (so the
				60	mount is considered to be indirect), then the root directory is
				61	always a regular directory, otherwise it is a mount trap when it is
				62	empty and a regular directory when not empty. Note that direct and
				63	offset are treated identically so a concise summary is that the root
				64	directory is a mount trap only if the filesystem is mounted direct
				65	and the root is empty.
				66
				67	Directories created in the root directory are mount traps only if the
				68	filesystem is mounted indirect and they are empty.
				69
				70	Directories further down the tree depend on the maxproto mount
				71	option and particularly whether it is less than five or not.
				72	When maxproto is five, no directories further down the
				73	tree are ever mount traps, they are always regular directories. When
				74	the maxproto is four (or three), these directories are mount traps
				75	precisely when they are empty.
				76
				77	So: non-empty (i.e. non-leaf) directories are never mount traps. Empty
				78	directories are sometimes mount traps, and sometimes not depending on
				79	where in the tree they are (root, top level, or lower), the maxproto,
				80	and whether the mount was indirect or not.
				81
				82	Mount Traps
				83	---------------
				84
				85	A core element of the implementation of autofs is the Mount Traps
				86	which are provided by the Linux VFS. Any directory provided by a
				87	filesystem can be designated as a trap. This involves two separate
				88	features that work together to allow autofs to do its job.
				89
				90	DCACHE_NEED_AUTOMOUNT
				91
				92	If a dentry has the DCACHE_NEED_AUTOMOUNT flag set (which gets set if
				93	the inode has S_AUTOMOUNT set, or can be set directly) then it is
				94	(potentially) a mount trap. Any access to this directory beyond a
				95	"`stat`" will (normally) cause the `d_op->d_automount()` dentry operation
				96	to be called. The task of this method is to find the filesystem that
				97	should be mounted on the directory and to return it. The VFS is
				98	responsible for actually mounting the root of this filesystem on the
				99	directory.
				100
				101	autofs doesn't find the filesystem itself but sends a message to the
				102	automount daemon asking it to find and mount the filesystem. The
				103	autofs `d_automount` method then waits for the daemon to report that
				104	everything is ready. It will then return "`NULL`" indicating that the
				105	mount has already happened. The VFS doesn't try to mount anything but
				106	follows down the mount that is already there.
				107
				108	This functionality is sufficient for some users of mount traps such
				109	as NFS which creates traps so that mountpoints on the server can be
				110	reflected on the client. However it is not sufficient for autofs. As
				111	mounting onto a directory is considered to be "beyond a `stat`", the
				112	automount daemon would not be able to mount a filesystem on the 'trap'
				113	directory without some way to avoid getting caught in the trap. For
				114	that purpose there is another flag.
				115
				116	DCACHE_MANAGE_TRANSIT
				117
				118	If a dentry has DCACHE_MANAGE_TRANSIT set then two very different but
				119	related behaviors are invoked, both using the `d_op->d_manage()`
				120	dentry operation.
				121
				122	Firstly, before checking to see if any filesystem is mounted on the
				123	directory, d_manage() will be called with the `rcu_walk` parameter set
				124	to `false`. It may return one of three things:
				125
				126	- A return value of zero indicates that there is nothing special
				127	about this dentry and normal checks for mounts and automounts
				128	should proceed.
				129
				130	autofs normally returns zero, but first waits for any
				131	expiry (automatic unmounting of the mounted filesystem) to
				132	complete. This avoids races.
				133
				134	- A return value of `-EISDIR` tells the VFS to ignore any mounts
				135	on the directory and to not consider calling `->d_automount()`.
				136	This effectively disables the DCACHE_NEED_AUTOMOUNT flag
				137	causing the directory not be a mount trap after all.
				138
				139	autofs returns this if it detects that the process performing the
				140	lookup is the automount daemon and that the mount has been
				141	requested but has not yet completed. How it determines this is
				142	discussed later. This allows the automount daemon not to get
				143	caught in the mount trap.
				144
				145	There is a subtlety here. It is possible that a second autofs
				146	filesystem can be mounted below the first and for both of them to
				147	be managed by the same daemon. For the daemon to be able to mount
				148	something on the second it must be able to "walk" down past the
				149	first. This means that d_manage cannot always return -EISDIR for
				150	the automount daemon. It must only return it when a mount has
				151	been requested, but has not yet completed.
				152
				153	`d_manage` also returns `-EISDIR` if the dentry shouldn't be a
				154	mount trap, either because it is a symbolic link or because it is
				155	not empty.
				156
				157	- Any other negative value is treated as an error and returned
				158	to the caller.
				159
				160	autofs can return
				161
				162	- -ENOENT if the automount daemon failed to mount anything,
				163	- -ENOMEM if it ran out of memory,
				164	- -EINTR if a signal arrived while waiting for expiry to
				165	complete
				166	- or any other error sent down by the automount daemon.
				167
				168
				169	The second use case only occurs during an "RCU-walk" and so `rcu_walk`
				170	will be set.
				171
				172	An RCU-walk is a fast and lightweight process for walking down a
				173	filename path (i.e. it is like running on tip-toes). RCU-walk cannot
				174	cope with all situations so when it finds a difficulty it falls back
				175	to "REF-walk", which is slower but more robust.
				176
				177	RCU-walk will never call `->d_automount`; the filesystems must already
				178	be mounted or RCU-walk cannot handle the path.
				179	To determine if a mount-trap is safe for RCU-walk mode it calls
				180	`->d_manage()` with `rcu_walk` set to `true`.
				181
				182	In this case `d_manage()` must avoid blocking and should avoid taking
				183	spinlocks if at all possible. Its sole purpose is to determine if it
				184	would be safe to follow down into any mounted directory and the only
				185	reason that it might not be is if an expiry of the mount is
				186	underway.
				187
				188	In the `rcu_walk` case, `d_manage()` cannot return -EISDIR to tell the
				189	VFS that this is a directory that doesn't require d_automount. If
				190	`rcu_walk` sees a dentry with DCACHE_NEED_AUTOMOUNT set but nothing
				191	mounted, it will fall back to REF-walk. `d_manage()` cannot make the
				192	VFS remain in RCU-walk mode, but can only tell it to get out of
				193	RCU-walk mode by returning `-ECHILD`.
				194
				195	So `d_manage()`, when called with `rcu_walk` set, should either return
				196	-ECHILD if there is any reason to believe it is unsafe to end the
				197	mounted filesystem, and otherwise should return 0.
				198
				199	autofs will return `-ECHILD` if an expiry of the filesystem has been
				200	initiated or is being considered, otherwise it returns 0.
				201
				202
				203	Mountpoint expiry
				204	-----------------
				205
				206	The VFS has a mechanism for automatically expiring unused mounts,
				207	much as it can expire any unused dentry information from the dcache.
				208	This is guided by the MNT_SHRINKABLE flag. This only applies to
				209	mounts that were created by `d_automount()` returning a filesystem to be
				210	mounted. As autofs doesn't return such a filesystem but leaves the
				211	mounting to the automount daemon, it must involve the automount daemon
				212	in unmounting as well. This also means that autofs has more control
				213	of expiry.
				214
				215	The VFS also supports "expiry" of mounts using the MNT_EXPIRE flag to
				216	the `umount` system call. Unmounting with MNT_EXPIRE will fail unless
				217	a previous attempt had been made, and the filesystem has been inactive
				218	and untouched since that previous attempt. autofs does not depend on
				219	this but has its own internal tracking of whether filesystems were
				220	recently used. This allows individual names in the autofs directory
				221	to expire separately.
				222
				223	With version 4 of the protocol, the automount daemon can try to
				224	unmount any filesystems mounted on the autofs filesystem or remove any
				225	symbolic links or empty directories any time it likes. If the unmount
				226	or removal is successful the filesystem will be returned to the state
				227	it was before the mount or creation, so that any access of the name
				228	will trigger normal auto-mount processing. In particlar, `rmdir` and
				229	`unlink` do not leave negative entries in the dcache as a normal
				230	filesystem would, so an attempt to access a recently-removed object is
				231	passed to autofs for handling.
				232
				233	With version 5, this is not safe except for unmounting from top-level
				234	directories. As lower-level directories are never mount traps, other
				235	processes will see an empty directory as soon as the filesystem is
				236	unmounted. So it is generally safest to use the autofs expiry
				237	protocol described below.
				238
				239	Normally the daemon only wants to remove entries which haven't been
				240	used for a while. For this purpose autofs maintains a "`last_used`"
				241	time stamp on each directory or symlink. For symlinks it genuinely
				242	does record the last time the symlink was "used" or followed to find
				243	out where it points to. For directories the field is a slight
				244	misnomer. It actually records the last time that autofs checked if
				245	the directory or one of its descendents was busy and found that it
				246	was. This is just as useful and doesn't require updating the field so
				247	often.
				248
				249	The daemon is able to ask autofs if anything is due to be expired,
				250	using an `ioctl` as discussed later. For a direct mount, autofs
				251	considers if the entire mount-tree can be unmounted or not. For an
				252	indirect mount, autofs considers each of the names in the top level
				253	directory to determine if any of those can be unmounted and cleaned
				254	up.
				255
				256	There is an option with indirect mounts to consider each of the leaves
				257	that has been mounted on instead of considering the top-level names.
				258	This is intended for compatability with version 4 of autofs and should
				259	be considered as deprecated.
				260
				261	When autofs considers a directory it checks the `last_used` time and
				262	compares it with the "timeout" value set when the filesystem was
				263	mounted, though this check is ignored in some cases. It also checks if
				264	the directory or anything below it is in use. For symbolic links,
				265	only the `last_used` time is ever considered.
				266
				267	If both appear to support expiring the directory or symlink, an action
				268	is taken.
				269
				270	There are two ways to ask autofs to consider expiry. The first is to
				271	use the AUTOFS_IOC_EXPIRE ioctl. This only works for indirect
				272	mounts. If it finds something in the root directory to expire it will
				273	return the name of that thing. Once a name has been returned the
				274	automount daemon needs to unmount any filesystems mounted below the
				275	name normally. As described above, this is unsafe for non-toplevel
				276	mounts in a version-5 autofs. For this reason the current `automountd`
				277	does not use this ioctl.
				278
				279	The second mechanism uses either the AUTOFS_DEV_IOCTL_EXPIRE_CMD or
				280	the AUTOFS_IOC_EXPIRE_MULTI ioctl. This will work for both direct and
				281	indirect mounts. If it selects an object to expire, it will notify
				282	the daemon using the notification mechanism described below. This
				283	will block until the daemon acknowledges the expiry notification.
				284	This implies that the "`EXPIRE`" ioctl must be sent from a different
				285	thread than the one which handles notification.
				286
				287	While the ioctl is blocking, the entry is marked as "expiring" and
				288	`d_manage` will block until the daemon affirms that the unmount has
				289	completed (together with removing any directories that might have been
				290	necessary), or has been aborted.
				291
				292	Communicating with autofs: detecting the daemon
				293	-----------------------------------------------
				294
				295	There are several forms of communication between the automount daemon
				296	and the filesystem. As we have already seen, the daemon can create and
				297	remove directories and symlinks using normal filesystem operations.
				298	autofs knows whether a process requesting some operation is the daemon
				299	or not based on its process-group id number (see getpgid(1)).
				300
				301	When an autofs filesystem is mounted the pgid of the mounting
				302	processes is recorded unless the "pgrp=" option is given, in which
				303	case that number is recorded instead. Any request arriving from a
				304	process in that process group is considered to come from the daemon.
				305	If the daemon ever has to be stopped and restarted a new pgid can be
				306	provided through an ioctl as will be described below.
				307
				308	Communicating with autofs: the event pipe
				309	-----------------------------------------
				310
				311	When an autofs filesystem is mounted, the 'write' end of a pipe must
				312	be passed using the 'fd=' mount option. autofs will write
				313	notification messages to this pipe for the daemon to respond to.
				314	For version 5, the format of the message is:
				315
				316	struct autofs_v5_packet {
				317	int proto_version; /* Protocol version */
				318	int type; /* Type of packet */
				319	autofs_wqt_t wait_queue_token;
				320	__u32 dev;
				321	__u64 ino;
				322	__u32 uid;
				323	__u32 gid;
				324	__u32 pid;
				325	__u32 tgid;
				326	__u32 len;
				327	char name[NAME_MAX+1];
				328	};
				329
				330	where the type is one of
				331
				332	autofs_ptype_missing_indirect
				333	autofs_ptype_expire_indirect
				334	autofs_ptype_missing_direct
				335	autofs_ptype_expire_direct
				336
				337	so messages can indicate that a name is missing (something tried to
				338	access it but it isn't there) or that it has been selected for expiry.
				339
				340	The pipe will be set to "packet mode" (equivalent to passing
				341	`O_DIRECT`) to _pipe2(2)_ so that a read from the pipe will return at
				342	most one packet, and any unread portion of a packet will be discarded.
				343
				344	The `wait_queue_token` is a unique number which can identify a
				345	particular request to be acknowledged. When a message is sent over
				346	the pipe the affected dentry is marked as either "active" or
				347	"expiring" and other accesses to it block until the message is
				348	acknowledged using one of the ioctls below and the relevant
				349	`wait_queue_token`.
				350
				351	Communicating with autofs: root directory ioctls
				352	------------------------------------------------
				353
				354	The root directory of an autofs filesystem will respond to a number of
				355	ioctls. The process issuing the ioctl must have the CAP_SYS_ADMIN
				356	capability, or must be the automount daemon.
				357
				358	The available ioctl commands are:
				359
				360	- AUTOFS_IOC_READY: a notification has been handled. The argument
				361	to the ioctl command is the "wait_queue_token" number
				362	corresponding to the notification being acknowledged.
				363	- AUTOFS_IOC_FAIL: similar to above, but indicates failure with
				364	the error code `ENOENT`.
				365	- AUTOFS_IOC_CATATONIC: Causes the autofs to enter "catatonic"
				366	mode meaning that it stops sending notifications to the daemon.
				367	This mode is also entered if a write to the pipe fails.
				368	- AUTOFS_IOC_PROTOVER: This returns the protocol version in use.
				369	- AUTOFS_IOC_PROTOSUBVER: Returns the protocol sub-version which
				370	is really a version number for the implementation. It is
				371	currently 2.
				372	- AUTOFS_IOC_SETTIMEOUT: This passes a pointer to an unsigned
				373	long. The value is used to set the timeout for expiry, and
				374	the current timeout value is stored back through the pointer.
				375	- AUTOFS_IOC_ASKUMOUNT: Returns, in the pointed-to `int`, 1 if
				376	the filesystem could be unmounted. This is only a hint as
				377	the situation could change at any instant. This call can be
				378	use to avoid a more expensive full unmount attempt.
				379	- AUTOFS_IOC_EXPIRE: as described above, this asks if there is
				380	anything suitable to expire. A pointer to a packet:
				381
				382	struct autofs_packet_expire_multi {
				383	int proto_version; /* Protocol version */
				384	int type; /* Type of packet */
				385	autofs_wqt_t wait_queue_token;
				386	int len;
				387	char name[NAME_MAX+1];
				388	};
				389
				390	is required. This is filled in with the name of something
				391	that can be unmounted or removed. If nothing can be expired,
				392	`errno` is set to `EAGAIN`. Even though a `wait_queue_token`
				393	is present in the structure, no "wait queue" is established
				394	and no acknowledgment is needed.
				395	- AUTOFS_IOC_EXPIRE_MULTI: This is similar to
				396	AUTOFS_IOC_EXPIRE except that it causes notification to be
				397	sent to the daemon, and it blocks until the daemon acknowledges.
				398	The argument is an integer which can contain two different flags.
				399
				400	AUTOFS_EXP_IMMEDIATE causes `last_used` time to be ignored
				401	and objects are expired if the are not in use.
				402
				403	AUTOFS_EXP_LEAVES will select a leaf rather than a top-level
				404	name to expire. This is only safe when maxproto is 4.
				405
				406	Communicating with autofs: char-device ioctls
				407	---------------------------------------------
				408
				409	It is not always possible to open the root of an autofs filesystem,
				410	particularly a direct mounted filesystem. If the automount daemon
				411	is restarted there is no way for it to regain control of existing
				412	mounts using any of the above communication channels. To address this
				413	need there is a "miscellaneous" character device (major 10, minor 235)
				414	which can be used to communicate directly with the autofs filesystem.
				415	It requires CAP_SYS_ADMIN for access.
				416
				417	The `ioctl`s that can be used on this device are described in a separate
				418	document `autofs-mount-control.txt`, and are summarized briefly here.
				419	Each ioctl is passed a pointer to an `autofs_dev_ioctl` structure:
				420
				421	struct autofs_dev_ioctl {
				422	__u32 ver_major;
				423	__u32 ver_minor;
				424	__u32 size; /* total size of data passed in
				425	* including this struct */
				426	__s32 ioctlfd; /* automount command fd */
				427
				428	/* Command parameters */
				429	union {
				430	struct args_protover protover;
				431	struct args_protosubver protosubver;
				432	struct args_openmount openmount;
				433	struct args_ready ready;
				434	struct args_fail fail;
				435	struct args_setpipefd setpipefd;
				436	struct args_timeout timeout;
				437	struct args_requester requester;
				438	struct args_expire expire;
				439	struct args_askumount askumount;
				440	struct args_ismountpoint ismountpoint;
				441	};
				442
				443	char path[0];
				444	};
				445
				446	For the OPEN_MOUNT and IS_MOUNTPOINT commands, the target
				447	filesystem is identified by the `path`. All other commands identify
				448	the filesystem by the `ioctlfd` which is a file descriptor open on the
				449	root, and which can be returned by OPEN_MOUNT.
				450
				451	The `ver_major` and `ver_minor` are in/out parameters which check that
				452	the requested version is supported, and report the maximum version
				453	that the kernel module can support.
				454
				455	Commands are:
				456
				457	- AUTOFS_DEV_IOCTL_VERSION_CMD: does nothing, except validate and
				458	set version numbers.
				459	- AUTOFS_DEV_IOCTL_OPENMOUNT_CMD: return an open file descriptor
				460	on the root of an autofs filesystem. The filesystem is identified
				461	by name and device number, which is stored in `openmount.devid`.
				462	Device numbers for existing filesystems can be found in
				463	`/proc/self/mountinfo`.
				464	- AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD: same as `close(ioctlfd)`.
				465	- AUTOFS_DEV_IOCTL_SETPIPEFD_CMD: if the filesystem is in
				466	catatonic mode, this can provide the write end of a new pipe
				467	in `setpipefd.pipefd` to re-establish communication with a daemon.
				468	The process group of the calling process is used to identify the
				469	daemon.
				470	- AUTOFS_DEV_IOCTL_REQUESTER_CMD: `path` should be a
				471	name within the filesystem that has been auto-mounted on.
				472	On successful return, `requester.uid` and `requester.gid` will be
				473	the UID and GID of the process which triggered that mount.
				474	- AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD: Check if path is a
				475	mountpoint of a particular type - see separate documentation for
				476	details.
				477	- AUTOFS_DEV_IOCTL_PROTOVER_CMD:
				478	- AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD:
				479	- AUTOFS_DEV_IOCTL_READY_CMD:
				480	- AUTOFS_DEV_IOCTL_FAIL_CMD:
				481	- AUTOFS_DEV_IOCTL_CATATONIC_CMD:
				482	- AUTOFS_DEV_IOCTL_TIMEOUT_CMD:
				483	- AUTOFS_DEV_IOCTL_EXPIRE_CMD:
				484	- AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD: These all have the same
				485	function as the similarly named AUTOFS_IOC ioctls, except
				486	that FAIL can be given an explicit error number in `fail.status`
				487	instead of assuming `ENOENT`, and this EXPIRE command
				488	corresponds to AUTOFS_IOC_EXPIRE_MULTI.
				489
				490	Catatonic mode
				491	--------------
				492
				493	As mentioned, an autofs mount can enter "catatonic" mode. This
				494	happens if a write to the notification pipe fails, or if it is
				495	explicitly requested by an `ioctl`.
				496
				497	When entering catatonic mode, the pipe is closed and any pending
				498	notifications are acknowledged with the error `ENOENT`.
				499
				500	Once in catatonic mode attempts to access non-existing names will
				501	result in `ENOENT` while attempts to access existing directories will
				502	be treated in the same way as if they came from the daemon, so mount
				503	traps will not fire.
				504
				505	When the filesystem is mounted a _uid_ and _gid_ can be given which
				506	set the ownership of directories and symbolic links. When the
				507	filesystem is in catatonic mode, any process with a matching UID can
				508	create directories or symlinks in the root directory, but not in other
				509	directories.
				510
				511	Catatonic mode can only be left via the
				512	AUTOFS_DEV_IOCTL_OPENMOUNT_CMD ioctl on the `/dev/autofs`.
				513
				514	autofs, name spaces, and shared mounts
				515	--------------------------------------
				516
				517	With bind mounts and name spaces it is possible for an autofs
				518	filesystem to appear at multiple places in one or more filesystem
				519	name spaces. For this to work sensibly, the autofs filesystem should
				520	always be mounted "shared". e.g.
				521
				522	> `mount --make-shared /autofs/mount/point`
				523
				524	The automount daemon is only able to manage a single mount location for
				525	an autofs filesystem and if mounts on that are not 'shared', other
				526	locations will not behave as expected. In particular access to those
				527	other locations will likely result in the `ELOOP` error
				528
				529	> Too many levels of symbolic links