Blame - src/kernel/linux/v4.19/Documentation/security/credentials.rst - T800

blob: 5bb7125faeee24873e93e5aa6d05db9c5f8d3eb5 [file] [log] [blame]

xj	b04a402	2021-11-25 15:01:52 +0800	[diff] [blame]	1	====================
				2	Credentials in Linux
				3	====================
				4
				5	By: David Howells <dhowells@redhat.com>
				6
				7	.. contents:: :local:
				8
				9	Overview
				10	========
				11
				12	There are several parts to the security check performed by Linux when one
				13	object acts upon another:
				14
				15	1. Objects.
				16
				17	Objects are things in the system that may be acted upon directly by
				18	userspace programs. Linux has a variety of actionable objects, including:
				19
				20	- Tasks
				21	- Files/inodes
				22	- Sockets
				23	- Message queues
				24	- Shared memory segments
				25	- Semaphores
				26	- Keys
				27
				28	As a part of the description of all these objects there is a set of
				29	credentials. What's in the set depends on the type of object.
				30
				31	2. Object ownership.
				32
				33	Amongst the credentials of most objects, there will be a subset that
				34	indicates the ownership of that object. This is used for resource
				35	accounting and limitation (disk quotas and task rlimits for example).
				36
				37	In a standard UNIX filesystem, for instance, this will be defined by the
				38	UID marked on the inode.
				39
				40	3. The objective context.
				41
				42	Also amongst the credentials of those objects, there will be a subset that
				43	indicates the 'objective context' of that object. This may or may not be
				44	the same set as in (2) - in standard UNIX files, for instance, this is the
				45	defined by the UID and the GID marked on the inode.
				46
				47	The objective context is used as part of the security calculation that is
				48	carried out when an object is acted upon.
				49
				50	4. Subjects.
				51
				52	A subject is an object that is acting upon another object.
				53
				54	Most of the objects in the system are inactive: they don't act on other
				55	objects within the system. Processes/tasks are the obvious exception:
				56	they do stuff; they access and manipulate things.
				57
				58	Objects other than tasks may under some circumstances also be subjects.
				59	For instance an open file may send SIGIO to a task using the UID and EUID
				60	given to it by a task that called ``fcntl(F_SETOWN)`` upon it. In this case,
				61	the file struct will have a subjective context too.
				62
				63	5. The subjective context.
				64
				65	A subject has an additional interpretation of its credentials. A subset
				66	of its credentials forms the 'subjective context'. The subjective context
				67	is used as part of the security calculation that is carried out when a
				68	subject acts.
				69
				70	A Linux task, for example, has the FSUID, FSGID and the supplementary
				71	group list for when it is acting upon a file - which are quite separate
				72	from the real UID and GID that normally form the objective context of the
				73	task.
				74
				75	6. Actions.
				76
				77	Linux has a number of actions available that a subject may perform upon an
				78	object. The set of actions available depends on the nature of the subject
				79	and the object.
				80
				81	Actions include reading, writing, creating and deleting files; forking or
				82	signalling and tracing tasks.
				83
				84	7. Rules, access control lists and security calculations.
				85
				86	When a subject acts upon an object, a security calculation is made. This
				87	involves taking the subjective context, the objective context and the
				88	action, and searching one or more sets of rules to see whether the subject
				89	is granted or denied permission to act in the desired manner on the
				90	object, given those contexts.
				91
				92	There are two main sources of rules:
				93
				94	a. Discretionary access control (DAC):
				95
				96	Sometimes the object will include sets of rules as part of its
				97	description. This is an 'Access Control List' or 'ACL'. A Linux
				98	file may supply more than one ACL.
				99
				100	A traditional UNIX file, for example, includes a permissions mask that
				101	is an abbreviated ACL with three fixed classes of subject ('user',
				102	'group' and 'other'), each of which may be granted certain privileges
				103	('read', 'write' and 'execute' - whatever those map to for the object
				104	in question). UNIX file permissions do not allow the arbitrary
				105	specification of subjects, however, and so are of limited use.
				106
				107	A Linux file might also sport a POSIX ACL. This is a list of rules
				108	that grants various permissions to arbitrary subjects.
				109
				110	b. Mandatory access control (MAC):
				111
				112	The system as a whole may have one or more sets of rules that get
				113	applied to all subjects and objects, regardless of their source.
				114	SELinux and Smack are examples of this.
				115
				116	In the case of SELinux and Smack, each object is given a label as part
				117	of its credentials. When an action is requested, they take the
				118	subject label, the object label and the action and look for a rule
				119	that says that this action is either granted or denied.
				120
				121
				122	Types of Credentials
				123	====================
				124
				125	The Linux kernel supports the following types of credentials:
				126
				127	1. Traditional UNIX credentials.
				128
				129	- Real User ID
				130	- Real Group ID
				131
				132	The UID and GID are carried by most, if not all, Linux objects, even if in
				133	some cases it has to be invented (FAT or CIFS files for example, which are
				134	derived from Windows). These (mostly) define the objective context of
				135	that object, with tasks being slightly different in some cases.
				136
				137	- Effective, Saved and FS User ID
				138	- Effective, Saved and FS Group ID
				139	- Supplementary groups
				140
				141	These are additional credentials used by tasks only. Usually, an
				142	EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID
				143	will be used as the objective. For tasks, it should be noted that this is
				144	not always true.
				145
				146	2. Capabilities.
				147
				148	- Set of permitted capabilities
				149	- Set of inheritable capabilities
				150	- Set of effective capabilities
				151	- Capability bounding set
				152
				153	These are only carried by tasks. They indicate superior capabilities
				154	granted piecemeal to a task that an ordinary task wouldn't otherwise have.
				155	These are manipulated implicitly by changes to the traditional UNIX
				156	credentials, but can also be manipulated directly by the ``capset()``
				157	system call.
				158
				159	The permitted capabilities are those caps that the process might grant
				160	itself to its effective or permitted sets through ``capset()``. This
				161	inheritable set might also be so constrained.
				162
				163	The effective capabilities are the ones that a task is actually allowed to
				164	make use of itself.
				165
				166	The inheritable capabilities are the ones that may get passed across
				167	``execve()``.
				168
				169	The bounding set limits the capabilities that may be inherited across
				170	``execve()``, especially when a binary is executed that will execute as
				171	UID 0.
				172
				173	3. Secure management flags (securebits).
				174
				175	These are only carried by tasks. These govern the way the above
				176	credentials are manipulated and inherited over certain operations such as
				177	execve(). They aren't used directly as objective or subjective
				178	credentials.
				179
				180	4. Keys and keyrings.
				181
				182	These are only carried by tasks. They carry and cache security tokens
				183	that don't fit into the other standard UNIX credentials. They are for
				184	making such things as network filesystem keys available to the file
				185	accesses performed by processes, without the necessity of ordinary
				186	programs having to know about security details involved.
				187
				188	Keyrings are a special type of key. They carry sets of other keys and can
				189	be searched for the desired key. Each process may subscribe to a number
				190	of keyrings:
				191
				192	Per-thread keying
				193	Per-process keyring
				194	Per-session keyring
				195
				196	When a process accesses a key, if not already present, it will normally be
				197	cached on one of these keyrings for future accesses to find.
				198
				199	For more information on using keys, see ``Documentation/security/keys/*``.
				200
				201	5. LSM
				202
				203	The Linux Security Module allows extra controls to be placed over the
				204	operations that a task may do. Currently Linux supports several LSM
				205	options.
				206
				207	Some work by labelling the objects in a system and then applying sets of
				208	rules (policies) that say what operations a task with one label may do to
				209	an object with another label.
				210
				211	6. AF_KEY
				212
				213	This is a socket-based approach to credential management for networking
				214	stacks [RFC 2367]. It isn't discussed by this document as it doesn't
				215	interact directly with task and file credentials; rather it keeps system
				216	level credentials.
				217
				218
				219	When a file is opened, part of the opening task's subjective context is
				220	recorded in the file struct created. This allows operations using that file
				221	struct to use those credentials instead of the subjective context of the task
				222	that issued the operation. An example of this would be a file opened on a
				223	network filesystem where the credentials of the opened file should be presented
				224	to the server, regardless of who is actually doing a read or a write upon it.
				225
				226
				227	File Markings
				228	=============
				229
				230	Files on disk or obtained over the network may have annotations that form the
				231	objective security context of that file. Depending on the type of filesystem,
				232	this may include one or more of the following:
				233
				234	* UNIX UID, GID, mode;
				235	* Windows user ID;
				236	* Access control list;
				237	* LSM security label;
				238	* UNIX exec privilege escalation bits (SUID/SGID);
				239	* File capabilities exec privilege escalation bits.
				240
				241	These are compared to the task's subjective security context, and certain
				242	operations allowed or disallowed as a result. In the case of execve(), the
				243	privilege escalation bits come into play, and may allow the resulting process
				244	extra privileges, based on the annotations on the executable file.
				245
				246
				247	Task Credentials
				248	================
				249
				250	In Linux, all of a task's credentials are held in (uid, gid) or through
				251	(groups, keys, LSM security) a refcounted structure of type 'struct cred'.
				252	Each task points to its credentials by a pointer called 'cred' in its
				253	task_struct.
				254
				255	Once a set of credentials has been prepared and committed, it may not be
				256	changed, barring the following exceptions:
				257
				258	1. its reference count may be changed;
				259
				260	2. the reference count on the group_info struct it points to may be changed;
				261
				262	3. the reference count on the security data it points to may be changed;
				263
				264	4. the reference count on any keyrings it points to may be changed;
				265
				266	5. any keyrings it points to may be revoked, expired or have their security
				267	attributes changed; and
				268
				269	6. the contents of any keyrings to which it points may be changed (the whole
				270	point of keyrings being a shared set of credentials, modifiable by anyone
				271	with appropriate access).
				272
				273	To alter anything in the cred struct, the copy-and-replace principle must be
				274	adhered to. First take a copy, then alter the copy and then use RCU to change
				275	the task pointer to make it point to the new copy. There are wrappers to aid
				276	with this (see below).
				277
				278	A task may only alter its _own_ credentials; it is no longer permitted for a
				279	task to alter another's credentials. This means the ``capset()`` system call
				280	is no longer permitted to take any PID other than the one of the current
				281	process. Also ``keyctl_instantiate()`` and ``keyctl_negate()`` functions no
				282	longer permit attachment to process-specific keyrings in the requesting
				283	process as the instantiating process may need to create them.
				284
				285
				286	Immutable Credentials
				287	---------------------
				288
				289	Once a set of credentials has been made public (by calling ``commit_creds()``
				290	for example), it must be considered immutable, barring two exceptions:
				291
				292	1. The reference count may be altered.
				293
				294	2. Whilst the keyring subscriptions of a set of credentials may not be
				295	changed, the keyrings subscribed to may have their contents altered.
				296
				297	To catch accidental credential alteration at compile time, struct task_struct
				298	has _const_ pointers to its credential sets, as does struct file. Furthermore,
				299	certain functions such as ``get_cred()`` and ``put_cred()`` operate on const
				300	pointers, thus rendering casts unnecessary, but require to temporarily ditch
				301	the const qualification to be able to alter the reference count.
				302
				303
				304	Accessing Task Credentials
				305	--------------------------
				306
				307	A task being able to alter only its own credentials permits the current process
				308	to read or replace its own credentials without the need for any form of locking
				309	-- which simplifies things greatly. It can just call::
				310
				311	const struct cred *current_cred()
				312
				313	to get a pointer to its credentials structure, and it doesn't have to release
				314	it afterwards.
				315
				316	There are convenience wrappers for retrieving specific aspects of a task's
				317	credentials (the value is simply returned in each case)::
				318
				319	uid_t current_uid(void) Current's real UID
				320	gid_t current_gid(void) Current's real GID
				321	uid_t current_euid(void) Current's effective UID
				322	gid_t current_egid(void) Current's effective GID
				323	uid_t current_fsuid(void) Current's file access UID
				324	gid_t current_fsgid(void) Current's file access GID
				325	kernel_cap_t current_cap(void) Current's effective capabilities
				326	void *current_security(void) Current's LSM security pointer
				327	struct user_struct *current_user(void) Current's user account
				328
				329	There are also convenience wrappers for retrieving specific associated pairs of
				330	a task's credentials::
				331
				332	void current_uid_gid(uid_t , gid_t );
				333	void current_euid_egid(uid_t , gid_t );
				334	void current_fsuid_fsgid(uid_t , gid_t );
				335
				336	which return these pairs of values through their arguments after retrieving
				337	them from the current task's credentials.
				338
				339
				340	In addition, there is a function for obtaining a reference on the current
				341	process's current set of credentials::
				342
				343	const struct cred *get_current_cred(void);
				344
				345	and functions for getting references to one of the credentials that don't
				346	actually live in struct cred::
				347
				348	struct user_struct *get_current_user(void);
				349	struct group_info *get_current_groups(void);
				350
				351	which get references to the current process's user accounting structure and
				352	supplementary groups list respectively.
				353
				354	Once a reference has been obtained, it must be released with ``put_cred()``,
				355	``free_uid()`` or ``put_group_info()`` as appropriate.
				356
				357
				358	Accessing Another Task's Credentials
				359	------------------------------------
				360
				361	Whilst a task may access its own credentials without the need for locking, the
				362	same is not true of a task wanting to access another task's credentials. It
				363	must use the RCU read lock and ``rcu_dereference()``.
				364
				365	The ``rcu_dereference()`` is wrapped by::
				366
				367	const struct cred __task_cred(struct task_struct task);
				368
				369	This should be used inside the RCU read lock, as in the following example::
				370
				371	void foo(struct task_struct t, struct foo_data f)
				372	{
				373	const struct cred *tcred;
				374	...
				375	rcu_read_lock();
				376	tcred = __task_cred(t);
				377	f->uid = tcred->uid;
				378	f->gid = tcred->gid;
				379	f->groups = get_group_info(tcred->groups);
				380	rcu_read_unlock();
				381	...
				382	}
				383
				384	Should it be necessary to hold another task's credentials for a long period of
				385	time, and possibly to sleep whilst doing so, then the caller should get a
				386	reference on them using::
				387
				388	const struct cred get_task_cred(struct task_struct task);
				389
				390	This does all the RCU magic inside of it. The caller must call put_cred() on
				391	the credentials so obtained when they're finished with.
				392
				393	.. note::
				394	The result of ``__task_cred()`` should not be passed directly to
				395	``get_cred()`` as this may race with ``commit_cred()``.
				396
				397	There are a couple of convenience functions to access bits of another task's
				398	credentials, hiding the RCU magic from the caller::
				399
				400	uid_t task_uid(task) Task's real UID
				401	uid_t task_euid(task) Task's effective UID
				402
				403	If the caller is holding the RCU read lock at the time anyway, then::
				404
				405	__task_cred(task)->uid
				406	__task_cred(task)->euid
				407
				408	should be used instead. Similarly, if multiple aspects of a task's credentials
				409	need to be accessed, RCU read lock should be used, ``__task_cred()`` called,
				410	the result stored in a temporary pointer and then the credential aspects called
				411	from that before dropping the lock. This prevents the potentially expensive
				412	RCU magic from being invoked multiple times.
				413
				414	Should some other single aspect of another task's credentials need to be
				415	accessed, then this can be used::
				416
				417	task_cred_xxx(task, member)
				418
				419	where 'member' is a non-pointer member of the cred struct. For instance::
				420
				421	uid_t task_cred_xxx(task, suid);
				422
				423	will retrieve 'struct cred::suid' from the task, doing the appropriate RCU
				424	magic. This may not be used for pointer members as what they point to may
				425	disappear the moment the RCU read lock is dropped.
				426
				427
				428	Altering Credentials
				429	--------------------
				430
				431	As previously mentioned, a task may only alter its own credentials, and may not
				432	alter those of another task. This means that it doesn't need to use any
				433	locking to alter its own credentials.
				434
				435	To alter the current process's credentials, a function should first prepare a
				436	new set of credentials by calling::
				437
				438	struct cred *prepare_creds(void);
				439
				440	this locks current->cred_replace_mutex and then allocates and constructs a
				441	duplicate of the current process's credentials, returning with the mutex still
				442	held if successful. It returns NULL if not successful (out of memory).
				443
				444	The mutex prevents ``ptrace()`` from altering the ptrace state of a process
				445	whilst security checks on credentials construction and changing is taking place
				446	as the ptrace state may alter the outcome, particularly in the case of
				447	``execve()``.
				448
				449	The new credentials set should be altered appropriately, and any security
				450	checks and hooks done. Both the current and the proposed sets of credentials
				451	are available for this purpose as current_cred() will return the current set
				452	still at this point.
				453
				454	When replacing the group list, the new list must be sorted before it
				455	is added to the credential, as a binary search is used to test for
				456	membership. In practice, this means :c:func:`groups_sort` should be
				457	called before :c:func:`set_groups` or :c:func:`set_current_groups`.
				458	:c:func:`groups_sort)` must not be called on a ``struct group_list`` which
				459	is shared as it may permute elements as part of the sorting process
				460	even if the array is already sorted.
				461
				462	When the credential set is ready, it should be committed to the current process
				463	by calling::
				464
				465	int commit_creds(struct cred *new);
				466
				467	This will alter various aspects of the credentials and the process, giving the
				468	LSM a chance to do likewise, then it will use ``rcu_assign_pointer()`` to
				469	actually commit the new credentials to ``current->cred``, it will release
				470	``current->cred_replace_mutex`` to allow ``ptrace()`` to take place, and it
				471	will notify the scheduler and others of the changes.
				472
				473	This function is guaranteed to return 0, so that it can be tail-called at the
				474	end of such functions as ``sys_setresuid()``.
				475
				476	Note that this function consumes the caller's reference to the new credentials.
				477	The caller should _not_ call ``put_cred()`` on the new credentials afterwards.
				478
				479	Furthermore, once this function has been called on a new set of credentials,
				480	those credentials may _not_ be changed further.
				481
				482
				483	Should the security checks fail or some other error occur after
				484	``prepare_creds()`` has been called, then the following function should be
				485	invoked::
				486
				487	void abort_creds(struct cred *new);
				488
				489	This releases the lock on ``current->cred_replace_mutex`` that
				490	``prepare_creds()`` got and then releases the new credentials.
				491
				492
				493	A typical credentials alteration function would look something like this::
				494
				495	int alter_suid(uid_t suid)
				496	{
				497	struct cred *new;
				498	int ret;
				499
				500	new = prepare_creds();
				501	if (!new)
				502	return -ENOMEM;
				503
				504	new->suid = suid;
				505	ret = security_alter_suid(new);
				506	if (ret < 0) {
				507	abort_creds(new);
				508	return ret;
				509	}
				510
				511	return commit_creds(new);
				512	}
				513
				514
				515	Managing Credentials
				516	--------------------
				517
				518	There are some functions to help manage credentials:
				519
				520	- ``void put_cred(const struct cred *cred);``
				521
				522	This releases a reference to the given set of credentials. If the
				523	reference count reaches zero, the credentials will be scheduled for
				524	destruction by the RCU system.
				525
				526	- ``const struct cred get_cred(const struct cred cred);``
				527
				528	This gets a reference on a live set of credentials, returning a pointer to
				529	that set of credentials.
				530
				531	- ``struct cred get_new_cred(struct cred cred);``
				532
				533	This gets a reference on a set of credentials that is under construction
				534	and is thus still mutable, returning a pointer to that set of credentials.
				535
				536
				537	Open File Credentials
				538	=====================
				539
				540	When a new file is opened, a reference is obtained on the opening task's
				541	credentials and this is attached to the file struct as ``f_cred`` in place of
				542	``f_uid`` and ``f_gid``. Code that used to access ``file->f_uid`` and
				543	``file->f_gid`` should now access ``file->f_cred->fsuid`` and
				544	``file->f_cred->fsgid``.
				545
				546	It is safe to access ``f_cred`` without the use of RCU or locking because the
				547	pointer will not change over the lifetime of the file struct, and nor will the
				548	contents of the cred struct pointed to, barring the exceptions listed above
				549	(see the Task Credentials section).
				550
				551
				552	Overriding the VFS's Use of Credentials
				553	=======================================
				554
				555	Under some circumstances it is desirable to override the credentials used by
				556	the VFS, and that can be done by calling into such as ``vfs_mkdir()`` with a
				557	different set of credentials. This is done in the following places:
				558
				559	* ``sys_faccessat()``.
				560	* ``do_coredump()``.
				561	* nfs4recover.c.