Blame - marvell/linux/Documentation/filesystems/locking.rst - T108

blob: b14ed014c6c4d341f984ad0a1ec6303d55c34468 [file] [log] [blame]

b.liu	e958203	2025-04-17 19:18:16 +0800	[diff] [blame^]	1	=======
				2	Locking
				3	=======
				4
				5	The text below describes the locking rules for VFS-related methods.
				6	It is (believed to be) up-to-date. Please, if you change anything in
				7	prototypes or locking protocols - update this file. And update the relevant
				8	instances in the tree, don't leave that to maintainers of filesystems/devices/
				9	etc. At the very least, put the list of dubious cases in the end of this file.
				10	Don't turn it into log - maintainers of out-of-the-tree code are supposed to
				11	be able to use diff(1).
				12
				13	Thing currently missing here: socket operations. Alexey?
				14
				15	dentry_operations
				16	=================
				17
				18	prototypes::
				19
				20	int (d_revalidate)(struct dentry , unsigned int);
				21	int (d_weak_revalidate)(struct dentry , unsigned int);
				22	int (d_hash)(const struct dentry , struct qstr *);
				23	int (d_compare)(const struct dentry ,
				24	unsigned int, const char , const struct qstr );
				25	int (d_delete)(struct dentry );
				26	int (d_init)(struct dentry );
				27	void (d_release)(struct dentry );
				28	void (d_iput)(struct dentry , struct inode *);
				29	char (d_dname)((struct dentry dentry, char buffer, int buflen);
				30	struct vfsmount (d_automount)(struct path *path);
				31	int (d_manage)(const struct path , bool);
				32	struct dentry (d_real)(struct dentry , const struct inode );
				33
				34	locking rules:
				35
				36	================== =========== ======== ============== ========
				37	ops rename_lock ->d_lock may block rcu-walk
				38	================== =========== ======== ============== ========
				39	d_revalidate: no no yes (ref-walk) maybe
				40	d_weak_revalidate: no no yes no
				41	d_hash no no no maybe
				42	d_compare: yes no no maybe
				43	d_delete: no yes no no
				44	d_init: no no yes no
				45	d_release: no no yes no
				46	d_prune: no yes no no
				47	d_iput: no no yes no
				48	d_dname: no no no no
				49	d_automount: no no yes no
				50	d_manage: no no yes (ref-walk) maybe
				51	d_real no no yes no
				52	================== =========== ======== ============== ========
				53
				54	inode_operations
				55	================
				56
				57	prototypes::
				58
				59	int (create) (struct inode ,struct dentry *,umode_t, bool);
				60	struct dentry * (lookup) (struct inode ,struct dentry *, unsigned int);
				61	int (link) (struct dentry ,struct inode ,struct dentry );
				62	int (unlink) (struct inode ,struct dentry *);
				63	int (symlink) (struct inode ,struct dentry ,const char );
				64	int (mkdir) (struct inode ,struct dentry *,umode_t);
				65	int (rmdir) (struct inode ,struct dentry *);
				66	int (mknod) (struct inode ,struct dentry *,umode_t,dev_t);
				67	int (rename) (struct inode , struct dentry *,
				68	struct inode , struct dentry , unsigned int);
				69	int (readlink) (struct dentry , char __user *,int);
				70	const char (get_link) (struct dentry , struct inode , struct delayed_call *);
				71	void (truncate) (struct inode );
				72	int (permission) (struct inode , int, unsigned int);
				73	int (get_acl)(struct inode , int);
				74	int (setattr) (struct dentry , struct iattr *);
				75	int (getattr) (const struct path , struct kstat *, u32, unsigned int);
				76	ssize_t (listxattr) (struct dentry , char *, size_t);
				77	int (fiemap)(struct inode , struct fiemap_extent_info *, u64 start, u64 len);
				78	void (update_time)(struct inode , struct timespec *, int);
				79	int (atomic_open)(struct inode , struct dentry *,
				80	struct file *, unsigned open_flag,
				81	umode_t create_mode);
				82	int (tmpfile) (struct inode , struct dentry *, umode_t);
				83
				84	locking rules:
				85	all may block
				86
				87	============ =============================================
				88	ops i_rwsem(inode)
				89	============ =============================================
				90	lookup: shared
				91	create: exclusive
				92	link: exclusive (both)
				93	mknod: exclusive
				94	symlink: exclusive
				95	mkdir: exclusive
				96	unlink: exclusive (both)
				97	rmdir: exclusive (both)(see below)
				98	rename: exclusive (both parents, some children) (see below)
				99	readlink: no
				100	get_link: no
				101	setattr: exclusive
				102	permission: no (may not block if called in rcu-walk mode)
				103	get_acl: no
				104	getattr: no
				105	listxattr: no
				106	fiemap: no
				107	update_time: no
				108	atomic_open: exclusive
				109	tmpfile: no
				110	============ =============================================
				111
				112
				113	Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_rwsem
				114	exclusive on victim.
				115	cross-directory ->rename() has (per-superblock) ->s_vfs_rename_sem.
				116	->unlink() and ->rename() have ->i_rwsem exclusive on all non-directories
				117	involved.
				118	->rename() has ->i_rwsem exclusive on any subdirectory that changes parent.
				119
				120	See Documentation/filesystems/directory-locking.rst for more detailed discussion
				121	of the locking scheme for directory operations.
				122
				123	xattr_handler operations
				124	========================
				125
				126	prototypes::
				127
				128	bool (list)(struct dentry dentry);
				129	int (get)(const struct xattr_handler handler, struct dentry *dentry,
				130	struct inode inode, const char name, void *buffer,
				131	size_t size, int flags);
				132	int (set)(const struct xattr_handler handler, struct dentry *dentry,
				133	struct inode inode, const char name, const void *buffer,
				134	size_t size, int flags);
				135
				136	locking rules:
				137	all may block
				138
				139	===== ==============
				140	ops i_rwsem(inode)
				141	===== ==============
				142	list: no
				143	get: no
				144	set: exclusive
				145	===== ==============
				146
				147	super_operations
				148	================
				149
				150	prototypes::
				151
				152	struct inode (alloc_inode)(struct super_block *sb);
				153	void (free_inode)(struct inode );
				154	void (destroy_inode)(struct inode );
				155	void (dirty_inode) (struct inode , int flags);
				156	int (write_inode) (struct inode , struct writeback_control *wbc);
				157	int (drop_inode) (struct inode );
				158	void (evict_inode) (struct inode );
				159	void (put_super) (struct super_block );
				160	int (sync_fs)(struct super_block sb, int wait);
				161	int (freeze_fs) (struct super_block );
				162	int (unfreeze_fs) (struct super_block );
				163	int (statfs) (struct dentry , struct kstatfs *);
				164	int (remount_fs) (struct super_block , int , char );
				165	void (umount_begin) (struct super_block );
				166	int (show_options)(struct seq_file , struct dentry *);
				167	ssize_t (quota_read)(struct super_block , int, char *, size_t, loff_t);
				168	ssize_t (quota_write)(struct super_block , int, const char *, size_t, loff_t);
				169	int (bdev_try_to_free_page)(struct super_block, struct page*, gfp_t);
				170
				171	locking rules:
				172	All may block [not true, see below]
				173
				174	====================== ============ ========================
				175	ops s_umount note
				176	====================== ============ ========================
				177	alloc_inode:
				178	free_inode: called from RCU callback
				179	destroy_inode:
				180	dirty_inode:
				181	write_inode:
				182	drop_inode: !!!inode->i_lock!!!
				183	evict_inode:
				184	put_super: write
				185	sync_fs: read
				186	freeze_fs: write
				187	unfreeze_fs: write
				188	statfs: maybe(read) (see below)
				189	remount_fs: write
				190	umount_begin: no
				191	show_options: no (namespace_sem)
				192	quota_read: no (see below)
				193	quota_write: no (see below)
				194	bdev_try_to_free_page: no (see below)
				195	====================== ============ ========================
				196
				197	->statfs() has s_umount (shared) when called by ustat(2) (native or
				198	compat), but that's an accident of bad API; s_umount is used to pin
				199	the superblock down when we only have dev_t given us by userland to
				200	identify the superblock. Everything else (statfs(), fstatfs(), etc.)
				201	doesn't hold it when calling ->statfs() - superblock is pinned down
				202	by resolving the pathname passed to syscall.
				203
				204	->quota_read() and ->quota_write() functions are both guaranteed to
				205	be the only ones operating on the quota file by the quota code (via
				206	dqio_sem) (unless an admin really wants to screw up something and
				207	writes to quota files with quotas on). For other details about locking
				208	see also dquot_operations section.
				209
				210	->bdev_try_to_free_page is called from the ->releasepage handler of
				211	the block device inode. See there for more details.
				212
				213	file_system_type
				214	================
				215
				216	prototypes::
				217
				218	struct dentry (mount) (struct file_system_type *, int,
				219	const char , void );
				220	void (kill_sb) (struct super_block );
				221
				222	locking rules:
				223
				224	======= =========
				225	ops may block
				226	======= =========
				227	mount yes
				228	kill_sb yes
				229	======= =========
				230
				231	->mount() returns ERR_PTR or the root dentry; its superblock should be locked
				232	on return.
				233
				234	->kill_sb() takes a write-locked superblock, does all shutdown work on it,
				235	unlocks and drops the reference.
				236
				237	address_space_operations
				238	========================
				239	prototypes::
				240
				241	int (writepage)(struct page page, struct writeback_control *wbc);
				242	int (readpage)(struct file , struct page *);
				243	int (writepages)(struct address_space , struct writeback_control *);
				244	int (set_page_dirty)(struct page page);
				245	int (readpages)(struct file filp, struct address_space *mapping,
				246	struct list_head *pages, unsigned nr_pages);
				247	int (write_begin)(struct file , struct address_space *mapping,
				248	loff_t pos, unsigned len, unsigned flags,
				249	struct page pagep, void fsdata);
				250	int (write_end)(struct file , struct address_space *mapping,
				251	loff_t pos, unsigned len, unsigned copied,
				252	struct page page, void fsdata);
				253	sector_t (bmap)(struct address_space , sector_t);
				254	void (invalidatepage) (struct page , unsigned int, unsigned int);
				255	int (releasepage) (struct page , int);
				256	void (freepage)(struct page );
				257	int (direct_IO)(struct kiocb , struct iov_iter *iter);
				258	bool (isolate_page) (struct page , isolate_mode_t);
				259	int (migratepage)(struct address_space , struct page , struct page );
				260	void (putback_page) (struct page );
				261	int (launder_page)(struct page );
				262	int (is_partially_uptodate)(struct page , unsigned long, unsigned long);
				263	int (error_remove_page)(struct address_space , struct page *);
				264	int (swap_activate)(struct file );
				265	int (swap_deactivate)(struct file );
				266
				267	locking rules:
				268	All except set_page_dirty and freepage may block
				269
				270	====================== ======================== =========
				271	ops PageLocked(page) i_rwsem
				272	====================== ======================== =========
				273	writepage: yes, unlocks (see below)
				274	readpage: yes, unlocks
				275	writepages:
				276	set_page_dirty no
				277	readpages:
				278	write_begin: locks the page exclusive
				279	write_end: yes, unlocks exclusive
				280	bmap:
				281	invalidatepage: yes
				282	releasepage: yes
				283	freepage: yes
				284	direct_IO:
				285	isolate_page: yes
				286	migratepage: yes (both)
				287	putback_page: yes
				288	launder_page: yes
				289	is_partially_uptodate: yes
				290	error_remove_page: yes
				291	swap_activate: no
				292	swap_deactivate: no
				293	====================== ======================== =========
				294
				295	->write_begin(), ->write_end() and ->readpage() may be called from
				296	the request handler (/dev/loop).
				297
				298	->readpage() unlocks the page, either synchronously or via I/O
				299	completion.
				300
				301	->readpages() populates the pagecache with the passed pages and starts
				302	I/O against them. They come unlocked upon I/O completion.
				303
				304	->writepage() is used for two purposes: for "memory cleansing" and for
				305	"sync". These are quite different operations and the behaviour may differ
				306	depending upon the mode.
				307
				308	If writepage is called for sync (wbc->sync_mode != WBC_SYNC_NONE) then
				309	it must start I/O against the page, even if that would involve
				310	blocking on in-progress I/O.
				311
				312	If writepage is called for memory cleansing (sync_mode ==
				313	WBC_SYNC_NONE) then its role is to get as much writeout underway as
				314	possible. So writepage should try to avoid blocking against
				315	currently-in-progress I/O.
				316
				317	If the filesystem is not called for "sync" and it determines that it
				318	would need to block against in-progress I/O to be able to start new I/O
				319	against the page the filesystem should redirty the page with
				320	redirty_page_for_writepage(), then unlock the page and return zero.
				321	This may also be done to avoid internal deadlocks, but rarely.
				322
				323	If the filesystem is called for sync then it must wait on any
				324	in-progress I/O and then start new I/O.
				325
				326	The filesystem should unlock the page synchronously, before returning to the
				327	caller, unless ->writepage() returns special WRITEPAGE_ACTIVATE
				328	value. WRITEPAGE_ACTIVATE means that page cannot really be written out
				329	currently, and VM should stop calling ->writepage() on this page for some
				330	time. VM does this by moving page to the head of the active list, hence the
				331	name.
				332
				333	Unless the filesystem is going to redirty_page_for_writepage(), unlock the page
				334	and return zero, writepage must run set_page_writeback() against the page,
				335	followed by unlocking it. Once set_page_writeback() has been run against the
				336	page, write I/O can be submitted and the write I/O completion handler must run
				337	end_page_writeback() once the I/O is complete. If no I/O is submitted, the
				338	filesystem must run end_page_writeback() against the page before returning from
				339	writepage.
				340
				341	That is: after 2.5.12, pages which are under writeout are not locked. Note,
				342	if the filesystem needs the page to be locked during writeout, that is ok, too,
				343	the page is allowed to be unlocked at any point in time between the calls to
				344	set_page_writeback() and end_page_writeback().
				345
				346	Note, failure to run either redirty_page_for_writepage() or the combination of
				347	set_page_writeback()/end_page_writeback() on a page submitted to writepage
				348	will leave the page itself marked clean but it will be tagged as dirty in the
				349	radix tree. This incoherency can lead to all sorts of hard-to-debug problems
				350	in the filesystem like having dirty inodes at umount and losing written data.
				351
				352	->writepages() is used for periodic writeback and for syscall-initiated
				353	sync operations. The address_space should start I/O against at least
				354	``nr_to_write`` pages. ``nr_to_write`` must be decremented for each page
				355	which is written. The address_space implementation may write more (or less)
				356	pages than ``*nr_to_write`` asks for, but it should try to be reasonably close.
				357	If nr_to_write is NULL, all dirty pages must be written.
				358
				359	writepages should _only_ write pages which are present on
				360	mapping->io_pages.
				361
				362	->set_page_dirty() is called from various places in the kernel
				363	when the target page is marked as needing writeback. It may be called
				364	under spinlock (it cannot block) and is sometimes called with the page
				365	not locked.
				366
				367	->bmap() is currently used by legacy ioctl() (FIBMAP) provided by some
				368	filesystems and by the swapper. The latter will eventually go away. Please,
				369	keep it that way and don't breed new callers.
				370
				371	->invalidatepage() is called when the filesystem must attempt to drop
				372	some or all of the buffers from the page when it is being truncated. It
				373	returns zero on success. If ->invalidatepage is zero, the kernel uses
				374	block_invalidatepage() instead.
				375
				376	->releasepage() is called when the kernel is about to try to drop the
				377	buffers from the page in preparation for freeing it. It returns zero to
				378	indicate that the buffers are (or may be) freeable. If ->releasepage is zero,
				379	the kernel assumes that the fs has no private interest in the buffers.
				380
				381	->freepage() is called when the kernel is done dropping the page
				382	from the page cache.
				383
				384	->launder_page() may be called prior to releasing a page if
				385	it is still found to be dirty. It returns zero if the page was successfully
				386	cleaned, or an error value if not. Note that in order to prevent the page
				387	getting mapped back in and redirtied, it needs to be kept locked
				388	across the entire operation.
				389
				390	->swap_activate will be called with a non-zero argument on
				391	files backing (non block device backed) swapfiles. A return value
				392	of zero indicates success, in which case this file can be used for
				393	backing swapspace. The swapspace operations will be proxied to the
				394	address space operations.
				395
				396	->swap_deactivate() will be called in the sys_swapoff()
				397	path after ->swap_activate() returned success.
				398
				399	file_lock_operations
				400	====================
				401
				402	prototypes::
				403
				404	void (fl_copy_lock)(struct file_lock , struct file_lock *);
				405	void (fl_release_private)(struct file_lock );
				406
				407
				408	locking rules:
				409
				410	=================== ============= =========
				411	ops inode->i_lock may block
				412	=================== ============= =========
				413	fl_copy_lock: yes no
				414	fl_release_private: maybe maybe[1]_
				415	=================== ============= =========
				416
				417	.. [1]:
				418	->fl_release_private for flock or POSIX locks is currently allowed
				419	to block. Leases however can still be freed while the i_lock is held and
				420	so fl_release_private called on a lease should not block.
				421
				422	lock_manager_operations
				423	=======================
				424
				425	prototypes::
				426
				427	void (lm_notify)(struct file_lock ); /* unblock callback */
				428	int (lm_grant)(struct file_lock , struct file_lock *, int);
				429	void (lm_break)(struct file_lock ); /* break_lease callback */
				430	int (lm_change)(struct file_lock *, int);
				431
				432	locking rules:
				433
				434	========== ============= ================= =========
				435	ops inode->i_lock blocked_lock_lock may block
				436	========== ============= ================= =========
				437	lm_notify: yes yes no
				438	lm_grant: no no no
				439	lm_break: yes no no
				440	lm_change yes no no
				441	========== ============= ================= =========
				442
				443	buffer_head
				444	===========
				445
				446	prototypes::
				447
				448	void (b_end_io)(struct buffer_head bh, int uptodate);
				449
				450	locking rules:
				451
				452	called from interrupts. In other words, extreme care is needed here.
				453	bh is locked, but that's all warranties we have here. Currently only RAID1,
				454	highmem, fs/buffer.c, and fs/ntfs/aops.c are providing these. Block devices
				455	call this method upon the IO completion.
				456
				457	block_device_operations
				458	=======================
				459	prototypes::
				460
				461	int (open) (struct block_device , fmode_t);
				462	int (release) (struct gendisk , fmode_t);
				463	int (ioctl) (struct block_device , fmode_t, unsigned, unsigned long);
				464	int (compat_ioctl) (struct block_device , fmode_t, unsigned, unsigned long);
				465	int (direct_access) (struct block_device , sector_t, void **,
				466	unsigned long *);
				467	int (media_changed) (struct gendisk );
				468	void (unlock_native_capacity) (struct gendisk );
				469	int (revalidate_disk) (struct gendisk );
				470	int (getgeo)(struct block_device , struct hd_geometry *);
				471	void (swap_slot_free_notify) (struct block_device , unsigned long);
				472
				473	locking rules:
				474
				475	======================= ===================
				476	ops bd_mutex
				477	======================= ===================
				478	open: yes
				479	release: yes
				480	ioctl: no
				481	compat_ioctl: no
				482	direct_access: no
				483	media_changed: no
				484	unlock_native_capacity: no
				485	revalidate_disk: no
				486	getgeo: no
				487	swap_slot_free_notify: no (see below)
				488	======================= ===================
				489
				490	media_changed, unlock_native_capacity and revalidate_disk are called only from
				491	check_disk_change().
				492
				493	swap_slot_free_notify is called with swap_lock and sometimes the page lock
				494	held.
				495
				496
				497	file_operations
				498	===============
				499
				500	prototypes::
				501
				502	loff_t (llseek) (struct file , loff_t, int);
				503	ssize_t (read) (struct file , char __user , size_t, loff_t );
				504	ssize_t (write) (struct file , const char __user , size_t, loff_t );
				505	ssize_t (read_iter) (struct kiocb , struct iov_iter *);
				506	ssize_t (write_iter) (struct kiocb , struct iov_iter *);
				507	int (iterate) (struct file , struct dir_context *);
				508	int (iterate_shared) (struct file , struct dir_context *);
				509	__poll_t (poll) (struct file , struct poll_table_struct *);
				510	long (unlocked_ioctl) (struct file , unsigned int, unsigned long);
				511	long (compat_ioctl) (struct file , unsigned int, unsigned long);
				512	int (mmap) (struct file , struct vm_area_struct *);
				513	int (open) (struct inode , struct file *);
				514	int (flush) (struct file );
				515	int (release) (struct inode , struct file *);
				516	int (fsync) (struct file , loff_t start, loff_t end, int datasync);
				517	int (fasync) (int, struct file , int);
				518	int (lock) (struct file , int, struct file_lock *);
				519	ssize_t (readv) (struct file , const struct iovec *, unsigned long,
				520	loff_t *);
				521	ssize_t (writev) (struct file , const struct iovec *, unsigned long,
				522	loff_t *);
				523	ssize_t (sendfile) (struct file , loff_t *, size_t, read_actor_t,
				524	void __user *);
				525	ssize_t (sendpage) (struct file , struct page *, int, size_t,
				526	loff_t *, int);
				527	unsigned long (get_unmapped_area)(struct file , unsigned long,
				528	unsigned long, unsigned long, unsigned long);
				529	int (*check_flags)(int);
				530	int (flock) (struct file , int, struct file_lock *);
				531	ssize_t (splice_write)(struct pipe_inode_info , struct file , loff_t ,
				532	size_t, unsigned int);
				533	ssize_t (splice_read)(struct file , loff_t , struct pipe_inode_info ,
				534	size_t, unsigned int);
				535	int (setlease)(struct file , long, struct file_lock , void );
				536	long (fallocate)(struct file , int, loff_t, loff_t);
				537
				538	locking rules:
				539	All may block.
				540
				541	->llseek() locking has moved from llseek to the individual llseek
				542	implementations. If your fs is not using generic_file_llseek, you
				543	need to acquire and release the appropriate locks in your ->llseek().
				544	For many filesystems, it is probably safe to acquire the inode
				545	mutex or just to use i_size_read() instead.
				546	Note: this does not protect the file->f_pos against concurrent modifications
				547	since this is something the userspace has to take care about.
				548
				549	->iterate() is called with i_rwsem exclusive.
				550
				551	->iterate_shared() is called with i_rwsem at least shared.
				552
				553	->fasync() is responsible for maintaining the FASYNC bit in filp->f_flags.
				554	Most instances call fasync_helper(), which does that maintenance, so it's
				555	not normally something one needs to worry about. Return values > 0 will be
				556	mapped to zero in the VFS layer.
				557
				558	->readdir() and ->ioctl() on directories must be changed. Ideally we would
				559	move ->readdir() to inode_operations and use a separate method for directory
				560	->ioctl() or kill the latter completely. One of the problems is that for
				561	anything that resembles union-mount we won't have a struct file for all
				562	components. And there are other reasons why the current interface is a mess...
				563
				564	->read on directories probably must go away - we should just enforce -EISDIR
				565	in sys_read() and friends.
				566
				567	->setlease operations should call generic_setlease() before or after setting
				568	the lease within the individual filesystem to record the result of the
				569	operation
				570
				571	dquot_operations
				572	================
				573
				574	prototypes::
				575
				576	int (write_dquot) (struct dquot );
				577	int (acquire_dquot) (struct dquot );
				578	int (release_dquot) (struct dquot );
				579	int (mark_dirty) (struct dquot );
				580	int (write_info) (struct super_block , int);
				581
				582	These operations are intended to be more or less wrapping functions that ensure
				583	a proper locking wrt the filesystem and call the generic quota operations.
				584
				585	What filesystem should expect from the generic quota functions:
				586
				587	============== ============ =========================
				588	ops FS recursion Held locks when called
				589	============== ============ =========================
				590	write_dquot: yes dqonoff_sem or dqptr_sem
				591	acquire_dquot: yes dqonoff_sem or dqptr_sem
				592	release_dquot: yes dqonoff_sem or dqptr_sem
				593	mark_dirty: no -
				594	write_info: yes dqonoff_sem
				595	============== ============ =========================
				596
				597	FS recursion means calling ->quota_read() and ->quota_write() from superblock
				598	operations.
				599
				600	More details about quota locking can be found in fs/dquot.c.
				601
				602	vm_operations_struct
				603	====================
				604
				605	prototypes::
				606
				607	void (open)(struct vm_area_struct);
				608	void (close)(struct vm_area_struct);
				609	vm_fault_t (fault)(struct vm_area_struct, struct vm_fault *);
				610	vm_fault_t (page_mkwrite)(struct vm_area_struct , struct vm_fault *);
				611	vm_fault_t (pfn_mkwrite)(struct vm_area_struct , struct vm_fault *);
				612	int (access)(struct vm_area_struct , unsigned long, void*, int, int);
				613
				614	locking rules:
				615
				616	============= ======== ===========================
				617	ops mmap_sem PageLocked(page)
				618	============= ======== ===========================
				619	open: yes
				620	close: yes
				621	fault: yes can return with page locked
				622	map_pages: yes
				623	page_mkwrite: yes can return with page locked
				624	pfn_mkwrite: yes
				625	access: yes
				626	============= ======== ===========================
				627
				628	->fault() is called when a previously not present pte is about
				629	to be faulted in. The filesystem must find and return the page associated
				630	with the passed in "pgoff" in the vm_fault structure. If it is possible that
				631	the page may be truncated and/or invalidated, then the filesystem must lock
				632	the page, then ensure it is not already truncated (the page lock will block
				633	subsequent truncate), and then return with VM_FAULT_LOCKED, and the page
				634	locked. The VM will unlock the page.
				635
				636	->map_pages() is called when VM asks to map easy accessible pages.
				637	Filesystem should find and map pages associated with offsets from "start_pgoff"
				638	till "end_pgoff". ->map_pages() is called with page table locked and must
				639	not block. If it's not possible to reach a page without blocking,
				640	filesystem should skip it. Filesystem should use do_set_pte() to setup
				641	page table entry. Pointer to entry associated with the page is passed in
				642	"pte" field in vm_fault structure. Pointers to entries for other offsets
				643	should be calculated relative to "pte".
				644
				645	->page_mkwrite() is called when a previously read-only pte is
				646	about to become writeable. The filesystem again must ensure that there are
				647	no truncate/invalidate races, and then return with the page locked. If
				648	the page has been truncated, the filesystem should not look up a new page
				649	like the ->fault() handler, but simply return with VM_FAULT_NOPAGE, which
				650	will cause the VM to retry the fault.
				651
				652	->pfn_mkwrite() is the same as page_mkwrite but when the pte is
				653	VM_PFNMAP or VM_MIXEDMAP with a page-less entry. Expected return is
				654	VM_FAULT_NOPAGE. Or one of the VM_FAULT_ERROR types. The default behavior
				655	after this call is to make the pte read-write, unless pfn_mkwrite returns
				656	an error.
				657
				658	->access() is called when get_user_pages() fails in
				659	access_process_vm(), typically used to debug a process through
				660	/proc/pid/mem or ptrace. This function is needed only for
				661	VM_IO \| VM_PFNMAP VMAs.
				662
				663	--------------------------------------------------------------------------------
				664
				665	Dubious stuff
				666
				667	(if you break something or notice that it is broken and do not fix it yourself
				668	- at least put it here)