| rjw | 1f88458 | 2022-01-06 17:20:42 +0800 | [diff] [blame] | 1 | Changes since 2.5.0: | 
 | 2 |  | 
 | 3 | --- | 
 | 4 | [recommended] | 
 | 5 |  | 
 | 6 | New helpers: sb_bread(), sb_getblk(), sb_find_get_block(), set_bh(), | 
 | 7 | 	sb_set_blocksize() and sb_min_blocksize(). | 
 | 8 |  | 
 | 9 | Use them. | 
 | 10 |  | 
 | 11 | (sb_find_get_block() replaces 2.4's get_hash_table()) | 
 | 12 |  | 
 | 13 | --- | 
 | 14 | [recommended] | 
 | 15 |  | 
 | 16 | New methods: ->alloc_inode() and ->destroy_inode(). | 
 | 17 |  | 
 | 18 | Remove inode->u.foo_inode_i | 
 | 19 | Declare | 
 | 20 | 	struct foo_inode_info { | 
 | 21 | 		/* fs-private stuff */ | 
 | 22 | 		struct inode vfs_inode; | 
 | 23 | 	}; | 
 | 24 | 	static inline struct foo_inode_info *FOO_I(struct inode *inode) | 
 | 25 | 	{ | 
 | 26 | 		return list_entry(inode, struct foo_inode_info, vfs_inode); | 
 | 27 | 	} | 
 | 28 |  | 
 | 29 | Use FOO_I(inode) instead of &inode->u.foo_inode_i; | 
 | 30 |  | 
 | 31 | Add foo_alloc_inode() and foo_destroy_inode() - the former should allocate | 
 | 32 | foo_inode_info and return the address of ->vfs_inode, the latter should free | 
 | 33 | FOO_I(inode) (see in-tree filesystems for examples). | 
 | 34 |  | 
 | 35 | Make them ->alloc_inode and ->destroy_inode in your super_operations. | 
 | 36 |  | 
 | 37 | Keep in mind that now you need explicit initialization of private data | 
 | 38 | typically between calling iget_locked() and unlocking the inode. | 
 | 39 |  | 
 | 40 | At some point that will become mandatory. | 
 | 41 |  | 
 | 42 | --- | 
 | 43 | [mandatory] | 
 | 44 |  | 
 | 45 | Change of file_system_type method (->read_super to ->get_sb) | 
 | 46 |  | 
 | 47 | ->read_super() is no more.  Ditto for DECLARE_FSTYPE and DECLARE_FSTYPE_DEV. | 
 | 48 |  | 
 | 49 | Turn your foo_read_super() into a function that would return 0 in case of | 
 | 50 | success and negative number in case of error (-EINVAL unless you have more | 
 | 51 | informative error value to report).  Call it foo_fill_super().  Now declare | 
 | 52 |  | 
 | 53 | int foo_get_sb(struct file_system_type *fs_type, | 
 | 54 | 	int flags, const char *dev_name, void *data, struct vfsmount *mnt) | 
 | 55 | { | 
 | 56 | 	return get_sb_bdev(fs_type, flags, dev_name, data, foo_fill_super, | 
 | 57 | 			   mnt); | 
 | 58 | } | 
 | 59 |  | 
 | 60 | (or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of | 
 | 61 | filesystem). | 
 | 62 |  | 
 | 63 | Replace DECLARE_FSTYPE... with explicit initializer and have ->get_sb set as | 
 | 64 | foo_get_sb. | 
 | 65 |  | 
 | 66 | --- | 
 | 67 | [mandatory] | 
 | 68 |  | 
 | 69 | Locking change: ->s_vfs_rename_sem is taken only by cross-directory renames. | 
 | 70 | Most likely there is no need to change anything, but if you relied on | 
 | 71 | global exclusion between renames for some internal purpose - you need to | 
 | 72 | change your internal locking.  Otherwise exclusion warranties remain the | 
 | 73 | same (i.e. parents and victim are locked, etc.). | 
 | 74 |  | 
 | 75 | --- | 
 | 76 | [informational] | 
 | 77 |  | 
 | 78 | Now we have the exclusion between ->lookup() and directory removal (by | 
 | 79 | ->rmdir() and ->rename()).  If you used to need that exclusion and do | 
 | 80 | it by internal locking (most of filesystems couldn't care less) - you | 
 | 81 | can relax your locking. | 
 | 82 |  | 
 | 83 | --- | 
 | 84 | [mandatory] | 
 | 85 |  | 
 | 86 | ->lookup(), ->truncate(), ->create(), ->unlink(), ->mknod(), ->mkdir(), | 
 | 87 | ->rmdir(), ->link(), ->lseek(), ->symlink(), ->rename() | 
 | 88 | and ->readdir() are called without BKL now.  Grab it on entry, drop upon return | 
 | 89 | - that will guarantee the same locking you used to have.  If your method or its | 
 | 90 | parts do not need BKL - better yet, now you can shift lock_kernel() and | 
 | 91 | unlock_kernel() so that they would protect exactly what needs to be | 
 | 92 | protected. | 
 | 93 |  | 
 | 94 | --- | 
 | 95 | [mandatory] | 
 | 96 |  | 
 | 97 | BKL is also moved from around sb operations. BKL should have been shifted into | 
 | 98 | individual fs sb_op functions.  If you don't need it, remove it. | 
 | 99 |  | 
 | 100 | --- | 
 | 101 | [informational] | 
 | 102 |  | 
 | 103 | check for ->link() target not being a directory is done by callers.  Feel | 
 | 104 | free to drop it... | 
 | 105 |  | 
 | 106 | --- | 
 | 107 | [informational] | 
 | 108 |  | 
 | 109 | ->link() callers hold ->i_mutex on the object we are linking to.  Some of your | 
 | 110 | problems might be over... | 
 | 111 |  | 
 | 112 | --- | 
 | 113 | [mandatory] | 
 | 114 |  | 
 | 115 | new file_system_type method - kill_sb(superblock).  If you are converting | 
 | 116 | an existing filesystem, set it according to ->fs_flags: | 
 | 117 | 	FS_REQUIRES_DEV		-	kill_block_super | 
 | 118 | 	FS_LITTER		-	kill_litter_super | 
 | 119 | 	neither			-	kill_anon_super | 
 | 120 | FS_LITTER is gone - just remove it from fs_flags. | 
 | 121 |  | 
 | 122 | --- | 
 | 123 | [mandatory] | 
 | 124 |  | 
 | 125 | 	FS_SINGLE is gone (actually, that had happened back when ->get_sb() | 
 | 126 | went in - and hadn't been documented ;-/).  Just remove it from fs_flags | 
 | 127 | (and see ->get_sb() entry for other actions). | 
 | 128 |  | 
 | 129 | --- | 
 | 130 | [mandatory] | 
 | 131 |  | 
 | 132 | ->setattr() is called without BKL now.  Caller _always_ holds ->i_mutex, so | 
 | 133 | watch for ->i_mutex-grabbing code that might be used by your ->setattr(). | 
 | 134 | Callers of notify_change() need ->i_mutex now. | 
 | 135 |  | 
 | 136 | --- | 
 | 137 | [recommended] | 
 | 138 |  | 
 | 139 | New super_block field "struct export_operations *s_export_op" for | 
 | 140 | explicit support for exporting, e.g. via NFS.  The structure is fully | 
 | 141 | documented at its declaration in include/linux/fs.h, and in | 
 | 142 | Documentation/filesystems/nfs/Exporting. | 
 | 143 |  | 
 | 144 | Briefly it allows for the definition of decode_fh and encode_fh operations | 
 | 145 | to encode and decode filehandles, and allows the filesystem to use | 
 | 146 | a standard helper function for decode_fh, and provide file-system specific | 
 | 147 | support for this helper, particularly get_parent. | 
 | 148 |  | 
 | 149 | It is planned that this will be required for exporting once the code | 
 | 150 | settles down a bit. | 
 | 151 |  | 
 | 152 | [mandatory] | 
 | 153 |  | 
 | 154 | s_export_op is now required for exporting a filesystem. | 
 | 155 | isofs, ext2, ext3, resierfs, fat | 
 | 156 | can be used as examples of very different filesystems. | 
 | 157 |  | 
 | 158 | --- | 
 | 159 | [mandatory] | 
 | 160 |  | 
 | 161 | iget4() and the read_inode2 callback have been superseded by iget5_locked() | 
 | 162 | which has the following prototype, | 
 | 163 |  | 
 | 164 |     struct inode *iget5_locked(struct super_block *sb, unsigned long ino, | 
 | 165 | 				int (*test)(struct inode *, void *), | 
 | 166 | 				int (*set)(struct inode *, void *), | 
 | 167 | 				void *data); | 
 | 168 |  | 
 | 169 | 'test' is an additional function that can be used when the inode | 
 | 170 | number is not sufficient to identify the actual file object. 'set' | 
 | 171 | should be a non-blocking function that initializes those parts of a | 
 | 172 | newly created inode to allow the test function to succeed. 'data' is | 
 | 173 | passed as an opaque value to both test and set functions. | 
 | 174 |  | 
 | 175 | When the inode has been created by iget5_locked(), it will be returned with the | 
 | 176 | I_NEW flag set and will still be locked.  The filesystem then needs to finalize | 
 | 177 | the initialization. Once the inode is initialized it must be unlocked by | 
 | 178 | calling unlock_new_inode(). | 
 | 179 |  | 
 | 180 | The filesystem is responsible for setting (and possibly testing) i_ino | 
 | 181 | when appropriate. There is also a simpler iget_locked function that | 
 | 182 | just takes the superblock and inode number as arguments and does the | 
 | 183 | test and set for you. | 
 | 184 |  | 
 | 185 | e.g. | 
 | 186 | 	inode = iget_locked(sb, ino); | 
 | 187 | 	if (inode->i_state & I_NEW) { | 
 | 188 | 		err = read_inode_from_disk(inode); | 
 | 189 | 		if (err < 0) { | 
 | 190 | 			iget_failed(inode); | 
 | 191 | 			return err; | 
 | 192 | 		} | 
 | 193 | 		unlock_new_inode(inode); | 
 | 194 | 	} | 
 | 195 |  | 
 | 196 | Note that if the process of setting up a new inode fails, then iget_failed() | 
 | 197 | should be called on the inode to render it dead, and an appropriate error | 
 | 198 | should be passed back to the caller. | 
 | 199 |  | 
 | 200 | --- | 
 | 201 | [recommended] | 
 | 202 |  | 
 | 203 | ->getattr() finally getting used.  See instances in nfs, minix, etc. | 
 | 204 |  | 
 | 205 | --- | 
 | 206 | [mandatory] | 
 | 207 |  | 
 | 208 | ->revalidate() is gone.  If your filesystem had it - provide ->getattr() | 
 | 209 | and let it call whatever you had as ->revlidate() + (for symlinks that | 
 | 210 | had ->revalidate()) add calls in ->follow_link()/->readlink(). | 
 | 211 |  | 
 | 212 | --- | 
 | 213 | [mandatory] | 
 | 214 |  | 
 | 215 | ->d_parent changes are not protected by BKL anymore.  Read access is safe | 
 | 216 | if at least one of the following is true: | 
 | 217 | 	* filesystem has no cross-directory rename() | 
 | 218 | 	* we know that parent had been locked (e.g. we are looking at | 
 | 219 | ->d_parent of ->lookup() argument). | 
 | 220 | 	* we are called from ->rename(). | 
 | 221 | 	* the child's ->d_lock is held | 
 | 222 | Audit your code and add locking if needed.  Notice that any place that is | 
 | 223 | not protected by the conditions above is risky even in the old tree - you | 
 | 224 | had been relying on BKL and that's prone to screwups.  Old tree had quite | 
 | 225 | a few holes of that kind - unprotected access to ->d_parent leading to | 
 | 226 | anything from oops to silent memory corruption. | 
 | 227 |  | 
 | 228 | --- | 
 | 229 | [mandatory] | 
 | 230 |  | 
 | 231 | 	FS_NOMOUNT is gone.  If you use it - just set SB_NOUSER in flags | 
 | 232 | (see rootfs for one kind of solution and bdev/socket/pipe for another). | 
 | 233 |  | 
 | 234 | --- | 
 | 235 | [recommended] | 
 | 236 |  | 
 | 237 | 	Use bdev_read_only(bdev) instead of is_read_only(kdev).  The latter | 
 | 238 | is still alive, but only because of the mess in drivers/s390/block/dasd.c. | 
 | 239 | As soon as it gets fixed is_read_only() will die. | 
 | 240 |  | 
 | 241 | --- | 
 | 242 | [mandatory] | 
 | 243 |  | 
 | 244 | ->permission() is called without BKL now. Grab it on entry, drop upon | 
 | 245 | return - that will guarantee the same locking you used to have.  If | 
 | 246 | your method or its parts do not need BKL - better yet, now you can | 
 | 247 | shift lock_kernel() and unlock_kernel() so that they would protect | 
 | 248 | exactly what needs to be protected. | 
 | 249 |  | 
 | 250 | --- | 
 | 251 | [mandatory] | 
 | 252 |  | 
 | 253 | ->statfs() is now called without BKL held.  BKL should have been | 
 | 254 | shifted into individual fs sb_op functions where it's not clear that | 
 | 255 | it's safe to remove it.  If you don't need it, remove it. | 
 | 256 |  | 
 | 257 | --- | 
 | 258 | [mandatory] | 
 | 259 |  | 
 | 260 | 	is_read_only() is gone; use bdev_read_only() instead. | 
 | 261 |  | 
 | 262 | --- | 
 | 263 | [mandatory] | 
 | 264 |  | 
 | 265 | 	destroy_buffers() is gone; use invalidate_bdev(). | 
 | 266 |  | 
 | 267 | --- | 
 | 268 | [mandatory] | 
 | 269 |  | 
 | 270 | 	fsync_dev() is gone; use fsync_bdev().  NOTE: lvm breakage is | 
 | 271 | deliberate; as soon as struct block_device * is propagated in a reasonable | 
 | 272 | way by that code fixing will become trivial; until then nothing can be | 
 | 273 | done. | 
 | 274 |  | 
 | 275 | [mandatory] | 
 | 276 |  | 
 | 277 | 	block truncatation on error exit from ->write_begin, and ->direct_IO | 
 | 278 | moved from generic methods (block_write_begin, cont_write_begin, | 
 | 279 | nobh_write_begin, blockdev_direct_IO*) to callers.  Take a look at | 
 | 280 | ext2_write_failed and callers for an example. | 
 | 281 |  | 
 | 282 | [mandatory] | 
 | 283 |  | 
 | 284 | 	->truncate is gone.  The whole truncate sequence needs to be | 
 | 285 | implemented in ->setattr, which is now mandatory for filesystems | 
 | 286 | implementing on-disk size changes.  Start with a copy of the old inode_setattr | 
 | 287 | and vmtruncate, and the reorder the vmtruncate + foofs_vmtruncate sequence to | 
 | 288 | be in order of zeroing blocks using block_truncate_page or similar helpers, | 
 | 289 | size update and on finally on-disk truncation which should not fail. | 
 | 290 | setattr_prepare (which used to be inode_change_ok) now includes the size checks | 
 | 291 | for ATTR_SIZE and must be called in the beginning of ->setattr unconditionally. | 
 | 292 |  | 
 | 293 | [mandatory] | 
 | 294 |  | 
 | 295 | 	->clear_inode() and ->delete_inode() are gone; ->evict_inode() should | 
 | 296 | be used instead.  It gets called whenever the inode is evicted, whether it has | 
 | 297 | remaining links or not.  Caller does *not* evict the pagecache or inode-associated | 
 | 298 | metadata buffers; the method has to use truncate_inode_pages_final() to get rid | 
 | 299 | of those. Caller makes sure async writeback cannot be running for the inode while | 
 | 300 | (or after) ->evict_inode() is called. | 
 | 301 |  | 
 | 302 | 	->drop_inode() returns int now; it's called on final iput() with | 
 | 303 | inode->i_lock held and it returns true if filesystems wants the inode to be | 
 | 304 | dropped.  As before, generic_drop_inode() is still the default and it's been | 
 | 305 | updated appropriately.  generic_delete_inode() is also alive and it consists | 
 | 306 | simply of return 1.  Note that all actual eviction work is done by caller after | 
 | 307 | ->drop_inode() returns. | 
 | 308 |  | 
 | 309 | 	As before, clear_inode() must be called exactly once on each call of | 
 | 310 | ->evict_inode() (as it used to be for each call of ->delete_inode()).  Unlike | 
 | 311 | before, if you are using inode-associated metadata buffers (i.e. | 
 | 312 | mark_buffer_dirty_inode()), it's your responsibility to call | 
 | 313 | invalidate_inode_buffers() before clear_inode(). | 
 | 314 |  | 
 | 315 | 	NOTE: checking i_nlink in the beginning of ->write_inode() and bailing out | 
 | 316 | if it's zero is not *and* *never* *had* *been* enough.  Final unlink() and iput() | 
 | 317 | may happen while the inode is in the middle of ->write_inode(); e.g. if you blindly | 
 | 318 | free the on-disk inode, you may end up doing that while ->write_inode() is writing | 
 | 319 | to it. | 
 | 320 |  | 
 | 321 | --- | 
 | 322 | [mandatory] | 
 | 323 |  | 
 | 324 | 	.d_delete() now only advises the dcache as to whether or not to cache | 
 | 325 | unreferenced dentries, and is now only called when the dentry refcount goes to | 
 | 326 | 0. Even on 0 refcount transition, it must be able to tolerate being called 0, | 
 | 327 | 1, or more times (eg. constant, idempotent). | 
 | 328 |  | 
 | 329 | --- | 
 | 330 | [mandatory] | 
 | 331 |  | 
 | 332 | 	.d_compare() calling convention and locking rules are significantly | 
 | 333 | changed. Read updated documentation in Documentation/filesystems/vfs.txt (and | 
 | 334 | look at examples of other filesystems) for guidance. | 
 | 335 |  | 
 | 336 | --- | 
 | 337 | [mandatory] | 
 | 338 |  | 
 | 339 | 	.d_hash() calling convention and locking rules are significantly | 
 | 340 | changed. Read updated documentation in Documentation/filesystems/vfs.txt (and | 
 | 341 | look at examples of other filesystems) for guidance. | 
 | 342 |  | 
 | 343 | --- | 
 | 344 | [mandatory] | 
 | 345 | 	dcache_lock is gone, replaced by fine grained locks. See fs/dcache.c | 
 | 346 | for details of what locks to replace dcache_lock with in order to protect | 
 | 347 | particular things. Most of the time, a filesystem only needs ->d_lock, which | 
 | 348 | protects *all* the dcache state of a given dentry. | 
 | 349 |  | 
 | 350 | -- | 
 | 351 | [mandatory] | 
 | 352 |  | 
 | 353 | 	Filesystems must RCU-free their inodes, if they can have been accessed | 
 | 354 | via rcu-walk path walk (basically, if the file can have had a path name in the | 
 | 355 | vfs namespace). | 
 | 356 |  | 
 | 357 | 	Even though i_dentry and i_rcu share storage in a union, we will | 
 | 358 | initialize the former in inode_init_always(), so just leave it alone in | 
 | 359 | the callback.  It used to be necessary to clean it there, but not anymore | 
 | 360 | (starting at 3.2). | 
 | 361 |  | 
 | 362 | -- | 
 | 363 | [recommended] | 
 | 364 | 	vfs now tries to do path walking in "rcu-walk mode", which avoids | 
 | 365 | atomic operations and scalability hazards on dentries and inodes (see | 
 | 366 | Documentation/filesystems/path-lookup.txt). d_hash and d_compare changes | 
 | 367 | (above) are examples of the changes required to support this. For more complex | 
 | 368 | filesystem callbacks, the vfs drops out of rcu-walk mode before the fs call, so | 
 | 369 | no changes are required to the filesystem. However, this is costly and loses | 
 | 370 | the benefits of rcu-walk mode. We will begin to add filesystem callbacks that | 
 | 371 | are rcu-walk aware, shown below. Filesystems should take advantage of this | 
 | 372 | where possible. | 
 | 373 |  | 
 | 374 | -- | 
 | 375 | [mandatory] | 
 | 376 | 	d_revalidate is a callback that is made on every path element (if | 
 | 377 | the filesystem provides it), which requires dropping out of rcu-walk mode. This | 
 | 378 | may now be called in rcu-walk mode (nd->flags & LOOKUP_RCU). -ECHILD should be | 
 | 379 | returned if the filesystem cannot handle rcu-walk. See | 
 | 380 | Documentation/filesystems/vfs.txt for more details. | 
 | 381 |  | 
 | 382 | 	permission is an inode permission check that is called on many or all | 
 | 383 | directory inodes on the way down a path walk (to check for exec permission). It | 
 | 384 | must now be rcu-walk aware (mask & MAY_NOT_BLOCK).  See | 
 | 385 | Documentation/filesystems/vfs.txt for more details. | 
 | 386 |   | 
 | 387 | -- | 
 | 388 | [mandatory] | 
 | 389 | 	In ->fallocate() you must check the mode option passed in.  If your | 
 | 390 | filesystem does not support hole punching (deallocating space in the middle of a | 
 | 391 | file) you must return -EOPNOTSUPP if FALLOC_FL_PUNCH_HOLE is set in mode. | 
 | 392 | Currently you can only have FALLOC_FL_PUNCH_HOLE with FALLOC_FL_KEEP_SIZE set, | 
 | 393 | so the i_size should not change when hole punching, even when puching the end of | 
 | 394 | a file off. | 
 | 395 |  | 
 | 396 | -- | 
 | 397 | [mandatory] | 
 | 398 | 	->get_sb() is gone.  Switch to use of ->mount().  Typically it's just | 
 | 399 | a matter of switching from calling get_sb_... to mount_... and changing the | 
 | 400 | function type.  If you were doing it manually, just switch from setting ->mnt_root | 
 | 401 | to some pointer to returning that pointer.  On errors return ERR_PTR(...). | 
 | 402 |  | 
 | 403 | -- | 
 | 404 | [mandatory] | 
 | 405 | 	->permission() and generic_permission()have lost flags | 
 | 406 | argument; instead of passing IPERM_FLAG_RCU we add MAY_NOT_BLOCK into mask. | 
 | 407 | 	generic_permission() has also lost the check_acl argument; ACL checking | 
 | 408 | has been taken to VFS and filesystems need to provide a non-NULL ->i_op->get_acl | 
 | 409 | to read an ACL from disk. | 
 | 410 |  | 
 | 411 | -- | 
 | 412 | [mandatory] | 
 | 413 | 	If you implement your own ->llseek() you must handle SEEK_HOLE and | 
 | 414 | SEEK_DATA.  You can hanle this by returning -EINVAL, but it would be nicer to | 
 | 415 | support it in some way.  The generic handler assumes that the entire file is | 
 | 416 | data and there is a virtual hole at the end of the file.  So if the provided | 
 | 417 | offset is less than i_size and SEEK_DATA is specified, return the same offset. | 
 | 418 | If the above is true for the offset and you are given SEEK_HOLE, return the end | 
 | 419 | of the file.  If the offset is i_size or greater return -ENXIO in either case. | 
 | 420 |  | 
 | 421 | [mandatory] | 
 | 422 | 	If you have your own ->fsync() you must make sure to call | 
 | 423 | filemap_write_and_wait_range() so that all dirty pages are synced out properly. | 
 | 424 | You must also keep in mind that ->fsync() is not called with i_mutex held | 
 | 425 | anymore, so if you require i_mutex locking you must make sure to take it and | 
 | 426 | release it yourself. | 
 | 427 |  | 
 | 428 | -- | 
 | 429 | [mandatory] | 
 | 430 | 	d_alloc_root() is gone, along with a lot of bugs caused by code | 
 | 431 | misusing it.  Replacement: d_make_root(inode).  The difference is, | 
 | 432 | d_make_root() drops the reference to inode if dentry allocation fails.   | 
 | 433 |  | 
 | 434 | -- | 
 | 435 | [mandatory] | 
 | 436 | 	The witch is dead!  Well, 2/3 of it, anyway.  ->d_revalidate() and | 
 | 437 | ->lookup() do *not* take struct nameidata anymore; just the flags. | 
 | 438 | -- | 
 | 439 | [mandatory] | 
 | 440 | 	->create() doesn't take struct nameidata *; unlike the previous | 
 | 441 | two, it gets "is it an O_EXCL or equivalent?" boolean argument.  Note that | 
 | 442 | local filesystems can ignore tha argument - they are guaranteed that the | 
 | 443 | object doesn't exist.  It's remote/distributed ones that might care... | 
 | 444 | -- | 
 | 445 | [mandatory] | 
 | 446 | 	FS_REVAL_DOT is gone; if you used to have it, add ->d_weak_revalidate() | 
 | 447 | in your dentry operations instead. | 
 | 448 | -- | 
 | 449 | [mandatory] | 
 | 450 | 	vfs_readdir() is gone; switch to iterate_dir() instead | 
 | 451 | -- | 
 | 452 | [mandatory] | 
 | 453 | 	->readdir() is gone now; switch to ->iterate() | 
 | 454 | [mandatory] | 
 | 455 | 	vfs_follow_link has been removed.  Filesystems must use nd_set_link | 
 | 456 | 	from ->follow_link for normal symlinks, or nd_jump_link for magic | 
 | 457 | 	/proc/<pid> style links. | 
 | 458 | -- | 
 | 459 | [mandatory] | 
 | 460 | 	iget5_locked()/ilookup5()/ilookup5_nowait() test() callback used to be | 
 | 461 | 	called with both ->i_lock and inode_hash_lock held; the former is *not* | 
 | 462 | 	taken anymore, so verify that your callbacks do not rely on it (none | 
 | 463 | 	of the in-tree instances did).  inode_hash_lock is still held, | 
 | 464 | 	of course, so they are still serialized wrt removal from inode hash, | 
 | 465 | 	as well as wrt set() callback of iget5_locked(). | 
 | 466 | -- | 
 | 467 | [mandatory] | 
 | 468 | 	d_materialise_unique() is gone; d_splice_alias() does everything you | 
 | 469 | 	need now.  Remember that they have opposite orders of arguments ;-/ | 
 | 470 | -- | 
 | 471 | [mandatory] | 
 | 472 | 	f_dentry is gone; use f_path.dentry, or, better yet, see if you can avoid | 
 | 473 | 	it entirely. | 
 | 474 | -- | 
 | 475 | [mandatory] | 
 | 476 | 	never call ->read() and ->write() directly; use __vfs_{read,write} or | 
 | 477 | 	wrappers; instead of checking for ->write or ->read being NULL, look for | 
 | 478 | 	FMODE_CAN_{WRITE,READ} in file->f_mode. | 
 | 479 | -- | 
 | 480 | [mandatory] | 
 | 481 | 	do _not_ use new_sync_{read,write} for ->read/->write; leave it NULL | 
 | 482 | 	instead. | 
 | 483 | -- | 
 | 484 | [mandatory] | 
 | 485 | 	->aio_read/->aio_write are gone.  Use ->read_iter/->write_iter. | 
 | 486 | --- | 
 | 487 | [recommended] | 
 | 488 | 	for embedded ("fast") symlinks just set inode->i_link to wherever the | 
 | 489 | 	symlink body is and use simple_follow_link() as ->follow_link(). | 
 | 490 | -- | 
 | 491 | [mandatory] | 
 | 492 | 	calling conventions for ->follow_link() have changed.  Instead of returning | 
 | 493 | 	cookie and using nd_set_link() to store the body to traverse, we return | 
 | 494 | 	the body to traverse and store the cookie using explicit void ** argument. | 
 | 495 | 	nameidata isn't passed at all - nd_jump_link() doesn't need it and | 
 | 496 | 	nd_[gs]et_link() is gone. | 
 | 497 | -- | 
 | 498 | [mandatory] | 
 | 499 | 	calling conventions for ->put_link() have changed.  It gets inode instead of | 
 | 500 | 	dentry,  it does not get nameidata at all and it gets called only when cookie | 
 | 501 | 	is non-NULL.  Note that link body isn't available anymore, so if you need it, | 
 | 502 | 	store it as cookie. | 
 | 503 | -- | 
 | 504 | [mandatory] | 
 | 505 | 	__fd_install() & fd_install() can now sleep. Callers should not | 
 | 506 | 	hold a spinlock	or other resources that do not allow a schedule. | 
 | 507 | -- | 
 | 508 | [mandatory] | 
 | 509 | 	any symlink that might use page_follow_link_light/page_put_link() must | 
 | 510 | 	have inode_nohighmem(inode) called before anything might start playing with | 
 | 511 | 	its pagecache.  No highmem pages should end up in the pagecache of such | 
 | 512 | 	symlinks.  That includes any preseeding that might be done during symlink | 
 | 513 | 	creation.  __page_symlink() will honour the mapping gfp flags, so once | 
 | 514 | 	you've done inode_nohighmem() it's safe to use, but if you allocate and | 
 | 515 | 	insert the page manually, make sure to use the right gfp flags. | 
 | 516 | -- | 
 | 517 | [mandatory] | 
 | 518 | 	->follow_link() is replaced with ->get_link(); same API, except that | 
 | 519 | 		* ->get_link() gets inode as a separate argument | 
 | 520 | 		* ->get_link() may be called in RCU mode - in that case NULL | 
 | 521 | 		  dentry is passed | 
 | 522 | -- | 
 | 523 | [mandatory] | 
 | 524 | 	->get_link() gets struct delayed_call *done now, and should do | 
 | 525 | 	set_delayed_call() where it used to set *cookie. | 
 | 526 | 	->put_link() is gone - just give the destructor to set_delayed_call() | 
 | 527 | 	in ->get_link(). | 
 | 528 | -- | 
 | 529 | [mandatory] | 
 | 530 | 	->getxattr() and xattr_handler.get() get dentry and inode passed separately. | 
 | 531 | 	dentry might be yet to be attached to inode, so do _not_ use its ->d_inode | 
 | 532 | 	in the instances.  Rationale: !@#!@# security_d_instantiate() needs to be | 
 | 533 | 	called before we attach dentry to inode. | 
 | 534 | -- | 
 | 535 | [mandatory] | 
 | 536 | 	symlinks are no longer the only inodes that do *not* have i_bdev/i_cdev/ | 
 | 537 | 	i_pipe/i_link union zeroed out at inode eviction.  As the result, you can't | 
 | 538 | 	assume that non-NULL value in ->i_nlink at ->destroy_inode() implies that | 
 | 539 | 	it's a symlink.  Checking ->i_mode is really needed now.  In-tree we had | 
 | 540 | 	to fix shmem_destroy_callback() that used to take that kind of shortcut; | 
 | 541 | 	watch out, since that shortcut is no longer valid. | 
 | 542 | -- | 
 | 543 | [mandatory] | 
 | 544 | 	->i_mutex is replaced with ->i_rwsem now.  inode_lock() et.al. work as | 
 | 545 | 	they used to - they just take it exclusive.  However, ->lookup() may be | 
 | 546 | 	called with parent locked shared.  Its instances must not | 
 | 547 | 		* use d_instantiate) and d_rehash() separately - use d_add() or | 
 | 548 | 		  d_splice_alias() instead. | 
 | 549 | 		* use d_rehash() alone - call d_add(new_dentry, NULL) instead. | 
 | 550 | 		* in the unlikely case when (read-only) access to filesystem | 
 | 551 | 		  data structures needs exclusion for some reason, arrange it | 
 | 552 | 		  yourself.  None of the in-tree filesystems needed that. | 
 | 553 | 		* rely on ->d_parent and ->d_name not changing after dentry has | 
 | 554 | 		  been fed to d_add() or d_splice_alias().  Again, none of the | 
 | 555 | 		  in-tree instances relied upon that. | 
 | 556 | 	We are guaranteed that lookups of the same name in the same directory | 
 | 557 | 	will not happen in parallel ("same" in the sense of your ->d_compare()). | 
 | 558 | 	Lookups on different names in the same directory can and do happen in | 
 | 559 | 	parallel now. | 
 | 560 | -- | 
 | 561 | [recommended] | 
 | 562 | 	->iterate_shared() is added; it's a parallel variant of ->iterate(). | 
 | 563 | 	Exclusion on struct file level is still provided (as well as that | 
 | 564 | 	between it and lseek on the same struct file), but if your directory | 
 | 565 | 	has been opened several times, you can get these called in parallel. | 
 | 566 | 	Exclusion between that method and all directory-modifying ones is | 
 | 567 | 	still provided, of course. | 
 | 568 |  | 
 | 569 | 	Often enough ->iterate() can serve as ->iterate_shared() without any | 
 | 570 | 	changes - it is a read-only operation, after all.  If you have any | 
 | 571 | 	per-inode or per-dentry in-core data structures modified by ->iterate(), | 
 | 572 | 	you might need something to serialize the access to them.  If you | 
 | 573 | 	do dcache pre-seeding, you'll need to switch to d_alloc_parallel() for | 
 | 574 | 	that; look for in-tree examples. | 
 | 575 |  | 
 | 576 | 	Old method is only used if the new one is absent; eventually it will | 
 | 577 | 	be removed.  Switch while you still can; the old one won't stay. | 
 | 578 | -- | 
 | 579 | [mandatory] | 
 | 580 | 	->atomic_open() calls without O_CREAT may happen in parallel. | 
 | 581 | -- | 
 | 582 | [mandatory] | 
 | 583 | 	->setxattr() and xattr_handler.set() get dentry and inode passed separately. | 
 | 584 | 	dentry might be yet to be attached to inode, so do _not_ use its ->d_inode | 
 | 585 | 	in the instances.  Rationale: !@#!@# security_d_instantiate() needs to be | 
 | 586 | 	called before we attach dentry to inode and !@#!@##!@$!$#!@#$!@$!@$ smack | 
 | 587 | 	->d_instantiate() uses not just ->getxattr() but ->setxattr() as well. | 
 | 588 | -- | 
 | 589 | [mandatory] | 
 | 590 | 	->d_compare() doesn't get parent as a separate argument anymore.  If you | 
 | 591 | 	used it for finding the struct super_block involved, dentry->d_sb will | 
 | 592 | 	work just as well; if it's something more complicated, use dentry->d_parent. | 
 | 593 | 	Just be careful not to assume that fetching it more than once will yield | 
 | 594 | 	the same value - in RCU mode it could change under you. | 
 | 595 | -- | 
 | 596 | [mandatory] | 
 | 597 | 	->rename() has an added flags argument.  Any flags not handled by the | 
 | 598 |         filesystem should result in EINVAL being returned. | 
 | 599 | -- | 
 | 600 | [recommended] | 
 | 601 | 	->readlink is optional for symlinks.  Don't set, unless filesystem needs | 
 | 602 | 	to fake something for readlink(2). | 
 | 603 | -- | 
 | 604 | [mandatory] | 
 | 605 | 	->getattr() is now passed a struct path rather than a vfsmount and | 
 | 606 | 	dentry separately, and it now has request_mask and query_flags arguments | 
 | 607 | 	to specify the fields and sync type requested by statx.  Filesystems not | 
 | 608 | 	supporting any statx-specific features may ignore the new arguments. | 
 | 609 | -- | 
 | 610 | [mandatory] | 
 | 611 |  | 
 | 612 | 	[should've been added in 2016] stale comment in finish_open() | 
 | 613 | 	nonwithstanding, failure exits in ->atomic_open() instances should | 
 | 614 | 	*NOT* fput() the file, no matter what.  Everything is handled by the | 
 | 615 | 	caller. |