| xj | b04a402 | 2021-11-25 15:01:52 +0800 | [diff] [blame] | 1 | =============================== | 
|  | 2 | FS-CACHE NETWORK FILESYSTEM API | 
|  | 3 | =============================== | 
|  | 4 |  | 
|  | 5 | There's an API by which a network filesystem can make use of the FS-Cache | 
|  | 6 | facilities.  This is based around a number of principles: | 
|  | 7 |  | 
|  | 8 | (1) Caches can store a number of different object types.  There are two main | 
|  | 9 | object types: indices and files.  The first is a special type used by | 
|  | 10 | FS-Cache to make finding objects faster and to make retiring of groups of | 
|  | 11 | objects easier. | 
|  | 12 |  | 
|  | 13 | (2) Every index, file or other object is represented by a cookie.  This cookie | 
|  | 14 | may or may not have anything associated with it, but the netfs doesn't | 
|  | 15 | need to care. | 
|  | 16 |  | 
|  | 17 | (3) Barring the top-level index (one entry per cached netfs), the index | 
|  | 18 | hierarchy for each netfs is structured according the whim of the netfs. | 
|  | 19 |  | 
|  | 20 | This API is declared in <linux/fscache.h>. | 
|  | 21 |  | 
|  | 22 | This document contains the following sections: | 
|  | 23 |  | 
|  | 24 | (1) Network filesystem definition | 
|  | 25 | (2) Index definition | 
|  | 26 | (3) Object definition | 
|  | 27 | (4) Network filesystem (un)registration | 
|  | 28 | (5) Cache tag lookup | 
|  | 29 | (6) Index registration | 
|  | 30 | (7) Data file registration | 
|  | 31 | (8) Miscellaneous object registration | 
|  | 32 | (9) Setting the data file size | 
|  | 33 | (10) Page alloc/read/write | 
|  | 34 | (11) Page uncaching | 
|  | 35 | (12) Index and data file consistency | 
|  | 36 | (13) Cookie enablement | 
|  | 37 | (14) Miscellaneous cookie operations | 
|  | 38 | (15) Cookie unregistration | 
|  | 39 | (16) Index invalidation | 
|  | 40 | (17) Data file invalidation | 
|  | 41 | (18) FS-Cache specific page flags. | 
|  | 42 |  | 
|  | 43 |  | 
|  | 44 | ============================= | 
|  | 45 | NETWORK FILESYSTEM DEFINITION | 
|  | 46 | ============================= | 
|  | 47 |  | 
|  | 48 | FS-Cache needs a description of the network filesystem.  This is specified | 
|  | 49 | using a record of the following structure: | 
|  | 50 |  | 
|  | 51 | struct fscache_netfs { | 
|  | 52 | uint32_t			version; | 
|  | 53 | const char			*name; | 
|  | 54 | struct fscache_cookie		*primary_index; | 
|  | 55 | ... | 
|  | 56 | }; | 
|  | 57 |  | 
|  | 58 | This first two fields should be filled in before registration, and the third | 
|  | 59 | will be filled in by the registration function; any other fields should just be | 
|  | 60 | ignored and are for internal use only. | 
|  | 61 |  | 
|  | 62 | The fields are: | 
|  | 63 |  | 
|  | 64 | (1) The name of the netfs (used as the key in the toplevel index). | 
|  | 65 |  | 
|  | 66 | (2) The version of the netfs (if the name matches but the version doesn't, the | 
|  | 67 | entire in-cache hierarchy for this netfs will be scrapped and begun | 
|  | 68 | afresh). | 
|  | 69 |  | 
|  | 70 | (3) The cookie representing the primary index will be allocated according to | 
|  | 71 | another parameter passed into the registration function. | 
|  | 72 |  | 
|  | 73 | For example, kAFS (linux/fs/afs/) uses the following definitions to describe | 
|  | 74 | itself: | 
|  | 75 |  | 
|  | 76 | struct fscache_netfs afs_cache_netfs = { | 
|  | 77 | .version	= 0, | 
|  | 78 | .name		= "afs", | 
|  | 79 | }; | 
|  | 80 |  | 
|  | 81 |  | 
|  | 82 | ================ | 
|  | 83 | INDEX DEFINITION | 
|  | 84 | ================ | 
|  | 85 |  | 
|  | 86 | Indices are used for two purposes: | 
|  | 87 |  | 
|  | 88 | (1) To aid the finding of a file based on a series of keys (such as AFS's | 
|  | 89 | "cell", "volume ID", "vnode ID"). | 
|  | 90 |  | 
|  | 91 | (2) To make it easier to discard a subset of all the files cached based around | 
|  | 92 | a particular key - for instance to mirror the removal of an AFS volume. | 
|  | 93 |  | 
|  | 94 | However, since it's unlikely that any two netfs's are going to want to define | 
|  | 95 | their index hierarchies in quite the same way, FS-Cache tries to impose as few | 
|  | 96 | restraints as possible on how an index is structured and where it is placed in | 
|  | 97 | the tree.  The netfs can even mix indices and data files at the same level, but | 
|  | 98 | it's not recommended. | 
|  | 99 |  | 
|  | 100 | Each index entry consists of a key of indeterminate length plus some auxiliary | 
|  | 101 | data, also of indeterminate length. | 
|  | 102 |  | 
|  | 103 | There are some limits on indices: | 
|  | 104 |  | 
|  | 105 | (1) Any index containing non-index objects should be restricted to a single | 
|  | 106 | cache.  Any such objects created within an index will be created in the | 
|  | 107 | first cache only.  The cache in which an index is created can be | 
|  | 108 | controlled by cache tags (see below). | 
|  | 109 |  | 
|  | 110 | (2) The entry data must be atomically journallable, so it is limited to about | 
|  | 111 | 400 bytes at present.  At least 400 bytes will be available. | 
|  | 112 |  | 
|  | 113 | (3) The depth of the index tree should be judged with care as the search | 
|  | 114 | function is recursive.  Too many layers will run the kernel out of stack. | 
|  | 115 |  | 
|  | 116 |  | 
|  | 117 | ================= | 
|  | 118 | OBJECT DEFINITION | 
|  | 119 | ================= | 
|  | 120 |  | 
|  | 121 | To define an object, a structure of the following type should be filled out: | 
|  | 122 |  | 
|  | 123 | struct fscache_cookie_def | 
|  | 124 | { | 
|  | 125 | uint8_t name[16]; | 
|  | 126 | uint8_t type; | 
|  | 127 |  | 
|  | 128 | struct fscache_cache_tag *(*select_cache)( | 
|  | 129 | const void *parent_netfs_data, | 
|  | 130 | const void *cookie_netfs_data); | 
|  | 131 |  | 
|  | 132 | enum fscache_checkaux (*check_aux)(void *cookie_netfs_data, | 
|  | 133 | const void *data, | 
|  | 134 | uint16_t datalen, | 
|  | 135 | loff_t object_size); | 
|  | 136 |  | 
|  | 137 | void (*get_context)(void *cookie_netfs_data, void *context); | 
|  | 138 |  | 
|  | 139 | void (*put_context)(void *cookie_netfs_data, void *context); | 
|  | 140 |  | 
|  | 141 | void (*mark_pages_cached)(void *cookie_netfs_data, | 
|  | 142 | struct address_space *mapping, | 
|  | 143 | struct pagevec *cached_pvec); | 
|  | 144 | }; | 
|  | 145 |  | 
|  | 146 | This has the following fields: | 
|  | 147 |  | 
|  | 148 | (1) The type of the object [mandatory]. | 
|  | 149 |  | 
|  | 150 | This is one of the following values: | 
|  | 151 |  | 
|  | 152 | (*) FSCACHE_COOKIE_TYPE_INDEX | 
|  | 153 |  | 
|  | 154 | This defines an index, which is a special FS-Cache type. | 
|  | 155 |  | 
|  | 156 | (*) FSCACHE_COOKIE_TYPE_DATAFILE | 
|  | 157 |  | 
|  | 158 | This defines an ordinary data file. | 
|  | 159 |  | 
|  | 160 | (*) Any other value between 2 and 255 | 
|  | 161 |  | 
|  | 162 | This defines an extraordinary object such as an XATTR. | 
|  | 163 |  | 
|  | 164 | (2) The name of the object type (NUL terminated unless all 16 chars are used) | 
|  | 165 | [optional]. | 
|  | 166 |  | 
|  | 167 | (3) A function to select the cache in which to store an index [optional]. | 
|  | 168 |  | 
|  | 169 | This function is invoked when an index needs to be instantiated in a cache | 
|  | 170 | during the instantiation of a non-index object.  Only the immediate index | 
|  | 171 | parent for the non-index object will be queried.  Any indices above that | 
|  | 172 | in the hierarchy may be stored in multiple caches.  This function does not | 
|  | 173 | need to be supplied for any non-index object or any index that will only | 
|  | 174 | have index children. | 
|  | 175 |  | 
|  | 176 | If this function is not supplied or if it returns NULL then the first | 
|  | 177 | cache in the parent's list will be chosen, or failing that, the first | 
|  | 178 | cache in the master list. | 
|  | 179 |  | 
|  | 180 | (4) A function to check the auxiliary data [optional]. | 
|  | 181 |  | 
|  | 182 | This function will be called to check that a match found in the cache for | 
|  | 183 | this object is valid.  For instance with AFS it could check the auxiliary | 
|  | 184 | data against the data version number returned by the server to determine | 
|  | 185 | whether the index entry in a cache is still valid. | 
|  | 186 |  | 
|  | 187 | If this function is absent, it will be assumed that matching objects in a | 
|  | 188 | cache are always valid. | 
|  | 189 |  | 
|  | 190 | The function is also passed the cache's idea of the object size and may | 
|  | 191 | use this to manage coherency also. | 
|  | 192 |  | 
|  | 193 | If present, the function should return one of the following values: | 
|  | 194 |  | 
|  | 195 | (*) FSCACHE_CHECKAUX_OKAY		- the entry is okay as is | 
|  | 196 | (*) FSCACHE_CHECKAUX_NEEDS_UPDATE	- the entry requires update | 
|  | 197 | (*) FSCACHE_CHECKAUX_OBSOLETE		- the entry should be deleted | 
|  | 198 |  | 
|  | 199 | This function can also be used to extract data from the auxiliary data in | 
|  | 200 | the cache and copy it into the netfs's structures. | 
|  | 201 |  | 
|  | 202 | (5) A pair of functions to manage contexts for the completion callback | 
|  | 203 | [optional]. | 
|  | 204 |  | 
|  | 205 | The cache read/write functions are passed a context which is then passed | 
|  | 206 | to the I/O completion callback function.  To ensure this context remains | 
|  | 207 | valid until after the I/O completion is called, two functions may be | 
|  | 208 | provided: one to get an extra reference on the context, and one to drop a | 
|  | 209 | reference to it. | 
|  | 210 |  | 
|  | 211 | If the context is not used or is a type of object that won't go out of | 
|  | 212 | scope, then these functions are not required.  These functions are not | 
|  | 213 | required for indices as indices may not contain data.  These functions may | 
|  | 214 | be called in interrupt context and so may not sleep. | 
|  | 215 |  | 
|  | 216 | (6) A function to mark a page as retaining cache metadata [optional]. | 
|  | 217 |  | 
|  | 218 | This is called by the cache to indicate that it is retaining in-memory | 
|  | 219 | information for this page and that the netfs should uncache the page when | 
|  | 220 | it has finished.  This does not indicate whether there's data on the disk | 
|  | 221 | or not.  Note that several pages at once may be presented for marking. | 
|  | 222 |  | 
|  | 223 | The PG_fscache bit is set on the pages before this function would be | 
|  | 224 | called, so the function need not be provided if this is sufficient. | 
|  | 225 |  | 
|  | 226 | This function is not required for indices as they're not permitted data. | 
|  | 227 |  | 
|  | 228 | (7) A function to unmark all the pages retaining cache metadata [mandatory]. | 
|  | 229 |  | 
|  | 230 | This is called by FS-Cache to indicate that a backing store is being | 
|  | 231 | unbound from a cookie and that all the marks on the pages should be | 
|  | 232 | cleared to prevent confusion.  Note that the cache will have torn down all | 
|  | 233 | its tracking information so that the pages don't need to be explicitly | 
|  | 234 | uncached. | 
|  | 235 |  | 
|  | 236 | This function is not required for indices as they're not permitted data. | 
|  | 237 |  | 
|  | 238 |  | 
|  | 239 | =================================== | 
|  | 240 | NETWORK FILESYSTEM (UN)REGISTRATION | 
|  | 241 | =================================== | 
|  | 242 |  | 
|  | 243 | The first step is to declare the network filesystem to the cache.  This also | 
|  | 244 | involves specifying the layout of the primary index (for AFS, this would be the | 
|  | 245 | "cell" level). | 
|  | 246 |  | 
|  | 247 | The registration function is: | 
|  | 248 |  | 
|  | 249 | int fscache_register_netfs(struct fscache_netfs *netfs); | 
|  | 250 |  | 
|  | 251 | It just takes a pointer to the netfs definition.  It returns 0 or an error as | 
|  | 252 | appropriate. | 
|  | 253 |  | 
|  | 254 | For kAFS, registration is done as follows: | 
|  | 255 |  | 
|  | 256 | ret = fscache_register_netfs(&afs_cache_netfs); | 
|  | 257 |  | 
|  | 258 | The last step is, of course, unregistration: | 
|  | 259 |  | 
|  | 260 | void fscache_unregister_netfs(struct fscache_netfs *netfs); | 
|  | 261 |  | 
|  | 262 |  | 
|  | 263 | ================ | 
|  | 264 | CACHE TAG LOOKUP | 
|  | 265 | ================ | 
|  | 266 |  | 
|  | 267 | FS-Cache permits the use of more than one cache.  To permit particular index | 
|  | 268 | subtrees to be bound to particular caches, the second step is to look up cache | 
|  | 269 | representation tags.  This step is optional; it can be left entirely up to | 
|  | 270 | FS-Cache as to which cache should be used.  The problem with doing that is that | 
|  | 271 | FS-Cache will always pick the first cache that was registered. | 
|  | 272 |  | 
|  | 273 | To get the representation for a named tag: | 
|  | 274 |  | 
|  | 275 | struct fscache_cache_tag *fscache_lookup_cache_tag(const char *name); | 
|  | 276 |  | 
|  | 277 | This takes a text string as the name and returns a representation of a tag.  It | 
|  | 278 | will never return an error.  It may return a dummy tag, however, if it runs out | 
|  | 279 | of memory; this will inhibit caching with this tag. | 
|  | 280 |  | 
|  | 281 | Any representation so obtained must be released by passing it to this function: | 
|  | 282 |  | 
|  | 283 | void fscache_release_cache_tag(struct fscache_cache_tag *tag); | 
|  | 284 |  | 
|  | 285 | The tag will be retrieved by FS-Cache when it calls the object definition | 
|  | 286 | operation select_cache(). | 
|  | 287 |  | 
|  | 288 |  | 
|  | 289 | ================== | 
|  | 290 | INDEX REGISTRATION | 
|  | 291 | ================== | 
|  | 292 |  | 
|  | 293 | The third step is to inform FS-Cache about part of an index hierarchy that can | 
|  | 294 | be used to locate files.  This is done by requesting a cookie for each index in | 
|  | 295 | the path to the file: | 
|  | 296 |  | 
|  | 297 | struct fscache_cookie * | 
|  | 298 | fscache_acquire_cookie(struct fscache_cookie *parent, | 
|  | 299 | const struct fscache_object_def *def, | 
|  | 300 | const void *index_key, | 
|  | 301 | size_t index_key_len, | 
|  | 302 | const void *aux_data, | 
|  | 303 | size_t aux_data_len, | 
|  | 304 | void *netfs_data, | 
|  | 305 | loff_t object_size, | 
|  | 306 | bool enable); | 
|  | 307 |  | 
|  | 308 | This function creates an index entry in the index represented by parent, | 
|  | 309 | filling in the index entry by calling the operations pointed to by def. | 
|  | 310 |  | 
|  | 311 | A unique key that represents the object within the parent must be pointed to by | 
|  | 312 | index_key and is of length index_key_len. | 
|  | 313 |  | 
|  | 314 | An optional blob of auxiliary data that is to be stored within the cache can be | 
|  | 315 | pointed to with aux_data and should be of length aux_data_len.  This would | 
|  | 316 | typically be used for storing coherency data. | 
|  | 317 |  | 
|  | 318 | The netfs may pass an arbitrary value in netfs_data and this will be presented | 
|  | 319 | to it in the event of any calling back.  This may also be used in tracing or | 
|  | 320 | logging of messages. | 
|  | 321 |  | 
|  | 322 | The cache tracks the size of the data attached to an object and this set to be | 
|  | 323 | object_size.  For indices, this should be 0.  This value will be passed to the | 
|  | 324 | ->check_aux() callback. | 
|  | 325 |  | 
|  | 326 | Note that this function never returns an error - all errors are handled | 
|  | 327 | internally.  It may, however, return NULL to indicate no cookie.  It is quite | 
|  | 328 | acceptable to pass this token back to this function as the parent to another | 
|  | 329 | acquisition (or even to the relinquish cookie, read page and write page | 
|  | 330 | functions - see below). | 
|  | 331 |  | 
|  | 332 | Note also that no indices are actually created in a cache until a non-index | 
|  | 333 | object needs to be created somewhere down the hierarchy.  Furthermore, an index | 
|  | 334 | may be created in several different caches independently at different times. | 
|  | 335 | This is all handled transparently, and the netfs doesn't see any of it. | 
|  | 336 |  | 
|  | 337 | A cookie will be created in the disabled state if enabled is false.  A cookie | 
|  | 338 | must be enabled to do anything with it.  A disabled cookie can be enabled by | 
|  | 339 | calling fscache_enable_cookie() (see below). | 
|  | 340 |  | 
|  | 341 | For example, with AFS, a cell would be added to the primary index.  This index | 
|  | 342 | entry would have a dependent inode containing volume mappings within this cell: | 
|  | 343 |  | 
|  | 344 | cell->cache = | 
|  | 345 | fscache_acquire_cookie(afs_cache_netfs.primary_index, | 
|  | 346 | &afs_cell_cache_index_def, | 
|  | 347 | cell->name, strlen(cell->name), | 
|  | 348 | NULL, 0, | 
|  | 349 | cell, 0, true); | 
|  | 350 |  | 
|  | 351 | And then a particular volume could be added to that index by ID, creating | 
|  | 352 | another index for vnodes (AFS inode equivalents): | 
|  | 353 |  | 
|  | 354 | volume->cache = | 
|  | 355 | fscache_acquire_cookie(volume->cell->cache, | 
|  | 356 | &afs_volume_cache_index_def, | 
|  | 357 | &volume->vid, sizeof(volume->vid), | 
|  | 358 | NULL, 0, | 
|  | 359 | volume, 0, true); | 
|  | 360 |  | 
|  | 361 |  | 
|  | 362 | ====================== | 
|  | 363 | DATA FILE REGISTRATION | 
|  | 364 | ====================== | 
|  | 365 |  | 
|  | 366 | The fourth step is to request a data file be created in the cache.  This is | 
|  | 367 | identical to index cookie acquisition.  The only difference is that the type in | 
|  | 368 | the object definition should be something other than index type. | 
|  | 369 |  | 
|  | 370 | vnode->cache = | 
|  | 371 | fscache_acquire_cookie(volume->cache, | 
|  | 372 | &afs_vnode_cache_object_def, | 
|  | 373 | &key, sizeof(key), | 
|  | 374 | &aux, sizeof(aux), | 
|  | 375 | vnode, vnode->status.size, true); | 
|  | 376 |  | 
|  | 377 |  | 
|  | 378 | ================================= | 
|  | 379 | MISCELLANEOUS OBJECT REGISTRATION | 
|  | 380 | ================================= | 
|  | 381 |  | 
|  | 382 | An optional step is to request an object of miscellaneous type be created in | 
|  | 383 | the cache.  This is almost identical to index cookie acquisition.  The only | 
|  | 384 | difference is that the type in the object definition should be something other | 
|  | 385 | than index type.  Whilst the parent object could be an index, it's more likely | 
|  | 386 | it would be some other type of object such as a data file. | 
|  | 387 |  | 
|  | 388 | xattr->cache = | 
|  | 389 | fscache_acquire_cookie(vnode->cache, | 
|  | 390 | &afs_xattr_cache_object_def, | 
|  | 391 | &xattr->name, strlen(xattr->name), | 
|  | 392 | NULL, 0, | 
|  | 393 | xattr, strlen(xattr->val), true); | 
|  | 394 |  | 
|  | 395 | Miscellaneous objects might be used to store extended attributes or directory | 
|  | 396 | entries for example. | 
|  | 397 |  | 
|  | 398 |  | 
|  | 399 | ========================== | 
|  | 400 | SETTING THE DATA FILE SIZE | 
|  | 401 | ========================== | 
|  | 402 |  | 
|  | 403 | The fifth step is to set the physical attributes of the file, such as its size. | 
|  | 404 | This doesn't automatically reserve any space in the cache, but permits the | 
|  | 405 | cache to adjust its metadata for data tracking appropriately: | 
|  | 406 |  | 
|  | 407 | int fscache_attr_changed(struct fscache_cookie *cookie); | 
|  | 408 |  | 
|  | 409 | The cache will return -ENOBUFS if there is no backing cache or if there is no | 
|  | 410 | space to allocate any extra metadata required in the cache. | 
|  | 411 |  | 
|  | 412 | Note that attempts to read or write data pages in the cache over this size may | 
|  | 413 | be rebuffed with -ENOBUFS. | 
|  | 414 |  | 
|  | 415 | This operation schedules an attribute adjustment to happen asynchronously at | 
|  | 416 | some point in the future, and as such, it may happen after the function returns | 
|  | 417 | to the caller.  The attribute adjustment excludes read and write operations. | 
|  | 418 |  | 
|  | 419 |  | 
|  | 420 | ===================== | 
|  | 421 | PAGE ALLOC/READ/WRITE | 
|  | 422 | ===================== | 
|  | 423 |  | 
|  | 424 | And the sixth step is to store and retrieve pages in the cache.  There are | 
|  | 425 | three functions that are used to do this. | 
|  | 426 |  | 
|  | 427 | Note: | 
|  | 428 |  | 
|  | 429 | (1) A page should not be re-read or re-allocated without uncaching it first. | 
|  | 430 |  | 
|  | 431 | (2) A read or allocated page must be uncached when the netfs page is released | 
|  | 432 | from the pagecache. | 
|  | 433 |  | 
|  | 434 | (3) A page should only be written to the cache if previous read or allocated. | 
|  | 435 |  | 
|  | 436 | This permits the cache to maintain its page tracking in proper order. | 
|  | 437 |  | 
|  | 438 |  | 
|  | 439 | PAGE READ | 
|  | 440 | --------- | 
|  | 441 |  | 
|  | 442 | Firstly, the netfs should ask FS-Cache to examine the caches and read the | 
|  | 443 | contents cached for a particular page of a particular file if present, or else | 
|  | 444 | allocate space to store the contents if not: | 
|  | 445 |  | 
|  | 446 | typedef | 
|  | 447 | void (*fscache_rw_complete_t)(struct page *page, | 
|  | 448 | void *context, | 
|  | 449 | int error); | 
|  | 450 |  | 
|  | 451 | int fscache_read_or_alloc_page(struct fscache_cookie *cookie, | 
|  | 452 | struct page *page, | 
|  | 453 | fscache_rw_complete_t end_io_func, | 
|  | 454 | void *context, | 
|  | 455 | gfp_t gfp); | 
|  | 456 |  | 
|  | 457 | The cookie argument must specify a cookie for an object that isn't an index, | 
|  | 458 | the page specified will have the data loaded into it (and is also used to | 
|  | 459 | specify the page number), and the gfp argument is used to control how any | 
|  | 460 | memory allocations made are satisfied. | 
|  | 461 |  | 
|  | 462 | If the cookie indicates the inode is not cached: | 
|  | 463 |  | 
|  | 464 | (1) The function will return -ENOBUFS. | 
|  | 465 |  | 
|  | 466 | Else if there's a copy of the page resident in the cache: | 
|  | 467 |  | 
|  | 468 | (1) The mark_pages_cached() cookie operation will be called on that page. | 
|  | 469 |  | 
|  | 470 | (2) The function will submit a request to read the data from the cache's | 
|  | 471 | backing device directly into the page specified. | 
|  | 472 |  | 
|  | 473 | (3) The function will return 0. | 
|  | 474 |  | 
|  | 475 | (4) When the read is complete, end_io_func() will be invoked with: | 
|  | 476 |  | 
|  | 477 | (*) The netfs data supplied when the cookie was created. | 
|  | 478 |  | 
|  | 479 | (*) The page descriptor. | 
|  | 480 |  | 
|  | 481 | (*) The context argument passed to the above function.  This will be | 
|  | 482 | maintained with the get_context/put_context functions mentioned above. | 
|  | 483 |  | 
|  | 484 | (*) An argument that's 0 on success or negative for an error code. | 
|  | 485 |  | 
|  | 486 | If an error occurs, it should be assumed that the page contains no usable | 
|  | 487 | data.  fscache_readpages_cancel() may need to be called. | 
|  | 488 |  | 
|  | 489 | end_io_func() will be called in process context if the read is results in | 
|  | 490 | an error, but it might be called in interrupt context if the read is | 
|  | 491 | successful. | 
|  | 492 |  | 
|  | 493 | Otherwise, if there's not a copy available in cache, but the cache may be able | 
|  | 494 | to store the page: | 
|  | 495 |  | 
|  | 496 | (1) The mark_pages_cached() cookie operation will be called on that page. | 
|  | 497 |  | 
|  | 498 | (2) A block may be reserved in the cache and attached to the object at the | 
|  | 499 | appropriate place. | 
|  | 500 |  | 
|  | 501 | (3) The function will return -ENODATA. | 
|  | 502 |  | 
|  | 503 | This function may also return -ENOMEM or -EINTR, in which case it won't have | 
|  | 504 | read any data from the cache. | 
|  | 505 |  | 
|  | 506 |  | 
|  | 507 | PAGE ALLOCATE | 
|  | 508 | ------------- | 
|  | 509 |  | 
|  | 510 | Alternatively, if there's not expected to be any data in the cache for a page | 
|  | 511 | because the file has been extended, a block can simply be allocated instead: | 
|  | 512 |  | 
|  | 513 | int fscache_alloc_page(struct fscache_cookie *cookie, | 
|  | 514 | struct page *page, | 
|  | 515 | gfp_t gfp); | 
|  | 516 |  | 
|  | 517 | This is similar to the fscache_read_or_alloc_page() function, except that it | 
|  | 518 | never reads from the cache.  It will return 0 if a block has been allocated, | 
|  | 519 | rather than -ENODATA as the other would.  One or the other must be performed | 
|  | 520 | before writing to the cache. | 
|  | 521 |  | 
|  | 522 | The mark_pages_cached() cookie operation will be called on the page if | 
|  | 523 | successful. | 
|  | 524 |  | 
|  | 525 |  | 
|  | 526 | PAGE WRITE | 
|  | 527 | ---------- | 
|  | 528 |  | 
|  | 529 | Secondly, if the netfs changes the contents of the page (either due to an | 
|  | 530 | initial download or if a user performs a write), then the page should be | 
|  | 531 | written back to the cache: | 
|  | 532 |  | 
|  | 533 | int fscache_write_page(struct fscache_cookie *cookie, | 
|  | 534 | struct page *page, | 
|  | 535 | loff_t object_size, | 
|  | 536 | gfp_t gfp); | 
|  | 537 |  | 
|  | 538 | The cookie argument must specify a data file cookie, the page specified should | 
|  | 539 | contain the data to be written (and is also used to specify the page number), | 
|  | 540 | object_size is the revised size of the object and the gfp argument is used to | 
|  | 541 | control how any memory allocations made are satisfied. | 
|  | 542 |  | 
|  | 543 | The page must have first been read or allocated successfully and must not have | 
|  | 544 | been uncached before writing is performed. | 
|  | 545 |  | 
|  | 546 | If the cookie indicates the inode is not cached then: | 
|  | 547 |  | 
|  | 548 | (1) The function will return -ENOBUFS. | 
|  | 549 |  | 
|  | 550 | Else if space can be allocated in the cache to hold this page: | 
|  | 551 |  | 
|  | 552 | (1) PG_fscache_write will be set on the page. | 
|  | 553 |  | 
|  | 554 | (2) The function will submit a request to write the data to cache's backing | 
|  | 555 | device directly from the page specified. | 
|  | 556 |  | 
|  | 557 | (3) The function will return 0. | 
|  | 558 |  | 
|  | 559 | (4) When the write is complete PG_fscache_write is cleared on the page and | 
|  | 560 | anyone waiting for that bit will be woken up. | 
|  | 561 |  | 
|  | 562 | Else if there's no space available in the cache, -ENOBUFS will be returned.  It | 
|  | 563 | is also possible for the PG_fscache_write bit to be cleared when no write took | 
|  | 564 | place if unforeseen circumstances arose (such as a disk error). | 
|  | 565 |  | 
|  | 566 | Writing takes place asynchronously. | 
|  | 567 |  | 
|  | 568 |  | 
|  | 569 | MULTIPLE PAGE READ | 
|  | 570 | ------------------ | 
|  | 571 |  | 
|  | 572 | A facility is provided to read several pages at once, as requested by the | 
|  | 573 | readpages() address space operation: | 
|  | 574 |  | 
|  | 575 | int fscache_read_or_alloc_pages(struct fscache_cookie *cookie, | 
|  | 576 | struct address_space *mapping, | 
|  | 577 | struct list_head *pages, | 
|  | 578 | int *nr_pages, | 
|  | 579 | fscache_rw_complete_t end_io_func, | 
|  | 580 | void *context, | 
|  | 581 | gfp_t gfp); | 
|  | 582 |  | 
|  | 583 | This works in a similar way to fscache_read_or_alloc_page(), except: | 
|  | 584 |  | 
|  | 585 | (1) Any page it can retrieve data for is removed from pages and nr_pages and | 
|  | 586 | dispatched for reading to the disk.  Reads of adjacent pages on disk may | 
|  | 587 | be merged for greater efficiency. | 
|  | 588 |  | 
|  | 589 | (2) The mark_pages_cached() cookie operation will be called on several pages | 
|  | 590 | at once if they're being read or allocated. | 
|  | 591 |  | 
|  | 592 | (3) If there was an general error, then that error will be returned. | 
|  | 593 |  | 
|  | 594 | Else if some pages couldn't be allocated or read, then -ENOBUFS will be | 
|  | 595 | returned. | 
|  | 596 |  | 
|  | 597 | Else if some pages couldn't be read but were allocated, then -ENODATA will | 
|  | 598 | be returned. | 
|  | 599 |  | 
|  | 600 | Otherwise, if all pages had reads dispatched, then 0 will be returned, the | 
|  | 601 | list will be empty and *nr_pages will be 0. | 
|  | 602 |  | 
|  | 603 | (4) end_io_func will be called once for each page being read as the reads | 
|  | 604 | complete.  It will be called in process context if error != 0, but it may | 
|  | 605 | be called in interrupt context if there is no error. | 
|  | 606 |  | 
|  | 607 | Note that a return of -ENODATA, -ENOBUFS or any other error does not preclude | 
|  | 608 | some of the pages being read and some being allocated.  Those pages will have | 
|  | 609 | been marked appropriately and will need uncaching. | 
|  | 610 |  | 
|  | 611 |  | 
|  | 612 | CANCELLATION OF UNREAD PAGES | 
|  | 613 | ---------------------------- | 
|  | 614 |  | 
|  | 615 | If one or more pages are passed to fscache_read_or_alloc_pages() but not then | 
|  | 616 | read from the cache and also not read from the underlying filesystem then | 
|  | 617 | those pages will need to have any marks and reservations removed.  This can be | 
|  | 618 | done by calling: | 
|  | 619 |  | 
|  | 620 | void fscache_readpages_cancel(struct fscache_cookie *cookie, | 
|  | 621 | struct list_head *pages); | 
|  | 622 |  | 
|  | 623 | prior to returning to the caller.  The cookie argument should be as passed to | 
|  | 624 | fscache_read_or_alloc_pages().  Every page in the pages list will be examined | 
|  | 625 | and any that have PG_fscache set will be uncached. | 
|  | 626 |  | 
|  | 627 |  | 
|  | 628 | ============== | 
|  | 629 | PAGE UNCACHING | 
|  | 630 | ============== | 
|  | 631 |  | 
|  | 632 | To uncache a page, this function should be called: | 
|  | 633 |  | 
|  | 634 | void fscache_uncache_page(struct fscache_cookie *cookie, | 
|  | 635 | struct page *page); | 
|  | 636 |  | 
|  | 637 | This function permits the cache to release any in-memory representation it | 
|  | 638 | might be holding for this netfs page.  This function must be called once for | 
|  | 639 | each page on which the read or write page functions above have been called to | 
|  | 640 | make sure the cache's in-memory tracking information gets torn down. | 
|  | 641 |  | 
|  | 642 | Note that pages can't be explicitly deleted from the a data file.  The whole | 
|  | 643 | data file must be retired (see the relinquish cookie function below). | 
|  | 644 |  | 
|  | 645 | Furthermore, note that this does not cancel the asynchronous read or write | 
|  | 646 | operation started by the read/alloc and write functions, so the page | 
|  | 647 | invalidation functions must use: | 
|  | 648 |  | 
|  | 649 | bool fscache_check_page_write(struct fscache_cookie *cookie, | 
|  | 650 | struct page *page); | 
|  | 651 |  | 
|  | 652 | to see if a page is being written to the cache, and: | 
|  | 653 |  | 
|  | 654 | void fscache_wait_on_page_write(struct fscache_cookie *cookie, | 
|  | 655 | struct page *page); | 
|  | 656 |  | 
|  | 657 | to wait for it to finish if it is. | 
|  | 658 |  | 
|  | 659 |  | 
|  | 660 | When releasepage() is being implemented, a special FS-Cache function exists to | 
|  | 661 | manage the heuristics of coping with vmscan trying to eject pages, which may | 
|  | 662 | conflict with the cache trying to write pages to the cache (which may itself | 
|  | 663 | need to allocate memory): | 
|  | 664 |  | 
|  | 665 | bool fscache_maybe_release_page(struct fscache_cookie *cookie, | 
|  | 666 | struct page *page, | 
|  | 667 | gfp_t gfp); | 
|  | 668 |  | 
|  | 669 | This takes the netfs cookie, and the page and gfp arguments as supplied to | 
|  | 670 | releasepage().  It will return false if the page cannot be released yet for | 
|  | 671 | some reason and if it returns true, the page has been uncached and can now be | 
|  | 672 | released. | 
|  | 673 |  | 
|  | 674 | To make a page available for release, this function may wait for an outstanding | 
|  | 675 | storage request to complete, or it may attempt to cancel the storage request - | 
|  | 676 | in which case the page will not be stored in the cache this time. | 
|  | 677 |  | 
|  | 678 |  | 
|  | 679 | BULK INODE PAGE UNCACHE | 
|  | 680 | ----------------------- | 
|  | 681 |  | 
|  | 682 | A convenience routine is provided to perform an uncache on all the pages | 
|  | 683 | attached to an inode.  This assumes that the pages on the inode correspond on a | 
|  | 684 | 1:1 basis with the pages in the cache. | 
|  | 685 |  | 
|  | 686 | void fscache_uncache_all_inode_pages(struct fscache_cookie *cookie, | 
|  | 687 | struct inode *inode); | 
|  | 688 |  | 
|  | 689 | This takes the netfs cookie that the pages were cached with and the inode that | 
|  | 690 | the pages are attached to.  This function will wait for pages to finish being | 
|  | 691 | written to the cache and for the cache to finish with the page generally.  No | 
|  | 692 | error is returned. | 
|  | 693 |  | 
|  | 694 |  | 
|  | 695 | =============================== | 
|  | 696 | INDEX AND DATA FILE CONSISTENCY | 
|  | 697 | =============================== | 
|  | 698 |  | 
|  | 699 | To find out whether auxiliary data for an object is up to data within the | 
|  | 700 | cache, the following function can be called: | 
|  | 701 |  | 
|  | 702 | int fscache_check_consistency(struct fscache_cookie *cookie, | 
|  | 703 | const void *aux_data); | 
|  | 704 |  | 
|  | 705 | This will call back to the netfs to check whether the auxiliary data associated | 
|  | 706 | with a cookie is correct; if aux_data is non-NULL, it will update the auxiliary | 
|  | 707 | data buffer first.  It returns 0 if it is and -ESTALE if it isn't; it may also | 
|  | 708 | return -ENOMEM and -ERESTARTSYS. | 
|  | 709 |  | 
|  | 710 | To request an update of the index data for an index or other object, the | 
|  | 711 | following function should be called: | 
|  | 712 |  | 
|  | 713 | void fscache_update_cookie(struct fscache_cookie *cookie, | 
|  | 714 | const void *aux_data); | 
|  | 715 |  | 
|  | 716 | This function will update the cookie's auxiliary data buffer from aux_data if | 
|  | 717 | that is non-NULL and then schedule this to be stored on disk.  The update | 
|  | 718 | method in the parent index definition will be called to transfer the data. | 
|  | 719 |  | 
|  | 720 | Note that partial updates may happen automatically at other times, such as when | 
|  | 721 | data blocks are added to a data file object. | 
|  | 722 |  | 
|  | 723 |  | 
|  | 724 | ================= | 
|  | 725 | COOKIE ENABLEMENT | 
|  | 726 | ================= | 
|  | 727 |  | 
|  | 728 | Cookies exist in one of two states: enabled and disabled.  If a cookie is | 
|  | 729 | disabled, it ignores all attempts to acquire child cookies; check, update or | 
|  | 730 | invalidate its state; allocate, read or write backing pages - though it is | 
|  | 731 | still possible to uncache pages and relinquish the cookie. | 
|  | 732 |  | 
|  | 733 | The initial enablement state is set by fscache_acquire_cookie(), but the cookie | 
|  | 734 | can be enabled or disabled later.  To disable a cookie, call: | 
|  | 735 |  | 
|  | 736 | void fscache_disable_cookie(struct fscache_cookie *cookie, | 
|  | 737 | const void *aux_data, | 
|  | 738 | bool invalidate); | 
|  | 739 |  | 
|  | 740 | If the cookie is not already disabled, this locks the cookie against other | 
|  | 741 | enable and disable ops, marks the cookie as being disabled, discards or | 
|  | 742 | invalidates any backing objects and waits for cessation of activity on any | 
|  | 743 | associated object before unlocking the cookie. | 
|  | 744 |  | 
|  | 745 | All possible failures are handled internally.  The caller should consider | 
|  | 746 | calling fscache_uncache_all_inode_pages() afterwards to make sure all page | 
|  | 747 | markings are cleared up. | 
|  | 748 |  | 
|  | 749 | Cookies can be enabled or reenabled with: | 
|  | 750 |  | 
|  | 751 | void fscache_enable_cookie(struct fscache_cookie *cookie, | 
|  | 752 | const void *aux_data, | 
|  | 753 | loff_t object_size, | 
|  | 754 | bool (*can_enable)(void *data), | 
|  | 755 | void *data) | 
|  | 756 |  | 
|  | 757 | If the cookie is not already enabled, this locks the cookie against other | 
|  | 758 | enable and disable ops, invokes can_enable() and, if the cookie is not an index | 
|  | 759 | cookie, will begin the procedure of acquiring backing objects. | 
|  | 760 |  | 
|  | 761 | The optional can_enable() function is passed the data argument and returns a | 
|  | 762 | ruling as to whether or not enablement should actually be permitted to begin. | 
|  | 763 |  | 
|  | 764 | All possible failures are handled internally.  The cookie will only be marked | 
|  | 765 | as enabled if provisional backing objects are allocated. | 
|  | 766 |  | 
|  | 767 | The object's data size is updated from object_size and is passed to the | 
|  | 768 | ->check_aux() function. | 
|  | 769 |  | 
|  | 770 | In both cases, the cookie's auxiliary data buffer is updated from aux_data if | 
|  | 771 | that is non-NULL inside the enablement lock before proceeding. | 
|  | 772 |  | 
|  | 773 |  | 
|  | 774 | =============================== | 
|  | 775 | MISCELLANEOUS COOKIE OPERATIONS | 
|  | 776 | =============================== | 
|  | 777 |  | 
|  | 778 | There are a number of operations that can be used to control cookies: | 
|  | 779 |  | 
|  | 780 | (*) Cookie pinning: | 
|  | 781 |  | 
|  | 782 | int fscache_pin_cookie(struct fscache_cookie *cookie); | 
|  | 783 | void fscache_unpin_cookie(struct fscache_cookie *cookie); | 
|  | 784 |  | 
|  | 785 | These operations permit data cookies to be pinned into the cache and to | 
|  | 786 | have the pinning removed.  They are not permitted on index cookies. | 
|  | 787 |  | 
|  | 788 | The pinning function will return 0 if successful, -ENOBUFS in the cookie | 
|  | 789 | isn't backed by a cache, -EOPNOTSUPP if the cache doesn't support pinning, | 
|  | 790 | -ENOSPC if there isn't enough space to honour the operation, -ENOMEM or | 
|  | 791 | -EIO if there's any other problem. | 
|  | 792 |  | 
|  | 793 | (*) Data space reservation: | 
|  | 794 |  | 
|  | 795 | int fscache_reserve_space(struct fscache_cookie *cookie, loff_t size); | 
|  | 796 |  | 
|  | 797 | This permits a netfs to request cache space be reserved to store up to the | 
|  | 798 | given amount of a file.  It is permitted to ask for more than the current | 
|  | 799 | size of the file to allow for future file expansion. | 
|  | 800 |  | 
|  | 801 | If size is given as zero then the reservation will be cancelled. | 
|  | 802 |  | 
|  | 803 | The function will return 0 if successful, -ENOBUFS in the cookie isn't | 
|  | 804 | backed by a cache, -EOPNOTSUPP if the cache doesn't support reservations, | 
|  | 805 | -ENOSPC if there isn't enough space to honour the operation, -ENOMEM or | 
|  | 806 | -EIO if there's any other problem. | 
|  | 807 |  | 
|  | 808 | Note that this doesn't pin an object in a cache; it can still be culled to | 
|  | 809 | make space if it's not in use. | 
|  | 810 |  | 
|  | 811 |  | 
|  | 812 | ===================== | 
|  | 813 | COOKIE UNREGISTRATION | 
|  | 814 | ===================== | 
|  | 815 |  | 
|  | 816 | To get rid of a cookie, this function should be called. | 
|  | 817 |  | 
|  | 818 | void fscache_relinquish_cookie(struct fscache_cookie *cookie, | 
|  | 819 | const void *aux_data, | 
|  | 820 | bool retire); | 
|  | 821 |  | 
|  | 822 | If retire is non-zero, then the object will be marked for recycling, and all | 
|  | 823 | copies of it will be removed from all active caches in which it is present. | 
|  | 824 | Not only that but all child objects will also be retired. | 
|  | 825 |  | 
|  | 826 | If retire is zero, then the object may be available again when next the | 
|  | 827 | acquisition function is called.  Retirement here will overrule the pinning on a | 
|  | 828 | cookie. | 
|  | 829 |  | 
|  | 830 | The cookie's auxiliary data will be updated from aux_data if that is non-NULL | 
|  | 831 | so that the cache can lazily update it on disk. | 
|  | 832 |  | 
|  | 833 | One very important note - relinquish must NOT be called for a cookie unless all | 
|  | 834 | the cookies for "child" indices, objects and pages have been relinquished | 
|  | 835 | first. | 
|  | 836 |  | 
|  | 837 |  | 
|  | 838 | ================== | 
|  | 839 | INDEX INVALIDATION | 
|  | 840 | ================== | 
|  | 841 |  | 
|  | 842 | There is no direct way to invalidate an index subtree.  To do this, the caller | 
|  | 843 | should relinquish and retire the cookie they have, and then acquire a new one. | 
|  | 844 |  | 
|  | 845 |  | 
|  | 846 | ====================== | 
|  | 847 | DATA FILE INVALIDATION | 
|  | 848 | ====================== | 
|  | 849 |  | 
|  | 850 | Sometimes it will be necessary to invalidate an object that contains data. | 
|  | 851 | Typically this will be necessary when the server tells the netfs of a foreign | 
|  | 852 | change - at which point the netfs has to throw away all the state it had for an | 
|  | 853 | inode and reload from the server. | 
|  | 854 |  | 
|  | 855 | To indicate that a cache object should be invalidated, the following function | 
|  | 856 | can be called: | 
|  | 857 |  | 
|  | 858 | void fscache_invalidate(struct fscache_cookie *cookie); | 
|  | 859 |  | 
|  | 860 | This can be called with spinlocks held as it defers the work to a thread pool. | 
|  | 861 | All extant storage, retrieval and attribute change ops at this point are | 
|  | 862 | cancelled and discarded.  Some future operations will be rejected until the | 
|  | 863 | cache has had a chance to insert a barrier in the operations queue.  After | 
|  | 864 | that, operations will be queued again behind the invalidation operation. | 
|  | 865 |  | 
|  | 866 | The invalidation operation will perform an attribute change operation and an | 
|  | 867 | auxiliary data update operation as it is very likely these will have changed. | 
|  | 868 |  | 
|  | 869 | Using the following function, the netfs can wait for the invalidation operation | 
|  | 870 | to have reached a point at which it can start submitting ordinary operations | 
|  | 871 | once again: | 
|  | 872 |  | 
|  | 873 | void fscache_wait_on_invalidate(struct fscache_cookie *cookie); | 
|  | 874 |  | 
|  | 875 |  | 
|  | 876 | =========================== | 
|  | 877 | FS-CACHE SPECIFIC PAGE FLAG | 
|  | 878 | =========================== | 
|  | 879 |  | 
|  | 880 | FS-Cache makes use of a page flag, PG_private_2, for its own purpose.  This is | 
|  | 881 | given the alternative name PG_fscache. | 
|  | 882 |  | 
|  | 883 | PG_fscache is used to indicate that the page is known by the cache, and that | 
|  | 884 | the cache must be informed if the page is going to go away.  It's an indication | 
|  | 885 | to the netfs that the cache has an interest in this page, where an interest may | 
|  | 886 | be a pointer to it, resources allocated or reserved for it, or I/O in progress | 
|  | 887 | upon it. | 
|  | 888 |  | 
|  | 889 | The netfs can use this information in methods such as releasepage() to | 
|  | 890 | determine whether it needs to uncache a page or update it. | 
|  | 891 |  | 
|  | 892 | Furthermore, if this bit is set, releasepage() and invalidatepage() operations | 
|  | 893 | will be called on a page to get rid of it, even if PG_private is not set.  This | 
|  | 894 | allows caching to attempted on a page before read_cache_pages() to be called | 
|  | 895 | after fscache_read_or_alloc_pages() as the former will try and release pages it | 
|  | 896 | was given under certain circumstances. | 
|  | 897 |  | 
|  | 898 | This bit does not overlap with such as PG_private.  This means that FS-Cache | 
|  | 899 | can be used with a filesystem that uses the block buffering code. | 
|  | 900 |  | 
|  | 901 | There are a number of operations defined on this flag: | 
|  | 902 |  | 
|  | 903 | int PageFsCache(struct page *page); | 
|  | 904 | void SetPageFsCache(struct page *page) | 
|  | 905 | void ClearPageFsCache(struct page *page) | 
|  | 906 | int TestSetPageFsCache(struct page *page) | 
|  | 907 | int TestClearPageFsCache(struct page *page) | 
|  | 908 |  | 
|  | 909 | These functions are bit test, bit set, bit clear, bit test and set and bit | 
|  | 910 | test and clear operations on PG_fscache. |