| xj | b04a402 | 2021-11-25 15:01:52 +0800 | [diff] [blame] | 1 | Reference counting in pnfs: | 
|  | 2 | ========================== | 
|  | 3 |  | 
|  | 4 | The are several inter-related caches.  We have layouts which can | 
|  | 5 | reference multiple devices, each of which can reference multiple data servers. | 
|  | 6 | Each data server can be referenced by multiple devices.  Each device | 
|  | 7 | can be referenced by multiple layouts.  To keep all of this straight, | 
|  | 8 | we need to reference count. | 
|  | 9 |  | 
|  | 10 |  | 
|  | 11 | struct pnfs_layout_hdr | 
|  | 12 | ---------------------- | 
|  | 13 | The on-the-wire command LAYOUTGET corresponds to struct | 
|  | 14 | pnfs_layout_segment, usually referred to by the variable name lseg. | 
|  | 15 | Each nfs_inode may hold a pointer to a cache of these layout | 
|  | 16 | segments in nfsi->layout, of type struct pnfs_layout_hdr. | 
|  | 17 |  | 
|  | 18 | We reference the header for the inode pointing to it, across each | 
|  | 19 | outstanding RPC call that references it (LAYOUTGET, LAYOUTRETURN, | 
|  | 20 | LAYOUTCOMMIT), and for each lseg held within. | 
|  | 21 |  | 
|  | 22 | Each header is also (when non-empty) put on a list associated with | 
|  | 23 | struct nfs_client (cl_layouts).  Being put on this list does not bump | 
|  | 24 | the reference count, as the layout is kept around by the lseg that | 
|  | 25 | keeps it in the list. | 
|  | 26 |  | 
|  | 27 | deviceid_cache | 
|  | 28 | -------------- | 
|  | 29 | lsegs reference device ids, which are resolved per nfs_client and | 
|  | 30 | layout driver type.  The device ids are held in a RCU cache (struct | 
|  | 31 | nfs4_deviceid_cache).  The cache itself is referenced across each | 
|  | 32 | mount.  The entries (struct nfs4_deviceid) themselves are held across | 
|  | 33 | the lifetime of each lseg referencing them. | 
|  | 34 |  | 
|  | 35 | RCU is used because the deviceid is basically a write once, read many | 
|  | 36 | data structure.  The hlist size of 32 buckets needs better | 
|  | 37 | justification, but seems reasonable given that we can have multiple | 
|  | 38 | deviceid's per filesystem, and multiple filesystems per nfs_client. | 
|  | 39 |  | 
|  | 40 | The hash code is copied from the nfsd code base.  A discussion of | 
|  | 41 | hashing and variations of this algorithm can be found at: | 
|  | 42 | http://groups.google.com/group/comp.lang.c/browse_thread/thread/9522965e2b8d3809 | 
|  | 43 |  | 
|  | 44 | data server cache | 
|  | 45 | ----------------- | 
|  | 46 | file driver devices refer to data servers, which are kept in a module | 
|  | 47 | level cache.  Its reference is held over the lifetime of the deviceid | 
|  | 48 | pointing to it. | 
|  | 49 |  | 
|  | 50 | lseg | 
|  | 51 | ---- | 
|  | 52 | lseg maintains an extra reference corresponding to the NFS_LSEG_VALID | 
|  | 53 | bit which holds it in the pnfs_layout_hdr's list.  When the final lseg | 
|  | 54 | is removed from the pnfs_layout_hdr's list, the NFS_LAYOUT_DESTROYED | 
|  | 55 | bit is set, preventing any new lsegs from being added. | 
|  | 56 |  | 
|  | 57 | layout drivers | 
|  | 58 | -------------- | 
|  | 59 |  | 
|  | 60 | PNFS utilizes what is called layout drivers. The STD defines 4 basic | 
|  | 61 | layout types: "files", "objects", "blocks", and "flexfiles". For each | 
|  | 62 | of these types there is a layout-driver with a common function-vectors | 
|  | 63 | table which are called by the nfs-client pnfs-core to implement the | 
|  | 64 | different layout types. | 
|  | 65 |  | 
|  | 66 | Files-layout-driver code is in: fs/nfs/filelayout/.. directory | 
|  | 67 | Blocks-layout-driver code is in: fs/nfs/blocklayout/.. directory | 
|  | 68 | Flexfiles-layout-driver code is in: fs/nfs/flexfilelayout/.. directory | 
|  | 69 |  | 
|  | 70 | blocks-layout setup | 
|  | 71 | ------------------- | 
|  | 72 |  | 
|  | 73 | TODO: Document the setup needs of the blocks layout driver |