|  | Introduction | 
|  | ============ | 
|  |  | 
|  | The more-sophisticated device-mapper targets require complex metadata | 
|  | that is managed in kernel.  In late 2010 we were seeing that various | 
|  | different targets were rolling their own data structures, for example: | 
|  |  | 
|  | - Mikulas Patocka's multisnap implementation | 
|  | - Heinz Mauelshagen's thin provisioning target | 
|  | - Another btree-based caching target posted to dm-devel | 
|  | - Another multi-snapshot target based on a design of Daniel Phillips | 
|  |  | 
|  | Maintaining these data structures takes a lot of work, so if possible | 
|  | we'd like to reduce the number. | 
|  |  | 
|  | The persistent-data library is an attempt to provide a re-usable | 
|  | framework for people who want to store metadata in device-mapper | 
|  | targets.  It's currently used by the thin-provisioning target and an | 
|  | upcoming hierarchical storage target. | 
|  |  | 
|  | Overview | 
|  | ======== | 
|  |  | 
|  | The main documentation is in the header files which can all be found | 
|  | under drivers/md/persistent-data. | 
|  |  | 
|  | The block manager | 
|  | ----------------- | 
|  |  | 
|  | dm-block-manager.[hc] | 
|  |  | 
|  | This provides access to the data on disk in fixed sized-blocks.  There | 
|  | is a read/write locking interface to prevent concurrent accesses, and | 
|  | keep data that is being used in the cache. | 
|  |  | 
|  | Clients of persistent-data are unlikely to use this directly. | 
|  |  | 
|  | The transaction manager | 
|  | ----------------------- | 
|  |  | 
|  | dm-transaction-manager.[hc] | 
|  |  | 
|  | This restricts access to blocks and enforces copy-on-write semantics. | 
|  | The only way you can get hold of a writable block through the | 
|  | transaction manager is by shadowing an existing block (ie. doing | 
|  | copy-on-write) or allocating a fresh one.  Shadowing is elided within | 
|  | the same transaction so performance is reasonable.  The commit method | 
|  | ensures that all data is flushed before it writes the superblock. | 
|  | On power failure your metadata will be as it was when last committed. | 
|  |  | 
|  | The Space Maps | 
|  | -------------- | 
|  |  | 
|  | dm-space-map.h | 
|  | dm-space-map-metadata.[hc] | 
|  | dm-space-map-disk.[hc] | 
|  |  | 
|  | On-disk data structures that keep track of reference counts of blocks. | 
|  | Also acts as the allocator of new blocks.  Currently two | 
|  | implementations: a simpler one for managing blocks on a different | 
|  | device (eg. thinly-provisioned data blocks); and one for managing | 
|  | the metadata space.  The latter is complicated by the need to store | 
|  | its own data within the space it's managing. | 
|  |  | 
|  | The data structures | 
|  | ------------------- | 
|  |  | 
|  | dm-btree.[hc] | 
|  | dm-btree-remove.c | 
|  | dm-btree-spine.c | 
|  | dm-btree-internal.h | 
|  |  | 
|  | Currently there is only one data structure, a hierarchical btree. | 
|  | There are plans to add more.  For example, something with an | 
|  | array-like interface would see a lot of use. | 
|  |  | 
|  | The btree is 'hierarchical' in that you can define it to be composed | 
|  | of nested btrees, and take multiple keys.  For example, the | 
|  | thin-provisioning target uses a btree with two levels of nesting. | 
|  | The first maps a device id to a mapping tree, and that in turn maps a | 
|  | virtual block to a physical block. | 
|  |  | 
|  | Values stored in the btrees can have arbitrary size.  Keys are always | 
|  | 64bits, although nesting allows you to use multiple keys. |