| dm-log-writes | 
 | ============= | 
 |  | 
 | This target takes 2 devices, one to pass all IO to normally, and one to log all | 
 | of the write operations to.  This is intended for file system developers wishing | 
 | to verify the integrity of metadata or data as the file system is written to. | 
 | There is a log_write_entry written for every WRITE request and the target is | 
 | able to take arbitrary data from userspace to insert into the log.  The data | 
 | that is in the WRITE requests is copied into the log to make the replay happen | 
 | exactly as it happened originally. | 
 |  | 
 | Log Ordering | 
 | ============ | 
 |  | 
 | We log things in order of completion once we are sure the write is no longer in | 
 | cache.  This means that normal WRITE requests are not actually logged until the | 
 | next REQ_PREFLUSH request.  This is to make it easier for userspace to replay | 
 | the log in a way that correlates to what is on disk and not what is in cache, | 
 | to make it easier to detect improper waiting/flushing. | 
 |  | 
 | This works by attaching all WRITE requests to a list once the write completes. | 
 | Once we see a REQ_PREFLUSH request we splice this list onto the request and once | 
 | the FLUSH request completes we log all of the WRITEs and then the FLUSH.  Only | 
 | completed WRITEs, at the time the REQ_PREFLUSH is issued, are added in order to | 
 | simulate the worst case scenario with regard to power failures.  Consider the | 
 | following example (W means write, C means complete): | 
 |  | 
 | W1,W2,W3,C3,C2,Wflush,C1,Cflush | 
 |  | 
 | The log would show the following | 
 |  | 
 | W3,W2,flush,W1.... | 
 |  | 
 | Again this is to simulate what is actually on disk, this allows us to detect | 
 | cases where a power failure at a particular point in time would create an | 
 | inconsistent file system. | 
 |  | 
 | Any REQ_FUA requests bypass this flushing mechanism and are logged as soon as | 
 | they complete as those requests will obviously bypass the device cache. | 
 |  | 
 | Any REQ_DISCARD requests are treated like WRITE requests.  Otherwise we would | 
 | have all the DISCARD requests, and then the WRITE requests and then the FLUSH | 
 | request.  Consider the following example: | 
 |  | 
 | WRITE block 1, DISCARD block 1, FLUSH | 
 |  | 
 | If we logged DISCARD when it completed, the replay would look like this | 
 |  | 
 | DISCARD 1, WRITE 1, FLUSH | 
 |  | 
 | which isn't quite what happened and wouldn't be caught during the log replay. | 
 |  | 
 | Target interface | 
 | ================ | 
 |  | 
 | i) Constructor | 
 |  | 
 |    log-writes <dev_path> <log_dev_path> | 
 |  | 
 |    dev_path	: Device that all of the IO will go to normally. | 
 |    log_dev_path : Device where the log entries are written to. | 
 |  | 
 | ii) Status | 
 |  | 
 |     <#logged entries> <highest allocated sector> | 
 |  | 
 |     #logged entries	       : Number of logged entries | 
 |     highest allocated sector   : Highest allocated sector | 
 |  | 
 | iii) Messages | 
 |  | 
 |     mark <description> | 
 |  | 
 | 	You can use a dmsetup message to set an arbitrary mark in a log. | 
 | 	For example say you want to fsck a file system after every | 
 | 	write, but first you need to replay up to the mkfs to make sure | 
 | 	we're fsck'ing something reasonable, you would do something like | 
 | 	this: | 
 |  | 
 | 	  mkfs.btrfs -f /dev/mapper/log | 
 | 	  dmsetup message log 0 mark mkfs | 
 | 	  <run test> | 
 |  | 
 | 	  This would allow you to replay the log up to the mkfs mark and | 
 | 	  then replay from that point on doing the fsck check in the | 
 | 	  interval that you want. | 
 |  | 
 | 	Every log has a mark at the end labeled "dm-log-writes-end". | 
 |  | 
 | Userspace component | 
 | =================== | 
 |  | 
 | There is a userspace tool that will replay the log for you in various ways. | 
 | It can be found here: https://github.com/josefbacik/log-writes | 
 |  | 
 | Example usage | 
 | ============= | 
 |  | 
 | Say you want to test fsync on your file system.  You would do something like | 
 | this: | 
 |  | 
 | TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" | 
 | dmsetup create log --table "$TABLE" | 
 | mkfs.btrfs -f /dev/mapper/log | 
 | dmsetup message log 0 mark mkfs | 
 |  | 
 | mount /dev/mapper/log /mnt/btrfs-test | 
 | <some test that does fsync at the end> | 
 | dmsetup message log 0 mark fsync | 
 | md5sum /mnt/btrfs-test/foo | 
 | umount /mnt/btrfs-test | 
 |  | 
 | dmsetup remove log | 
 | replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync | 
 | mount /dev/sdb /mnt/btrfs-test | 
 | md5sum /mnt/btrfs-test/foo | 
 | <verify md5sum's are correct> | 
 |  | 
 | Another option is to do a complicated file system operation and verify the file | 
 | system is consistent during the entire operation.  You could do this with: | 
 |  | 
 | TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" | 
 | dmsetup create log --table "$TABLE" | 
 | mkfs.btrfs -f /dev/mapper/log | 
 | dmsetup message log 0 mark mkfs | 
 |  | 
 | mount /dev/mapper/log /mnt/btrfs-test | 
 | <fsstress to dirty the fs> | 
 | btrfs filesystem balance /mnt/btrfs-test | 
 | umount /mnt/btrfs-test | 
 | dmsetup remove log | 
 |  | 
 | replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs | 
 | btrfsck /dev/sdb | 
 | replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \ | 
 | 	--fsck "btrfsck /dev/sdb" --check fua | 
 |  | 
 | And that will replay the log until it sees a FUA request, run the fsck command | 
 | and if the fsck passes it will replay to the next FUA, until it is completed or | 
 | the fsck command exists abnormally. |