| b.liu | e958203 | 2025-04-17 19:18:16 +0800 | [diff] [blame^] | 1 | ============ |
| 2 | dm-integrity |
| 3 | ============ |
| 4 | |
| 5 | The dm-integrity target emulates a block device that has additional |
| 6 | per-sector tags that can be used for storing integrity information. |
| 7 | |
| 8 | A general problem with storing integrity tags with every sector is that |
| 9 | writing the sector and the integrity tag must be atomic - i.e. in case of |
| 10 | crash, either both sector and integrity tag or none of them is written. |
| 11 | |
| 12 | To guarantee write atomicity, the dm-integrity target uses journal, it |
| 13 | writes sector data and integrity tags into a journal, commits the journal |
| 14 | and then copies the data and integrity tags to their respective location. |
| 15 | |
| 16 | The dm-integrity target can be used with the dm-crypt target - in this |
| 17 | situation the dm-crypt target creates the integrity data and passes them |
| 18 | to the dm-integrity target via bio_integrity_payload attached to the bio. |
| 19 | In this mode, the dm-crypt and dm-integrity targets provide authenticated |
| 20 | disk encryption - if the attacker modifies the encrypted device, an I/O |
| 21 | error is returned instead of random data. |
| 22 | |
| 23 | The dm-integrity target can also be used as a standalone target, in this |
| 24 | mode it calculates and verifies the integrity tag internally. In this |
| 25 | mode, the dm-integrity target can be used to detect silent data |
| 26 | corruption on the disk or in the I/O path. |
| 27 | |
| 28 | There's an alternate mode of operation where dm-integrity uses bitmap |
| 29 | instead of a journal. If a bit in the bitmap is 1, the corresponding |
| 30 | region's data and integrity tags are not synchronized - if the machine |
| 31 | crashes, the unsynchronized regions will be recalculated. The bitmap mode |
| 32 | is faster than the journal mode, because we don't have to write the data |
| 33 | twice, but it is also less reliable, because if data corruption happens |
| 34 | when the machine crashes, it may not be detected. |
| 35 | |
| 36 | When loading the target for the first time, the kernel driver will format |
| 37 | the device. But it will only format the device if the superblock contains |
| 38 | zeroes. If the superblock is neither valid nor zeroed, the dm-integrity |
| 39 | target can't be loaded. |
| 40 | |
| 41 | To use the target for the first time: |
| 42 | |
| 43 | 1. overwrite the superblock with zeroes |
| 44 | 2. load the dm-integrity target with one-sector size, the kernel driver |
| 45 | will format the device |
| 46 | 3. unload the dm-integrity target |
| 47 | 4. read the "provided_data_sectors" value from the superblock |
| 48 | 5. load the dm-integrity target with the the target size |
| 49 | "provided_data_sectors" |
| 50 | 6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target |
| 51 | with the size "provided_data_sectors" |
| 52 | |
| 53 | |
| 54 | Target arguments: |
| 55 | |
| 56 | 1. the underlying block device |
| 57 | |
| 58 | 2. the number of reserved sector at the beginning of the device - the |
| 59 | dm-integrity won't read of write these sectors |
| 60 | |
| 61 | 3. the size of the integrity tag (if "-" is used, the size is taken from |
| 62 | the internal-hash algorithm) |
| 63 | |
| 64 | 4. mode: |
| 65 | |
| 66 | D - direct writes (without journal) |
| 67 | in this mode, journaling is |
| 68 | not used and data sectors and integrity tags are written |
| 69 | separately. In case of crash, it is possible that the data |
| 70 | and integrity tag doesn't match. |
| 71 | J - journaled writes |
| 72 | data and integrity tags are written to the |
| 73 | journal and atomicity is guaranteed. In case of crash, |
| 74 | either both data and tag or none of them are written. The |
| 75 | journaled mode degrades write throughput twice because the |
| 76 | data have to be written twice. |
| 77 | B - bitmap mode - data and metadata are written without any |
| 78 | synchronization, the driver maintains a bitmap of dirty |
| 79 | regions where data and metadata don't match. This mode can |
| 80 | only be used with internal hash. |
| 81 | R - recovery mode - in this mode, journal is not replayed, |
| 82 | checksums are not checked and writes to the device are not |
| 83 | allowed. This mode is useful for data recovery if the |
| 84 | device cannot be activated in any of the other standard |
| 85 | modes. |
| 86 | |
| 87 | 5. the number of additional arguments |
| 88 | |
| 89 | Additional arguments: |
| 90 | |
| 91 | journal_sectors:number |
| 92 | The size of journal, this argument is used only if formatting the |
| 93 | device. If the device is already formatted, the value from the |
| 94 | superblock is used. |
| 95 | |
| 96 | interleave_sectors:number |
| 97 | The number of interleaved sectors. This values is rounded down to |
| 98 | a power of two. If the device is already formatted, the value from |
| 99 | the superblock is used. |
| 100 | |
| 101 | meta_device:device |
| 102 | Don't interleave the data and metadata on on device. Use a |
| 103 | separate device for metadata. |
| 104 | |
| 105 | buffer_sectors:number |
| 106 | The number of sectors in one buffer. The value is rounded down to |
| 107 | a power of two. |
| 108 | |
| 109 | The tag area is accessed using buffers, the buffer size is |
| 110 | configurable. The large buffer size means that the I/O size will |
| 111 | be larger, but there could be less I/Os issued. |
| 112 | |
| 113 | journal_watermark:number |
| 114 | The journal watermark in percents. When the size of the journal |
| 115 | exceeds this watermark, the thread that flushes the journal will |
| 116 | be started. |
| 117 | |
| 118 | commit_time:number |
| 119 | Commit time in milliseconds. When this time passes, the journal is |
| 120 | written. The journal is also written immediatelly if the FLUSH |
| 121 | request is received. |
| 122 | |
| 123 | internal_hash:algorithm(:key) (the key is optional) |
| 124 | Use internal hash or crc. |
| 125 | When this argument is used, the dm-integrity target won't accept |
| 126 | integrity tags from the upper target, but it will automatically |
| 127 | generate and verify the integrity tags. |
| 128 | |
| 129 | You can use a crc algorithm (such as crc32), then integrity target |
| 130 | will protect the data against accidental corruption. |
| 131 | You can also use a hmac algorithm (for example |
| 132 | "hmac(sha256):0123456789abcdef"), in this mode it will provide |
| 133 | cryptographic authentication of the data without encryption. |
| 134 | |
| 135 | When this argument is not used, the integrity tags are accepted |
| 136 | from an upper layer target, such as dm-crypt. The upper layer |
| 137 | target should check the validity of the integrity tags. |
| 138 | |
| 139 | recalculate |
| 140 | Recalculate the integrity tags automatically. It is only valid |
| 141 | when using internal hash. |
| 142 | |
| 143 | journal_crypt:algorithm(:key) (the key is optional) |
| 144 | Encrypt the journal using given algorithm to make sure that the |
| 145 | attacker can't read the journal. You can use a block cipher here |
| 146 | (such as "cbc(aes)") or a stream cipher (for example "chacha20", |
| 147 | "salsa20", "ctr(aes)" or "ecb(arc4)"). |
| 148 | |
| 149 | The journal contains history of last writes to the block device, |
| 150 | an attacker reading the journal could see the last sector nubmers |
| 151 | that were written. From the sector numbers, the attacker can infer |
| 152 | the size of files that were written. To protect against this |
| 153 | situation, you can encrypt the journal. |
| 154 | |
| 155 | journal_mac:algorithm(:key) (the key is optional) |
| 156 | Protect sector numbers in the journal from accidental or malicious |
| 157 | modification. To protect against accidental modification, use a |
| 158 | crc algorithm, to protect against malicious modification, use a |
| 159 | hmac algorithm with a key. |
| 160 | |
| 161 | This option is not needed when using internal-hash because in this |
| 162 | mode, the integrity of journal entries is checked when replaying |
| 163 | the journal. Thus, modified sector number would be detected at |
| 164 | this stage. |
| 165 | |
| 166 | block_size:number |
| 167 | The size of a data block in bytes. The larger the block size the |
| 168 | less overhead there is for per-block integrity metadata. |
| 169 | Supported values are 512, 1024, 2048 and 4096 bytes. If not |
| 170 | specified the default block size is 512 bytes. |
| 171 | |
| 172 | sectors_per_bit:number |
| 173 | In the bitmap mode, this parameter specifies the number of |
| 174 | 512-byte sectors that corresponds to one bitmap bit. |
| 175 | |
| 176 | bitmap_flush_interval:number |
| 177 | The bitmap flush interval in milliseconds. The metadata buffers |
| 178 | are synchronized when this interval expires. |
| 179 | |
| 180 | legacy_recalculate |
| 181 | Allow recalculating of volumes with HMAC keys. This is disabled by |
| 182 | default for security reasons - an attacker could modify the volume, |
| 183 | set recalc_sector to zero, and the kernel would not detect the |
| 184 | modification. |
| 185 | |
| 186 | |
| 187 | The journal mode (D/J), buffer_sectors, journal_watermark, commit_time can |
| 188 | be changed when reloading the target (load an inactive table and swap the |
| 189 | tables with suspend and resume). The other arguments should not be changed |
| 190 | when reloading the target because the layout of disk data depend on them |
| 191 | and the reloaded target would be non-functional. |
| 192 | |
| 193 | |
| 194 | The layout of the formatted block device: |
| 195 | |
| 196 | * reserved sectors |
| 197 | (they are not used by this target, they can be used for |
| 198 | storing LUKS metadata or for other purpose), the size of the reserved |
| 199 | area is specified in the target arguments |
| 200 | |
| 201 | * superblock (4kiB) |
| 202 | * magic string - identifies that the device was formatted |
| 203 | * version |
| 204 | * log2(interleave sectors) |
| 205 | * integrity tag size |
| 206 | * the number of journal sections |
| 207 | * provided data sectors - the number of sectors that this target |
| 208 | provides (i.e. the size of the device minus the size of all |
| 209 | metadata and padding). The user of this target should not send |
| 210 | bios that access data beyond the "provided data sectors" limit. |
| 211 | * flags |
| 212 | SB_FLAG_HAVE_JOURNAL_MAC |
| 213 | - a flag is set if journal_mac is used |
| 214 | SB_FLAG_RECALCULATING |
| 215 | - recalculating is in progress |
| 216 | SB_FLAG_DIRTY_BITMAP |
| 217 | - journal area contains the bitmap of dirty |
| 218 | blocks |
| 219 | * log2(sectors per block) |
| 220 | * a position where recalculating finished |
| 221 | * journal |
| 222 | The journal is divided into sections, each section contains: |
| 223 | |
| 224 | * metadata area (4kiB), it contains journal entries |
| 225 | |
| 226 | - every journal entry contains: |
| 227 | |
| 228 | * logical sector (specifies where the data and tag should |
| 229 | be written) |
| 230 | * last 8 bytes of data |
| 231 | * integrity tag (the size is specified in the superblock) |
| 232 | |
| 233 | - every metadata sector ends with |
| 234 | |
| 235 | * mac (8-bytes), all the macs in 8 metadata sectors form a |
| 236 | 64-byte value. It is used to store hmac of sector |
| 237 | numbers in the journal section, to protect against a |
| 238 | possibility that the attacker tampers with sector |
| 239 | numbers in the journal. |
| 240 | * commit id |
| 241 | |
| 242 | * data area (the size is variable; it depends on how many journal |
| 243 | entries fit into the metadata area) |
| 244 | |
| 245 | - every sector in the data area contains: |
| 246 | |
| 247 | * data (504 bytes of data, the last 8 bytes are stored in |
| 248 | the journal entry) |
| 249 | * commit id |
| 250 | |
| 251 | To test if the whole journal section was written correctly, every |
| 252 | 512-byte sector of the journal ends with 8-byte commit id. If the |
| 253 | commit id matches on all sectors in a journal section, then it is |
| 254 | assumed that the section was written correctly. If the commit id |
| 255 | doesn't match, the section was written partially and it should not |
| 256 | be replayed. |
| 257 | |
| 258 | * one or more runs of interleaved tags and data. |
| 259 | Each run contains: |
| 260 | |
| 261 | * tag area - it contains integrity tags. There is one tag for each |
| 262 | sector in the data area |
| 263 | * data area - it contains data sectors. The number of data sectors |
| 264 | in one run must be a power of two. log2 of this value is stored |
| 265 | in the superblock. |