| xj | b04a402 | 2021-11-25 15:01:52 +0800 | [diff] [blame^] | 1 | 		    OCFS2 online file check | 
 | 2 | 		    ----------------------- | 
 | 3 |  | 
 | 4 | This document will describe OCFS2 online file check feature. | 
 | 5 |  | 
 | 6 | Introduction | 
 | 7 | ============ | 
 | 8 | OCFS2 is often used in high-availability systems. However, OCFS2 usually | 
 | 9 | converts the filesystem to read-only when encounters an error. This may not be | 
 | 10 | necessary, since turning the filesystem read-only would affect other running | 
 | 11 | processes as well, decreasing availability. | 
 | 12 | Then, a mount option (errors=continue) is introduced, which would return the | 
 | 13 | -EIO errno to the calling process and terminate further processing so that the | 
 | 14 | filesystem is not corrupted further. The filesystem is not converted to | 
 | 15 | read-only, and the problematic file's inode number is reported in the kernel | 
 | 16 | log. The user can try to check/fix this file via online filecheck feature. | 
 | 17 |  | 
 | 18 | Scope | 
 | 19 | ===== | 
 | 20 | This effort is to check/fix small issues which may hinder day-to-day operations | 
 | 21 | of a cluster filesystem by turning the filesystem read-only. The scope of | 
 | 22 | checking/fixing is at the file level, initially for regular files and eventually | 
 | 23 | to all files (including system files) of the filesystem. | 
 | 24 |  | 
 | 25 | In case of directory to file links is incorrect, the directory inode is | 
 | 26 | reported as erroneous. | 
 | 27 |  | 
 | 28 | This feature is not suited for extravagant checks which involve dependency of | 
 | 29 | other components of the filesystem, such as but not limited to, checking if the | 
 | 30 | bits for file blocks in the allocation has been set. In case of such an error, | 
 | 31 | the offline fsck should/would be recommended. | 
 | 32 |  | 
 | 33 | Finally, such an operation/feature should not be automated lest the filesystem | 
 | 34 | may end up with more damage than before the repair attempt. So, this has to | 
 | 35 | be performed using user interaction and consent. | 
 | 36 |  | 
 | 37 | User interface | 
 | 38 | ============== | 
 | 39 | When there are errors in the OCFS2 filesystem, they are usually accompanied | 
 | 40 | by the inode number which caused the error. This inode number would be the | 
 | 41 | input to check/fix the file. | 
 | 42 |  | 
 | 43 | There is a sysfs directory for each OCFS2 file system mounting: | 
 | 44 |  | 
 | 45 |   /sys/fs/ocfs2/<devname>/filecheck | 
 | 46 |  | 
 | 47 | Here, <devname> indicates the name of OCFS2 volume device which has been already | 
 | 48 | mounted. The file above would accept inode numbers. This could be used to | 
 | 49 | communicate with kernel space, tell which file(inode number) will be checked or | 
 | 50 | fixed. Currently, three operations are supported, which includes checking | 
 | 51 | inode, fixing inode and setting the size of result record history. | 
 | 52 |  | 
 | 53 | 1. If you want to know what error exactly happened to <inode> before fixing, do | 
 | 54 |  | 
 | 55 |   # echo "<inode>" > /sys/fs/ocfs2/<devname>/filecheck/check | 
 | 56 |   # cat /sys/fs/ocfs2/<devname>/filecheck/check | 
 | 57 |  | 
 | 58 | The output is like this: | 
 | 59 |   INO		DONE	ERROR | 
 | 60 | 39502		1	GENERATION | 
 | 61 |  | 
 | 62 | <INO> lists the inode numbers. | 
 | 63 | <DONE> indicates whether the operation has been finished. | 
 | 64 | <ERROR> says what kind of errors was found. For the detailed error numbers, | 
 | 65 | please refer to the file linux/fs/ocfs2/filecheck.h. | 
 | 66 |  | 
 | 67 | 2. If you determine to fix this inode, do | 
 | 68 |  | 
 | 69 |   # echo "<inode>" > /sys/fs/ocfs2/<devname>/filecheck/fix | 
 | 70 |   # cat /sys/fs/ocfs2/<devname>/filecheck/fix | 
 | 71 |  | 
 | 72 | The output is like this: | 
 | 73 |   INO		DONE	ERROR | 
 | 74 | 39502		1	SUCCESS | 
 | 75 |  | 
 | 76 | This time, the <ERROR> column indicates whether this fix is successful or not. | 
 | 77 |  | 
 | 78 | 3. The record cache is used to store the history of check/fix results. It's | 
 | 79 | default size is 10, and can be adjust between the range of 10 ~ 100. You can | 
 | 80 | adjust the size like this: | 
 | 81 |  | 
 | 82 |   # echo "<size>" > /sys/fs/ocfs2/<devname>/filecheck/set | 
 | 83 |  | 
 | 84 | Fixing stuff | 
 | 85 | ============ | 
 | 86 | On receiving the inode, the filesystem would read the inode and the | 
 | 87 | file metadata. In case of errors, the filesystem would fix the errors | 
 | 88 | and report the problems it fixed in the kernel log. As a precautionary measure, | 
 | 89 | the inode must first be checked for errors before performing a final fix. | 
 | 90 |  | 
 | 91 | The inode and the result history will be maintained temporarily in a | 
 | 92 | small linked list buffer which would contain the last (N) inodes | 
 | 93 | fixed/checked, the detailed errors which were fixed/checked are printed in the | 
 | 94 | kernel log. |