Candidates should be able to properly maintain a Linux filesystem using system utilities. This objective includes manipulating standard filesystems.
Key Knowledge Areas:
Tools and utilities to manipulate ext2, ext3 and ext4.
Tools and utilities to manipulate xfs.
BTRFS Filesystem awareness.
The following is a partial list of used files, terms, and utilities:
debugfs and debugreiserfs
tune2fs and reiserfstune
xfs_info , xfs_check and xfs_repair
smartmontools: smartd and smartctl
Good disk maintenance requires periodic disk checks. Your best
, and should be run at least
monthly. Default checks will normally be run after 20 system
reboots, but if your system stays up for weeks or months at a
time, you'll want to force a check from time to time. Your best
bet is performing routine system backups and checking your
directories from time to time.
The frequency of the checks at system reboot can be changed with tune2fs . This utility can also be used to change the mount count, which will prevent the system from having to check all filesystems at the 20th reboot (which can take a long time).
The dumpe2fs utility will provide important information regarding hard disk operating parameters found in the superblock, and badblocks will perform surface checking. Finally, surgical procedures to remove areas grown bad on the disk can be accomplished using debugfs .
fsck is called automatically at system startup. If the filesystem is marked “not clean” , or the maximum mount count is reached or the time between checks is exceeded, the filesystem is checked. To change the maximum mount count or the time between checks, use tune2fs .
Frequently used options to fsck include:
Walk through the
and try to check all filesystems in one run. This
option is typically used from the /etc/rc system
initialization file, instead of multiple commands for
checking a single filesystem.
The root filesystem will be checked first. After that,
filesystems will be checked in the order specified by
(the sixth) field
value of 0 are
skipped and are not checked at all. If there are
multiple filesystems with the same pass number, fsck
will attempt to check them in parallel, although it will
avoid running multiple filesystem checks on the same
Options which are not understood by fsck are passed to
the filesystem-specific checker. These arguments(=options) must not
arguments, as there is no way for fsck to be able to properly
guess which arguments take options and which don't. Options and
arguments which follow the
are treated as
filesystem-specific options to be passed to the filesystem-specific
The filesystem checker for the ext2 filesystem is called fsck.e2fs or e2fsck . Frequently used options include:
This option does the same thing as the -p option. It is provided for backwards compatibility only; it is suggested that people use -p option whenever possible.
This option causes e2fsck to write completion information to the specified file descriptor so that the progress of the filesystem check can be monitored. This option is typically used by programs which are running e2fsck. If the file descriptor specified is 0, e2fsck will print a completion bar as it goes about its business. This requires that e2fsck is running on a video console or terminal.
Open the filesystem read-only, and assume an answer of “no” to all questions. Allows e2fsck to be used non-interactively. (Note: if the -c, -l, or -L options are specified in addition to the -n option, then the filesystem will be opened read-write, to permit the bad-blocks list to be updated. However, no other changes will be made to the filesystem.)
command is used to create a Linux
filesystem. It can create different types of filesystems by
or by giving
is a front-end for the several
commands. Please read the mkfs man pages and section
for more information.
It is also possible to set the
to a specific value. This can be used to
'stagger' the mount counts of the different filesystems, which
ensures that at reboot not all filesystems will be checked at
the same time.
So for a system that contains 5 partitions and is booted approximately once a month you could do the following to stagger the mount counts:
tune2fs -c 5 -C 0
partition1tune2fs -c 5 -C 1
partition2tune2fs -c 5 -C 2
partition3tune2fs -c 5 -C 3
partition4tune2fs -c 5 -C 4
Frequently used options include:
Adjust the maximum mount count between two filesystem checks. If max-mount-counts is 0 then the number of times the filesystem is mounted will be disregarded by e2fsck(8) and the kernel. Staggering the mount-counts at which filesystems are forcibly checked will avoid all filesystems being checked at one time when using journalling filesystems.
You should strongly consider the consequences of disabling mount-count-dependent checking entirely. Bad disk drives, cables, memory and kernel bugs could all corrupt a filesystem without marking the filesystem dirty or in error. If you are using journalling on your filesystem, your filesystem will never be marked dirty, so it will not normally be checked. A filesystem error detected by the kernel will still force an fsck on the next reboot, but it may already be too late to prevent data loss at that point.
It is strongly recommended that either -c (mount-count-dependent) or -i (time-dependent) checking be enabled to force periodic full e2fsck(8) checking of the filesystem. Failure to do so may lead to filesystem corruption due to bad disks, cables or memory or kernel bugs to go unnoticed, until they cause data loss or corruption.
print the blocks which are reserved as bad in the filesystem.
only display the superblock information and not any of the block group descriptor detail information.
badblocks is a Linux utility to check for damaged sectors on a disk drive. It marks these sectors so that they are not used in the future and thus do not cause corruption of data. It is part of the e2fsprogs project.
It is strongly recommended that badblocks not be run directly but to have it invoked through the
. A commonly used option is:
write the list of bad blocks to
, you can modify the disk with
direct disk writes. Since this utility is so powerful, you
will normally want to invoke it as read-only until you are
ready to actually make changes and write them to the disk. To
in read-only mode, do not
use any options. To open in read-write mode, add the
option. You may also want to include
in the command line the device you wish to work on, as in
, etc. Once it is invoked, you
should see a debugfs prompt.
debugfs -b 1024 -s 8193 /dev/hda1
This means that the superblock at block 8193 will be used and the blocksize is 1024. Note that you have to specify the blocksize when you want to use a different superblock. The information about blocksize and backup superblocks can be found with:
The first command to try after invocation of debugfs , is params to show the mode (read-only or read-write), and the current file system. If you run this command without opening a filesystem, it will almost certainly dump core and exit. Two other commands, open and close , may be of interest when checking more than one filesystem. Close takes no argument, and appropriately enough, it closes the filesystem that is currently open. Open takes the device name as an argument. To see disk statistics from the superblock, the command stats will display the information by group. The command testb checks whether a block is in use. This can be used to test if any data is lost in the blocks marked as “ bad ” by the badblocks command. To get the filename for a block, first use the icheck command to get the inode and then ncheck to get the filename. The best course of action with bad blocks is to mark the block “bad ” and restore the file from backup.
To get a complete list of all commands, see the man page of debugfs or type ? , lr or list_requests .
Ext4 is the evolution of the most used Linux filesystem, Ext3. In many ways, Ext4 is a deeper improvement over Ext3 than Ext3 was over Ext2. Ext3 was mostly about adding journaling to Ext2, but Ext4 modifies important data structures of the filesystem such as the ones destined to store the file data. The result is a filesystem with an improved design, better performance, reliability, and features. Therefore converting ext3 to ext4 is not as straightforward and easy as it was converting ext2 to ext3.
To creating ext4 partitions from scratch, use:
Tip: See the mkfs.ext4 man page for more options; edit
view/configure default options.
Be aware that by default, mkfs.ext4 uses a rather low bytes-per-inode ratio to calculate the fixed amount of inodes to be created.
Note: Especially for contemporary HDDs (750 GB+) this usually results in a much too large inode number and thus many likely wasted GB. The ratio can be set directly via the -i option; one of 6291456 resulted in 476928 inodes for a 2 TB partition.
For the rest ext4 can be manipulated using all the same tools that are available for ext2/ext3 type of filesystems like badblocks, dumpe2fs, e2fsck and tune2fs.
Btrfs (abbreviation for: BTree File System)is a new copy on write (CoW) filesystem for Linux aimed at implementing advanced features while focusing on fault tolerance, repair and easy administration. Jointly developed at Oracle, Red Hat, Fujitsu, Intel, SUSE, STRATO and many others, Btrfs is licensed under the GPL and open for contribution from anyone. Btrfs has several features characteristic of a storage device. It is designed to make the file system tolerant of errors, and to facilitate the detection and repair of errors when they occur. It uses checksums to ensure the validity of data and metadata, and maintains snapshots of the file system that can be used for backup or repair. The core datastructure used by btrfs is the B-Tree - hence the name.
Btrfs is still under heavy development, but every effort is being made to keep the filesystem stable and fast. Because of the speed of development, you should run the latest kernel you can (either the latest release kernel from kernel.org, or the latest -rc kernel.
As of the beginning of the year 2013 Btrfs was included in the default kernel and its tools (btrfs-progs) are part of the default installation. GRUB 2, mkinitcpio, and Syslinux have support for Btrfs and require no additional configuration.
The main Btrfs features available at the moment include:
Extent based file storage
2^64 byte == 16 EiB maximum file size
Space-efficient packing of small files
Space-efficient indexed directories
Dynamic inode allocation
Writable snapshots, read-only snapshots
Subvolumes (separate internal filesystem roots)
Checksums on data and metadata (crc32c)
Compression (zlib and LZO)
Integrated multiple device support
File Striping, File Mirroring, File Striping+Mirroring, Striping with Single and Dual Parity implementations
SSD (Flash storage) awareness (TRIM/Discard for reporting free blocks for reuse) and optimizations (e.g. avoiding unnecessary seek optimizations, sending writes in clusters, even if they are from unrelated files. This results in larger write operations and faster write throughput)
Efficient Incremental Backup
Background scrub process for finding and fixing errors on files with redundant copies
Online filesystem defragmentation
Offline filesystem check
Conversion of existing ext3/4 file systems
Seed devices. Create a (readonly) filesystem that acts as a template to seed other Btrfs filesystems. The original filesystem and devices are included as a readonly starting point for the new filesystem. Using copy on write, all modifications are stored on different devices; the original is unchanged.
Subvolume-aware quota support
Send/receive of subvolume changes
Efficient incremental filesystem mirroring
The most notable (unique) btrfs features are:
Btrfs’s snapshotting is simple to use and understand. The snapshots will show up as normal directories under the snapshotted directory, and you can cd into it and walk around there as you would in any directory.
By default, all snapshots are writeable in Btrfs, but you can create read-only snapshots if you choose so. Read-only snapshots are great if you are just going to take a snapshot for a backup and then delete it once the backup completes. Writeable snapshots are handy because you can do things such as snapshot your file system before performing a system update; if the update breaks your system, you can reboot into the snapshot and use it like your normal file system. When you create a new Btrfs file system, the root directory is a subvolume. Snapshots can only be taken of subvolumes, because a subvolume is the representation of the root of a completely different filesystem tree, and you can only snapshot a filesystem tree.
The simplest way to think of this would be to create a subvolume for
/home, so you could snapshot
/home independently of each other. So you could run the
following command to create a subvolume:
btrfs subvolume create /home
And then at some point down the road when you need to snapshot
/home for a backup, you simply run:
btrfs subvolume snapshot /home/ /home-snap
Once you are done with your backup, you can delete the snapshot with the command
btrfs subvolume delete /home-snap/
The hard work of unlinking the snapshot tree is done in the background, so you may notice I/O happening on a seemingly idle box; this is just Btrfs cleaning up the old snapshot. If you have a lot of snapshots or don’t remember which directories you created as subvolumes, you can run the command:
# btrfs subvolume list /mnt/btrfs-test/ ID 267 top level 5 path home ID 268 top level 5 path snap-home ID 270 top level 5 path home/josef
This doesn’t differentiate between a snapshot and a normal subvolume, so you should probably name your snapshots consistently so that later on you can tell which is which.
A subvolume in btrfs is not the same as an LVM logical volume, or a ZFS subvolume. With LVM, a logical volume is a block device in its own right; this is not the case with btrfs. A btrfs subvolume is not a block device, and cannot be treated as one.
Instead, a btrfs subvolume can be thought of as a POSIX file namespace. This namespace can be accessed via the top-level subvolume of the filesystem, or it can be mounted in its own right. So, given a filesystem structure like this:
toplevel `--- dir_a * just a normal directory | `--- p | `--- q `--- subvol_z * a subvolume `--- r `--- s
the root of the filesystem can be mounted, and the full filesystem structure
will be seen at the mount point; alternatively the subvolume can be mounted
(with the mount option
and only the files
s will be
visible at the mount point.
A btrfs filesystem has a default subvolume, which is initially set to be the top-level subvolume. It is the default subvolume which is mounted if no subvol or subvolid option is passed to mount. Changing the default subvolume with btrfs subvolume default will make the top level of the filesystem inaccessible, except by use of the subvolid=0 mount option.
Directories and files look the same on disk in Btrfs, which is consistent with the UNIX way of doing things. The ext file system variants have to pre-allocate their inode space when making the file system, so you are limited to the number of files you can create once you create the file system.
With Btrfs we add a couple of items to the B-tree when you create a new file, which limits you only by the amount of metadata space you have in your file system. If you have ever created thousands of files in a directory on an ext file system and then deleted the files, you may have noticed that doing an ls on the directory would take much longer than you’d expect given that there may only be a few files in the directory.
You may have even had to run this command:
e2fsck -D /dev/sda1
to re-optimize your directories in ext. This is due to a flaw in how the directory indexes are stored in ext: they cannot be shrunk. So once you add thousands of files and the internal directory index tree grows to a large size, it will not shrink back down as you remove files. This is not the case with Btrfs.
In Btrfs we store a file index next to the directory inode within the file system B-tree. The B-tree will grow and shrink as necessary, so if you create a billion files in a directory and then remove all of them, an ls will take only as long as if you had just created the directory.
Btrfs also has an index for each file that is based on the name of the file. This is handy because instead of having to search through the containing directory’s file index for a match, we simply hash the name of the file and search the B-tree for this hash value. This item is stored next to the inode item of the file, so looking up the name will usually read in the same block that contains all of the important information you need. Again, this limits the amount of I/O that needs to be done to accomplish basic tasks.
mkswap sets up a Linux swap area on a device or in a file.
(After creating the swap area, you need to invoke the swapon command to
start using it. Usually swap areas are listed in
so that they can be taken into use at boot time by a swapon
command in some boot script.) See Swap file usage
xfs_info shows the filesystem geometry for an XFS filesystem. xfs_info is equivalent to invoking xfs_growfs with the -n option.
xfs_check checks whether an XFS filesystem is consistent. It is needed only when there is reason to believe that the filesystem has a consistency problem. Since XFS is a Journalling filesystem, which allows it to retain filesystem consistency, there should be little need to ever run xfs_check.
xfs_repair repairs corrupt or damaged XFS filesystems. xfs_repair will attempt to find the raw device associated with the specified block device and will use the raw device instead. Regardless, the filesystem to be repaired must be unmounted, otherwise, the resulting filesystem may become inconsistent or corrupt.
Two utility programs, smartctl and smartd (available when the smartmontools package is installed) can be used to monitor and control storage systems using the Self-Monitoring, Analysis and Reporting Technology System (SMART). SMARTis built into most modern ATA and SCSI harddisks and solid-state drives. The purpose of SMARTis to monitor the reliability of the hard drive and predict drive failures, and to carry out different types of drive self-tests.
smartdis a daemon that will attempt to enable SMARTmonitoring on ATA devices and polls these and SCSI devices every 30 minutes (configurable), logging SMARTerrors and changes of SMARTAttributes via the SYSLOG interface. smartd can also be configured to send email warnings if problems are detected. Depending upon the type of problem, you may want to run self-tests on the disk, back up the disk, replace the disk, or use a manufacturer's utility to force reallocation of bad or unreadable disk sectors.
smartd can be configured at start-up using the configuration file /usr/local/etc/smartd.conf. When the USR1 signal is sent to smartd it will immediately check the status of the disks, and then return to polling the disks every 30 minutes. Please consult the manual page for smartd for specific configuration options.
The smartctl utility controls the SMARTsystem. It can be used to scan devices and print info about them, to enable or disable SMARTon specific disks, to configure what to do when (imminent) errors are detected. Please consult the smartctl manual page for details.