System recovery (202.2)

Candidates should be able to properly manipulate a Linux system during both the boot process and during recovery mode. This objective includes using both the init utility and init-related kernel options.

Key Knowledge Areas:

The following is a partial list of used files, terms, and utilities:

GRUB explained

The boot loader loads the operating system kernel and transfers control to it.

GRUB understands filesystems and kernel executable formats.

GRUB is able to boot many operating systems, both free and proprietary ones. Open operating systems, like FreeBSD, NetBSD, OpenBSD, and Linux, are supported by GRUB directly. Proprietary kernels (e.g. DOS, Windows and OS/2) are supported using GRUB's chain-loading function. Chain-loading implies that GRUB will be used to boot the system, and in turn will load and run the proprietary systems bootloader, which then boots the operating system.

GRUB features both a menu interface and a command-line interface. The command-line interface allows you to execute commands to select a root device (root command), load a kernel from it (kernel command), if necessary load some additional kernel modules (module or modulenounzip command) and subsequently boot the kernel (boot command).

The menu interface offers a method for sequential execution of command line commands. While booting, both interfaces are available. On boot the menu is displayed, and you can simply choose one of the menu entries. Upon choosing such an entry a number of preconfigured commands will be executed. You can also gain access to the CLI interface and manually specify the various parameters.

GRUB additionally allows on the fly editing of the menu entries. The commands for the menu-entries are listed in the file /boot/grub/menu.lst. For compatibility reasons this file is often a link to /boot/grub/grub.conf. Because GRUB is capable of accessing the systems filename directly, any change in that file is reflected immediately. GRUB users simply choose the proper kernel (which can be found by GRUB itself, since it understands most common file systems), boot the system and correct the problem.

To install and emulate the bootloader, a GRUB shell is available. This shell emulates the boot loader and can be used to install the boot loader. It also comes in handy to inspect your current set up and modify it. To start it up (as root) simply type grub. In the following example we display the help screen:

# grub
grub > help
grub> help 
blocklist FILE                         boot
cat FILE                               chainloader [--force] FILE
color NORMAL [HIGHLIGHT]               configfile FILE
device DRIVE DEVICE                    displayapm
displaymem                             find FILENAME
geometry DRIVE [CYLINDER HEAD SECTOR [ halt [--no-apm]
help [--all] [PATTERN ...]             hide PARTITION
initrd FILE [ARG ...]                  kernel [--no-mem-option] [--type=TYPE]
makeactive                             map TO_DRIVE FROM_DRIVE
md5crypt                               module FILE [ARG ...]
modulenounzip FILE [ARG ...]           pager [FLAG]
partnew PART TYPE START LEN            parttype PART TYPE
quit                                   reboot
root [DEVICE [HDBIAS]]                 rootnoverify [DEVICE [HDBIAS]]
serial [--unit=UNIT] [--port=PORT] [-- setkey [TO_KEY FROM_KEY]
setup [--prefix=DIR] [--stage2=STAGE2_ terminal [--dumb] [--timeout=SECS] [--
testvbe MODE                           unhide PARTITION
uppermem KBYTES                        vbeprobe [MODE]

grub >_

We already discussed the root, kernel, module and modulenounzip commands briefly. GRUB has many commands to assist engineers with their work, for example the blocklist command, which can be used to find out on which disk blocks a file is stored, or the geometry command, which can be used to find out the disk geometry. You can create new (primary) partitions using the partnew command, load an initrd image using the initrd command, and many more. All options are described in the GRUB documentation. GRUB is part of the GNU software library and as such is documented using the info system. On most systems there is a limited man page available as well.

GRUB uses its own syntax to describe hard disks. Device names need to be enclosed in brackets, e.g

(fd0) 

denotes the floppy disk, and

(hd0,1)

denotes the second partition on the first hard disk. Disks and partitions are counted starting at zero, so the last example references the first disk and the second partition.

GRUB uses the computer BIOS to find out which hard drives are available. But it can not always figure out the relation between Linux device filenames and the BIOS drives. The special file /boot/grub/device.map can be created to map these, e.g.:

(fd0)  /dev/fd0
(hd0)  /dev/hda

Note that when you are using software RAID-1 (mirroring), you need to set up GRUB on both disks. Upon boot, the system will not be able to use the software RAID system yet, so booting can only be done from one disk. If you only set up GRUB on the first disk and that disk would be damaged, the system would not be able to boot.

The initial boot process , upon boot, the BIOS accesses the initial sector of the hard disk, the so-called MBR (Master Boot Record), loads the data found there in memory and transfers execution to it. If GRUB is used, the MBR contains a copy of the first stage of GRUB, which tries to load stage 2.

To be able to load stage 2, GRUB needs to have access to code to handle the filesystem(s). There are many filesystem types and the code to handle them will not fit within the 512 byte MBR, even less so since the MBR also contains the partitioning table. The GRUB parts that deal with filesystems are therefore stored in the so-called DOS compatibility region. That region consists of sectors on the same cylinder where the MBR resides (cylinder 0). In the old days, when disks were adressed using the CHS (Cylinder/Head/Sector) specification, the MBR typically would load DOS. DOS requires that its image is on the same cylinder. Therefore, by tradition, the first cylinder on a disk is reserved and it is this space that GRUB uses to store the filesystem code. That section is referred to as stage 1.5.

In Linux, the grub-install command is used to install stage 1 to either the MBR or within a partition. GRUB's configuration file, by default named stage2 and other files must be in a usable partition. If the files become unavailable stage 1 will present the end user with a command line interface.

Stage 2 contains most of the boot-logic. It presents a (graphical) menu to the end-user and an additional command prompt, where the user can manually specify boot-parameters. GRUB is typically configured to automatically load a particular kernel after a timeout period. Once the end-user made his/her selection, GRUB loads the selected kernel into memory and passes control on to the kernel. At this stage GRUB can pass control of the boot process to another loader using chain loading if required by the operating system.

Influencing the regular boot process

The regular boot process is the process that normally takes place when (re)booting the system. This process can be influenced by entering something at the GRUB prompt. What can be influenced will be discussed in the following sections, but first we must activate the prompt.

Choosing another kernel

If you've just compiled a new kernel and you're experiencing difficulties with the new kernel, chances are that you'd like to revert to the old kernel.

For GRUB, once you see the boot screen, use the cursor keys to select the kernel you'd like to boot, and press Enter to boot it.

Booting into single user mode or a specific runlevel

This can be useful if, for instance, you've installed a graphical environment which isn't functioning properly. You either don't see anything at all or the system doesn't reach a finite state because is keeps trying to start X over and over again.

Booting into single user mode or into another runlevel where the graphical environment isn't running will give you access to the system so you can correct the problem.

To boot into single user mode in GRUB, point the cursor to the kernel entry you'd like to boot and press e. Then select the line starting with kernel. Go to the end of the line, and add single. After that press Enter to exit the editing mode. Then press b to boot it.

Switching runlevels

It's possible in Linux to switch to a different runlevel than the currently active one. This is done through the telinit command. It's syntax is simple: telinit [OPTION] RUNLEVEL where RUNLEVEL is the number of the runlevel.

The only option which telinit supports is -e KEY=VALUE. It's used to specify an additional environment variable to be included in the event along with RUNLEVEL and PREVLEVEL. Usually you won't use this option.

You'll find you use telinit mostly to switch to single-user mode (runlevel 1), for example to be able to umount a filesystem and fsck it. In that case you can use:

# telinit 1
   

Note that telinit on most systems is a symbolic link to the init command.

Passing parameters to the kernel

If a device doesn't work

A possible cause can be that the device driver in the kernel has to be told to use another irq and/or another I/O port. BTW: This is only applicable if support for the device has been compiled into the kernel, not if you're using a loadable module.

As an example, let's pretend we've got a system with two identical ethernet-cards for which support is compiled into the kernel. By default only one card will be detected, so we need to tell the driver in the kernel to probe for both cards. Suppose the first card is to become eth0 with an address of 0x300 and an irq of 5 and the second card is to become eth1 with an irq of 11 and an address of 0x340. For GRUB, you can add the additions the same way as booting into single-user mode, replacing the single keyword by the parameters.

The Rescue Boot process

When fsck is started but fails

During boot, on my Debian system, this is done by /etc/rcS.d/S30check.fs. All filesystems are checked based on the contents of /etc/fstab.

If the command fsck returns an exit status larger than 1, the command has failed. The exit status is the result of one or more of the following conditions:

0    - No errors
1    - File system errors corrected
2    - System should be rebooted
4    - File system errors left uncorrected
8    - Operational error
16   - Usage or syntax error
128  - Shared library error
     

If the command has failed you'll get a message:

fsck failed. Please repair manually

"CONTROL-D" will exit from this shell and
continue system startup.
     

If you don't press Ctrl-D but enter the root password, you'll get a shell, in fact /sbin/sulogin is launched, and you should be able to run fsck and fix the problem if the root filesystem is mounted read-only.

Alternatively, as is described in the next section, you can boot from a home-made or distribution-provided boot media.

If your root (/) filesystem is corrupt

Using the distribution's bootmedia

A lot of distributions come with one or more CD's or boot images which can be put on a USB stick. One of these CD's usually contains a 'rescue' option to boot into a simple mode without using the hard drive, so you can fix things.

Remember to set the boot-order in the BIOS to boot from CD-ROM or USB stick first and then HDD. In the case of a USB stick it may also be necessary to enable 'USB Legacy Support' in the bios.

What the rescue mode entails is distribution specific. But it should allow you to open a shell with root-privileges. There you can run fsck on the unmounted corrupt filesystem.

Let's assume your root partition was /dev/sda2. You can then run a filesystem check on the root filesystem by typing fsck -y /dev/sda2. The -y flag prevents fsck from asking questions which you must answer (this can result in a lot of Enters) and causes fsck to use yes as an answer to all questions.

Although your current root (/) filesystem is completely in RAM, you can mount a filesystem from harddisk on an existing mountpoint in RAM, such as /target or you can create a directory first and then mount a harddisk partition there.

After you've corrected the errors, don't forget to umount the filesystems you've mounted before you reboot the system, otherwise you'll get a message during boot that one or more filesystems have not been cleanly umounted and fsck will try to fix it again.

Copyright Snow B.V. The Netherlands