Candidates should be able to use system tools to back up important system data.
Key Knowledge Areas:
The following is a partial list of used files, terms, and utilities:
Everyone (more or less) knows that, as a system administrator, it is vital to make backups. Most also know why. Your data is valuable. It will cost you time and effort to re-create it, and that costs money or at least personal grief and tears. Sometimes it can't even be re-created, such as the results of some experiments. Since it is an investment, you should protect it and take steps to avoid losing it.
There are four main reasons why you may lose data: human error, software bugs, hardware failure and natural disasters. Humans are quite unreliable, they might make a mistake or be malicious and destroy data on purpose. Modern software does not even pretend to be reliable. A rock-solid program is an exception, not a rule. Hardware is more reliable, but may break seemingly spontaneously, often at the worst possible time. Nature may not be evil, but, nevertheless, can be very destructive sometimes.
In general, you want to back up as much as possible. Major exception are the
/sys filesystems. Since these only
contain data that the kernel generates automatically, it is never a good idea to back it up.
/proc/kcore file is especially unnecessary, since it is just an
image of your current physical memory; it's pretty large as well. Some special files that
are constantly changed by the operating system (e.g.
not be restored, hence not be backed up. There may be others on your system.
Gray areas include the news spool, log files and many other things in
/var. You must decide what you consider important. Also, consider
what to do with the device files in
/dev. Most backup solutions can
backup and restore these special files, but you may want to re-generate them with a script.
The obvious things to back up are user files (
/home) and system
configuration files (
/etc, but possibly other things scattered all over
Generally it is good idea to backup everything needed to rebuild the system as fast as required after a failure.
Depending on the rate of change of the data, this may be anything from daily to almost never. The latter happens on firewall systems, where only the log files change daily (but logging should happen on a different system anyway). So, the only time when a backup is necessary, for example, is when the system is updated or after a security update.
On normal systems, a daily backup is often best.
Backups can take several hours to complete, but, for a successful backup strategy, human interaction should be minimal, preferably just a matter of placing a tape in the tape device.
While the brand or the technology of the hardware and software used for backups is not important, there are, nevertheless, important considerations in selecting them. Imagine, for example, the restore software breaks and the publisher has since gone out of business.
No matter how you create your backups, the two most important parts in a backup strategy are:
The safest method is to read back the entire backup and compare this with the original files. This is very time-consuming and often not an option. A faster, and relatively safe method, is to create a table of contents (which should contain a checksum per file) during the backup. Afterwards, read the contents of the tape and compare the two.
This means that you must have a restore procedure. This restore procedure has to specify how to restore anything from a single file to the whole system. Every few months, you should test this procedure by doing a restore.
If something fails during a backup, the medium will not contain anything useful. If this was your only medium, you are screwed. So you should have at least two sets of backup media. But if you store both sets in the same building, the first disaster that comes along will destroy all your precious backups along with the running system.
So you should have at least one set stored at a remote site. Depending on the nature of your data, you could store weekly or daily sets remotely.
One important thing to be aware of, as noted in the previous paragraph, you need to be able to rebuild your system (or restore certain files) as fast as required. In some enviroments restore times can be hours because of slow network lines or other causes. The time lost may be too much and can defeat the purpose of the backup system. Other solutions for continuity of service like cluster/failover systems are recommended.
There are diffent types of backup media, each with there own benifits and drawbacks. The choice for the medium however will often be made based on total cost. The main types are:
Tape is one of the most used mediums for backup in enterprise environments. It is low cost and because tapes store passively they have a low chance for failure and consume little power on standby. A disadvantage of tape is that it is a streaming medium which means high access times, especially when a tape robot is used for accessing multiple tapes. Bandwidth can be high if data is provides/requisted in a continuous stream. Tape is especially suitable for long term backup and archiving. If a lot of small files have to be restored often this medium is not suitable.
Local disk storage is hardly used for backup, (though it is used for network storage, see below). The advantages are high bandwidth, low latency and a reasonable price compared to capacity. But it is not suitable for off-site backup (or the disk has to be manually disconnected en transported to a safe location). And since disks are always connected and running chances for failure are high. Though not suitable for off-site backup it is sometimes used as intermediate (buffer) medium between the backup system and an off-site backup server. The advantage is that fast recovery of recent files is possible and the production systems won't be occupied by long backup transfers.
Optical media like CDR and DVDR disk are mostly used to backup systems which don't change a lot of files. Often a complete image of the system is saved to disk for fast recovery. Optical disks are low cost and have a high reliability when stored correctly. They can easily be transported off-site. Disadvantages are that most are write-once and the storage capacity is low. Access time and bandwidth are moderate, although mostly they have to be handled manually by the operator.
Network storage is mostly remote disk storage (NAS or SAN). Using data protection techniques like RAID, the unreliability of disks can be reduced. Most modern network storage systems use compression and deduplication to increase potential capacity. Also most systems can emulate tape drives which makes it easy to migrate from tape. The cost of the systems can be high depending on the features, reliability and capacity. Also power costs should be considered because a network storage system is always on. This type of medium is thus not preferred for long time backup and archives. Access time and bandwidth can differ and depend on infrastructure, but are mostly high.
Rsync is a utility to copy/synchronise files from one location to the other while
keeping the required bandwidth low. It wil look at the files to copy and the files already
present at the destination and uses timestamps, filesize and an advanced algorithm to
calculate which (portions of) files need to be transferred. Source and destination can be
local or remote and in case of a remote server SSH or rsync protocol can be used for network
transfer. Rsync is invoced much like a
cp. Recursive mode is enabled
-r and archive with
-a. A simple example
to copy files from a local directory to a remote directory via
rsync -av -e "ssh" /snow remote:/snow
The tar utility is used to combine multiple files/directories into a continous stream of bytes (and revert the stream into files/directories). This stream can be compressed, transferred over network connections, saved to a file or streamed onto a tape device. When reading files from the stream, permissions, modes, times and other information can be restored. Tar is the most basic way for transferring files and directories to and from tape, either with or without compression.
An example of extracting a gzipped tar archive, with verbose output and input data read from a file:
tar xvzf snow.tgz
Extracting a tar archive from a scsi tape drive:
tar xvf /dev/st0
Creating a archive to file from the directory
cd /; tar cvf /tmp/snow.tar snow
By default the tar utility uses (scsi) tape as medium.
As can be seen in the example above scsi tape devices can be found in
latter one is a non rewinding tape, this means that the tape does not
rewind automatically after each operation. This is an important feature
for backups, because otherwise when using multiple tar
commands for backups any backup but the last would be overwritten by the
Tapes can be controlled by the mt command (magnetic tape). The syntax of this command is: mt [-h] [-f device] command [count]. The option -h (help) lists all possible commands. If the device is not specified by the -f option, the command will use the environment variable TAPE as default. More information can be found in the manual pages.
Using the dd utility, whole disks/partitions can be transferred from/to files or other
disks/partitions. With dd whole filesystems can be backed-up at once. dd will copy data at
byte level. Common options to
An example of dd usage to transfer a 1GB partition to file:
dd if=/dev/hda1 of=/tmp/disk.img bs=1024 count=1048576
The cpio utility is used to copy files to and from archives. It can read/write various
archive formats including tar and zip. Although it predates
tar it is
less well known. Cpio has three modes, input mode (
-i) to read an
archive and extract the files, output mode (
-o) to read a list of
files and compress them into an archive and pass-through mode (
which reads a list of files and copies these to the destination directory. The file list is
stdin and is often provided by
An example of compressing a directory into a cpio
%cd /snow; find . | cpio -o > snow.cpio
Complete backup solutions exist which help simplify the administration and configuration of backups in larger environments. These solutions can automate backup(s) of multiple servers and/or clients to multiple backup media. Many different solutions exist, each with their own strengths and weaknesses. Below you'll find some examples of these solutions.
AMANDA, the Advanced Maryland Automatic Network Disk Archiver, is a backup solution that allows the IT administrator to set up a single master backup server to back up multiple hosts over network to tape drives/changers or disks or optical media. Amanda uses native utilities and formats (e.g. dump and/or GNU tar) and can back up a large number of servers and workstations running multiple versions of Linux or Unix.
Bacula is a set of Open Source, enterprise ready, computer programs that permit you (or the system administrator) to manage backup, recovery, and verification of computer data across a network of computers of different kinds. Bacula is relatively easy to use and efficient, while offering many advanced storage management features that make it easy to find and recover lost or damaged files. In technical terms, it is an Open Source, enterprise ready, network based backup program. According to Source Forge statistics (rank and downloads), Bacula is by far the most popular Enterprise grade Open Source program.
BackupPC is a high-performance, enterprise-grade system for backing up Linux, WinXX and MacOSX PCs and laptops to a server's disk. BackupPC is highly configurable and easy to install and maintain.