Chapter 14. Capacity Planning (214)

This text is part of the (not-finalized) October 2013 objectives. Do not rely on this text and the list of objectives. Do not use this text as a preparation for the pre-October 2013 exam.

Revision: $Revision: 947 $ ($Date: 2012-07-19 14:03:32 +0200 (Thu, 19 Jul 2012) $)

This topic has a total weight of 12 points and contains the following objectives:

Objective 214.1; Measure Resource Usage (4 points)

Candidates should be able to measure hardware resource and network bandwidth usage.

Objective 214.2; Troubleshoot Resource Problems (4 points)

Candidates should be able to identify and troubleshoot resource problems.

Objective 214.3; Analyze Demand (1 point)

Candidates should be able to analyze capacity demands.

Objective 214.4; Predict Future Resource Needs (3 points)

Candidates should be able to monitor resource usage to predict future resource needs.

Measure Resource Usage (214.1)

Candidates should be able to measure hardware resource and network bandwidth usage.

Key Knowledge Areas:

  • Measure CPU Usage

  • Measure memory usage

  • Measure disk I/O

  • Measure firewalling and routing throughput

  • Map client bandwith usage.

The following is a partial list of used files, terms and utilities:

  • iostat

  • vmstat

  • netstat

  • pstree, ps

  • w

  • lsof

  • top

  • uptime

  • sar

iostat

The iostat command is used for monitoring system input/output (IO) device load. This is done by observing the time the devices are active in relation to their average transfer rates. Usage:

$ iostat [options] /device
interval count

Example:

$ iostat
Linux 3.2.0-4-686-pae (debian) 	05/07/2013 	_i686_	(2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.25    0.32    3.76    0.20    0.00   94.46

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              12.21       214.81        17.38     333479      26980

$ iostat -c
Linux 3.2.0-4-686-pae (debian) 	05/07/2013 	_i686_	(2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.16    0.05   17.01    1.37    0.00   72.42

vmstat

The vmstat command reports virtual memory statistics about processes, memory, paging, block IO, traps, and CPU utilization. Usage:

$ vmstat [options] delay 
count

Example:

$ vmstat 2 2
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 109112  33824 242204    0    0   603    26  196  516  7 11 81  1
 0  0      0 109152  33824 242204    0    0     0     2  124  239  0  1 98  0

Please beware that the first row will always show average measurements since the machine has booted, and should therefore be neglected. Values which are related to memory and I/O are expressed in kilobytes (1024 bytes). Old (pre-2.6) kernels might report blocks as 512, 2048 or 4096 bytes instead. Values related to CPU measurements are expressed as a percent of total CPU time. Keep this in mind when interpreting measurements from a multi-CPU system. All five CPU fields should add up to a total of 100. vmstat will accept delay in seconds and a number of counts (repetitions) as an argument, but the process and memory measurement results will always remain to be instantaneous.

The first process column, r lists the number of processes currently allocated to the processorrun queue. These processes are waiting for processor run time, also known as CPU time.

The second process column, b lists the number of processes currently allocated to the block queue. These processes are listed as being in uninterruptable sleep, which means they are waiting for a device to return either input or output (I/O).

The first memory column, swpd lists the amount of virtual memory being used expressed in kilobytes (1024 bytes). Virtual memory consists of swap space from disk, which is considerably slower than physical memory allocated inside memorychips.

The second memory column, free lists the amount of memory currently not in use, not cached and not buffered expressed.

The third memory column, buff lists the amount of memory currently allocated to buffers. Buffered memory contains raw disk blocks.

The fourth memory column, cache lists the amount of memory currently allocated to caching. Cached memory contains files.

The fifth memory column, inact lists the amount of inactive memory.

The sixth memory column, active lists the amount of active memory.

The first swap column, si lists the amount of memory being swapped in from disk.

The second swap column, so lists the amount of memory being swapped out to disk.

The first io column, bi lists the amount of blocks per second being received from a block device.

The second io column, bo lists the amount of blocks per second being sent to a block device.

The first system column, in lists the number of interrupts per second (including the clock).

The second system column, cs lists the number of context switches per second.

The cpu columns are expressed as percentages of total CPU time.

The first cpu column, us shows the percentage of time spent running non-kernel code.

The second cpu column, sy shows the percentage of time spent running kernel code.

The third cpu column, id shows the percentage of idle time.

The fourth cpu column, wa shows the percentage of time spent waiting for IO (Input/Output).

The fifth cpu column, st shows the percentage of time stolen from a virtual machine.

netstat

The netstat command shows network connections, routing tables, interface statistics, masquerade connections and multicast memberships. The results are dependant on the first argument:

  • (no argument given) - all active sockets of all configured address families will be listed.

  • --route, -r - the kernel routing tables are shown, output is identical to route -e (note: in order to use route elevated privileges might be needed whereas netstat -r can be run with user privileges instead).

  • --groups, -g - lists multicast group membership information for IPv4 and IPv6

  • --interfaces, -i - lists all network interfaces and certain specific properties

  • --statistics, -s - lists a summary of statistics for each protocol, similar to SNMP output

  • --masquerade, -M - lists masqueraded connections on pre-2.4 kernels. On newer kernels, use cat /proc/net/ip_conntrack instead. In order for this to work, the ipt_MASQUERADE kernel module has to be loaded.

Usage:

$ netstat [address_family_options] [options] 
$ netstat -aln
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN     
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:34236           0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:389             0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN     
tcp6       0      0 :::22                   :::*                    LISTEN     
tcp6       0      0 ::1:25                  :::*                    LISTEN     
tcp6       0      0 :::32831                :::*                    LISTEN
     
$ netstat -al
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 *:ssh                   *:*                     LISTEN     
tcp        0      0 localhost:smtp          *:*                     LISTEN     
tcp        0      0 *:34236                 *:*                     LISTEN     
tcp        0      0 *:ldap                  *:*                     LISTEN     
tcp        0      0 *:sunrpc                *:*                     LISTEN     
tcp6       0      0 [::]:ssh                [::]:*                  LISTEN     
tcp6       0      0 localhost:smtp          [::]:*                  LISTEN     
tcp6       0      0 [::]:32831              [::]:*                  LISTEN     

ps

Usage:

$ ps [options]

The ps command shows a list of the processes currently running. These are the same processes which are being shown by the top command. The GNU version of ps accepts three different kind of options:

  1. UNIX options - these may be grouped and must be preceded by a dash

  2. BSD options - these may be grouped and must be used without a dash

  3. GNU long options - these are preceded by two dashes

These options may be mixed on GNU ps up to some extent, but bare in mind that depending on the version of Linux you are working on you might encounter a less flexible variant of ps. The ps manpage can be, depending on the distribution being questioned, up to nearly 900 lines long. Because of its versatile nature, you are encouraged to read trough the manpage and try out some of the options ps has to offer.

Example:

$ ps ef
  PID TTY      STAT   TIME COMMAND
 4417 pts/0    Ss     0:00 bashDISPLAY=:0 PWD=/home/user HOME=/home/user SESSI
 4522 pts/0    R+     0:00  \_ ps efSSH_AGENT_PID=4206 GPG_AGENT_INFO=/home/user/

$ ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 02:02 ?        00:00:01 init [2]  
root         2     0  0 02:02 ?        00:00:00 [kthreadd]
root         3     2  0 02:02 ?        00:00:01 [ksoftirqd/0]
root         4     2  0 02:02 ?        00:00:01 [kworker/0:0]
root         6     2  0 02:02 ?        00:00:00 [migration/0]
root         7     2  0 02:02 ?        00:00:00 [watchdog/0]

pstree

The pstree command shows the same processes as ps and top, but the output is is formatted as a tree. The tree is rooted at pid (or init if pid is omitted), and if a username is specified the tree will root at all processes owned by that username. pstree provides an easy way to track back a process to it's parent process id (PID). Output between square brackets prefixed by a number are identical branches of processes grouped together, the prefixed number represents the repetition count. Grouped child threads are shown between square brackets as well but the process name will be shown between curly braces as an addition. The last line of the output shows the number of children for a given process. Usage:

$ pstree [options] [pid|username]
$ pstree 3655
gnome-terminal─┬─bash───pstree
               ├─bash───man───pager
               ├─gnome-pty-helpe
               └─3*[{gnome-terminal}]

w

The w displays information about the users currently logged on to the machine, their processes and the same statistics as provided by the uptime command. Usage:

$ w - [husfV] [user]

Example:

$ w -s
 02:52:10 up 49 min,  2 users,  load average: 0.11, 0.10, 0.13
USER     TTY      FROM              IDLE WHAT
user  	 	tty9     :0               49:51  gdm-session-worker [pam/gdm3]
user     pts/0    :0                0.00s w -s

lsof

The lsof command is used to list information about open files and their corresponding processes. Not just regular files can be examined this way, but lsof might just as well handle directories, block special files, character special files, executing text references, libraries, streams or network files. By default, lsof will show unformatted output which might be hard to read but is very suitable to be interpreted by other programs. The -F switch plays an important role here. Usage:

$ lsof [options] names 

names acts as a filter here, without options lsof will show all open files belonging to all active processes.

$ sudo lsof /var/run/utmp 
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
gdm-simpl 4040 root   10u   REG   0,14     5376  636 /run/utmp

$ sudo lsof +d /var/log
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
rsyslogd 2039 root    1w   REG    8,1    44399 262162 /var/log/syslog
rsyslogd 2039 root    2w   REG    8,1   180069 271272 /var/log/messages
rsyslogd 2039 root    3w   REG    8,1    54012 271269 /var/log/auth.log
rsyslogd 2039 root    6w   REG    8,1   232316 271268 /var/log/kern.log
rsyslogd 2039 root    7w   REG    8,1   447350 271267 /var/log/daemon.log
rsyslogd 2039 root    8w   REG    8,1    68368 271271 /var/log/debug
rsyslogd 2039 root    9w   REG    8,1     7888 271270 /var/log/user.log
Xorg     4041 root    0r   REG    8,1    31030 262393 /var/log/Xorg.0.log

free

The free command shows a real-time overview of the total amount of both physical and virtual memory of a system, as well as the amount of memory which is free, which is used, and which is buffered by the kernel.

The fourth column, called shared has become obsolete and should be neglected. Usage:

$ free  [options]

Example:

$ free -h
             total       used       free     shared    buffers     cached
Mem:          502M       489M        13M         0B        44M       290M
-/+ buffers/cache:       154M       347M
Swap:         382M       3.9M       379M

top

$ top [options]

The top provides a 'dynamic real-time view' of a running system.

top - 03:01:24 up 59 min,  2 users,  load average: 0.15, 0.19, 0.16
Tasks: 117 total,   2 running, 115 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.9 us,  4.5 sy,  0.1 ni, 94.3 id,  0.1 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem:    514332 total,   497828 used,    16504 free,    63132 buffers
KiB Swap:   392188 total,        0 used,   392188 free,   270552 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
 4041 root      20   0  106m  31m 9556 R  30.4  6.3   3:05.58 Xorg
 4262 user      20   0  527m  71m  36m S  18.2 14.3   2:04.42 gnome-shell

Because of it's interactive mode, the most important keys while operating top are the help keys h or ? and the quit key q. The following schema provides an overview of the most important function keys and it's alternatives:

              key      equivalent-key-combinations
              Up       alt + \      or  alt + k
              Down     alt + /      or  alt + j
              Left     alt + <      or  alt + h
              Right    alt + >      or  alt + l (lower case L)
              PgUp     alt + Up     or  alt + ctrl + k
              PgDn     alt + Down   or  alt + ctrl + j
              Home     alt + Left   or  alt + ctrl + h
              End      alt + Right  or  alt + ctrl + l

uptime

The uptime command shows how long the system has been running, how many users are logged on, the system load averages for the past 1, 5 and 15 minutes and the current time. Usage:

$ uptime [-V]

$ uptime
 03:03:12 up  1:00,  2 users,  load average: 0.17, 0.18, 0.16 

sar

The sar command collects, reports or saves system activity information Usage:

$ sar  interval
count

$ sar
Linux 3.2.0-4-686-pae (debian) 	05/07/2013 	_i686_	(2 CPU)

02:02:34 AM       LINUX RESTART

02:05:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
02:15:01 AM     all      0.15      0.00      1.06      0.23      0.00     98.56
02:25:01 AM     all      0.98      0.83      3.84      0.04      0.00     94.31
02:35:01 AM     all      0.46      0.00      4.84      0.04      0.00     94.66
02:45:01 AM     all      0.90      0.00      5.29      0.01      0.00     93.80
02:55:01 AM     all      0.66      0.00      4.64      0.03      0.00     94.67
03:05:02 AM     all      0.66      0.00      5.57      0.01      0.00     93.76
Average:        all      0.64      0.14      4.19      0.06      0.00     94.98

Without options, sar will output the statistics above.

Using the -d option switch sar will output disk statistics.

$ sar -d 
06:45:01 AM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
06:55:01 AM    dev8-0      6.89    227.01     59.67     41.59      0.02      2.63      1.38      0.95
07:05:01 AM    dev8-0      2.08     17.73     17.78     17.06      0.00      2.19      0.94      0.20
07:15:01 AM    dev8-0      1.50     12.16     12.96     16.69      0.00      1.35      0.68      0.10
Average:       dev8-0      3.49     85.63     30.14     33.15      0.01      2.36      1.19      0.42

The -b option switch shows output related to I/O and transfer rate statistics:

$ sar -b
06:45:01 AM       tps      rtps      wtps   bread/s   bwrtn/s
06:55:01 AM      6.89      4.52      2.38    227.01     59.67
07:05:01 AM      2.08      0.95      1.13     17.73     17.78
07:15:01 AM      1.50      0.50      1.00     12.16     12.96
Average:         3.49      1.99      1.50     85.63     30.14

Apart from the options provided above, the following additional switches may be used with sar:

  • -c System calls

  • -g, -p and -w Paging and swapping activity

  • -q Run queue

  • -r Free memory and swap over time

Copyright Snow B.V. The Netherlands