Troubleshooting network issues (205.3)

Revision: $Revision: 955 $

Candidates should be able to identify and correct common network setup issues, to include knowledge of locations for basic configuration files and commands.

This objective has a weight of 5 points

Key Knowledge Areas:

The following is a partial list of used files, terms, and utilities:

Something on network troubleshooting in general

The purpose of this chapter is not providing solutions to all possible network problems but more to point out that you have to do it methodically and with knowledge of the surrounding environment.

Knowledge of the surrounding environment is an absolute necessity because you can't solve a problem if you have no idea whatsoever where problems can occur. You have to know how you are connected to the network because every component that plays a role in facilitating your network access can fail.

The key files, terms and utilities will not be described in this chapter since most of them have already been described in Chapter 5, Networking Configuration (205) and other chapters.

An example situation

Let's assume you're working on a PC which is connected to the Internet by means of a local area network and a firewall. Suddenly you can't view that web page anymore.

The first step in determining the problem is to make a list of the components that are involved, this could be something like:

+ your PC with eth0 network interface to the LAN
+ the firewall with eth0 interface to the LAN
               and  eth1 interface to the ISP
+ the site you are trying to reach
   

Then think about how everything works together. You enter the URL in the browser. Your machine uses DNS to find out what the IP address is of the web site you are trying to reach etc.

Packets travel through your eth0 interface over the LAN to the eth0 interface of the firewall and out the eth1 interface of the firewall to the ISP and from the ISP in some way to the web server.

Now that you know how the different components interact, you can take steps to determine the source of the malfunction.

The graphic below should not be treated as the way to solve all problems. The purpose of the graphic is to give an example of step-by-step troubleshooting given a certain situation, which in this case is fictitious :)

S

The cause of the problem has been determined and can be S(olved).

1

Can we reach other machines on the internet ? Try another URL or try pinging another machine on the internet. Be careful with ping though, your firewall could be blocking ICMP echo-requests and replies.

2

Is the machine we are trying to reach, the target, down? Try reaching the machine via another network, contact a friend and let him try to reach the machine, call the person responsible for the machine etc.

3

Can we reach the firewall? Try pinging the firewall, login to it etc.

4

Is there a HOP down ? Use traceroute to find out what the hops are between you and the target host. The route from my machine to LPI's web-server for instance can be determined by issuing the command traceroute -I www.lpi.org:

# traceroute -I www.lpi.org
traceroute to www.lpi.org (209.167.177.93), 30 hops max, 38 byte packets
 1  fertuut.snowgroningen (192.168.2.1)  0.555 ms  0.480 ms  0.387 ms
 2  wc-1.r-195-85-156.essentkabel.com (195.85.156.1)  30.910 ms  26.352 ms  19.406 ms
 3  HgvL-WebConHgv.castel.nl (195.85.153.145)  19.296 ms  28.656 ms  29.204 ms
 4  S-AMS-IxHgv.castel.nl (195.85.155.2)  172.813 ms  199.017 ms  95.894 ms
 5  f04-08.ams-icr-03.carrier1.net (212.4.194.13)  118.879 ms  84.262 ms  130.855 ms
 6  g02-00.amd-bbr-01.carrier1.net (212.4.211.197)  30.790 ms  45.073 ms  28.631 ms
 7  p08-00.lon-bbr-02.carrier1.net (212.4.193.165)  178.978 ms  211.696 ms  301.321 ms
 8  p13-02.nyc-bbr-01.carrier1.net (212.4.200.89)  189.606 ms  413.708 ms  194.794 ms
 9  g01-00.nyc-pni-02.carrier1.net (212.4.193.198)  134.624 ms  182.647 ms  411.876 ms
10  500.POS2-1.GW14.NYC4.ALTER.NET (157.130.94.249)  199.503 ms  139.083 ms  158.804 ms
11  578.ATM3-0.XR2.NYC4.ALTER.NET (152.63.26.242)  122.309 ms  191.783 ms  297.066 ms
12  188.at-1-0-0.XR2.NYC8.ALTER.NET (152.63.18.90)  212.805 ms  193.841 ms  94.278 ms
13  0.so-2-2-0.XL2.NYC8.ALTER.NET (152.63.19.33)  131.535 ms  131.768 ms  152.717 ms
14  0.so-2-0-0.TL2.NYC8.ALTER.NET (152.63.0.185)  198.645 ms  136.199 ms  274.059 ms
15  0.so-3-0-0.TL2.TOR2.ALTER.NET (152.63.2.86)  232.886 ms  188.511 ms  166.256 ms
16  POS1-0.XR2.TOR2.ALTER.NET (152.63.2.78)  153.015 ms  157.076 ms  150.759 ms
17  POS7-0.GW4.TOR2.ALTER.NET (152.63.131.141)  143.956 ms  146.313 ms  141.405 ms
18  akainn-gw.customer.alter.net (209.167.167.118)  384.687 ms  310.406 ms  302.744 ms
19  new.lpi.org (209.167.177.93)  348.981 ms  356.486 ms  328.069 ms
      

5

Can other machines in the network reach the firewall? Use ping, or login to the firewall from that machine or try viewing a web page on the internet from that machine.

6

Does the firewall block the traffic to that particular machine? Maybe someone doesn't want you to look at that particular site and has instructed the firewall to block traffic to and/or from that site.

7

Inspect the firewall. The problem seems to be on the firewall. Test the interfaces on the firewall, inspect the firewalling rules, check the cabling etc.

8

Is our eth0 interface up? This can be tested by issuing the command ifconfig eth0.

9

Are your route definitions as they should be? Think of things like default gateway. The route table can be viewed by issuing the command route -n.

10

Is there a physical reason for the problem? Check if the the problem is in the cabling. This could be a defective cable or a badly shielded one. Putting power supply cabling and data cabling through the same tube without metal shielding between the two of them can cause unpredictable, hard to reproduce, errors in the data transmission.

Name resolution problems

Besides the checklist given above, there are some other possibilities why the connection to another machine might fail. These possiblities are listed below.

Name resolution is the translation of a hostname into an IP address. If a user tries to connect to a machine based on the hostname of that machine and the hostname resolution doesn't function properly then there will be no connection made.

The file /etc/resolv.conf contains the IP addresses of the nameservers. The nameservers are the servers that do the name resolution for a external network. For small (local) networks a local lookup table can be made by using the /etc/hosts file. This file contains a list of aliases or FQDN (or both) per IP address.

Checking the name resolution can be done with the commands /usr/bin/dig (dig is an acronym for Domain Information Groper) or /usr/bin/host. Both of these commands return the IP address associated with the hostname.

# host crystal.snow.nl
crystal.snow.nl is an alias for imap.snow.nl.
imap.snow.nl has address 213.154.248.156
    
# dig crystal.snow.nl

; <<>> DiG 9.8.0-P1-RedHat-9.8.0-3.P1.fc15 <<>> crystal.snow.nl
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13354
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 3, ADDITIONAL: 6

;; QUESTION SECTION:
;crystal.snow.nl.  IN A

;; ANSWER SECTION:
crystal.snow.nl. 62664 IN CNAME imap.snow.nl.
imap.snow.nl.  67198 IN A 213.154.248.156

;; AUTHORITY SECTION:
snow.nl.  2344 IN NS ns2.transip.eu.
snow.nl.  2344 IN NS ns0.transip.net.
snow.nl.  2344 IN NS ns1.transip.nl.

;; ADDITIONAL SECTION:
ns0.transip.net. 67199 IN A 80.69.67.67
ns0.transip.net. 67199 IN AAAA 2a01:7c8:a::53
ns1.transip.nl.  75705 IN A 80.69.69.69
ns1.transip.nl.  314 IN AAAA 2a01:7c8:b::53
ns2.transip.eu.  67198 IN A 217.115.203.194
ns2.transip.eu.  67198 IN AAAA 2001:14a0:100:6::53

;; Query time: 2 msec
;; SERVER: 213.154.248.156#53(213.154.248.156)
;; WHEN: Mon May 16 15:43:13 2011
;; MSG SIZE  rcvd: 283
    

The information the dig command returns is more elaborate than the host command does. But both commands return the same IP address.

The hostname of a machine itself is stored in a file called /etc/hostname or /etc/HOSTNAME for Debian based systems. On Fedora systems the name is stored in the file /etc/sysconfig/network. For all systems the hostname can be found with the command /bin/hostname. When given no argument, this command gives replies with the hostname of the machine. In case an argument is given along with the command, the hostname of the machine will be changed.

Incorrect initialization of the system

Another possible cause of network problems can be the incorrect initialization of the system. To find any initialization errors check out the file /var/log/messages or read the kernel ring buffer by using the /bin/dmesg command.

Security settings

Security settings can also be a source of connection problems. The server may have blocked access from or allow access from certain clients using the /etc/host.deny resp. /etc/host.allow

Network configuration

Also the network configuration may cause connectivity problems. For instance if the computer uses fixed network settings from another site. You can check these settings in the files in the directory /etc/sysconfig/network-scripts for Fedora based systems or in the file /etc/network for Debian based systems.

Copyright Snow B.V. The Netherlands