Chapter 8. Web Services (208)

Revision: $Revision: 911 $ ($Date: 2012-05-24 15:53:19 +0200 (Thu, 24 May 2012) $)

This objective has a weight of 6 points and contains the following objectives:

Objective 208.1; Implementing a Web Server (3 points)

Candidates should be able to install and configure a web server. This objective includes monitoring the server's load and performance, restricting client-user access, configuring support for scripting languages as modules and setting up client user authentication. Also included is configuring server options to restrict usage of resources.

Objective 208.2; Maintaining a Web Server (2 points)

Candidates should be able to configure a web server to use virtual hosts, Secure Sockets Layer(SSL) and customise file access.

Objective 208.3; Implementing a Proxy Server (1 points)

Candidates should be able to install and configure a proxy server, including access policies, authentication and resource usage.

Implementing a Web Server (208.1)

Candidates should be able to install and configure a web server. This objective includes monitoring the server's load and performance, restricting client user access, configuring support for scripting languages as modules and setting up client user authentication. Also included is configuring server options to restrict usage of resources.

Key Knowledge Areas

Apache 2.x configuration files, terms and utilities

Apache log files configuration and content

Access restriction methods and files

mod_perl and PHP configuration

Client user authentication files and utilities

Configuration of maximum requests, minimum and maximim servers and clients

Key files, terms and utilities include:

access.log
error.log
.htaccess
httpd.conf
mod_auth
htpasswd
htgroup
apache2ctl
httpd

Resources: LinuxRef06; Coar00; Poet99; Wilson00; Engelschall00; PerlRef01; Krause01; the man pages for the various commands.

Installing the Apache web-server

Apache can be built from source, but in most cases it is already part of a Linux distribution. The next step is to edit the configuration files for the server: by default, depending on your distribution, these files are located in /etc/apache/config or /etc/httpd/config. The main configuration file is called httpd.conf. This can be, again depending on your distribution, one big file with configuration of all modules, or a small generic file with references to other configuration files. Some configuration options can be set on a per-directory basis by placing a file called .htaccess in the directory. Options set in a .htaccess file are valid for the directory containg the file and all its subdirectories.

First, edit httpd.conf. This sets up the general attributes of the server: the port number, the user it runs as, etc. The default configuration file is meant to be self explanatory. The configuration option AllowOverride can be set in a Directory context and specifies which values can be overridden by .htaccess files.

Modularity

Apache, like many other successful open source projects, has a modular source code architecture. This means that, to add or modify functionality, you do not need to know the whole code base. You can custom build the server with only the modules you need and include your own modules.

Extending Apache can be done in C or in a variety of other languages using the appropriate modules. These modules expose Apache's internal functionality to various programming languages such as Perl or Tcl. There are many modules available, too many to list in this book. If you have any questions about the development of an Apache module, you should join the Apache-modules mailing list at http://modules.apache.org. Remember to do your homework first: research past messages and check all the documentation on the Apache site. Chances are that someone has already written a module that solves the problem you are experiencing.

The modular structure of Apache's source code should not be confused with the functionality of run-time loading of Apache modules. Run-time modules are loaded after the core functionality of Apache has started and are a relatively new feature. In older versions, to use the functionality of a module, it needed to be compiled in during the build phase. Current implementations of Apache are capable of loading modules run-time, see DSO.

Run-time loading of modules (DSO)

Most modern Unix derivatives include a mechanism called dynamic linking/loading of Dynamic Shared Objects (DSO). This provides a way to build a piece of program code in a special format for loading at run-time into the address space of an executable program. This loading can usually be done in two ways: either automatically by a system program called ld.so when an executable program is started, or manually, from within the executing program via a system interface to the Unix loader through the system calls dlopen()/dlsym().

In the latter method the DSO's are usually called shared objects or DSO files and can be named with an arbitrary extension (although the canonical name is foo.so). These files are usually installed in a program-specific directory. The executable program manually loads the DSO at run-time into its address space via dlopen().

The fact that Apache already uses a module concept to extend its functionality, and internally uses a dispatch-list-based approach to link external modules into the Apache core functionality has predestined Apache for using DSO's. Starting with version 1.3 Apache began using the DSO mechanism to extend its functionality at run-time. Since then the configuration system supports two optional features for taking advantage of the modular DSO approach: compilation of the Apache core program into a DSO library for shared usage and compilation of the Apache modules into DSO files for explicit loading at run-time.

Tip

While Apache is installed by default in most Linux distributions, not all versions support dynamic modules. To see whether your version of Apache supports these, execute the command httpd -l which lists the modules that have been compiled into Apache. If mod_so.c appears in the list of modules then your Apache server can make use of dyamic modules.

APache eXtenSion (APXS) support tool

The APXS is a new support tool from Apache 1.3 onwards which can be used to build an Apache module as a DSO outside the Apache source-tree. It knows the platform dependent build parameters for making DSO files and provides an easy way to run the build commands with them.

Encrypted webservers: SSL

Apache has been modified to support Secure Socket Layers for Secure On-Line communication. The Secure Sockets Layer protocol (SSL) is a protocol layer which may be placed between a reliable connection-oriented network layer protocol (e.g., TCP/IP) and the application protocol layer (e.g., HTTP). SSL provides secure communication between client and server by allowing mutual authentication, the use of digital signatures for integrity and encryption for privacy. There are a currently two versions of SSL still in use: versions 2 and 3. Additionally, there is the successor to SSL, TLS (version 1, which is based on SSL), designed by the IETF.

Public key cryptography

SSL uses Public Key Cryptography (PKC), also known as asymmetric cryptography. Public key cryptography is used in situations where the sender and receiver do not share a common secret, e.g., between browsers and web-servers, but wish to establish a trusted channel for their communication.

PKC defines an algorithm which uses two keys, each of which may be used to encrypt a message. If one key is used to encrypt a message, then the other must be used to decrypt it. This makes it possible to receive secure messages by simply publishing one key (the public key) and keeping the other secret (the private key). Anyone may encrypt a message using the public key, but only the owner of the private key will be able to read it. For example, Joan may send private messages to the owner of a key-pair (e.g., your web-server), by encrypting the messages using the public key your server publishes. Only the server will be able to decrypt it.

A secure web-server (e.g., Apache/SSL) uses HTTP over SSL, using port 443 by default (can be configured in httpd.conf). Within the browser, this is signified by the use of the https scheme in the URL. The public key is exchanged during the set-up of the communication between server and client (browser). That public key is signed (it contains a digital signature e.g., a message digest) by a so-called CA (Certificate Authority). The browser contains a number of so-called root-certificates: they can be used to determine the validity of the CA's that signed the key.

Various Apache and SSL related projects

A number of solutions are available to enable Apache to use SSL on Linux:

  • commercially licensed: Raven, Stronghold

  • Apache with SSLeay or Open-SSL, aka Apache-SSL

  • mod_ssl

It can be quite confusing to find out the difference between SSLeay, OpenSSL, Apache/SSL and mod_ssl. Therefore, the relation between these components is shown in the following diagram.

Eric A. Young and Tim J. Hudson created SSLeay - a library containing encryption functions. Another team, lead by Ralf Engelschall and Ben Laurie, used this library as a starting point to create a complementary set of cryptography software named OpenSSL. A team lead by Ben Laurie combined OpenSSL with the Apache webserver to create Apache-SSL. In 1998, Ralf Engelschall and his team derived mod_ssl from Apache-SSL.

mod_ssl is not a replacement for Apache-SSL - it is an alternative. It is a matter of personal choice as to which you run. mod_ssl is what is known as a split - i.e., it was originally derived from Apache-SSL, but has been extensively redeveloped. Many people find it very easy to install.

There are a number of commercial products available: Red Hat's Secure Web Server (which is based on mod_ssl), Covalent's Raven SSL Module (also based on mod_ssl) and C2Net's product Stronghold (based on a different evolution branch named Sioux up to Stronghold 2.x and based on mod_ssl since Stronghold 3.x).

Apache with mod_ssl

To use mod_ssl you will need to acquire and install Apache, patch it, and install and configure the module. You will also need to acquire and install OpenSSL, generate a key-pair, and either sign the public part of it yourself, thus creating a certificate, or have it signed by a commercial Certificate Authority (CA).

The mod_ssl package consists of the SSL module itself - and, surprisingly, a set of patches for Apache itself. This may puzzle you at first: why do we need to patch Apache to install the mod_ssl module? Well, the standard API that Apache uses for it's modules is unable to communicate with the SSL module. Therefore, the source patches add the Extended API (EAPI). In other words: you can only use the mod_ssl module when Apache's core code contains the Extended API. When building mod_ssl, the Apache source tree is automatically altered for you, adding the Extended API.

After installation of the software you will need to configure Apache with Apache-SSL. Some additional directives should be used to configure the secure server - for example the location of the key-files. It's beyond the scope of this book to document these directives, however, you can find them in the mod_ssl documentation and on the mod_ssl web-site.

Mod_ssl can be used to authenticate clients by using client certificates. These client certificates can be signed by your own CA, mod_ssl will validate the certificates against this CA. To enable this functionality the following directive must be used: SSLVerifyClient. This Directive can have the following values: require or none. When using the value none, this functionality is turned off.

Monitoring Apache load and performance

Apache is a generic web-server and has been designed to be correct first and fast second. (Even so, its performance is quite satisfactory.) Most sites have less than 10Mbits of outgoing bandwidth, which Apache can fill using only a low end Pentium-based computer. In practice, sites with more bandwidth require more than one machine to fill the bandwidth due to other constraints (such as CGI or database transaction overhead). For these reasons, the development focus has stayed on correctness and configurability.

The single biggest hardware issue affecting web-server performance is RAM. A webserver should never ever have to swap. It increases the latency of each request beyond a point that users consider fast enough. This causes users to hit stop and reload, further increasing the load. You can, and should, control the MaxClients setting so that your server does not spawn so many children that it starts swapping.

An Open Source system that can be used to periodically load-test pages of web-servers is Cricket. Cricket can be easily set up to record page-load times, and it has a web-based grapher that will generate charts to display the data in several formats. It is based on RRDtool, whose ancestor is MRTG (short for Multi-Router Traffic Grapher). RRDtool (Round Robin Data Tool) is a package that collects data in round robin databases; each data file is fixed in size so that running Cricket does not slowly fill up your disks. The database tables are sized when created and do not grow larger over time. As the data ages, it is averaged.

Apache access_log file

The access_log gives a generic overview of the access to your web-server. The format of the access log is highly configurable. The format is specified using a format string that looks much like a C-style printf format string. A typical configuration for the access log might look like the following.

LogFormat "%h %l %u %t \"%r\" %>s %b" common
CustomLog logs/access_log common
      

This defines the nickname common and associates it with a particular log format string. The above configuration will write log entries in a format known as the Common Log Format (CLF). This standard format can be produced by many different web servers and read by many log analysis programs. The log file entries produced in CLF will look something like this:

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
      

and contain the following fields:

  1. IP address of the client (%h)

  2. RFC 1413 identity determined by identd (%l)

  3. userid of person requesting (%u)

  4. time server finished serving request (%t)

  5. request line of user (%r)

  6. status code servers sent to client (%s)

  7. size of object returned (%b).

Restricting client user access

Apache is aware of two methods of access control:

discretionary access control (DAC)

checks the validity of the credentials given by the user, e.g. username/password (you can change these at your discretion);

mandatory access controls (MAC)

validate aspects that the user cannot control, for example, your DNA sequence, fingerprint or retinal patterns.

In Web terms in general, and Apache terms in particular, discretionary controls are based on usernames and passwords, and mandatory controls are based on things such as the IP address of the requesting client.

Apache uses modules to authenticate and authorise users. Generally, modules let you store the valid credential information in one format or another. The mod_auth module, for instance, looks in normal text files for the username and password info, and mod_auth_dbm looks in a DBM database for it.

Below is a list of the security-related modules that are included as part of the standard Apache distribution.

mod_auth

This is the basis for most Apache security modules; it uses ordinary text files for the authentication database.

mod_access

This is the only module in the standard Apache distribution which applies mandatory controls. It allows you to list hosts, domains, and/or IP addresses or networks that are permitted or denied access to documents.

mod_auth_anon

This module mimics the behaviour of anonymous FTP - rather than having a database of valid credentials, it recognizes a list of valid usernames (i.e., the way an FTP server recognizes ftp and anonymous) and grants access to any of those with virtually any password. This module is more useful for logging access to resources and keeping robots out than it is for actual access control.

mod_auth_dbm

Like mod_auth_db, save that credentials are stored in a DBM file.

mod_auth_db

This module is essentially the same as mod_auth, except that the authentication credentials are stored in a Berkeley DB file format. The directives contain the additional letters DB (e.g., AuthDBUserFile).

mod_auth_digest

Whereas the other discretionary control modules supplied with Apache all support Basic authentication, mod_auth_digest is currently the sole supporter of the Digest mechanism. It underwent some serious revamping in 1999. Like mod_auth, the credentials used by this module are stored in a text file. Digest database files are managed with the htdigest tool. Using mod_digest is much more extensive than setting up Basic authentication; please refer to the module documentation for details.

Configuring authentication modules

The security modules are passed the information which authentication databases to use via directives in the Apache configuration files, such as AuthUserFile or AuthDBMGroupFile. An alternate approach is the use of .htaccess files, which can be placed in the directories to be protected.

The first approach.  The resource being protected is determined by the placement of the directives in the configuration files; in this example:

<Directory /home/johnson/public_html>
<Files foo.bar>
AuthName "Foo for Thought"
            AuthType Basic
            AuthUserFile /home/johnson/foo.htpasswd
            Require valid-user
</Files>
</Directory>
</screen>
      

The resource being protected is any file named foo.bar in the /home/johnson/public_html directory or any subdirectory thereof. Likewise, the identification of which users are authorized to access foo.bar is stated by the directives -- in this case, any user with valid credentials in the /home/johnson/foo.htpasswd file can access the file.

The other approach.  Create an .htaccess file. The server behaviour can be configured on a directory-by-directory basis by using .htaccess files in directories accessed by the server (provided you configured this in the httpd.conf file using the AllowOverride option).

Note

By using an .htaccess file you can also restrict access by or grant access to (a) certain user(s) to documents within a directory or a subdirectory thereof. Documents to be restricted should be placed in directories separate from those you want unrestricted.

The authentication part of an .htaccess file contains 2 sections:

The first section of .htaccess can contain lines to determine which authentication type to use. It can also contain the name of the password file to be used, or the group file to be used, e.g.:

AuthUserFile {path to passwd file}
AuthGroupFile {path to group file}
AuthName {title for dialog box}
AuthType Basic
     

The second section of .htaccess contains a section that defines the access rights for the current directory to ensure that only user {username} can access the current directory:

<Limit GET>
  require user {username} 
</Limit>
     

The Limit section can contain other directives, that allow access from only certain IP addresses or only for users who are part of a certain set of users, a group.

As an example of the usage of both access control types (DAC and MAC), the following would permit any client on the local network (IP addresses 10.*.*.*) to access the foo.html page without hindrance, but require a username and password for anyone else:

<Files foo.html>
Order Deny,Allow
        Deny from All
        Allow from 10.0.0.0/255.0.0.0
        AuthName "Insiders Only"
        AuthType Basic
        AuthUserFile /usr/local/web/apache/.htpasswd-foo
        Require valid-user
        Satisfy Any
</Files>
   

User files

The mod_auth module uses plain text user files. Its entries are of the form username:password; additional fields may follow the password, separated from it by a colon, but these are ignored. The password field should be encrypted. To create and update the flat-files used to store usernames and password for basic authentication of HTTP users, you can use the command htpasswd. This program can only be used when the usernames are stored in a flat-file. htpasswd encrypts passwords using either a version of MD5 modified for Apache, or the system's crypt() routine. It is permissible to have some user records using MD5-encrypted passwords, while others in the same file may have passwords encrypted with crypt(). To use a DBM database (as used by mod_auth_db) you may use dbmmanage. For other types of user files/databases, please consult the documentation that comes with the chosen module.

Group files

It is possible to deny or allow access to a group of users. This is done by creating a group file. It contains a list of groups and members. The group names are completely arbitrary. The usernames should occur in the password file. The format looks like this:

{group1}:{username1} {username2} etc... 
{group2}:{username1} {username3} etc... 
     

In this example, group1 and group2 are different group names and username1, username2, and username3 are usernames from the password file. Be sure to put each group entry on its own line.

Note

A username can be in more than one group entry. This simply means that the user is a member of both groups.

The last step is to make sure that the read permissions for the group file have been set for everyone (i.e., owner, group and other). Referral to this file is done by insertion of AuthGroupFile directives into either the master configuration file or into the .htaccess file.

Example 8.1. Example

To ensure that only users of the group mygroup, as defined in file /etc/httpd/groups/groupfile, can access a directory, you would use the following directives in .htaccess:

AuthGroupFile /etc/httpd/groups/groupfile
<Limit GET>
  require group mygroup
</Limit>
     


Configuring mod_perl

mod_perl is another module for Apache. You can configure it either at compile-time, or as a DSO. With mod_perl it is possible to write Apache modules entirely in Perl, letting you easily do things that are more difficult or impossible in CGI programs. In addition, the persistent Perl interpreter embedded in the module saves the overhead of starting an external interpreter. Another important feature is code-caching: modules and scripts are loaded and compiled only once, and for the rest of the server's life they are served from the cache. Thus, the server spends its time only running already loaded and compiled code, which is very fast.

You have full access to the inner workings of the web server and can intervene at any stage of request-processing. This allows for customized processing of (to name just a few of the phases) URI->filename translation, authentication, response generation and logging. There is very little run-time overhead.

The standard Common Gateway Interface (CGI) within Apache can be replaced entirely with Perl code that handles the response generation phase of request processing. mod_perl includes two general purpose modules for this purpose: Apache::Registry, which can transparently run existing perl CGI scripts and Apache::PerlRun, which does a similar job, but allows you to run dirtier (to some extent) scripts.

You can configure your httpd server and handlers in Perl (using PerlSetVar, and <Perl> sections). You can also define your own configuration directives.

There are many ways to install mod_perl, e.g. as a DSO, either using APXS or not, from source or from RPM's. Most of the possible scenarios can be found in the Mod_perl Guide PerlRef01. As an example we describe a scenario for building Apache and mod_perl from source code.

You should have the Apache source code, the source code for mod_perl and have unpacked these in the same directory [9]. You'll need a recent version of perl installed on your system. To build the module, in most cases, these commands will suffice:

$ cd ${the-name-of-the-directory-with-the-sources-for-the-module}
$ perl Makefile.PL APACHE_SRC=../apache_x.x.x/src \
                DO_HTTPD=1 USE_APACI=1 EVERYTHING=1
$ make && make test && make install
     

After building the module, you should also build the Apache server. This can be done using the commands:

$ cd ${the-name-of-the-directory-with-the-sources-for-Apache}
$ make install
       

All that's left then is to add a few configuration lines to httpd.conf (the Apache configuration file) and start the server. Which lines you should add depends on the specific type of installation, but usually a few LoadModule and AddModule lines suffice.

As an example, these are the lines you would need to add to use mod_perl as a DSO:

LoadModule perl_module modules/libperl.so
AddModule mod_perl.c
PerlModule Apache::Registry 

Alias /perl/ /home/httpd/perl/ 
<Location /perl>
  SetHandler perl-script 
  PerlHandler Apache::Registry 
  Options +ExecCGI
  PerlSendHeader On 
</Location>

The first two lines will add the mod_perl module when Apache boots. On boot, the PerlModule directive ensures that the named Perl module is read in too. This usually is a Perl package file ending in .pm. The Alias keyword reroutes requests for URIs in the form http://www.host.com/perl/file.pl to the directory /home/httpd/perl. Next, we define settings for that location. By setting the SetHandler, all requests for a Perl file in the directory /home/httpd/perl now will be redirected to the perl-script handler, which is part of the Apache::Registry module. The next line simply allows execution of CGI scripts in the specified location. Any URI of the form http://www.host.com/perl/file.pl now will be compiled once and cached in memory. The memory image will be refreshed by recompiling the Perl routine whenever its source is updated on disk.

Configuring mod_php support

PHP is a server-side, cross-platform, HTML embedded scripting language. PHP started as a quick Perl hack written by Rasmus Lerdorf in late 1994. Over the next two to three years, it evolved into what we today know as PHP/FI 2.0. PHP/FI started to get a lot of users, but things didn't start flying until Zeev Suraski and Andi Gutmans suddenly came along with a new parser in the summer of 1997, leading to PHP 3.0. PHP 3.0 defined the syntax and semantics used in both versions 3 and 4.

PHP can be called from the CGI interface, but most common approach is to configure PHP in the Apache web server as a (dynamic DSO) module. To do this, you can either use pre-built modules extracted from RPM's or roll your own from the source code[10]. You need to configure the make-process first. To tell configure to build the module as a DSO, you need to tell it to use APXS:

./configure -with-apxs
     

.. or, in case you want to specify the location for the apxs binary:

./configure -with-apxs={path-to-apxs}/apxs
     

Next, you can compile PHP by running the make command. Once all the source files are successfully compiled, install PHP by using the make install command.

Before Apache can use PHP, it has to know about the PHP module and when to use it. The apxs program took care of telling Apache about the PHP module, so all that is left to do is tell Apache about .php files. File types are controlled in the httpd.conf file, and it usually includes lines about PHP that are commented out. You want to search for these lines and uncomment them:

Addtype application/x-httpd-php .php 
     

.. and restart Apache by issuing the apachectl restart command.

To test whether it actually works, create the following page:

<HTML>
<HEAD><TITLE>PHP Test </TITLE></HEAD>
<BODY>
<?phpinfo( ) ?>
</BODY>
</HTML>
      

Notice that PHP commands are contained by <? and ?> tags. Save the file as test.php in Apache's htdocs directory and aim your browser at http://localhost/test.php. A page should appear with the PHP logo and additional information about your PHP configuration.

Configuring Apache server options

The httpd.conf file contains a number of sections that allow you to configure the behavior of the Apache server. A number of keywords/sections are listed below.

MaxKeepAliveRequests

The maximum number of requests to allow during a persistent connection. Set to 0 to allow an unlimited amount.

StartServers

The number of servers to start initially.

MinSpareServers, MaxSpareServers

Used for server-pool size regulation. Rather than making you guess how many server processes you need, Apache dynamically adapts to the load it sees. That is, it tries to maintain enough server processes to handle the current load, plus a few spare servers to handle transient load spikes (e.g., multiple simultaneous requests from a single Netscape browser). It does this by periodically checking how many servers are waiting for a request. If there are fewer than MinSpareServers, it creates a new spare. If there are more than MaxSpareServers, some of the spares die off.

MaxClients

Limit on total number of servers running, i.e., limit on the number of clients that can simultaneously connect. If this limit is ever reached, clients will be locked out, so it should not be set too low. It is intended mainly as a brake to keep a runaway server from taking the system with it as it spirals down.

Note

In most RedHat derivates the Apache configuration is split into two subdirectories. The main configuration file httpd.conf is located in /etc/httpd/conf. The configuration of Apache modules is located in /etc/httpd/conf.d. Files in that directories with the suffix .conf are added to the Apache configuration during startup.



[9] The mod_perl module can be obtained at perl.apache.org, the source code for Apache at www.apache.org

[10] The source code for PHP4 can be obtained at www.php.net

Copyright Snow B.V. The Netherlands