Key Knowledge Areas:
The following is a partial list of used files, terms and utilities:
Using the tools and knowledge presented in the previous two chapters, it should be possible to diagnose the usage of resources for specific components or processes. One of the tools mentioned was sar, which is capable of recording measurements over a longer period of time. Being able to use the recorded data for trend analysis is one of the advantages of using a tool that is able to log measurements.
One of the tools that can be used to monitor an IT infrastructure is collectd. collectd is a daemon which collects system performance statistics periodically and provides mechanisms to store the values in a variety of ways. It gathers statistics about the system it is running on and stores this information. Those statistics can then be used to find current performance bottlenecks (i.e. performance analysis) and predict future system load (i.e. capacity planning).
Key differentiators for collectd are:
It's written in C for performance and portability, allowing it to run on systems without scripting language or cron daemon, such as embedded systems
It includes optimizations and features to handle hundreds of thousands of data sets
It comes with over 90 plugins which range from standard cases to very specialized and advanced topics
It provides powerful networking features and is extensible in numerous ways
By analyzing and observing the data from measurements, over time it should be possible to predict the statistical growth of resource needs. We deliberately say statistical growth here, because there are many circumstances which can influence resource needs. The demand for fax machines and phone lines has suffered from the introduction of e-mail, for instance. But numerical or statistical based growth estimations also suffer from the lack of linearity: When expanding due to increased demand the expansion often incorporates a variety of services. The demand for these services usually doesn't grow at the same speed for all provided services. This means that measurement data should not just be analysed, but also evaluated regarding to judge its relevance.
The steps to predict future needs can be done as follows:
Decide what to measure.
Use the appropriate tools to measure and record relevant data to meat your goals.
Analyze the measurement results, starting with the biggest fluctuations.
Predict future needs based on the analysis.
When a resource cannot deliver to the request in an orderly fashion anymore, it is exhausted. The demand and delivery are not aligned anymore, and the availability of resources will become a problem. Resource Exhaustion can lead to a denial-of-service. Apart from disrupting the availability of a resource, devices which are configured to 'fail open' can be tricked by exhausting it's resources. Some switches fall back to forwarding all traffic to all ports when the ARP table becomes flooded, as an example.
Most of the time, a single resource which gets exhausted will be extractable from collected measurement data. This is what we call a bottleneck: a single point within the system narrows throughput and slows down everything below. It is important to have a clear understanding of the bigger picture here. Simply resolving a specific bottleneck will only shift the problem, if you increase the capacity of one component another component will become the limiting factor as soon as it hits it's capacity limit.
Therefore it is important to identify as many bottlenecks as possible right away during analysis.