SQUID and MRTG: to SNMP or not SNMP?

If you can't measure it, it isn't science.
-William Thomson, Lord Kelvin

Why monitor squid with MRTG

After all, there are a number of scripts which analyze squid logs and tell you what has been happening through the day. And real time analysis has a strong disadvantage: it either involves analyzing information on the cache server (which places further burden on a busy machine) or sending a lot of information to another machine - which places a considerable burden on busy network interfaces,

Certainly tools that analyze logs after the fact are an invaluable tool to anybody who has to manage a cache. However, these tools mostly present averages or means - they do not present a dynamic picture, In other words, if half your users are eating steak, and the other half are eating cabbages, a log analysis tool might make you think they are all eating sarma.

MRTG samples data every five minutes to present a dynamic view. This view can be used not only for end of the day analysis, but for active monitoring that can sometimes avert disaster.

An additional benefit of squid is that it can present the data it had gathered not only in a daily graph - one that shows activity in the last 36 hours - but also in a a weekly, monthly and yearly graph, thus making it much easier to see long term trends.

Example of a MRTG information page

Cache hit rate by volume

This is the toplevel cache for Slovenia, serving both for per-user HTTP access and as ICP parent for several smaller caches.

 The hit rate is calculated for the last 5 minutes (green) and the last hour (blue). Hit rate is calculated by volume.


The statistics were last updated Thursday, 18 February 1999 at 23:36 
`Daily' Graph (5 Minute Average)

Max5 minute 74.0 %  Average5 minute 32.0 %  Current5 minute 45.0 % 
Max60 minute 59.0 %  Average60 minute 34.0 %  Current60 minute 37.0 % 

`Weekly' Graph (30 Minute Average)
Max5 minute 54.0 %  Average5 minute 27.0 %  Current5 minute 35.0 % 
Max60 minute 60.0 %  Average60 minute 29.0 %  Current60 minute 36.0 % 

`Monthly' Graph (2 Hour Average)
Max5 minute 49.0 %  Average5 minute 26.0 %  Current5 minute 35.0 % 
Max60 minute 58.0 %  Average60 minute 28.0 %  Current60 minute 35.0 % 

`Yearly' Graph (1 Day Average)
Max5 minute 46.0 %  Average5 minute 28.0 %  Current5 minute 26.0 % 
Max60 minute 47.0 %  Average60 minute 28.0 %  Current60 minute 27.0 % 

GREEN ### %
BLUE ### %


2.5.2alpha-1998/02/06 Tobias Oetiker <oetiker@ee.ethz.ch> and Dave Rand <dlr@bungi.com>

There are several important details in the above page. One can see how hit rate changes throughout the day - it is higher during the peak working time and falls of toward zero when as cache becomes more idle. One can also see how the cache stopped serving pages at 21:30 hours, when the log files overfilled the disk.

The yearly graph shows the declining hit rate, brought on, perhaps, by the growth of the internet, and also the improvement brought by Squid 2.0.

MRTG was primarily meant to monitor traffic through routers, which is why it best supports receiving data through SNMP.

MRTG (version 2) is not the ideal tool: the storage model in version 2 can only use integers, and when monitoring a large number of variables (more than a few tens) puts a significant load on the machine. There are several ways of reducing MRTG's load: the packages Orca, Cricket, and the UseRRDTool patch. All three use RRDTool (where RRD stands for Round Robin database), but only the UseRRDTool patch uses standard MRTG configuration files. However, even with UseRRDTool you need to install a special CGI program which will generate your graphs on the fly.

Basics of SNMP

SNMP is short for Simple Network Management Protocol. It supports not only acquisition of data from network devices, but also their management. However, there don't appear to be any plans in the making to make squid be manageable through SNMP.Squid also does not utilize SNMP TRAP as a notification of catastrophic or other events.

SNMP is normally transported through UDP packets. That makes it a light protocol - there is no connection setup/teardown expense. The disadvantage is that on a busy network, packets may be lost and higher level software has to handle timeouts and retransmissions.

when data is prepared for presentation over SNMP, it is organised into a MIB (management information base) tree. The nodes of the tree are the available items of information. Each node can be labeled with a word, or with a number, and the full name of a SNMP variable is the sequence of nodes that are traversed between the root of the MIB and the node.

For instance, the Squid MIB can be addressed by it root, which is 1.3.6.1.4.1.3495.1. In symbolic names, that is iso.org.dod.internet.private.enterprises.nlanr.squid. Symbolic names are used only when interacting with people - the packet only contains the numeric MIB label.

SNMP has some drawbacks: There is very little concern for security in basic SNMP. Instead of a cryptographic handshake we have come to expect from the more modern protocols, SNMP uses a simple password scheme called "community string". Only request packets with the appropriate community string are accepted by the monitored device.

Needless to say, such security is broken the moment someone connects a "sniffer" to your network. SNMP v2 and SNMPv3 provide better security, but still not widely used. Therefore, most router jockeys prefer to rely on IP controls to prevent unauthorized monitoring of their routers. Squid also makes IP restrictions for SNMP access possible. Also, Squid from version 2.2 on will log failed SNMP attempts to cache.log

Squid data available over SNMP

The squid documentation, including the FAQ, which has a nice SNMP entry, does not list the variables that Squid makes available through SNMP.

The only reference to available SNMP variables is mib.txt, which the install places in squid's /etc directory. This is not the most readable document imaginable. The first few entries have nice descriptions. Later, however, descriptions aren't provided. Here is an example:

        cacheProtoAggregateStats OBJECT IDENTIFIER ::= { cacheProtoStats 1 }


        cacheClientHttpRequests  OBJECT-TYPE
                SYNTAX Counter32
                MAX-ACCESS read-only
                STATUS current
        ::= { cacheProtoAggregateStats 1 }

        cacheHttpHits OBJECT-TYPE
                SYNTAX Counter32
                MAX-ACCESS read-only
                STATUS current
        ::= { cacheProtoAggregateStats 2 }
Parsing the MIB file by hand is dull work and doesn't tell us what units are used. Sometimes the only way to find out is to start graphing the data then compare it with the same data from another source.

MRTG can now parse Squid's mib.txt (located in src/mib.txt) for you, so you only need to add the LoadMIBs: /path/to/mib.txt directive in your MRTG configuration file, and from then on you can use symbolic names for the variables you want to display with MRTG. Here is a short list of some of the available variables. (The list was taken from Andreas Pabst's contribution to MRTG, but the comments are mine:

All in all, while there have been some small changes in the data available from Squid over SNMP, some variables that didn't work in 2.1 work in 2.2. There don't seem to be any significant differences between 2.2stable5 and 2.3stable1. You can get SOME idea of what is going on in your cache just bt using SNMP, and it is easiest to set up. But if you are realy curious about what is happening, it is not enough.

Squid data NOT available over SNMP

In the effort to get a standard MIB defined that could be used for monitoring different caches, regardless of design, a number of interesting variables was left out of the Squid MIB.

Such things as number of ftpget processes are understandably missing, since ftpget is no longer external in squid 2. The object and volume hit rate can be calculated with some effort.

But there is still plenty of data available through other means that is not available through SNMP interface. This is probably intentional: if everything that is available through cachemgr pages were to be forced into cache MIB, nobody would want to implement it.

Getting data by parsing cachemgr reports

Cachemgr is just a formating tool for pages which can be requested from a special URL available from a running squid process.

The human readable pages are very informative:

Squid Object Cache: Version 2.1.PATCH2

 Start Time:   Wed, 17 Feb 1999 21:09:26 GMT
 Current Time: Fri, 19 Feb 1999 01:32:06 GMT


Connection information for squid:
        Number of clients accessing cache:      777
        Number of HTTP requests received:       312425
        Number of ICP messages received:        243757
        Number of ICP messages sent:    244696
        Number of queued ICP replies:   0
        Request failure ratio:   0.00%
        HTTP requests per minute:       183.5
        ICP messages per minute:        286.9
        Select loop called: 12194937 times, 8.377 ms avg
Cache information for squid:
        Request Hit Ratios:     5min: 64.2%, 60min: 46.2%
        Byte Hit Ratios:        5min: 35.9%, 60min: 27.9%
        Storage Swap size:      15917088 KB
        Storage Mem size:       44212 KB
        Storage LRU Expiration Age:      13.46 days
        Mean Object Size:       15.95 KB
        Requests given to unlinkd:      0
Median Service Times (seconds)  5 min    60 min:
        HTTP Requests (All):   0.05331  0.28853
        Cache Misses:          0.68577  0.58309
        Cache Hits:            0.01955  0.02899
        Near Hits:             0.37825  0.32154
        Not-Modified Replies:  0.01235  0.01847
        DNS Lookups:           0.06083  0.18639
        ICP Queries:           0.01494  0.01586
Resource usage for squid:
        UP Time:        102160.743 seconds
        CPU Time:       14389.014 seconds
        CPU Usage:      14.08%
        CPU Usage, 5 minute avg:        4.40%
        CPU Usage, 60 minute avg:       6.06%
        Maximum Resident Size: 0 KB
        Page faults with physical i/o: 148003
Memory accounted for:
        Total accounted:       124632 KB
File descriptor usage for squid:
        Maximum number of file descriptors:   4096
        Largest file desc currently in use:    120
        Number of file desc currently in use:   50
        Available number of file descriptors: 4046
        Files queued for open:                   0
        Reserved number of file descriptors:   100
        Disk files open:                         4
Internal Data Structures:
        998195 StoreEntries
          9155 StoreEntries with MemObjects
          9152 Hot Object Cache Items
        997915 Filemap bits set
        997911 on-disk objects
It turns out that cachemgr only does very minor reformatting. The page can requested by any client and even in a raw state has sufficient syntactic fluff. Thus a small program equipped with appropriate regular expression can reliably extract data.

I wrote a small tool which request a page, and then caches it so that various regular expressions can extract data from it. Once all the data has been collected, it is sent as a UDP packet to an MRTG machine.

Since writing regular expressions soon became tedious, I wrote another small tool, which requests a page and then turns it into a bunch of regular expressions. Simply pick out the right one, and paste it into the first tool.

Getting data by parsing logs

There are some data that can not be gleaned even from squid's comprehensive cachemgr pages. Such variables include server loadaverage, the total and swapped in size of the process, Also, watching Squid with an external process can sometimes provide data about what is happening even when squid has stopped responding. Truth be told, however, most of the data that could once only be obtained through parsing logs is now available through cachemgr pages or even through SNMP.

Other measurements

Jens Voeckler has done some remarkable things while measuring Germany's network of academic caches.

Also be sure to check out another Jens from Germany for some very nice web and cache analysis tools.

Please send questions, ideas, accolades :-) to Matija Grabnar

Last updated:Fri Feb 19 03:52:09 CET 1999