You are on page 1of 35

Squid Configuration Basics

From Squid User's Guide

Jump to: navigation, search


For Squid, the default configuration file is probably right for 90% of installations - once you have
Squid running, you should change the configuration file one option at a time. Don't get over-
ambitious in your changes quite yet! Leave things like refresh rules until you have experimented
with the basic options - what port you want to accept your requests on, what user to run as, and
where to keep cached pages on your drives.
So that you can get Squid running, this chapter works through the basic Squid options, giving you
background information and introducing you to some of the basic concepts. In later chapters you'll
move on to more advanced topics.
The Squid config file is not arranged in the same order as this book. The config file also does not
progress from basic to advanced config options in any specific order, but instead consists of related
sections, with all hierarchy settings in a specific section of the file, all access controls in another and
so forth.
To make changes detailed in this chapter you are going to have to skip around in the config file a
bit. It's probably easiest to simply search for the options discussed in each subsection of this
chapter, but if you have some time it will be best if you read through the config file, so that you
have an idea of how sections fit together.
The chapter also points out options that may have to be changed on the other 10% of machines. If
you have a firewall, for example, you will almost certainly have to configure Squid differently to
someone that doesn't.

[edit] Version Control Systems


I recommend that you put all Squid configuration files and startup scripts under revision control. If
you are like me, you love to play with new software. You change an option, get the program to re-
read the configuration file, and see what difference it makes. By repeating this process, I learn what
each option does, and at the same time I gain experience, and discover why the program is written
the way it is. Quite often configuration files make no sense until you discover the overall structure
of the underlying program.
The best way for you to understand each of the options in the Squid config file (and to understand
Squid itself) is to experiment with the multitude of options. At some stage in the experimentation
stage, you will find that you break something. It's useful to be able to revert to a previous version
(or simply to be reminded what changes you have made).
Many readers will already have used a Revision Control System. The RCS system is included with
many Unix systems, and source is freely available. For the few that haven't used RCS, however, it's
worth including some pointers to some manual pages:
ci(1)
co(1)
rcs(1)
rcsdiff(1)
rlog(1)

One of the wonders of Unix is the ability to create scripts which reduce the number of commands
that you have to type to get something done. I have a short script on all the machines I maintain
called rvi. Using rvi instead of vi allows me to use one command to edit files under RCS (as
opposed to the customary four). Put this file somewhere in your path and make it executable chmod
+x rvi. You can then simply use a command like rvi squid.conf to edit files that are under revision
control. This is a lot quicker than running each of the co, rcsdiff and ci commands.
#!/bin/sh
co -l $1
$VISUAL $1
rcsdiff -u $1
ci -u $1

[edit] The Configuration File


All Squid configuration files are kept in the directory /usr/local/squid/etc. Though there is more
than one file in this directory, only one file is important to most administrators, the squid.conf file.
Though there are (as of this writing) 125 option tags in this file, you should only need to change
eight options to get Squid up and running. The other 117 options give you amazing flexibility, but
you can learn about them once you have Squid running, by playing with the options or by reading
the descriptions in chapter 10.
Squid assumes that you wish to use the default value if there is no occurrence of a tag in the
squid.conf file. Theoretically, you could even run Squid with a zero length configuration file.
The remainder of this chapter works through the options that you may need to change to get Squid
to run. Most people will not need to change all of these settings. You will need to change at least
one part of the configuration file though: the default squid.conf denies access to all browsers. If you
don't change this, Squid will not be very useful!

[edit] Setting Squid's HTTP Port


The first option in the squid.conf file sets the HTTP port(s) that Squid will listen to for incoming
requests.
Network services listen on particular ports. Ports below 1024 can only be used by the system
administrator, and are used by programs that provide basic Internet services: SMTP, POP, DNS and
HTTP (web). Ports above 1024 are used for untrusted services (where a service does not run as
administrator), and for transient connections, such as outgoing data requests.
Typically, web servers listen for incoming web requests (using the HyperText Transfer Protocol -
HTTP) on port 80.
Squid's default HTTP port is 3128. Many people run their cache servers on a port which is easier to
remember: something like 80 or 8080). If you choose a low-numbered port, you will have to start
Squid as root (otherwise you are considered untrusted, and you will not be able to start Squid).
Many ISPs use port 8080, making it an accepted pseudo-standard.
If you wish, you can use multiple ports appending a second port number to the http_port variable.
Here is an example:
http_port 3128 8080
It is very important to refer to your cache server with a generic DNS name. Simply because you
only have one server now does not mean that you should not plan for the future. It is a good idea to
setup a DNS hostname for your proxy server. Do this right away! A simple DNS entry can save
many hours further down the line. Configuring client machines to access the cache server by IP
address is asking for a long, painful transition down the road. Generally people add a hostname like
cache.mydomain.com to the DNS. Other people prefer the name proxy, and create a name like
proxy.mydomain.com.

[edit] Using Port 80


HTTP defines the format of both the request for information and the format of the server response.
The basic aspects of the protocol are quite straight forward: a client (such as your browser) connects
to port 80 and asks for the file by supplying the full path and filename that it wishes to download.
The client also specifies the version of the HTTP protocol it wishes to use for the retrieval.
With a proxy request the format is only a little different. The client specifies the whole URL instead
of just the path to the file. The proxy server then connects to the web server specified in the URL,
and sends a normal HTTP request for the page.
Since the format of proxy requests is so similar to a normal HTTP request, it is not especially
surprising that many web servers can function as proxy servers too. Changing a web server program
to function as a proxy normally involves comparatively small changes to the code, especially if the
code is written in a modular manner - as is the Apache web server. In many cases the resulting
server is not as fast, or as configurable, as a dedicated cache server can be.
The CERN web server httpd was the first widely available web proxy server. The whole WWW
system was initially created to give people easy access to CERN data, and CERN HTTPD was thus
the de-facto test-bed for new additions to the initial informal HTTP specification. Most (and
certainly at one stage all) of the early web sites ran the CERN server. Many system administrators
who wanted a proxy server simply used their standard CERN web server (listening on port 80) as
their proxy server, since it could function as one. It is easy for the web server to distinguish a web
site request from a normal web page request, since it simply has to check if the full URL is given
instead of simply a path name. Given the choice (even today) many system administrators would
choose port 80 as their proxy server port simply as 'port 80 is the standard port for web requests'.
There are, however, good reasons for you to choose a port other than 80.
Running both services on the same port meant that if the system administrator wanted to install a
different web server package (for extra features available in the new software) they would be
limited to software that could perform both as a web server and as a proxy. Similarly, if the same
sysadmin found that their web server's low-end proxy module could not handle the load of their
ever-expanding local client base, they would be restricted to a proxy server that could function as a
web server. The only other alternative is to re-configure all the clients, which normally involves
spending a few days apologizing to users and helping them through the steps involved in changing
over.
Microsoft use the Microsoft web server (IIS) as a basis for their proxy server component, and
Microsoft proxy thus only accepts incoming proxy request on port 80. If you are installing a Squid
system to replace either CERN, Apache or IIS running in both web-server and cache-server modes
on the same port, you will have to set http_port to 80. Squid is written only as a high-performance
proxy server, so there is no way for it to function as a web server, since Squid has no support for
reading files from a local disk, running CGI scripts and so forth. There is, however, a workaround.
If you have both services running on the same port, and you cannot change your client PCs, do not
despair. Squid can accept requests in web-server format and forward them to another server. If you
have only one machine, and you can get your web server software to accept incoming requests on a
non-default port (for example 81), Squid can be configured to forward incoming web requests to
that port. This is called accelerator mode (since its initial purpose was to speed up very slow web
servers). Squid effectively does some translation on the original request, and then simply acts as if
the request were a proxy request and connects to the host: the fact that it's not a remote host is
irrelevant. Accelerator mode is discussed in more detail in chapter 9. Until then, get Squid installed
and running on another port, and work your way through the first couple of chapters of this book,
until you have a working pilot-phase system. Once Squid is stable and tested you can move on to
changing web server settings. If you feel adventurous, however, you can skip there shortly!

[edit] Where to Store Cached Data


Cached Data has to be kept somewhere. In the section on hardware sizing, we discussed the size
and number of drives to use for caching. Squid cannot autodetect where to store this data, though,
so you need to let Squid know which directories it can use for data storage.
The cache_dir operator in the squid.conf file is used to configure specific storage areas. If you use
more than one disk for cached data, you may need more than one mount point (for example
/usr/local/squid/cache1 for the first disk, /usr/local/squid/cache2 for the second). Squid allows you
to have more than one cache_dir option in your config file.
Let's consider only one cache_dir entry in the meantime. Here I am using the default values from
the standard squid.conf.
cache_dir ufs /usr/local/squid/var/cache/ 100 16 256

The first option to the cache_dir tag sets the directory where data will be stored. The prefix value
simply has /cache/ tagged onto the end and it's used as the default directory. This directory is also
made by the make install command that we used earlier.
The next option to cache_dir is straight forward: it's a size value. Squid will store up to that amount
of data in that directory. The value is in megabytes, so of the cache store. The default is 100
megabytes.
The other two options are more complex: they set the number of subdirectories (first and second
tier) to create in this directory. Squid makes lots of directories and stores a few files in each of them
in an attempt to speed up disk access (finding the correct entry in a directory with one million files
in it is not efficient: it's better to split the files up into lots of smaller sets of files... don't worry too
much about this for the moment). I suggest that you use the default values for these options in the
mean time: if you have a very large cache store you may want to increase these values, but this is
covered in the section on

[edit] Email for the Cache Administrator


If Squid dies, an email is sent to the address specified with the cache_mgr tag. This address is also
appended to the end of error pages returned to users if, for example, the remote machine is
unreachable.

[edit] Effective User and Group ID


Squid can only bind to low numbered ports (such as port 80) if it is started as root. Squid is
normally started by your system's rc scripts when the machine boots. Since these scripts run as root,
Squid is started as root at bootup time.
Once Squid has been started, however, there is no need to run it as root. Good security practice is to
run programs as root only when it's absolutely necessary, and for this reason Squid changes user and
group ID's once it has bound to the incoming network port.
The cache_effective_user and cache_effective_group tags tell Squid what ID's to change to. The
Unix security system would be useless if it allowed all users to change their ID's at will, so Squid
only attempts to change ID's if the main program is started as root.
If you do not have root access to the machine, and are thus not starting Squid as root, you can
simply leave this option commented out. Squid will then run with whatever user ID starts the actual
Squid binary.
As discussed in chapter 2, this book assumes that you have created both a squid user and a squid
group on your cache machine. The above tags should thus both be set to "squid".

[edit] FTP login information


Squid can act as a proxy server for various Internet protocols. The most commonly used protocol is
HTTP, but the File Transfer Protocol (FTP) is still alive and well.
FTP was written for authenticated file transfer (it requires a username and password). To provide
public access, a special account is created: the anonymous user. When you log into an FTP server
you use this as your username. As a password you generally use your email address. Most browsers
these days automatically enter a useless email address.
It's polite to give an address that works, though. If one of your users abuses a site, it allows the site
admin get hold of you easily.
Squid allows you to set the email address that is used with the ftp_user tag. You should probably
create a squid@yourdomain.example email address specifically for people to contact you on.
There is another reason to enter a proper address here: some servers require a real email address.
For your proxy to log into these ftp servers you will have to enter a real email address here.

[edit] Access Control Lists and Access Control


Operators
Squid could not be used in an ISP environment without a sophisticated access control system.
Indeed, Squid should not be used in ANY environment without some kind of basic authentication
system. It is amazing how fast other Internet users will find out that they can relay requests through
your cache, and then proceed to do so.
Why? Sometimes to obfuscate their real identity, and other times since they have a fast line to you,
but a slow line to the remainder of the Internet.

[edit] Simple Access Control


In many cases only the most basic level of access control is needed. If you have a small network,
and do not wish to use things like user/password authentication or blocking by destination domain,
you may find that this small section is sufficient for all your access control setup. If not, you should
read chapter 7, where access control is discussed in detail.
The simplest way of restricting access is to only allow IPs that are on your network. If you wish to
implement different access control, it's suggested that you put this in place later, after Squid is
running. In the meantime, set it up, but only allow access from your PC's IP address.
Example access control entries are included in the default squid.conf. The included entries should
help you avoid some of the more obscure problems, such as bandwidth-chewing loops, cache
tunneling with SSL CONNECTs and other strange access problems. In chapter 7 we work through
the config file's default config options, since some of them are pretty complex.
Access control is done on a per-protocol basis: when Squid accepts an HTTP request, the list of
HTTP controls is checked. Similarly, when an ICP request is accepted, the ICP list is checked
before a reply is sent.
Assume that you have a list of IP addresses that are to have access to your cache. If you want them
to be able to access your cache with both HTTP and ICP, you would have to enter the list of IP
addresses twice: you would have lines something like this:
acl localnet src 192.168.1.0/255.255.255.0
..
http_access allow localnet
icp_access allow localnet

Rule sets like the above are great for small organisations: they are straightforward. Note that as
http_access and icp_access rules are processed in the order they appear in the file, you will need to
place the http_access and icp_access entries as is appropriate.
For large organizations, though, things are more convenient if you can create classes of users. You
can then allow or deny classes of users in more complex relationships. Let's look at an example like
this, where we duplicate the above example with classes of users:
Sure, it's more complex for this example. The benefits only become apparent if you have large
access lists, or when you want to integrate refresh-times (which control how long objects are kept)
and the sources of incoming requests. I am getting quite far ahead of myself, though, so let's skip
back.
We need some terminology to discuss access control lists, otherwise this could become a rather long
chapter. So: lines beginning with acl are (appropriately, I believe) acl lines. The lines that use these
acls (such as http_access and icp_access in the above example) are called acl-operators. An acl-
operator can either allow or deny a request.
So, to recap: acls are used to define classes. When Squid accepts a request it checks the list of acl-
operators specific to the type of request: an HTTP request causes the http_access lines to be
checked; an ICP request checks the icp_access lists.
Acl-operators are checked in the order that they occur in the file (ie from top to bottom). The first
acl-operator line that matches causes Squid to drop out of the acl list. Squid will not check through
all acl-operators if the first denies the request.
In the previous example, we used a src acl: this checks that the source of the request is within the
given IP range. The src acl-type accepts IP address lists in many formats, though we used the
subnet/netmask in the earlier example. CIDR (Classless Internet Domain Routing) notation can also
be used here. Here is an example of the same address range in either notation:
CIDR: 192.168.1.0/24
Subnet/Netmask (Dot Notation): 192.168.1.0/255.255.255.0

Access control lists inherit permissions when there is no matching acl If all acl-operators in the file
are checked, and no match is found, the last acl-operator checked determines whether the request is
allowed or denied. This can be confusing, so it's normally a good idea to place a final catch-all acl-
operator at the end of the list. The simplest way to create such an operator is to create an acl that
matches any IP address. This is done with a src acl with a netmask of all 0's. When the netmask
arithmetic is done, Squid will find that any IP matches this acl.
Your cache server may well be on the network placed in the relevant allow lists on your cache, and
if you were thus to run the client on the cache machine (as opposed to another machine somewhere
on your network) the above acl and http_access rules would allow you to test the cache. In many
cases, however, a program running on the cache server will end up connecting to (and from) the
address '127.0.0.1' (also known as localhost). Your cache should thus allow requests to come from
the address 127.0.0.1/255.255.255.255. In the below example we don't allow icp requests from the
localhost address, since there is no reason to run two caches on the same machine.
The squid.conf file that comes with Squid includes acls that deny all HTTP requests. To use your
cache, you need to explicitly allow incoming requests from the appropriate range. The squid.conf
file includes text that reads:
#
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
#

To allow your client machines access, you need to add rules similar to the below in this space. The
default access-control rules stop people exploiting your cache, it's best to leave them in. by ace

[edit] Ensuring Direct Access to Internal Machines


Acl-operator lines are not only used for authentication. In an earlier section we discussed
communication with other cache servers. Acl lines are used to ensure that requests for specific
URLs are handled by your cache, not passed on to another (further away) cache.
If you don't have a parent cache (a firewall, or you have a parent ISP cache) you can probably skip
this section.
Let's assume that you connect to your ISP's cache server as a parent. A client machine (on your local
network) connects to your cache and requests http://www.yourdomain.example/. Your cache server
will look in the local cache store. If the page is not there, Squid will connect to its configured parent
(your ISP's cache: across your serial link), and request the page from there. The problem, though, is
that there is no need to connect across your internet line: the web server is sitting a few feet from
your cache in the machine room.
Squid cannot know that it's being very inefficient unless you give it a list of sites that are "near by".
This is not the only way around this problem though: your browser could be configure to ignore the
cache for certain IPs and domains, and the request will never reach the cache in the first place.
Browser config is covered in Chapter 5, but in the meantime here is some info on how to configure
Squid to communicate directly with internal machines.
The acl-operators always_direct and never_direct determine whether to pass the connection to a
parent or to proceed directly.
The following is a set of operators are based on the final configuration created in the previous
section, but using never_direct and always_direct operators. It is assumed that all servers that you
wish to connect to directly are in the address ranges specified in with the my-iplist directives. In
some cases you may run a web server on the same machine as the cache server, and the localhost
acl is thus also considered local.
The always_direct and never_direct tags are covered in more detail in Chapter 7, where we cover
hierarchies in detail.

Squid always attempts to cache pages. If you have a large Intranet system, it's a waste of cache store
disk space to cache your Intranet. Controlling which URLs and IP ranges not to cache are covered
in detail in chapter 6, using the no_cache acl operator.
[edit] Communicating with other proxy
servers
Squid supports the concept of a hierarchy of proxies. If your proxy does not have an object on disk,
its default action is to connect to the origin web server and retrieve the page. In a hierarchy, your
proxy can communicate with other proxies (in the hope that one of these servers will have the
relevant page). You will, obviously, only peer with servers that are 'close' to you, otherwise you
would end up slowing down access. If access to the origin server is faster than access to
neighboring cache servers it is not a good idea to get the page from the slower link!
Having the ability to treat other caches as siblings is very useful in some interactions. For example:
if you often do business with another company, and have a permanent link to their premises, you
can configure your cache to communicate with their cache. This will reduce overall latency: it's
almost certainly faster to get the page from them than from the other side of the country.
When querying more than one cache, Squid does not query each in turn, and wait for a reply from
the first before querying the second (since this would create a linear slowdown as you add more
siblings, and if the first server stops responding, you would slow down all incoming requests).
Squid thus sends all ICP queries together - without waiting for replies. Squid then puts the client's
request on hold until the first positive reply from a sibling cache is received, and will retrieve the
object from the fastest-replying cache server. Since the earliest returning reply packet is usually on
the fastest link (and from the least loaded sibling server), your server gets the page fast.
Squid will always get the page from the fastest-responding cache - be it a parent or a sibling.
The cache_peer option allows you to specify proxy servers that your server is to communicate with.
The first line of the following example configures Squid to query the cache machine
cache.myparent.example as a parent. Squid will communicate with the parent on HTTP port 3128,
and will use ICP to query the server using port 3130. Configuring Squid to query more than one
server is easy: simply add another cache_peer line. The second line configures
cache.sibling.example as a sibling, listening for HTTP request on port 8080 and ICP queries on port
3130.
cache_peer cache.myparent.example parent 3128 3130
cache_peer cache.sibling.example sibling 8080 3130

If you do not wish to query any other caches, simply leave all cache_peer lines commented out: the
default is to talk directly to origin servers.
Cache peering and hierarchy interactions are discussed in quite some detail in this book. In some
cases hierarchy setups are the most difficult part of your cache setup process (especially in a
distributed environment like a nationwide ISP). In depth discussion of hierarchies is beyond the
scope of this chapter, so much more information is given in chapter 8. There are cases, where you
need at least one hierarchy line to get Squid to work at all. This section covers the basics, just for
those setups.
You only need to read this material if one of the following scenarios applies to you:
• You have to use your Internet Service Provider's cache.
• You have a firewall.

[edit] Your ISP's cache


If you have to use your Internet Service Provider's cache, you will have to configure Squid to query
that machine as a parent. Configuring their cache as a sibling would probably return error pages for
every URL that they do not already have in their cache.
Squid will attempt to contact parent caches with ICP for each request. This is essentially a ping. If
there is no response to this request, Squid will attempt to go direct to the origin server. since (in this
case, at least) you cannot bypass your ISP's cache, you may want to reduce the latency added by
this extra query. To do this, place the default and no-query keywords at the end of your cache_peer
line:
cache_peer cache.myisp.example parent 3128 3130 default no-query

The default option essentially tells Squid "Go through this cache for all requests. If it's down, return
an error message to the client: you cannot go direct".
The no-query option gets Squid to ignore the given ICP port (leaving the port number out will
return an error), and never to attempt to query the cache with ICP.

[edit] Firewall Interactions


Firewalls can make cache configuration hairy. Inter-cache protocols generally use packets which
firewalls inherently distrust. Most caches (Squid included) use ICP, which is a layer on top of UDP.
UDP is difficult to make secure, and firewall administrators generally disable it if at all possible.
It's suggested that you place your cache server on your DMZ (if you have one). There are a few
advantages to this:
• Your cache server is kept secure.
• The firewall can be configured to hand off requests to the cache server, assuming it is
capable.
• You will be able to peer with other, outside, caches (like your ISP's), since DMZ networks
generally have less rigid rule sets.
The remainder of this section should help you getting Squid and your firewall to co-operate. A few
cases are covered for each type of firewall: the cache inside the firewall; the cache outside the
firewall; and, finally, on the DMZ.

[edit] Proxying Firewalls


The vast majority of firewalls know nothing about ICP. If, on the other hand, your firewall does not
support HTTP, it's a good time to have a serious talk to the buyer that had an all-expenses-paid
weekend on the firewall supplier. Configuring the firewall to understand ICP is likely to be painful,
but HTTP should be easy.
If you are using a proxy-level firewall, your client machines are probably configured to use the
firewall's internal IP address as their proxy server. Your firewall could also be running in
transparent mode, where it automatically picks up outgoing web requests. If you have a fair number
of client machines, you may not relish the idea of reconfiguring all of them. If you fall into this
category, you may wish to put Squid on the outside (or on the DMZ) and configure the firewall to
pass requests to the cache, rather than reconfiguring all client machines.

[edit] Inside
The cache is considered a trusted host, and is protected by the firewall. You will configure client
machines to use the cache server in their browser proxy settings, and when a request is made, the
cache server will pass the outgoing request to a firewall, treating the firewall as a parent proxy
server. The firewall will then connect to the destination server. If you have a large number of clients
configured to use the firewall as their proxy server, you could get the firewall to hand-off incoming
HTTP requests back into the network, to the cache server. This is less efficient though, since the
cache will then have to re-pass these requests through the firewall to get to the outside, using the
parent option to cache_peer. Since the latter involves traffic passing through the firewall twice, your
load is very likely to increase. You should also beware of loops, with the cache server parenting to
the firewall and the firewall handing-off the cache's request back to the cache!
As described in chapter 1, Squid will also send ICP queries to parents. Firewalls don't care for UDP
packets, and normally log (and then discard) such packets.
When Squid does not receive a response from a configured parent, it will mark the parent as down,
and proceed to go directly.
Whenever Squid is setup to use a parent that does not support ICP, the cache_peer line should
include the "default" and "no-query" options. These options stop Squid from attempting to go direct
when all caches are considered down, and specify that Squid is not to send ICP requests to that
parent.
Here is an example config entry:
cache_peer inside.fw.address.domain parent 3128 3130 default no-query

[edit] Outside
There are only two major reasons for you to put your cache outside the firewall:
One: Although squid can be configured to do authentication, this can lead to the duplication of
effort (you will encounter the "add new staff to 500 servers" syndrome). If you want to continue to
authenticate users on the firewall, you will have to put your cache on the outside or on the DMZ.
The firewall will thus accept requests from clients, authenticate them, and then pass them on to the
cache server.
Two: Communicating with cache hierarchies is easy. The cache server can communicate with other
systems using any protocol. Sibling caches, for example, are difficult to contact through a proxying
firewall.
You can only place your cache outside if your firewall supports hand-offs. Browsers inside will
connect to the firewall and request a URL, and the firewall will connect to the outside cache and
request the page.
If you place your cache outside your firewall, you may find that your client PCs have problems
connecting to internal web servers (your intranet, for example, may be unreachable). The problem is
that the cache is unable to connect back through to your internal network (which is actually a good
thing: don't change that). The best thing to do here is to add exclusions to your browser settings: this
is described in Chapter 5 - you should specifically have a look at the section on browser autoconfig.
In the meantime, let's just get Squid going, and we will configure browsers once you have a cache
to talk to.
Since the cache is not protected by the firewall, it must be very carefully configured - it must only
accept requests from the firewall, and must not run any strange services. If possible, you should
disable telnet, and use something like SSH (Secure SHell) instead. The access control lists (which
you will setup shortly) must only allow the firewall, otherwise people will be able to relay their
requests through your cache, using your bandwidth.
If you place the cache outside the firewall, your client PCs will be configured to use the firewall as
their proxy server (this is probably the case already). The firewall must be configured to hand-off
client HTTP requests to the cache server. The cache must be configured to only allow HTTP
requests when from the firewall's outside IP address. If not configured this way, other Internet users
could use your cache server as a relay, using your bandwidth and hardware resources for
illegitimate (and possibly illegal) purposes.
With your cache server on the outside network, you should treat the machine as a completely
untrusted host, lest a cracker find a hole somewhere on the system. It is recommended that you
place the cache server on a dedicated firewall network card, or on a switched ethernet port. This
way, if your cache server were to be cracked, the cracker would only be able to read passing HTTP
data. Since the majority of sensitive information is sent via email, this would reduce the potential
for sensitive data loss.
Since your cache server only accepts requests from the firewall, there is no cache_peer line needed
in the squid.conf. If you have to talk to your ISP's cache you will, of course, need one: see the
section on this a bit further back.

[edit] DMZ
The best place for a cache is your DMZ.
If you are concerned with the security of your cache server, and want to be able to communicate
with outside cache servers (using ICP), you may want to put your cache on the DMZ.
With Squid in your DMZ, internal client PCs are setup to proxy to the firewall. The firewall is then
responsible for handing-off these HTTP requests to the cache server (so the firewall in fact treats the
cache server as a parent).
Since your cache server is (essentially) on the outside of the firewall, the cache doesn't need to treat
the firewall as a parent or sibling: it only accepts requests from the firewall: it never passes them to
the firewall.
If your cache is outside your firewall, you will need to configure your client PCs not to use the
firewall as a proxy server for internal hosts. This is quite easy, and is discussed in the chapter on
browser configuration.
Since the firewall is acting as a filter between your cache and the outside world, you are going to
have to open up some ports on the firewall. The cache will need to be able to connect to port 80 on
any machine on the outside world. Since some valid web servers will run on ports other than 80,
you should consider allowing connections to any port from the cache server. In short, allow
connections to:
• Port 80 (for normal HTTP requests)
• Port 443 (for HTTPS requests)
• Ports higher than 1024 (site search engines often use high-numbered ports)
If you are going to communicate with a cache server outside the firewall, you will need even more
ports opened. If you are going to communicate with ICP, you will need to allow UDP traffic from
and to your cache machine on port 3130. You may find that the cache server that you are peering
with uses different ports for reply packets. It's probably a bad idea to open all UDP traffic, though.

[edit] Packet Filtering firewalls


Squid will normally live on the inside of your packet-filtering firewall. If you have a DMZ, it may
be best to put your cache on this network, as you may want to allow UDP traffic to and from the
cache server (to communicate with other caches).
To configure your firewall correctly, you should make the minimum number of holes in your filter
set. In the remainder of this section we assume that your internal machines can connect to the cache
server unimpeded. If your cache is on the DMZ (or outside the firewall altogether) you will need to
allow TCP connections from your internal network (on a random source port) to the HTTP port that
Squid will be accepting requests on (this is the port that you set a bit earlier, in the Setting Squid's
HTTP Port section of this chapter).
First, let's consider the firewall setup when you do not query any outside caches. On accepting a
request, Squid will attempt to connect to a machine on the Internet at large. Almost always, the
destination port will be the default HTTP port, port 80. A few percent of the time, however, the
request will be destined for a high-numbered port (any port number higher than 1023 is a high-
numbered port). Squid always sources TCP requests from a high-numbered port, so you will thus
need to allow TCP requests (all HTTP is TCP-based) from a random high-numbered port to both
port 80 and any high-numbered port.
There is another low-numbered port that you will probably need to open. The HTTPS port (used for
secure Internet transactions) is normally listening on TCP port 443, so this should also be opened.
In the second situation, let's look at cache-peering. If you are planning to interact with other caches,
you will need to open a few more ports. First, let's look at ICP. As mentioned previously, ICP is
UDP-based. Almost all ICP-compliant caches listen for ICP requests on UDP port 3130. Squid will
always source requests from port 3130 too, though other ICP-compliant caches may source their
requests from a different port.
It's probably not a good idea to allow these UDP packets no matter what source address they come
from. Your filter should probably specify the IP addresses for each of the caches that you wish to
peer from, rather than allowing UDP packets from any source address. That should be it: you should
now be able to save the config file, and get ready to start the Squid program.
<< Prev (Installing Squid) | Next >> (Starting Squid) >>
Retrieved from "http://www.deckle.co.za/squid-users-guide/Squid_Configuration_Basics"
Starting Squid
From Squid User's Guide

Jump to: navigation, search

[edit] Running Squid


Squid should now be configured, and the directories should have the correct permissions. We
should now be able to start Squid, and you can try and access the cache with a web browser. Squid
is normally run by starting the RunCache script. RunCache (as mentioned ealier) restarts Squid if it
dies for some reason, but at this stage we are merely testing that it will run properly: we can add it
to startup scripts at a later stage.
Programs which handle network requests (such as inetd and sendmail) normally run in the
background. They are run at startup, and log any messages to a file (instead of printing it to a screen
or terminal, as most user-level programs do.) These programs are often referred to as daemon
programs. Squid is such a program: when you run the squid binary, you should be immediately
returned to the command line. While it looks as if the program ran and did nothing, it's actually
sitting in the background waiting for incoming requests. We want to be able to see that Squid's
actually doing something useful, so we increase the debug level (using -d 1) and tell it not to
dissapear into the background (using -N.) If your machine is not connected to the Internet (you are
doing a trial squid-install on your home machine, for example) you should use the -D flag too, since
Squid tries to do DNS lookups for a few common domains, and dies with an error if it is not able to
resolve them.
The following output is that printed by a default install of Squid:
cache1:~ # /usr/local/squid/sbin/squid -N -d 1 -D

Squid reads the config file, and changes user-id's here:


1999/06/12 19:16:20| Starting Squid Cache version 2.2.DEVEL3 for i586-pc-linux-
gnu...
1999/06/12 19:16:20| Process ID 4121

Each concurrent incoming request uses at least one filedescriptor. 256 filedescriptors is only enough
for a small, lightly loaded cache server, see Chapter 12 for more details. Most of the following is
diagnostic:
1999/06/12 19:16:20| With 256 file descriptors available
1999/06/12 19:16:20| helperOpenServers: Starting 5 'dnsserver' processes
1999/06/12 19:16:20| Unlinkd pipe opened on FD 13
1999/06/12 19:16:20| Swap maxSize 10240 KB, estimated 787 objects
1999/06/12 19:16:20| Target number of buckets: 15
1999/06/12 19:16:20| Using 8192 Store buckets, replacement runs every 10 seconds
1999/06/12 19:16:20| Max Mem size: 8192 KB
1999/06/12 19:16:20| Max Swap size: 10240 KB
1999/06/12 19:16:20| Rebuilding storage in Cache Dir #0 (DIRTY)

When you connect to an ftp server without a cache, your browser chooses icons to match the files
based on their filenames. When you connect through a cache server, it assumes that the page
returned will be in html form, and will include tags to load any images so that the directory listing
looks normal. Squid adds these tags, and has a collection of icons that it refers clients to. These
icons are stored in /usr/local/squid/etc/icons/. If Squid has permission problems here, you need to
make sure that these files are owned by the appropriate users (in the previous section we set
permissions on the files in this directory.)
1999/06/12 19:16:20| Loaded Icons.

The next few lines are the most important. Once you see the Ready to serve requests line, you
should be able to start using the cache server. The HTTP port is where Squid is waiting for browser
connections, and should be the same as whatever we set it to in the previous chapter. The ICP port
should be 3130, the default, and if you have included other protocols (such as HTCP) you should
see them here. If you see permission denied errors here, it's possible that you are trying to bind to a
low-numbered port (like 80) as a normal user. Try run the startup command as root, or (if you don't
have root access on the machine) choose a high-numbered port. Another common error message at
this stage is Address already in use. This occurs when another process is already listening to the
given port. This could be because Squid is already started (perhaps you are upgrading from an older
version which is being restarted by the RunCache script) or you have some other process listening
on the same port (such as a web server.)
1999/06/12 19:16:20| Accepting HTTP connections on port 3128, FD 35.
1999/06/12 19:16:20| Accepting ICP messages on port 3130, FD 36.
1999/06/12 19:16:20| Accepting HTCP messages on port 4827, FD 37.
1999/06/12 19:16:20| Ready to serve requests.

Once Squid is up-and-running, it reads the cache-store. Since we are starting Squid for the first
time, you should see only zeros for all the numbers below:
1999/06/12 19:16:20| storeRebuildFromDirectory: DIR #0 done!
1999/06/12 19:16:25| Finished rebuilding storage disk.
1999/06/12 19:16:25| 0 Entries read from previous logfile.
1999/06/12 19:16:25| 0 Entries scanned from swap files.
1999/06/12 19:16:25| 0 Invalid entries.
1999/06/12 19:16:25| 0 With invalid flags.
1999/06/12 19:16:25| 0 Objects loaded.
1999/06/12 19:16:25| 0 Objects expired.
1999/06/12 19:16:25| 0 Objects cancelled.
1999/06/12 19:16:25| 0 Duplicate URLs purged.
1999/06/12 19:16:25| 0 Swapfile clashes avoided.
1999/06/12 19:16:25| Took 5 seconds ( 0.0 objects/sec).
1999/06/12 19:16:25| Beginning Validation Procedure
1999/06/12 19:16:26| storeLateRelease: released 0 objects
1999/06/12 19:16:27| Completed Validation Procedure
1999/06/12 19:16:27| Validated 0 Entries
1999/06/12 19:16:27| store_swap_size = 21k

[edit] Testing Squid


If all has gone well, we can begin to test the cache. True browser access is only covered in the next
chapter, and there is a whole chapter devoted to configuring your browser. Until then, testing is
done with the client program, which is included with the Squid source, and is in the
/usr/local/squid/bin directory.
The client program connects to a cache and request a page, and prints out useful timing information.
Since client is available on all systems that Squid runs on, and has the same interface on all of them,
we use it for the initial testing.
At this stage Squid should be in the foreground, logging everything to your terminal. Since client is
a unix program, you need access to a command prompt to run it. At this stage it's probably easiest to
simply start another session (this way you can see if errors are printed in the main window).
The client program is compiled to connect to localhost on port 3128 (you can override these
defaults from the command line, see the output of client -h for more details.)
If you are running client on the cache server, and are using port 3128 for incoming requests, you
should be able to type a command like this, and the client program will retrieve the page through
the cache server:
client http://squid.nlanr.net/

If your cache is running on a different machine you will have to use the -h and -p options. The
following command will connect to the machine cache.qualica.comf on port 8080 and retrieve the
above web page.
The client program can also be used to access web sites directly. As you may remember from
reading Chapter 2, the protocol that clients use to access pages through a cache is part of the HTTP
specification. The client program can be used to send both "normal" and "cache" HTTP requests. To
check that your cache machine can actually connect to the outside world, it's a good idea to test
access to an outside web server.
The next example will retrieve the page at http://www.qualica.com/, and send the html contents of
the page to your terminal.
If you have a firewall between you and the internet, the request may not work, since the firewall
may require authentication (or, if it's a proxy-level firewall and is not doing transparent proxying of
the data, you may explicitly have to tell client to connect to the machine.) To test requests through
the firewall, look at the next section.
A note about the syntax of the next request: you are telling client to connect directly to the remote
site, and request the page /. With a request through a cache server, you connect to the cache (as you
would expect) and request a whole url instead of just the path to a file. In essence, both normal-
HTTP and cache-HTTP requests are identical; one just happens to refer to a whole URL, the other
to a file.

Client can also print out timing information for the download of a page. In this mode, the contents
of the page aren't printed: only the timing information is. The zero in the below example indicates
that Squid is to retrieve the page until interrupted (with Control-C or Break). If you want to retrieve
the page a limited number of times, simply replace the zero with a number.

[edit] Testing a Cache or Proxy Server with Client


Now that you have client working, you
If the request through the cache returned the same page as you retrieved with direct access (you
didn't receive an error message from Squid), Squid should be up and running. Congratulations! If
things aren't going so well for you, you will have received an error message here. Normally, this is
because of the acls described in the previous chapter. First, you should have a look at the terminal
where you are running Squid (Or, if you are skipping ahead and have put Squid in the background,
in the /usr/local/squid/logs/cache.log file.) If Squid encountered some sort of problem, there should
be an error or warning in this file. If there are no messages here, you should look at the
/usr/local/squid/logs/access.log file next. We haven't covered the details of this file yet, but they are
covered in the next section of this chapter. First, though, let's see if your cache can process requests
to internal servers. There are many cases where a request will work to internal servers but not to
external machines.

[edit] Testing Intranet Access


If you have a proxy-based firewall, Squid should be configured to pass outgoing requests to the
proxy running on the firewall. This quite often presents a problem when an internal client is
attempting to connect to an internal (Intranet) server, as discussed in section 2.2.5.2. To ensure that
the acl-operator lists created in section 2.2.5.2 are working, you should use client to attempt to
connect to a machine on the local network through the cache.
If you didn't get an error message from a command like the above, access to local servers should be
working. It is possible, however, that the connection could be being passed from the local cache to
the parent (across a serial line), and the parent could be connecting back into the local network,
slowing the connection enormously. The only way to ensure that the connection is not passing
through your parent is to check the access logs, and see which server the connection is being passed
to.

[edit] Access.log basics


The access.log file logs all incoming requests. chapter 11 covers the fields in the access.log in
detail. The most important fields are the URL (field 7), and hierarchy access type (field 9) fields.
Note that a "-" indicates that there is no data for that field.
The following example access.log entries indicate the changes in log output when connecting to
another server, without a cache, with a single parent, and with multiple parents.
Though fields are seperated by spaces, fields can contain sub-fields, where a "/" indicates the split.
When connecting directly to a destination server, field 9 contains two subfields - the key word
"DIRECT", followed by the name of the server that it is connecting to. Access to local servers (on
your network) should always be DIRECT, even if you have a firewall, as discussed in section 3.1.2.
The acl operator always_direct controls this behaviour.
When you have configured only one parent cache, the hierarchy access type indicates this, and
includes the name of that cache.
There are many more types that can appear in the hierarchy access information field, but these are
covered in chapter 11.
Another useful field is the 'Log Tag' field, field four. In the following example this is the field
"TCP_MISS/200".
A MISS indicates that the request was not already stored in the cache (or that the page contained
headers indicating that the page was not to be cached). A HIT would indicate that the page was
already stored in the cache. In the latter case the request time for a remote page should be
substantially less than the first occurence in the logs.
The time that Squid took to service the request is the second field. This value is in milliseconds.
This value should approach that returned by examining a client request, but given operating system
buffering there is likely to be a discrepancy.
The fifth field is the size of the page returned to the client. Note that an aborted request can end up
downloading more than this from the origin server if the quick_abort feature set is turned on in the
Squid config file.
Here is an example request direct from the origin server:
If we use client to fetch the page a short time later, a HIT is returned, and the time is reduced
hugely.
Some of you will have noticed that the size of the hit has increased slightly. If you have checked the
size of a request from the origin server and compared it to that of the same page through the cache,
you will also note that the size of the returned data has increased very slightly. Extra headers are
added to pages passing through the cache, indicating which peer the page was returned from (if
applicable), age information and other information. Clients never see this information, but it can be
useful for debugging.
Since Squid 1.2 has support for HTTP/1.1, extra features can be used by clients accessing a copy of
a page that Squid already has. Certain extra headers are included into the HTTP headers returned in
HITS, indicating support for features which are not available to clients when returning MISSes. In
the above example Squid has included a header in the page indicating that range-request are
supported.
If Squid is performing correctly, you should shut Squid down and add it to your startup files.
Since Squid maintains an in-memory index of all objects in the cache, a kill -9 could cause
corruption, and should never be used. The correct way to shutdown Squid is to use the command:
Squid command-line options are covered in chapter 10.

[edit] Addition to Startup Files


The location of startup files vary from system to system. The location and naming scheme of these
files is beyond the scope of this book.
If you already have a local startup file, it's a pretty good idea to simply add the RunCache program
to that file. Note that you should place RunCache in the background on startup, which is normally
done by placing an '&' after the command.
The RunCache program attempts to restart Squid if it dies for some reason, and logs basic Squid
debug output both to the file "/usr/local/squid/squid.out" and to syslog.

[edit] Windows
In NT-based Windows systems, Squid NT can be installed as a native service. Simply unzip in the
root of C: and run c:\squid\sbin\squid -i. Rename and edit the files in c:\squid\etc and run net start
squid or start squid via services.msc. Also, make sure to create c:\squid\var\cache and run squid -z
to create swap directories (or you might spend a long time trying to figure out the cryptic "abnormal
program termination" message like I did! :) )

Access Control and Access Control Operators


From Squid User's Guide

Jump to: navigation, search


Access control lists (acls) are often the most difficult part of the configuration of a Squid cache: the
layout and concept is not immediately obvious to most people. Hang on to your hat!
Unless Chapter 4 is still fresh in your mind, you may wish to skip back and review the access
control section of that chapter before you continue. This chapter assumes that you understood the
difference between an acl and an acl-operator.

[edit] Uses of ACLs


The primary use of the acl system is to implement simple access control: to stop other people using
your cache infrastructure. (There are other uses of acls, described later in this chapter; in the
meantime we are going to discuss only the access control function of acls.) Most people implement
only very basic access control, denying access to people that are not on their network. Squid's
access system is incredibly flexible, but 99% of administrators only use the most basic elements. In
this chapter some examples of the less common uses of acls are covered: hopefully you will
discover some Squid feature which suits your organization - and which you didn't think was part of
Squid before.

[edit] Access Classes and Operators


There are two elements to access control: classes and operators. Classes are defined with the acl
squid.conf tag, while the names of the operators vary: the most common operator used is
http_access.
Let's work through the below example line-by-line. Here, a systems administrator is in the process
of installing a cache, and doesn't want other staff to access it while it's being installed, since it's
likely to ping-pong up and down during the installation. Once the administrator is happy with the
config, the whole network will be allowed access. The admin's PC is at the IP 10.0.0.3.
If the admin connects to the cache from the PC, Squid does the following:
• Accepts the (HTTP) connection and reads the request
• Checks the line that reads http_access allow myIP.
• Since your IP address matches the IP defined in the myIP acl, access is allowed. Remember
that Squid drops out of the operator list on the first match.
If you connect from a different PC (on the 10.0.*.* network) things are very similar:
• Accepts the connection and reads the request
• The source of the connection doesn't match the myIP acl, so the next http_access line is
checked.
• The myNet acl matches the source of the connection, so access is denied. An error page is
returned to the user instead of the requested page.
If someone reaches your cache from another netblock (from, say, 192.168.*.*), the above access list
will not block access. The reason for this is quite complicated. If Squid works through a set of acl-
operators and finds no match, it defaults to using the opposite of the last match (if the previous
operator is an allow, the default is to deny; if it's a deny, the default is to allow). This seems a bit
strange at first, but let's look at an example where this behaviour is used: it's more sensible than it
seems.
The following acl example is nice and simple: it's something a first-time cache admin could create.
A config file with no access lists will allow cache access without any restrictions. An administrator
using the above access lists obviously wishes to allow only his network access to the cache. Given
the Squid behavior of inverting the last decision, we have an invisible line reading
http_access deny all

Inverting the last decision is a simple (if not immediately obvious) solution to one of the most
common acl mistakes: not adding a final deny all to the end of your acl list.
With this new knowledge, have a look at the first example in this chapter: you will see why I said
not to use it in your configs. Given that the last operator denies the local network, local people will
not be able to access the cache. The remainder of the Internet, however, will! As discussed in
Chapter 4, the simplest way of creating a catch-all acl is to match requests when they come from
any IP address. When programs do netmask arithmetic a subnet of all zeros will match any IP
address. A corrected version of the first example dispenses with the myNet acl.
Once the cache is considered stable and is moved into production, the config would change.
http_access lines do add a very small amount of overhead, but that's not the only reason to have
simple access rulesets: the fewer rulesets, the easier your setup is to understand. The below example
includes a deny all rule although it doesn't really need one: you may know of the automatic
inversion of the last rule, but someone else working on the cache may not.
You should always end your access lists with an explicit deny. In Squid-2.1 the default config file
does this for you when you insert your HTTP acl operators in the appropriate place.

[edit] Acl lines


The Examples so far have given you an idea of an acl line's layout. Their layout can be symbolized
as follows (? Check! ?):
acl name type (string|"filename") [string2] [string3] ["filename2"]

The acl tag consists of a minimum of three fields: a unique name; an acl type and a decision string.
An acl line can have more than one decision string, hence the [string2] and [string3] in the line
above.

[edit] A unique name


This is supposed to be descriptive. Use a name such as customers or mynet. You have seen this lots
of times before: the word myNet in the above example is one such case.
There must only be one acl with a given name; if you find that you have two or more classes with
similar names, you can append a number to the name: customer1, customer2 etc. I generally avoid
this, instead putting all similar data on these classes into a file, and including the whole file as one
acl. Check the Decision String section for some more info on this.

[edit] Type
So far we have discussed only acls that check the source IP address of the connection. This isn't
sufficient for many people: it may be useful for you to allow connections at only certain times, or to
only specific domains, or by only some users (using usernames and passwords). If you really want
to, you can even combine all of the above: only allow connections from users that have the right
password, have the right destination and are going to the right domain. There are quite a few
different acl types: the next section of this chapter discusses all of the different types in detail. In the
meantime, let's finish the description of the structure of the acl line.

[edit] Decision String


The acl code uses this string to check if the acl matches a given connection. When using this field,
Squid checks the type field of the acl line to decide how to use the decision string. The decision
string could be an IP address range, a regular expression or a list of domains or more. In the next
section (where we discuss the types of acls available) we discuss the different forms of the Decision
String.
If you have another look at the formal definition of the acl line above, you will note that you can
have more than one decision string per acl line. Strings in this format are ORd together; if you were
to specify two IP address ranges on the same line the return result of the acl would be true if either
of the IP addresses match. (If source strings were ANDd together, then an incoming request would
have to come from two IP address ranges at the same time. This is not impossible, but would almost
certainly be pointless.)
Large decision lists can be stored in files, so that your squid.conf doesn't get cluttered. Some of the
caches I have worked on have had in the region of 2000 lines of acl rules, which could lead to a
very cluttered squid.conf file. You can include a file into the decision section of an acl list by
placing the filename (with path) in double-quotes. The file simply contains the data set; one datum
per line. In the next example the file /usr/local/squid/conf/data/myNets can contain any number of
IP ranges, one range per line.
While on the topic of long lists of acls: it's important to note that you can end up slowing your
cache response with very long lists of acls. Checking acls requires CPU time, and long lists can
decrease cache performance, since instead of moving data to clients Squid is busy checking access
lists. What constitutes a long list? Don't worry about lists with a few hundred entries unless you
have a really slow or busy CPU. Lists thousands of lines long can, however, cause problems.

[edit] Types of acl


So far we have only spoken about acls that filter by source IP address. There are numerous other acl
types:
• Source/Destination IP address
• Source/Destination Domain
• Regular Expression match of requested domain
• Words in the requested URL
• Words in the source or destination domain
• Current day/time
• Destination port
• Protocol (FTP, HTTP, SSL)
• Method (HTTP GET or HTTP POST)
• Browser type
• Name (according to the Ident protocol)
• Autonomous System (AS) number
• Username/Password pair
• SNMP Community

[edit] Source/Destination IP address


In the examples earlier in this chapter you saw lines in the following format:
acl myNet src 10.0.0.0/255.255.0.0
http_access allow myNet

The above acl will match when the IP address comes from any IP address between 10.0.0.0 and
10.0.255.255. In recent years more and more people are using Classless Internet Domain Routing
(CIDR) format netmasks, like 10.0.0.0/16. Squid handles both the traditional IP/Netmask and more
recent IP/Bits notation in the src acl type. IP ranges can also be specified in a further format: one
that is Squid specific. (? I need to spend some time hacking around with these: I am not sure of the
layout ?)
acl myNet src addr1-addr2/netmask
http_access allow myNet

Squid can also match connections by destination IP. The layout is very similar: simply replace src
with dst. Here are a couple of examples:

[edit] Source/Destination Domain


Squid can also limit requests by their source domain. Though it doesn't always happen in the real
world, network administrators can add reverse DNS entries for each of the hosts on their network.
(These records are normally referred to as PTR records.) Squid can make decisions about the
validity of incoming requests by checking their reverse DNS entries. In the below example, the acl
is true if the request comes from a host with a reverse entry that is in either the qualica.com or
squid-cache.org domains.
acl myDomain srcdomain .qualica.com .squid-cache.org
acl allow myDomain

Reverse DNS matches should not be used where security is important. A determined attacker (who
controlled the reverse DNS entries for the attacking host) would be able to manipulate these entries
so that the request comes from your domain. Squid doesn't attempt to check that reverse and
forward DNS entries match, so this option is not recommended.
Squid can also be configured to deny requests to specific domains. Many people implement these
filter lists for pornographic sites. The legal implications of this filtering are not covered here: there
are many, and the relevant law is in a constant state of flux, so advice here would likely be obsolete
in a very short period of time. I suggest that you consult a good lawyer if you want to do something
like this.
The dst acl type allows one to match accesses by destination domain. This could be used to match
urls for popular adult sites, and refuse access (perhaps during specific times).
If you want to deny access to a set of sites, you will need to find out these site's IP addresses, and
deny access to these IP addresses too. If you just put the URL Domain name in, someone
determined to access a specific site could find out the IP address associated with that hostname and
access it by entering the IP address in their browser.
The above is best described with an example. Here, I assume that you want to restrict access to the
site www.adomain.example. If you use either the host of nslookup commands, you would find that
this server has the IP address 10.255.1.2. It's easiest to just have two acls: one for IPs and one for
domains. If the lists get too large, you can simply place them in a file.

[edit] Words in the requested URL


Most caches can filter out URLs that contain a set of banned words. Regular expressions allow you
to simply check if a word is in a given URL, but they also allow for more powerful searches of the
URL. With a simple word check you would find it nearly impossible to create a rule that allows
access to sites with the word sex in the URL, but at the same time denies access to all avi files on
that site. With regular expressions this sort of checking becomes easy, once you understand the
regex syntax.

[edit] A Quick introduction to regular expressions


We haven't encountered regular expressions in this book yet. A regular expression (regex) is an
incredibly useful way of matching strings. As they are incredibly powerful they can get a little
complicated. Regexes are often used in string-oriented languages like Perl, where they make
processing of large text files (such as logs) incredibly easy. Squid uses regular expressions for
numerous things: refresh patterns and access control among them.
If you have not used regular expressions before, you might want to have a look at the O'Reilly book
on regular expressions or the appropriate section in the O'Reilly perl book. Instead of going into
detail here, I am just going to give some (hopefully) useful examples. If you have perl installed on
your machine, you could have a look at the perlre manual page to get an idea as to how the various
regex operators (such as .) function.
Regular expressions in Squid are case-sensitive by default. If you want to match both upper or
lower-case text, you can prefix the regular expression with a -i. Have a look at the next example,
where we use this to match either sex SEX (or even SeX).

[edit] Using Regular expressions to match words in the requested URL


Using regular expressions allows you to create more flexible access lists. So far you have only been
able to filter sites by destination domain, where you have to match the entire domain to deny access
to the site. Since regular expressions are used to match text strings, you can use them to match
words, partial words or patterns in URLs or domains.
The most common use of regex filters in ACL lists is for the creation of far-reaching site filters: if
the url or domain contain a set of banned words, access to the site is denied. If you wish to deny
access to sites that contain the word sex in the URL, you would add one acl rule, rather than trying
to find every site that has adult material on it.
The big problem with regex filters is that not all sites that contain the word sex in the URL are
pornographic. By denying these sites you are likely to be infringing people's rights, and you should
refer to a lawyer for advice on the legality of this.
Creating a list of sites that you don't want accessed can be tedious. There are companies that sell
adult/unwanted material lists which plug into Squid, but these can be expensive. If you cannot
justify the cost, you can
The url_regex acl type is used to match any word in the URL. Here is an example:
In places where bandwidth is very expensive, system administrators may have no problem with
people visiting pornograpic sites. They may, however, want to stop people downloading huge avi
files from these sites. The following example would deny downloads of avi files from sites that
contain the word sex in the URL. The regular expression below matches any URL that contains the
word sex AND ends with .avi.
The urlpath_regex acl strips off the url-type and hostname, checking instead only the path and
filename.

[edit] Words in the source or destination domain


Regular expressions can also be used for checking the source and destination domains of a request.
The srcdom_regex tag is used to check that a request comes from a specific subdomain, while the
dstdom_regex checks the domain part of the requested URL. (You could check the requested
domain with a url_regex tag, but you could run into interesting problems with sites that refer to
pages with urls like http://www.company.example/www.anothersite.example.)
Here is an example acl set that uses a regular expression (rather than using the srcdomain and
dstdomain tags). This example allows you to deny access to .com or .net sites if the request is from
the .za domain. This could be useful if you are providing a "public peering" infrastructure to other
caches in your geographical region. Note that this example is only a fragment of a complete acl set:
you would presumably want your customers to be able to access any site, and there is no final deny
acl.
acl bad_dst_TLD dstdom_regex \.com$ \.net$
acl good_src_TLD srcdom_regex \.za$
# allow requests FROM the za domain UNLESS they want to go to \.com or \.net
http_access deny bad_dst_TLD
http_access allow good_src_TLD

[edit] Current day/time


Squid allows one to allow access to specific sites by time. Often businesses wish to filter out
irrelevant sites during work hours. The Squid time acl type allows you to filter by the current day
and time. By combining the dstdomain and time acls you can allow access to specific sites (such as
your the sites of suppliers or other associates) during work hours, but allow access to other sites
after work hours.
The layout is quite compact:
acl name time [day-list] [start_hour:minute-end_hour:minute]

Day list is a list of single characters indicating the days that the acl applies to. Using the first letter
of the day would be ambiguous (since, for example, both Tuesday and Thursday start with the same
letter). When the first letter is ambiguous, the second letter is used: T stands for Tuesday, H for
Thursday. Here is a list of the days with their single-letter abreviations:
S - Sunday M - Monday T - Tuesday W - Wednesday H - Thursday F - Friday A - Saturday
Start_hour and end_hour are times written in 24-hour ("military") time (17:00 instead of 5:00).
End_hour must always be larger than start_hour. Unfortunately, this means that you can't simply
write:
acl darkness 17:00-6:00 # won't work

You have to specify two separate ranges:


acl night time 17:00-24:00
acl early_morning time 00:00-6:00

As you can see from the original definition of the time acl, you can specify the day of the week
(with no time), the time (with no day), or both the time and day (?check!?). You can, for example,
create a rule that specifies weekends without specifying that the day starts at midnight and ends at
the following midnight. The following acl will match on either Saturday or Sunday.
acl weekends time SA

The following example is too basic for real-world use. Unfortunately, creating a good example
requires some of the more advanced features of the http_access line; these are covered (with
examples) in the next section of this chapter.

[edit] Destination Port


Because of the design of the HTTP protocol, people can connect to things like IRC servers through
your cache servers, even though the two protocols are very different. The same problems can be
used to tunnel telnet connections through your cache server. The part of HTTP that allows this is the
CONNECT method, mainly used for securing https connections with SSL.
Since you generally don't want to proxy anything other than the standard supported protocols, you
can restrict the ports that your cache is willing to connect to. Web servers almost always listen for
incoming requests on port 80. Some servers (notably site-specific search engines and unofficial
sites) listen on other ports, such as 8080. Other services (such as IRC) also use high-numbered
ports. The default Squid config file limits standard HTTP requests to the port ranges defined in the
Safe_ports squid.conf acl. SSL CONNECT requests are even more limited, allowing connections to
only ports 443 and 563. However, keep in mind that these port assignments are only a convention
and nothing prevents people from hosting (on machines they control) any type of server on any port
they choose.
Port ranges are limited with the port acl type. If you look in the default squid.conf, you will see
lines like:
acl SSL_ports port 443 563
acl Safe_ports port 80 21 443 563 70 210 1025-65535

The format is pretty straightforward: a destination port of 443 or 563 is matched by the first acl,
while 80, 21, 443, etc. by the second line. The most complicated section of the examples above is
the end of the line: the text that reads "1025-65535".
The "-" character is used in squid to specify a range. The example thus matches any port from 1025
all the way up to 65535. These ranges are inclusive, so the second line matches ports 1025 and
65535 too.
The only low-numbered ports which Squid should need to connect to are 80 (the HTTP port), 21
(the FTP port), 70 (the Gopher port), 210 (wais) and the appropriate SSL ports. All other low-
numbered ports (where common services like telnet run) do not fall into the 1024-65535 range, and
are thus denied.
The following http_access line denies access to URLs that are not in the correct port ranges. You
have not seen the ! http_access operator before: it inverts the decision. The line below would read
"deny access if the request does not fall in the range specified by acl Safe_ports" if it were written
in english. If the port matches one of those specified in the Safe_ports acl line, the next http_access
line is checked. More information on the format of http_access lines is given in the next section
Acl-operator lines.
http_access deny !Safe_ports

[edit] Protocol (FTP, HTTP, SSL)


Some people may wish to restrict their users to specific protocols. The proto acl type allows you to
restrict access by the URL prefix: the http:// or ftp:// bit at the front. The following example will
deny requests that use the FTP protocol.
The default squid.conf file denies access to a special type of URL, those which use the
cache_object protocol. When Squid sees a request for one of these URLs it serves up information
about itself: usage statistics, performance information and the like. The world at large has no need
for this information, and it could be a security risk.

[edit] HTTP Method (GET, POST or CONNECT)


HTTP can be used for downloading (GETting data) or uploads (POSTing data to a site). The
CONNECT mode is used for SSL data transfers. When a connection is made to the proxy the client
specifies what kind of request (called a method) it is sending. A GET request looks like this:
GET http://www.qualica.com/ HTTP/1.1
blank-line

If you were connecting using SSL, the GET word would be replaced with the word CONNECT.
You can control what methods are allowed through the cache using the post acl type. The most
common use is to stop CONNECT type requests to non-SSL ports. The CONNECT method allows
data transfer in any direction at any time: if you telnet to a badly configured proxy, and enter
something like:
CONNECT www.domain.example:23 HTTP/1.1
blank-line

you might end up with a telnet connection to www.domain.example just as if you had telnetted there
from the cache server itself. This can be used get around packet-filters, firewall access lists and
passwords, which is generally considered a bad thing! Since CONNECT requests can be quite easily
exploited, the default squid.conf denies access to SSL requests to non-standard ports (as described
in the section on the port acl-operator.)
Let's assume that you want to stop your clients from POSTing to any sites (note that doing this is
not a good idea, since people using some search engines (for example) would run into problems: at
this stage this is just an example. (?TODO: Example)

[edit] Browser type


Companies sometimes have policies as to what browsers people can use. The browser acl type
allows you to specify a regular expression that can be used to allow or deny access..

[edit] Username
Logs generally show the source IP address of a connection. When this address is on a multiuser
machine (let's use a Unix machine at a university as an example) you cannot pin down a request as
being from a specific user. There could be hundreds of people logged into the Unix machine, and
they could all be using the cache server. Trying to track down a misbehaver is very difficult in this
case, since you can never be sure which user is actually doing what. To solve this problem, the ident
protocol was created. When the cache server accepts a new connection, it can call back to the origin
server (on a low-numbered port, so the reply cannot be faked) to find out who's on the other end of
the connection. This doesn't make any sense on single-user systems: people can just load their own
ident servers (and become daffy duck for a day). If you run multi-user systems then you may want
only certain people on those machines to be able to use the cache. In this case you can use the ident
username to allow or deny access.
One of the best things about Unix is the flexibility you get. If you wanted (for example) only
students in their second year on to have access to the cache servers via your Unix machines, you
could create a replacement ident server. This server could find out which user that has connected to
the cache, but instead of returning the username you could return a string like "third_year" or
"postgrad". Rather than maintaining a list of which students are in on both the cache server and the
central Unix system, you could simple Squid rules, and the ident server could do all the work where
it checks which user is which.

[edit] Autonomous System (AS) Number


Squid is often used by large ISPs. These ISPs want all of their customers to have access to their
caches without having incredibly long manually-maintained ACL lists (don't forget that such long
lists of IPs generally increase the CPU usage of Squid too). Large ISP's all have AS (Autonomous
System) numbers which are used by other Internet routers which run the BGP (Border Gateway
Protocol) routing protocol.
The whois server whois.ra.net keeps a (supposedly authoritive) list of all the IP ranges that are in
each AS. Squid can query this server and get a list of all IP addresses that the ISP controls, reducing
the number of rules required. The data returned is also stored in a radix tree, for more cpu-friendly
retrieval.
Sometimes the whois server is updated only sporadically. This could lead to problems with new
networks being denied access incorrectly. It's probably best to automate the process of adding new
IP ranges to the whois server if you are going to use this function.
If your region has some sort of local whois server that handles queries in the same way, you can use
the as_whois_server Squid config file option to query a different server.

[edit] Username and Password


If you want to track Internet usage it's best to get users to log into the cache server when they want
to use the net. You can then use a stats program to generate per-user reports, no matter which
machine on your network a person is using. Universities and colleges often have labs with many
machines, where it is difficult to tell which user is sitting in front of a machine at any specific time.
By using names and passwords you will solve this problem.
Squid uses modules to do user authentication, rather than including code to do it directly. The
default Squid source does, however, include two standard modules; The first authenticates users
from a file, the other uses SMB (MS Windows) authentication. Since these modules are not
compiled when you compile Squid itself, you will need to cd to the appropriate source directory
(under auth_modules) and run make. If the compile goes well, a make install will place the program
file in the /usr/local/squid/bin/ directory and any config files in the /usr/local/squid/etc/ directory.
NCSA authentication is the easiest to use, since it's self contained. The SMB authentication program
requires that Samba (samba.org) be installed, since it effectively talks to the SMB server through
Samba.
The squid.conf file uses the authenticate_program tag to decide which external program to use to
authenticate users. If Squid were to only start one authentication program, a slow
username/password lookup could slow the whole cache down (while all other connections waited to
be authenticated). Squid thus opens more than one authentication program at a time, sending
pending requests to the second when the first is busy, the third when the second is and so forth. The
actual number started is specified by the authenticate_children squid.conf value. The default is five,
but you will probably need to increase this for a heavily loaded cache server.

[edit] Using the NCSA authentication module


To use the NCSA authentication module, you will need to add the following line to your squid.conf:
authenticate_program /usr/local/squid/bin/ncsa_auth /usr/local/squid/etc/passwd

You will also need to create the appropriate password file (/usr/local/squid/etc/passwd in the
example above). This file consists of a username and password pair, one per line, where the
username and password are seperated by a colon (:), just as they are in a Unix /etc/passwd file. The
password is encrypted with the same function as the passwords in /etc/passwd (or /etc/shadow on
newer systems) are. Here is an example password line:
oskar:lKdpxbNzhlo.w

Since the encrypted passwords are the same, and the ncsa_auth module understands the /etc/passwd
or /etc/shadow file format, you could simply copy the system password file periodically. If your
users do not already have passwords in Unix crypt format somewhere, you will have to use the
htpasswd program (in /usr/local/squid/bin/) to generate the appropriate user and password pairs.
[edit] Using the SMB authentication module
Very Simple...
authenticate_ip_ttl 5 minutes
auth_param basic children 5
auth_param basic realm Servidor de Autenticacion!
auth_param basic program /usr/lib/squid/smb_auth -W work_group -I server_name

[edit] Using the RADIUS authentication module


Once you have compiled (./compile & make & make install) "Squid_radius_auth" (you can get a
copy here: http://www.squid-cache.org/contrib/squid_radius_auth/), you must add this follow line to
squid.conf (for basic auth):
acl external_traffic proxy_auth REQUIRED
http_access allow external_traffic
auth_param basic program /usr/local/squid/libexec/squid_radius_auth -f
/usr/local/squid/etc/squid_radius_auth.conf
auth_param basic children 5
auth_param basic realm This is the realm
auth_param basic credentialsttl 45 minutes

After you have added this parameter you must edit /usr/local/squid/etc/squid_radius_auth.conf and
change the default hostname of RADIUS server hostname (or IP) and change the key. Restart squid
for it to take effect.

[edit] SNMP Community


If you have configured Squid to support SNMP, you can also create acls that filter by the requested
SNMP community. By combining source address (with the src acl type) and community filters
(using the snmp_community acl type) you can restrict sensitive SNMP queries to administrative
machines while allowing safer queries from the public. SNMP setup is covered in more detail later
in the chapter, where we discuss the snmp_access acl-operator.

[edit] Acl-operator lines


Acl-operators are the other half of the acl system. For each connection the appropriate acl-operators
are checked (in the order that they appear in the file). You have met the http_access and icp_access
operators before, but they aren't the only Squid acl-operators. All acl-operator lines have the same
format; although the below format mentions http_access specifically, the layout also applies to all
the other acl-operators too.
http_access allow|deny [!]aclname [& [!]aclname2 ... ]

<<note: field-testing above on old squid 2.3 suggests "&" must be omitted>>
Let's work through the fields from left to right. The first word is http_access, the actual acl-operator.
The allow and deny words come next. If you want to deny access to a specific class of users, you
can change the customary allow to deny in the acl line. We have seen where a deny line is useful
before, with the final deny of all IP ranges in previous examples.
Let's say that you wanted to deny Internet access to a specific list of IP addresses during the day.
Since acls can only have one type per acl, you could not create an acl line that matches an IP
address during specific times. By combining more than one acl per acl-operator line, though, you
get the same effect. Consider the following acls:
acl dialup src 10.0.0.0/255.255.255.0
acl work time 08:00-17:00

If you could create an acl-operator that was matched when both the dialup and work acls were true,
clients in the range could only connect during the right times. This is where the aclname2 in the
above acl-operator definition comes in. When you specify more than one acl per acl-operator line,
both acls have to be matched for the acl-operator to be true. The acl-operator function AND's the
results from each acl check together to see if it is to return true of false.
You could thus deny the dialup range cache access during working hours with the following acl
rules:
You can also invert an acl's result value by using an exclamation mark (the traditional NOT value
from many programming languages) before the appropriate acl. In the following example I have
reduced Example 6-4 into one http_access line, taking advantage of the implicit inversion of the last
rule to deny access to all clients.
Since the above example is quite complicated, let's cover it in more detail:
In the above example an IP from the outside world will match the 'all' acl, but not the 'myNet' acl;
the IP will thus match the http_access line. Consider the binary logic for a request coming in from
the outside world, where the IP is not defined in the myNet acl.
Deny http access if ((true) & (!false))

If you consider the relevant matching of an IP in the 10.0.0.0 range, the myNet value is true, the
binary representation is as follows:
Deny http access if ((true) & (!true))

A 10.0.0.0 range IP will thus not match the only http_access line in the squid config file.
Remembering that Squid will default to the opposite of the last match in the file, accesses will be
allowed from the myNet IP range.

[edit] The other Acl-operators


You have encountered only the http_access and icp_access acl-operators so far. Other acl-operators
are:
• no_cache
• ident_lookup_access
• miss_access
• always_direct, never_direct
• snmp_access (covered in the next section of this chapter)
• delay_classes (covered in the next section of this chapter)
• broken_posts

[edit] The no_cache acl-operator


The no_cache acl-operator is used to ensure freshness of objects in the cache. The default Squid
config file includes an example no_cache line that ejects the results of cgi programs from the cache.
If you want to ensure that cgi pages are not cached, you must un-comment the following lines from
squid.conf:
acl QUERY urlpath_regex cgi-bin \\?
no_cache deny QUERY

The first line uses a regular expression match to find urls that have cgi-bin or ? in the path (since we
are using the urlpath_regex acl type, a site with a name like cgi-bin.qualica.com will not be
matched.) The no_cache acl-operator is then used to eject matching objects from the cache.

[edit] The ident_lookup_access acl-operator


Earlier we discussed using the ident protocol to control cache access. To reduce network overhead,
Squid does an ident lookup only when it needs to. If you are using ident to do access control, Squid
will do an ident lookup for every request, and you don't have to worry about this acl-operator.
Many administrators would like to log the the ident value for connections without actually using it
for access control. Squid used to have a simple on/off switch for ident lookups, but this incurred
extra overhead for the cases where the ident lookup wasn't useful (where, for example, the
connection is from a desktop PC).
Let's consider some examples. Assume that a you have one Unix server (at IP address 10.0.0.3), and
all remaining IP's in the 10.0.0.0/255.255.255.0 range are desktop PC's. You don't want to log the
ident value from PC's, but you do want to record it when the connection is from the Unix machine.
Here is an example acl set that does this:
If a system cracker is attempting to attack your cache, it can be useful to have their ident value
logged. The following example gets Squid not to do ident lookups for machines that are allowed
access, but if a request comes from a disallowed IP range, an ident lookup is done and inserted into
the log.

[edit] The miss_access acl-operator


The ICP protocol is used by many caches to find out if objects are in another cache's on-disk store.
If you are peering with other organisation's caches, you may wish them to treat you as a sibling,
where they only get data that you already have stored on disk. If an unscrupulous cache-admin were
to change their cache_peer line to read parent instead of sibling, they could get you to retrieve
objects on their behalf.
To stop this from happening, you can create an acl that contains the peering caches, and use the
miss_access acl-operator to ensure that only hits are served to these caches. In response to all other
requests, an access-denied message is sent (so if a sibling complains that they almost always get
error messages, it's likely that they think that you should be their parent, and you think that they
should be treating you as a sibling.)
When looking at the following example it is important to realise that http_access lines are checked
before any miss_access lines. If the request is denied by the http_access lines, an error page is
returned and the connection closed, so miss_access lines are never checked. This means that the last
miss_access line in the example doesn't allow random IP ranges to access your cache, it only allows
ranges that have passed the http_access test through. This is simpler than having one miss_access
line for each http_access line in the file, and it will reduce CPU usage too, since only two acls are
checked instead of the six we would have instead.

[edit] The always_direct and never_direct acl-operators


These operators help you make controlled decisions about which servers to connect to directly, and
which to connect through a parent cache/proxy. I previously discussed this set of options briefly in
Chapter 3, during the Basic Installation phase.
These tags are covered in detail in the following chapter, in the Peer Selection section.
[edit] The broken_posts acl-operator
Some servers incorrectly handle POST data, requiring an extra Carriage-Return (CR) and Line-Feed
(LF) after a POST request. Since obeying the HTTP specification will make Squid incompatible
with these servers, there is an option to be non-compliant when talking to a specific set of servers.
This option should be very rarely used. The url_regex acl type should be used for specifying the
broken server.

[edit] SNMP Configuration


Before we continue: if you wish to use Squid's SNMP functions, you will need to have configured
Squid with the --enable-snmp option, as discussed way back in Chapter 2. The Squid source only
includes SNMP code if it is compiled with the correct options.
Normally a Unix SNMP server (also called an agent) collects data from the various services running
on a machine, returning information about the number of users logged in, the number of sendmail
processes running and so forth. As of this writing, there is no SNMP server which gathers Squid
statistics and makes them available to SNMP managment stations for interpretation. Code has thus
been added to Squid to handle SNMP queries directly.
Squid normally listens for incoming SNMP requests on port 3401. The standard SNMP port is 161.
For the moment I am going to assume that your management station can collect SNMP data from a
port other than 161. Squid will thus listen on port 3401, where it will not interfere with any other
SNMP agents running on the machine.
No specific SNMP agent or mangement station software is covered by this text. A Squid-specific
mib.txt file is included in the /usr/local/squid/etc/ directory. Most management station software
should be able to use this file to construct Squid-specific queries.

[edit] Querying the Squid SNMP server on port 3401


All snmp_access acl-operators are checked when Squid is queried by an SNMP management
station. The default squid.conf file allows SNMP queries from any machine, which is probably not
what you want. Generally you will want only one machine to be able to do SNMP queries of your
cache. Some SNMP information is confidential, and you don't want random people to poke around
your cache settings. To restrict access, simply create a src acl for the appropriate IP address, and use
snmp_access to deny access for every other IP.
Not all Squid SNMP information is confidential. If you want to allow split up SNMP information
into public and private, you can use an SNMP-specific acl type to allow or deny requests based on
the community the client has requested.

[edit] Running multiple SNMP servers on a cache machine


If you are running multiple SNMP servers on your cache machine, you probably want to see all the
SNMP data returned on one set of graphs or summaries. You don't want to have to query two SNMP
servers on the same machine, since many SNMP analysis tools will not allow you to relate (for
example) load average to number of requests per second when the SNMP data comes from more
than one source.
Let's work through the steps Squid goes through when it receives an SNMP query: The request is
accepted, and access-control lists are checked. If the request is allowed, Squid checks to see if it's a
request for Squid information or a request for something it doesn't understand. Squid handles all
Squid-specific queries internally, but all other SNMP requests are simply passed to the other SNMP
server; Squid essentially acts as an SNMP proxy for SNMP queries it doesn't understand.
This SNMP proxy-mode allows you to run two servers on a machine, but query them both on the
same port. In this mode Squid will normally listen on port 161, and the other SNMP server is
configured to listen on another port (let's use port 3456 for argument's sake). This way the client
software doesn't have to be configured to query a different port, which especially helps when the
client is not under your control.

[edit] Binding the SNMP server to a non-standard port


Getting your SNMP server to listen on a different port may be as easy as changing one line in a
config file. In the worst case, though, you may have to trick it to listen somewhere else. This section
is a bit of a guide to IP server trickery!
Server software can either listen for connections on a hard-coded port (where the port to listen to is
coded into the source and placed directly into the binary on compilation time), or it can use standard
system calls to find the port that it should be listening to. Changing programs that use the second set
of options to use a different port is easy: you edit the /etc/services file, changing the value for the
appropriate port there. If this doesn't work, it probably means that your program uses hard-coded
values, and your only recourse is to recompile from source (if you have it) or speak to your vendor.
You can check that your server is listening to the new port by checking the output of the netstat
command. The following command should show you if some process is listening for UDP data on
port 3456:
cache1:~ $ netstat -na | grep udp | grep 3456
udp 0 0 0.0.0.0:3456 0.0.0.0:*
cache1:~ $

Changing the services port does have implications: client programs (like any SNMP management
station software running on the machine) will also use the services file to find out which port they
should connect when forming outgoing requests. If you are running anything other than a simple
SNMP agent on the cache machine, you must not change the /etc/services file: if you do you will
encounter all sorts of strange problems!
Squid doesn't use the /etc/services file, but the port to listen to is stored in the standard Squid config
file. Once the other server is listening on port 3456, we need to get Squid to listen on the standard
SNMP port and proxy requests to port 3456.
First, change the snmp_port value in squid.conf to 161. Since we are forwarding requests to another
SNMP server, we also need to set forward_snmpd_port to our other-server port, port 3456.

[edit] Access Control with more than one Agent


Since Squid is actually creating all the queries that reach the second SNMP server, using an IP-
based access control system in the second server's config is useless: all requests will come from
localhost. Since the second server cannot find out where the requests came from originally, Squid
will have to take over the access control functions that were handled by the other server.
For the first example, let's assume that you have a single SNMP management station, and you want
this machine to have access to all SNMP functions. Here we assume that the management station is
at IP 10.0.0.2.
You may have classes of SNMP stations too: you may wish some machines to be able to inspect
public data, but others are to be considered completely trusted. The special snmp_community acl
type is used to filter requests by destination community. In the following example all local machines
are able to get data in the public SNMP community, but only the snmpManager machine is able to
get other information. In this example we are using the ANDing of the publicCommunity and myNet
acls to ensure that only people on the local network can get even public information.

[edit] Delay Classes


Delay Classes are generally used in places where bandwidth is expensive. They let you slow down
access to specific sites (so that other downloads can happen at a reasonable rate), and they allow
you to stop a small number of users from using all your bandwidth (at the expense of those just
trying to use the Internet for work).
To ensure that some bandwidth is available for work-related downloads, you can use delay-pools.
By classifying downloads into segments, and then allocating these segments a certain amount of
bandwidth (in kilobytes per second), your link can remain uncongested for "useful" traffic.
To use delay-pools you need to have compiled Squid with the appropriate options: you will have to
have used the --enable-delay-pools option when running the configure program back in Chapter 2.

[edit] Slowing down access to specific URLs


An acl-operator (delay_access) is used to split requests into pools. Since we are using acls, you can
split up requests by source address, destination url or more. There is more than one type (or class)
of pool. Each type of pool allows you to limit bandwidth in different ways.

[edit] The First Pool Class


Rather than cover all of the available classes immediately, let's deal with a basic example first. In
this example we have only one pool, and the pool catches all URLs containing the word
abracadabra.
acl magic_words url_regex -i abracadabra
delay_pool_count 1
delay_class 1 1
delay_parameters 1 16000/16000
delay_access 1 allow magic_words

The first line is a standard ACL: it returns true if the requested URL has the word abracadabra in it.
The -i flag is used to make the search case-insensitive.
The delay_pool_count variable tells Squid how many delay pools there will be. Here we have only
one pool, so this option is set to 1.
The third line creates a delay pool (delay pool number 1, the first option) of class 1 (the second
option to delay_class).
The first delay class is the simplest: the download rate of all connections in the class are added
together, and Squid keeps this aggregate value below a given maximum value.
The fourth line is the most complex, as if you can see. The delay_parameters option allows you to
set speed limits on each pool. The first option is the pool to be manipulated: since we have only one
pool in this example, this is set to 1. The second option consists of two values: the restore and max
values, separated by a forward-slash (/).
If you download a short file at high speed, you create a so-called burst of traffic. Generally these
short bursts of traffic are not a problem: these are normally html or text files, which are not the real
bandwidth consumers. Since we don't want to slow everyone's access down (just the people
downloading comparitively large files), Squid allows you to configure a size that the download is to
start slowing down at. If you download a short file, it arrives at full speed, but when you hit a
certain threshold the file arrives more slowly.
The restore value is used to set the download speed, and the max value lets you set the size at which
the files are to be slowed down from. Restore is in bytes per second, max is in bytes.
In the above example, downloads proceed at full speed until they have downloaded 16000 bytes.
This limit ensures that small file arrive reasonably fast. Once this much data has been transferred,
however, the transfer rate is slowed to 16000 bytes per second. At 8 bits per byte this means that
connections are limited to 128kilobits per second (16000 * 8).

[edit] The Second Pool Class


As I discussed in this section's introduction, delay pools can help you stop one user from flooding
your links with downloads. You could place each user in their own pool, and then set limits on a
per-user basis, but administrating these lists would become painful almost immediately. By using a
different pool type, you can set rate limits by IP address easily.
Let's consider another example: you have a 128kbit per second line. Since you want some
bandwidth available for things like SMTP, you want to limit web access to 100kbit per second. At
the same time, you don't want a single user to use more than their fair share of sustained bandwidth.
Given that you have 20 staff members, and 100kbit per second remaining bandwidth, each person
should not use more than 5kbit per second of bandwidth. Since it's unlikely that every user will be
surfing at once, we can probably limit people to about four times their limit (that's 20kbit per
second, or 2.5kbytes per second).
In the following example, we change the delay class for pool 1 to 2. Delay class 2 allows us to
specify both an aggregate (overall) bandwidth usage and a per-user usage. In the previous example
the delay_paramaters tag only took one set of options, the aggregate peak and burst rates. Given
that we are now using a class-two pool, we have to supply two sets of options to delay_parameters:
the overall speed and the per-IP speed. The 100kbits per second value is converted to bytes per
second by dividing by 8 (giving us the 12500 values), and the per-IP value of 2.5kbits per second
we discovered is converted to bytes per second (giving us the 2500 values.)

EXAMPLE
acl all src 0.0.0.0/0.0.0.0
delay_pool_count 1
delay_class 1 2
delay_parameters 1 12500/12500 2500/2500
delay_access 1 allow all

[edit] The Third Pool Class


This class is useful to very organizations like Universities. The second pool class lets you stop
individual users from flooding your links. A lab full of students all operating at their maximum
download rate can, however, still flood the link. Since such a lab (or department, if you are not at a
University) will all have IP addresses in the same range, it is useful to be able to put a cap on the
download rate of an entire network range. The third pool class lets you do this. Currently this option
only works on class-C network ranges, so if you are using variable length subnet masks then this
will not help.
In the next example we assume that you have three IP ranges. Each range must not use more than
1/3 of your available bandwidth. For this example I am assuming that you have a 512kbit/s line, and
you want 64kbit/s available for SMTP and other protocols. This will leave you with an overall
download rate cap of 448kbit/s.) Each Class-C IP range will have about 150kbit/s available. With 3
ranges of 256 IP addresses each, you should have in the region of 500 pc's, which (if calculated
exactly) gives you .669kbit per second per machine. Since it is unlikely that all machines will be
using the net at the same time, you can probably allocate each machine (say) 4kbit per second (a
mere 500 bytes per second).
In this example, we changed the delay class of the pool to 3. The delay_parameters option now
takes four arguments: the pool number; the overall bandwidth rate; the per-network bandwidth rate
and the per-user bandwidth rate.
The 4kbit per second limit for users seems a little low. You can increase the per-user limit, but you
may find that it's a better idea to change the max value instead, so that the limit sets in after only
(say) 16kilobytes or so. This will allow small pages to be downloaded as fast as possible, but large
pages will be brought down without influencing other users.
If you want, you can set the per-user limit to something quite high, or even set them to -1, which
effectively means that there is no limit. Limits work from right to left, so if I user is sitting alone in
a lab they will be limited by their per-user speed. If this value is undefined, they are limited by their
per-network speed, and if that is undefined then they are limited by their overall speed. This means
that you can set the per-user limit higher than you would expect: if the lab is not busy then they will
get good download rates (since they are only limited by the per-network limit).

EXAMPLE:
acl all src 0.0.0.0/0.0.0.0
delay_pool_count 1
delay_class 1 3
1. 56000*8 sets your overall limit at 448kbit/s
1. 18750*8 sets your per-network limit at 150kbit/s
1. 500*8 sets your per-user limit at 4kbit/s
delay_parameters 1 56000/56000 18750/18750 500/500
delay_access 1 allow all

[edit] Using Delay Pools in Real Life


By combining multiple ACLs, you can do interesting things with delay pools. Here are some
examples:
• By using time-based acls, you can limit people's speed during working hours, but allow
them full-speed access outside hours.
• Again (with time-based acl lists), you can allocate a very small amount of bandwidth to http
during working hours, discouraging people from browsing the Web during office hours.
• By using acls that match specific source IP addresses, you can ensure that sibling caches
have full-speed access to your cache.
• You can prioritize access to a limited set of destination sites by using the dst or dstdomain
acl types by inverting the rules we used to slow access to some sites down.
• You can combine username/password access-lists and speed-limits. You can, for example.
allow users that have not logged into the cache access to the Internet, but at a much slower
speed than users who have logged in. Users that are logged in get access to dedicated
bandwidth, but are charged for their downloads.

[edit] Conclusion
Once your acl system is correctly set up, your cache should essentially be ready to become a
functional part of your infrastructure. If you are going to use some of the advanced Squid features
(like transparent operation mode, for example), (? Unintended truncation? ?)
Retrieved from "http://www.deckle.co.za/squid-users-
guide/Access_Control_and_Access_Control_Operators"

You might also like