Professional Documents
Culture Documents
1 Squid
1.1 A Users Guide 1.1.1 Oskar Pearson
Qualica Technologies (Pty) Ltd, South Africa. Copyright 2000 by Oskar Pearson (oskar@linux.org.za). All rights reserved. This version of the document (Version 0.1) is not to be mirrored. All trademarks used in this document are owned by their respective companies. This document makes no ownership claim of any trademark(s). If you wish to have your trademark removed from this document, please contact the copyright holder. No disrespect is meant by any use of other companies trademarks in this document. Note: This document is not (yet) to be mirrored; copying for personal or company-wide use or printing is perfectly acceptable. Once the document is in a stable state, the document will be released under the GNU Free Documentation License. (http://www.gnu.org/copyleft/fdl.html), and mirroring will be allowed. There are many mirrors of the old Squid Users Guide out there, which will all now have effectively useless mirrors; dont mirror this documentation at your site unless you are willing to keep it up to date! This document will shortly be released under the GNU Free Documentation License. Table of Contents 1. Overall Layout (for writers) 2. Terminology and Technology What Squid is Why Cache? What Squid is not Supported Protocols Supported Client Protocols Inter Cache and Management Protocols Inter-Cache Communication Protocols Firewall Terminology The Two Types of Firewall Firewalled Segments Hand Offs 3. Installing Squid Hardware Requirements Gathering statistics Hard Disks RAM requirements CPU Power
-1-
Choosing an Operating System Experience Features Compilers Basic System Setup Default Squid directory structure User and Group IDs Getting Squid Getting the Squid source code Getting Binary Versions of Squid Compiling Squid Compilation Tools Unpacking the Source Archive Compilation options Running configure Compiling the Squid Source Installing the Squid binary 4. Squid Configuration Basics Version Control Systems The Configuration File Setting Squids HTTP Port Using Port 80 Email for the Cache Administrator Effective User and Group ID FTP login information Access Control Lists and Access Control Operators Simple Access Control Ensuring Direct Access to Internal Machines Communicating with other proxy servers Your ISPs cache Firewall Interactions 5. Starting Squid Before Running Squid Subdirectory Permissions Running Squid Testing Squid Testing a Cache or Proxy Server with Client 6. Browser Configuration Browsers Basic Configuration Advanced Configuration Basic Configuration Host name Browser-cache Interaction Testing the Cache Cache Auto-config Web server config changes for autoconfig files Autoconfig Script Coding
-2-
Cache Array Routing Protocol cgi generated autoconfig files Future directions Roaming Browsers Transparency Ready to Go 7. Access Control and Access Control Operators Uses of ACLs Access Classes and Operators Acl lines A unique name Type Decision String Types of acl Acl-operator lines The other Acl-operators SNMP Configuration Querying the Squid SNMP server on port 3401 Running multiple SNMP servers on a cache machine Delay Classes Slowing down access to specific URLs The Second Pool Class The Second Pool Class The Third Pool Class Using Delay Pools in Real Life Conclusion 8. Cache Hierarchies Introduction Why Peer Peer Configuration The cache_peer Option Peer Selection Selecting by Destination Domain Selecting with Acls Other Peering Options Multicast Cache Communication Getting your machine ready for Multicast Querying a Multicast Cache Accepting Multicast Queries: The mcast_groups option Other Multicast Cache Options Cache Digests Cache Hierarchy Structures Two Peering Caches Trees Meshes Load Balancing Servers The Cache Array Routing Protocol (CARP)
-3-
9. Accelerator Mode When to use Accelerator Mode Acceleration of a slow server Replacing a combination cache/web server with Squid Transparent Caching Security Accelerator Configuration Options The httpd_accel_host option The httpd_accel_port option The httpd_accel_with_proxy option The httpd_accel_uses_host_header option Related Configuration Options The redirect_rewrites_host_header option Refresh patterns Access Control Example Configurations Replacing a Combination Web/Cache server Accelerating Requests to a Slow Server 10. Transparent Caching The Problem with Transparency The Transparent Caching Process Some Routing Basics Packet Flow with Transparent Caches Network Layout Filtering Traffic Unix machines Routers (not done) Layer-Four Switches (not done) Kernel Redirection (not done) Squid Settings (not done) 11. Not Yet Done: Squid Config files and options Overall Layout (for writers) Next
-4-
-5-
3.1) note on RCS 3.2) The configuration file: 3.2.1) HTTP port 3.2.2) Communicating with other proxy servers 3.2.2.1) Basic cache hierarchy terminology 3.2.2.2) Proxy-level firewall 3.2.2.3) Packet-filter firewall 3.2.2.4) Source/Destination IP and Port pairs 3.2.3) Cache Store location 3.2.3.1) Disk space allocation (? move to chapter1?) 3.2.4) FTP login information 3.2.5) acl, http_access 3.2.5.1) create a basic acl that denies everything but one address range 3.2.5.2) Intranet access with parents 3.2.6) cache_mgr 3.2.7) cache_effective_group Chapter 4) Starting and Running Squid (15 pages) 4.1) Running Squid for the first time 4.1.1) Permissions - on each ~squid/* directory 4.1.2) Creating cache directories 4.1.2.1) Problems creating Swap Directories - problems: not root squid user id doesnt exist squid user doesnt have write to cache dir squid user doesnt have read/exec to a directory up the tree 4.2) Running Squid 3.2.1) What is expected in cache.log 4.3) Testing the cache with the included client 4.3.1) checking if Internet works 4.3.2) checking if intranet works (if configured with a parent) 4.3.3) Checking Access.log for hits vs misses Include basic fields 4.4) Addition to startup files (? check NT ?) Chapter 5) Client configuration: (24 pages) Include some screen shots of the configuration menus 5.1) Basic client configuration. 5.1.1) Netscape 5.1.2) Internet Explorer 5.1.3) Unix environment variables (Important for both lynx and for wget - for prefetching pages)
-6-
5.2) Client cache-specific modifications 5.3) Testing client access 5.4) Setting clients to use LOCAL caches 5.4.1) CARP 5.4.2) Autoconfigs 5.4.3) Future directions 5.2.4.1) DNS destination selection based on 5.2.4.2) Roaming ability will help 5.2.4.3) Transparency (see 11.1) II) Integration By this point Squid should be installed with a minimum working environment. This section covers changing cache setup to suit the local network configuration. This section covers Access Control, Refresh patterns and Cache-peer relationships. These are the painful sections of the setup. This section also goes through the options in the config file that havent been covered. This is essentially a reference guide to the config options. Chapter 6) ACLs: (38 pages) Each of these includes a short example that shows how they work. At the end of the Chapter there is a nice long complex ACL list that should suit almost everyone. 6.1) Introduction to ACLs 6.1.1) ACL lines vs Operator lines 6.1.2) How decisions work 6.2) Data specification: 6.2.1) regular expressions 6.2.2) IP address range specifications 6.2.3) AS numbers 6.2.4) putting the data in files 6.3) types of acl lines: Works through all the acl types. (src, srcdomain, dst, dstdomain etc) - must include info on "no_cache", specifically for 3.2.5.2 6.4) Delay classes 6.5) SNMP configuration 6.5) The default acl set include info on why the SSL setup is the way that it is, and information on the echo/chargen ports Chapter 7) Hierarchies: (42 pages) 7.1) Inter-cache communication protocols How each one is suited to specific circumstances. Compatability notes
-7-
(with other programs) are included. 7.1.1) ICP 7.1.2) Digests 7.1.3) HTCP 7.1.4) CARP 7.2) Various types of hierarchy structures are covered: 7.2.1) The Tree stucture 7.2.2) Load balancing peer system 7.2.3) True distributed system 7.3) Configuring Squid to talk to other caches 7.3.1) The cache_peer config option All options are covered with examples 7.3.2) cache_peer_domain config option 7.3.3) miss_access acl line Chapter 8) Accelerator mode (11 pages) (? I havent use accelerator mode - I am using Miguel a.l. Parazs page in the Squid Documentation as a guide ?) 8.1) Intro - why use this mode 8.1.1) performance 8.1.2) security 8.2) Types of accelerator mode 8.2.1) Virtual mode (note on security problems) 8.2.2) Host header 8.3) Options 8.3.1) http_port 8.3.2) httpd_accel_host 8.3.3) httpd_accel_port 8.3.4) httpd_accel_with_proxy 8.3.5) httpd_accel_uses_host_header 8.4) Reverse caching using accelerator mode on the return path of an International link See Transparency Chapter 9) Transparency: (24 pages) 9.1) TCP basics 9.2) Operating System function 9.3) Squid accept destination sensing 9.4) Special ACLs to stop loops 9.5) FTP transparency problems 9.6) Routing the actual TCP packets to Squid 9.7) Changing hierarchies to work with transparency Chapter 10) The config file and Squid options (48 pages) The options list doesnt really belong in section (I). I am, instead going to cover it here. Also cover the options to client.
-8-
This covers ALL the tags in the config file. Where the tag has been covered already it refers people to that section of the book. Arranged in alphabetical order. III) Maintainence and Site-Specific Optimization Covers the further development of your cache setup. This covers both maintainence and specialized setups (like transparent caches) Chapter 11) Refresh Patterns: (24 pages) 11.1) distribution of file types (gifs vs jpg vs html) 11.2) distribution of protocols 11.3) Server-Sent Header fields 11.3.1) Work through the types of headers 11.3.2) meta-tags 11.4) Client-Sent Header fields 11.4.1) If-Modified-Since Requests 11.4.2) Refresh button 11.5) refresh_pattern tag First match selection. Describes order of checking each of the fields. Chapter 12) Cache analysis (24 pages) This section covers disadvantages and advantages of the various types of cache performance/savings analysis systems 12.1) access.log fields 12.2) Simple Network Management Protocol (SNMP) configuring, access control, multiple servers, multiple agent configurations, understanding results. Shew! 12.3) Cache-specific analysis using a squid analysis program 12.4) The cachemgr.cgi script Using the output (eg LRU values) for deciding when to buy more disk space etc 12.5) Using a cache-query-tool 12.6) Using your results Graphing response times over the months, for example. Chapter 13) Standby procedures: (15 pages) 13.1) Hardware failure 13.1.1) Standby machines 13.1.2) DNS modification 13.1.3) Automatic configuration 13.2) Software failure 13.3.1) We need lots of info on vmstat, iostat, strace -T, (and other stuff like that) here.
-9-
cachemgr: Slowness: queued DNS queries DNS response times queued username/password authentication requests page faults: vmstat 13.2.2) Consistent crashing - filehandles - memory - all dnsservers busy - slow! - latency of local request, comparing with "client" through cache and without it. Chapter 14) Future directions: (18 pages) 14.1) Wide ranging use of Skycache 14.2) Wide ranging use of transparency 14.3) Very heavily used parents For example at Exchange Points 14.4) compression between server and client - like the berkely thing... Prev Squid Home Next Terminology and Technology
- 10 -
4 What Squid is
Squid is a free, high-speed, Internet proxy-caching program. So, what is a "proxy cache"? According to Project Gutenbergs Online version of Websters Unabridged Dictionary: Proxy. An agent that has authority to act for another. Cache. A hiding place for concealing and preserving provisions which it is inconvenient to carry Squid acts as an agent, accepting requests from clients (such as browsers) and passes them to the appropriate Internet server. It stores a copy of the returned data in an on-disk cache. The real benefit of Squid emerges when the same data is requested multiple times, since a copy of the on-disk data is returned to the client, speeding up Internet access and saving bandwidth. Small amounts of disk space can have a significant impact on bandwidth usage and browsing speed. (?costs?) Internet Firewalls (which are used to protect company networks) often have a proxy component. What makes the Squid proxy different from a firewall proxy? Most firewall proxies do not store copies of the returned data, instead they re-fetch requested data from the remote Internet server each time. Squid differs from firewall proxies in other ways too: Many protocols are supported (firewalls often have specific proxies for specific protocols: its difficult to ensure code security of a large program) Hierarchies of proxies, arranged in complex relationships are possible When, in this book, we refer to a cache, we are referring to a caching proxy - something that keeps copies of returned data. A proxy on the other hand, is a program which do not cache replies. The web consists of HTML pages, graphics and sound files (to name but a few!). Since only a very small portion of the web is made up of text, referring to all cached data as pages is misleading. To avoid ambiguity, caches store objects, not pages. (? trash Many Internet servers support more than one protocol. A given server can support more than one type of query protocol. A web server uses the Hyper Text Transfer Protocol (HTTP) to serve data. An older protocol, the File Transfer Protocol (FTP) often runs on web servers too. Muddling them up would be bad. Caching an FTP response and returning the same data to the client on a subsequent
- 11 -
HTTP request would be incorrect. Squid uses the complete URL to uniquely identify everything stored in the cache. So as to avoid returning out of date data to clients, objects must be expired. Squid therefore allows you to set refresh times for objects, ensuring old data is not returned to clients. Squid is based on software developed for the Harvest project, which developed their cached (pronounced Cache-Dee) as a side project. Squid development is funded by the National Laboratory of Network Research (NLANR), who are in turn funded by the National Science Foundation (NSF). Squid is open source software, and although development is done mainly with NSF funding (??), features are added and bugs fixed by a team of online collaborators.
- 12 -
4.1.2.1 Costs
Outside of the USA and Canada, bandwidth is expensive. Saving bandwidth reduces Internet infrastructural costs significantly. Since Internet connectivity is so expensive, ISPs and their customers reduce their bandwidth requirements with caches.
4.1.2.2 Latency
Although reduction of latency is not normally the major reason for introducing caching in these countries, the problems experienced in the USA are exacerbated by the high latency and lower speed of the lines to the USA. Prev Overall Layout (for writers) Home Next What Squid is not
- 13 -
Prev
Next
- 14 -
Prev
Next
6 Supported Protocols
6.1 Supported Client Protocols
Squid supports the following incoming protocol request types (when the proxy requests are sent in HTTP format) HyperText Transfer Protocol (HTTP), which is the specification that the WWW is based on. File Transfer Protocol (FTP) Gopher Wide Area Information (?Server?) (WAIS) (With the appropriate relay server.) Secure Socket Layer - which is used for secure online transactions.
- 15 -
Prev
Next
- 16 -
Prev
Next
8 Firewall Terminology
Firewalls are used by many companies to protect their networks. Squid is going to have to interact with your firewall to be useful. So that we are on the same wavelength, I cover some of the terminology here: it makes later descriptions easier if we get all the terms sorted out first.
- 17 -
number of client machines setup to talk to the firewall as a proxy, the prospect of having to change all their setups can influence your decision on where to position the cache server. In many cases its easier to re-configure the firewall to communicate with a parent, than to change the proxy server settings on all client machines. The vast majority of proxy-level firewalls are able to talk to another proxy server using HTTP. This feature is sometimes called a hand-off, and it is this which allows your firewall to talk to a higher-level firewall or cache server via HTTP. Hand-offs allow you to have a stack of firewalls, with higher-up firewalls protecting your entire company from outside attacks, and with lower-down firewalls protecting your different divisions from one another. When a firewall hands-off a request to another firewall or proxy server, it simply acts as a pipe between the client and the remote firewall. The term hand-off is a little misleading, since it implies that the lower-down firewall is somehow less involved in the data transfer. In reality the proxy process handling such a request is just as involved as when conversing directly with a destination server, since it is channeling data between the client and the firewall the connection was handed to. The lower-down firewall is, in fact, treating the higher-up cache as a parent. Prev Inter-Cache Communication Protocols Home Up Next Installing Squid
- 18 -
10 Hardware Requirements
Caching stresses certain hardware subsystems more than others. Although the key to good cache performance is good overall system performance, the following list is arranged in order of decreasing importance: Disk random seek time Amount of system memory Sustained disk throughput CPU power Do not drastically underpower any one subsystem, or performance will suffer. In the case of catastrophic hardware failure you must have a ready supply of alternate parts. When your cache is critical, you should have a (working!) standby machine with operating system and Squid installed. This can be kept ready for nearly instantaneous swap-out. This will, of course, increase your costs, something that you may want to take into account. Chapter 13 covers standby procedures in detail.
- 19 -
When gathering statistics, make sure that you judge the peak number of requests, rather than an average value. You shouldnt take the number of requests per day and divide, since your peak (during, for example, lunch hour) can be many times your average number of requests. Its a very good idea to over-estimate hardware requirements. Stay ahead of the growth curve too, since an overloaded cache can spiral out of control due to a transient network problems If a cache cannot deal with incoming requests for some reason (say a DNS outage), it still continues to accept incoming requests, in the hope that it can deal with them. If no requests can be handled, the number of concurrent connections will increase at the rate that new requests arrive. If your cache runs close to capacity, a temporary glitch can increase the number of concurrent, waiting, requests tremendously. If your cache cant cope with this number of established connections, it may never be able to recover, with current connections never being cleared while it tries to deal with a huge backlog. Squid 2.0 may be configured to use threads to perform asynchronous Input/Output on operating systems that supports Posix threads. Including async-IO can dramatically reduce your cache latency, allowing you to use a less powerful machine. Unfortunately not all systems support Posix threads correctly, so your choice of hardware can depend on the abilities of your operating system. Your choice of operating system is discussed in the next section - see if your system will support threads there.
- 20 -
requests per second = 1000/seek time Squid load-balances writes between multiple cache disks, so if you have more than one data disk your seeks-per-second per disk will be lower. Almost all operating systems will increase random seek time in a semi-linear fashion as you add more disks, though others may have a small performance penalty. If you add more disks to the equation, the requests per second value becomes even more approximate! To simplify things in the meantime, we are going to assume that you use only disks with the same seek time. Our formula thus becomes: 1000 theoretical requests per second = ----------------(seek time)/(number of disks) Lets consider a less theoretical example: I have three disks - all have 12ms seek times. I can thus (theoretically, as always) handle: requests per second = 1000/(12/3) = 1000/4 = 250 requests per second While we are on this topic: many people query the use of IDE disks in caches. IDE disks these days generally have very similar seek times to SCSI disks, and (with DMA-compatible IDE controllers) approach the speed of data transfer without slowing the whole machine down. Deciding how much disk space to allocate to Squid is difficult. For the pilot project you can simply allocate a few megabytes, but this is unlikely to be useful on a production cache. The amount of disk space required depends on quite a few factors. Assume that you were to run a cache just for yourself. If you were to allocate 1 gig of disk, and you browse pages at a rate of 10 megabytes per day, it will take at least 100 days for you to fill the cache. You can thus see that the rate of incoming cache queries influences the amount of disk to allocate. If you examine the other end of the scale, where you have 10 megabytes of disk, and 10 incoming queries per second, you will realize that at this rate your disk space will not last very long. Objects are likely to be pushed out of the cache as they arrive, so getting a hit would require two people to be downloading the object at almost exactly the same time. Note that the latter is definitely not impossible, but it happens only occasionally on loaded caches. The above certainly appears simple, but many people do not extrapolate. The same relationships govern the expulsion of objects from your cache at larger cache store sizes. When deciding on the amount of disk space to allocate, you should determine approximately how much data will pass through the cache each day. If you are unable to determine this, you could simply use your theoretical maximum transfer rate of your line as a basis. A 1mb/s line can transfer about 125000 bytes per second. If all clients were setup to access the cache, disk would be used at about 125k per second, which translates to about 450 megabytes per hour. If the bulk of your traffic is transferred during the day, you are probably transferring 3.6 gigabytes per day. If your line was 100% used, however, you would probably have upgraded it a while ago, so lets assume you transfer 2 gigabytes per day. If you wanted to keep ALL data for a day, you would have to have 2 gigabytes of disk for Squid. The feasibility of caching depends on two or more users visiting the same page while the object is still on disk. This is quite likely to happen with the large sites (search engines, and the default home pages in respective browsers), but the chances of a user visiting the same obscure page is slim, simply due to
- 21 -
the volume of pages. In many cases the obscure pages are on the slowest links, frustrating users. Depending on the number of users requesting pages you should keep pages for longer, so that the chances of different users accessing the same page twice is higher. Determining this value, however, is difficult, since it also depends on the average object size, which, in turn, depends on user habits. Some people use RAID systems on their caches. This can dramatically increase availability, but a RAID-5 system can reduce disk throughput significantly. If you are really concerned with uptime, you may find a RAID system useful. Since the actual data in the cache store is not vital, though, you may prefer to manually fail-over the cache, simply re-formatting or replacing drives. Sure, your cache may have a lower hit-ratio for a short while, but you can easily balance this minute cost against what hardware to do automatic failover would have cost you. You should probably base your purchase on the bandwidth description above, and use the data discussed in chapter 11 to decide when to add more disk.
- 22 -
Home
- 23 -
Prev
Next
11.1 Experience
If you normally work on a specific operating system, you should probably not use your cache as a system to experiment with a new flavor of Unix. If you have more experience in an operating system, you should use that system as the basis for your cache server. Customers rapidly turn off caching when a cache stops accepting requests (while you learn your way around some feature). Your cache system will almost certainly form a core part of your network as soon as it is stable. You must be able to return the system to working order in minimal time in the event of a system failure, and this is where your existing experience becomes crucial. If the failure happens out of business hours you may not be able to get technical support from your vendor. A dialup ISPs hours of business differ dramatically to that of Operating System vendors.
- 24 -
11.2 Features
Though most operating systems support similar features, there are often no standards for functions required for some of the less commonly used operating system features. One example is transparency: many operating systems can now support transparent redirection to a local program, but almost all of them function in a different way, since there is not a real standard for the way an operating system is supposed to function in this scenario. If you are unable to find information about Squid on your operating system, you may want to organize a trial hardware installation (assuming that you are using a commercial operating system) as a test. Only when you have the system running can you be sure that your operating system supports the required features. Squid works on the following systems: (? List ?) If you are using Squid without extensions like transparency and ARP access control lists, you should not have problems. For your convenience a table of operating system support of specific features is included. Since Squid is constantly being developed, its likely that this list will change.
11.3 Compilers
Squid is written on Digital Unix (?version ?) machines running the GNU C compiler (GCC). GCC is included with free operating systems such as Linux and FreeBSD, and is easily available for many other operating systems and hardware platforms. The GNU compiler adheres as closely to the ANSI C standard as possible, so if a different compiler is included with your operating system, it may (or may not) have trouble interpreting Squids source code, depending on its level of ANSI compliance. In practice, most compilers work fine. Some commercial compilers choose backward compatibility with older versions over ANSI compliance. These compilers generally support an option that turns on ANSI compliant mode. If you have trouble compiling Squid you may have to turn this mode on. (? is this still valid? I remember things like this back in the Borland C days - though I seem to remember this on a Unix system too... ?) In the worst possible scenario you may have to compile GCC with your existing compiler and use GCC to compile Squid. If you do not have a compiler, you may be able to find a precompiled version of GCC for your system on the Internet. Be very careful when installing software from untrusted sources. This is discussed shortly in the "precompiled binary" section. If you cannot find versions of GCC for your platform, you may have to factor in the cost of the compiler when deciding on your operating system and hardware. Prev Installing Squid Home Up Next Basic System Setup
- 25 -
Prev
Next
- 26 -
When you upgrade to the latest version of Squid, its a good idea to keep the old working compiled source tree somewhere. If you upgrade to the latest Squid and encounter problems, simply kill Squid, change to the previous source directory and reinstall the old binaries. This is a lot faster than trying to remember which source tree you were running, downloading it, compiling it, applying local patches and then reinstalling.
- 27 -
Prev
Next
13 Getting Squid
Now that your machine is ready for your Squid install, you need to download and install the Squid program. This can be done in two ways: you can download a source version and compile it, or you can download a precompiled binary version and install that, relying on someone else to do the compilation for you. Binary versions of Squid are generally easier to install than source code versions, specifically if your operating system vendor distributes a package which you can simply install. Installing Squid from source code is recommended. This method allows you to turn on compile-time options that may not be included in distributed binary versions (one of many examples: SNMP support is not included into the source at compile time unless it is specifically included, and most binary versions available do not include snmp support). If your operating system has been optimized so that Squid can run better (lets say you have increased the number of open filehandles per process) a precompiled binary will not take advantage of this tuning, since your compiler header files are probably different to the ones where the binaries where compiled. Its also a little worrying running binaries that other people distribute (unless, of course, they are officially supplied by your operating system vendor): what if they have placed a trojan into the binary version? To ensure the security of your system it is recommended that you compile from the official source tree. Since we suggest installing from source code first, we cover that first: if you have to download a Squid binary from somewhere, simply skip to the next sub-section: Getting a binary version of Squid.
- 28 -
Squid source is normally available via FTP (the File Transfer Protocol), so you should be able to download Squid source by using the ftp program, available on almost every Unix system. If you are not familiar with ftp, you can simply select the mirror closest to you with your browser and save the Squid source to your disk by right-clicking on the filename and selecting save as (do not simply click on the filename - many browsers attempt to extract compressed files, printing the tar file to your browser window: this is definitely not what you want!). Once the download is complete, transfer the file to the cache machine.
- 29 -
Prev
Next
14 Compiling Squid
Compiling Squid is quite easy: you need the right tools to do the job, though. First, lets go through getting the tools, then you can extract the source code package, include optional Squid components (using the configure command) and then actually compile the distributed code into a binary format. A word of warning, though: this is the stage where most people run into problems. If you havent compiled source before, try and follow the next section in order - it shouldnt be too bad. If you dont manage to get Squid running, at least you have gained experience.
- 30 -
- 31 -
The first time you run configure you should run it in verbose mode. The configure process can take a while on slower machines, so you should get an idea as to how long it will take to run. Should you need to submit a bug report, you should always include as much information as possible, and should include the full configure output.
- 32 -
./configure --enable-dl-malloc
14.3.5 Asynchronous IO
Squid 2.0 includes a major performance increase in the form of Async-IO. Its important to remember that Squid is one processes. In many Internet daemons, more than one copy runs at a time, so if one process is by a system call, it does not effect the other running copies. Squid is only one process. If the main loop stops running for some reason, all connections are slowed. In all versions of Squid, the main loop uses the select and poll system calls to decide which connections to service. As Squid receives data from the server, it writes the data to disk and to the client. To write data to disk, a file has to be opened on the cache drive. When lots of clients are opening and closing connections to a busy cache, the main loop has to make lots of calls to open and close network and disk filehandles (note that the word filehandle can refer to both a network connection and an on-disk file). These two functions block the flow of all data through the cache. While waiting for open to return, Squid cannot perform any other functions. When you enable Async-IO, Squid 2.0 uses threads to open and close filedescriptors. A thread is part of the main Squid program in most ways, except that if it makes use of a blocking system call (such as open), only the thread stops, not the main loop or other threads. Note that there is not one thread per connection. Using threads to make calls to blocking function calls reduces the latency that a cache adds to each request. (People sometimes worry about the latency that caches add, but if you have a fast enough cache the latency is not an issue - the client sees no noticeable overhead. Network overhead normally outweighs Squid overhead). Async-IO drastically reduces cache overhead when you have a loaded cache. Unfortunately Posix threads arent available on all operating systems. This ties your hardware choice into your choice of operating system, since if your operating system does not support threads there may be no choice but to use a faster system, or even to split the load between multiple machines. (? need a table of machines that work ?) You should probably try and run Squid with Async-IO enabled if you have a few thousand requests per hour. Some systems only support threads properly with a fair amount of initial setup. If your load is low and Async-IO doesnt work straight away you can leave Squid in the default configuration.
- 33 -
Use the --enable-async-io configure option to include the async-io code into Squid.
- 34 -
- 35 -
- 36 -
- 37 -
- 38 -
- 39 -
To make changes detailed in this chapter you are going to have to skip around in the config file a bit. Its probably easiest to simply search for the options discussed in each subsection of this chapter, but if you have some time it will be best if you read through the config file, so that you have an idea of how sections fit together. The chapter also points out options that may have to be changed on the other 10% of machines. If you have a firewall, for example, you will almost certainly have to configure Squid differently to someone that doesnt.
- 40 -
Prev
Next
- 41 -
Prev
Next
- 42 -
Since the format of proxy requests is so similar to a normal HTTP request, it is not especially surprising that many web servers can function as proxy servers too. Changing a web server program to function as a proxy normally involves comparatively small changes to the code, especially if the code is written in a modular manner - as is the Apache web server. In many cases the resulting server is not as fast, or as configurable, as a dedicated cache server can be. The CERN web server httpd was the first widely available web proxy server. The whole WWW system was initially created to give people easy access to CERN data, and CERN HTTPD was thus the de-facto test-bed for new additions to the initial informal HTTP specification. Most (and certainly at one stage all) of the early web sites ran the CERN server. Many system administrators who wanted a proxy server simply used their standard CERN web server (listening on port 80) as their proxy server, since it could function as one. It is easy for the web server to distinguish a web site request from a normal web page request, since it simply has to check if the full URL is given instead of simply a path name. Given the choice (even today) many system administrators would choose port 80 as their proxy server port simply as port 80 is the standard port for web requests. There are, however, good reasons for you to choose a port other than 80. Running both services on the same port meant that if the system administrator wanted to install a different web server package (for extra features available in the new software) they would be limited to software that could perform both as a web server and as a proxy. Similarly, if the same sysadmin found that their web servers low-end proxy module could not handle the load of their ever-expanding local client base, they would be restricted to a proxy server that could function as a web server. The only other alternative is to re-configure all the clients, which normally involves spending a few days apologizing to users and helping them through the steps involved in changing over. Microsoft use the Microsoft web server (IIS) as a basis for their proxy server component, and Microsoft proxy thus only (? tried once - lets see if its changed since ?) accepts incoming proxy request on port 80. If you are installing a Squid system to replace either CERN, Apache or IIS running in both web-server and cache-server modes on the same port, you will have to set http_port to 80. Squid is written only as a high-performance proxy server, so there is no way for it to function as a web server, since Squid has no support for reading files from a local disk, running CGI scripts and so forth. There is, however, a workaround. If you have both services running on the same port, and you cannot change your client PCs, do not despair. Squid can accept requests in web-server format and forward them to another server. If you have only one machine, and you can get your web server software to accept incoming requests on a non-default port (for example 81), Squid can be configured to forward incoming web requests to that port. This is called accelerator mode (since its initial purpose was to speed up very slow web servers). Squid effectively does some translation on the original request, and then simply acts as if the request were a proxy request and connects to the host: the fact that its not a remote host is irrelevant. Accelerator mode is discussed in more detail in chapter 9. Until then, get Squid installed and running on another port, and work your way through the first couple of chapters of this book, until you have a working pilot-phase system. Once Squid is stable and tested you can move on to changing web server settings. If you feel adventurous, however, you can skip there shortly!
- 43 -
- 44 -
Prev
Next
- 45 -
Prev
Next
- 46 -
Home Up
- 47 -
Prev
Next
Rule sets like the above are great for small organisations: they are straight forward. For large organizations, though, things are more convenient if you can create classes of users. You can then allow or deny classes of users in more complex relationships. Lets look at an example like this, where we duplicate the above example with classes of users:
- 48 -
Sure, its more complex for this example. The benefits only become apparent if you have large access lists, or when you want to integrate refresh-times (which control how long objects are kept) and the sources of incoming requests. I am getting quite far ahead of myself, though, so lets skip back. We need some terminology to discuss access control lists, otherwise this could become a rather long chapter. So: lines beginning with acl are (appropriately, I believe) acl lines. The lines that use these acls (such as http_access and icp_access in the above example) are called acl-operators. An acl-operator can either allow or deny a request. So, to recap: acls are used to define classes. When Squid accepts a request it checks the list of acl-operators specific to the type of request: an HTTP request causes the http_access lines to be checked; an ICP request checks the icp_access lists. Acl-operators are checked in the order that they occur in the file (ie from top to bottom). The frst acl-operator line that matches causes Squid to drop out of the acl list. Squid will not check through all acl-operators if the first denies the request. In the previous example, we used a src acl: this checks that the source of the request is within the given IP range. The src acl-type accepts IP address lists in many formats, though we used the subnet/netmask in the earlier example. CIDR (Classless Internet Domain Routing) notation can also be used here. Here is an example of the same address range in either notation: Example 4-4. CIDR vs Netmask Source-IP Notation
acl mynet1 src 10.1.0.0/255.0.0.0 acl mynet2 src 10.2.0.0/16
Access control lists inherit permissions when there is no matching acl If all acl-operators in the file are checked, and no match is found, the last acl-operator checked determines whether the request is allowed or denied. This can be confusing, so its normally a good idea to place a final "catch-all" acl-operator at the end of the list. The simplest way to create such an operator is to create an acl that matches any IP address. This is done with a src acl with a netmask of all 0s. When the netmask arithmetic is done, Squid will find that any IP matches this acl. Your cache server may well be on the network placed in the relevant allow lists on your cache, and if you were thus to run the client on the cache machine (as opposed to another machine somewhere on your network) the above acl and http_access rules would allow you to test the cache. In many cases, however, a program running on the cache server will end up connecting to (and from) the address 127.0.0.1 (also known as localhost). Your cache should thus allow requests to come from the address 127.0.0.1/255.255.255.255. In the below example we dont allow icp requests from the localhost address, since there is no reason to run two caches on the same machine.
- 49 -
The squid.conf file that comes with Squid includes acls that deny all HTTP requests. To use your cache, you need to explicitly allow incoming requests from the appropriate range. The squid.conf file includes text that reads: # # INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS # To allow your client machines access, you need to add rules similar to the below in this space. The default access-control rules stop people exploiting your cache, its best to leave them in. Example 4-5. Example Complete ACL list
# # INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS # # acls for my network addresses acl my-iplist-1 src 192.168.1.0/24 acl my-iplist-2 src 10.0.0.0/255.255.0.0 # Check that requests are from users on our network http_access allow my-iplist-1 http_access allow my-iplist-2 icp_access allow my-iplist-1 icp_access allow my-iplist-2 # allow requests from the local machine (for testing and the like) http_access allow localhost # End of locally-inserted rules http_access deny all
- 50 -
The following is a set of operators are based on the final configuration created in the previous section, but using never_direct and always_direct operators. It is assumed that all servers that you wish to connect to directly are in the address ranges specified in with the my-iplist directives. In some cases you may run a web server on the same machine as the cache server, and the localhost acl is thus also considered local. The always_direct and never_direct tags are covered in more detail in Chapter 7, where we cover hierarchies in detail. Example 4-6. Using always and never_direct
# acls for my network addresses acl my-iplist-1 src 192.168.1.0/24 acl my-iplist-2 src 10.0.0.0/255.255.0.0 # Various programs running on the cache box connect to Squid, so its # useful to allow connections from the localhost address. acl localhost src 127.0.0.1/255.255.255.255 # used to deny all requests: Since the netmask is all 0s, any request # matches this acl acl all src 0.0.0.0/0.0.0.0 # Check that requests are from users on our network http_access allow my-iplist-1 http_access allow my-iplist-2 icp_access allow my-iplist-1 icp_access allow my-iplist-2 # check the localhost acl as a special case http_access allow localhost # If the requests comes from any other IP, deny all access. http_access deny all # always go direct to local machines always_direct allow my-iplist-1 always_direct allow my-iplist-2 # never go direct to other hosts never_direct allow all
Squid always attempts to cache pages. If you have a large Intranet system, its a waste of cache store disk space to cache your Intranet. Controlling which URLs and IP ranges not to cache are covered in detail in chapter 6, using the no_cache acl operator. Prev Effective User and Group ID Home Up Next Communicating with other proxy servers
- 51 -
Prev
Next
- 52 -
You only need to read this material if one of the following scenarios applies to you: You have to use your Internet Service Providers cache. You have a firewall.
- 53 -
If you are using a proxy-level firewall, your client machines are probably configured to use the firewalls internal IP address as their proxy server. Your firewall could also be running in transparent mode, where it automatically picks up outgoing web requests. If you have a fair number of client machines, you may not relish the idea of reconfiguring all of them. If you fall into this category, you may wish to put your firewall on the outside (or on the DMZ) and configure the firewall to pass requests to the cache, rather than reconfiguring all client machines.
22.2.1.1 Inside
The cache is considered a trusted host, and is protected by the firewall. You will configure client machines to use the cache server in their browser proxy settings, and when a request is made, the cache server will pass the outgoing request to the firewall, treating the firewall as a parent proxy server. The firewall will then, connect to the destination server. If you have a large number of clients configured to use the firewall as their proxy server, you could get the firewall to hand-off incoming HTTP requests back into the network, to the cache server. This is less efficient though, since the cache will then have to re-pass these requests through the firewall to get to the outside, using the parent option to cache_peer. Since the latter involves traffic passing through the firewall twice, your load is very likely to increase. You should also beware of loops, with the cache server parenting to the firewall and the firewall handing-off the caches request back to the cache! As described in chapter 1, Squid will also send ICP queries to parents. Firewalls dont care for UDP packets, and normally log (and then discard) such packets. When Squid does not receive a response from a configured parent, it will mark the parent as down, and proceed to go directly. Whenever Squid is setup to use a parent that does not support ICP, the cache_peer line should include the "default" and "no-query" options. These options stop Squid from attempting to go direct when all caches are considered down, and specify that Squid is not to send ICP requests to that parent. Here is an example config entry: cache_peer inside.fw.address.domain parent 3128 3130 default no-query
22.2.1.2 Outside
There are only two major reasons for you to put your cache outside the firewall: One: Although squid can be configured to do authentication, this can lead to the duplication of effort (you will encounter the "add new staff to 500 servers" syndrome). If you want to continue to authenticate users on the firewall, you will have to put your cache on the outside or on the DMZ. The firewall will thus accept requests from clients, authenticate them, and then pass them on to the cache server. Two: Communicating with cache hierarchies is easy. The cache server can communicate with other systems using any protocol. Sibling caches, for example, are difficult to contact through a proxying firewall. You can only place your cache outside if your firewall supports hand-offs. Browsers inside will connect to the firewall and request a URL, and the firewall will connect to the outside cache and request the page.
- 54 -
If you place your cache outside your firewall, you may find that your client PCs have problems connecting to internal web servers (your intranet, for example, may be unreachable). The problem is that the cache is unable to connect back through to your internal network (which is actually a good thing: dont change that). The best thing to do here is to add exclusions to your browser settings: this is described in Chapter 5 - you should specifically have a look at the section on browser autoconfig. In the meantime, lets just get Squid going, and we will configure browsers once you have a cache to talk to. Since the cache is not protected by the firewall, it must be very carefully configured - it must only accept requests from the firewall, and must not run any strange services. If possible, you should disable telnet, and use something like SSH (Secure SHell) instead. The access control lists (which you will setup shortly) must only allow the firewall, otherwise people will be able to relay their requests through your cache, using your bandwidth. If you place the cache outside the firewall, you client PCs will be configured to use the firewall as their proxy server (this is probably the case already). The firewall must be configured to hand-off client HTTP requests to the cache server. The cache must be configured to only allow HTTP requests when from the firewalls outside IP address. If not configured this way, other Internet users could use your cache server as a relay, using your bandwidth and hardware resources for illegitimate (and possibly illegal) purposes. With your cache server on the outside network, you should treat the machine as a completely untrusted host, lest a cracker find a hole somewhere on the system. It is recommended that you place the cache server on a dedicated firewall network card, or on a switched ethernet port. This way, if your cache server were to be cracked, the cracker would only be able to read passing HTTP data. Since the majority of sensitive information is sent via email, this would reduce the potential for sensitive data loss. Since your cache server only accepts requests from the firewall, there is no cache_peer line needed in the squid.conf. If you have to talk to your ISPs cache you will, of course, need one: see the section on this a bit further back.
22.2.1.3 DMZ
The best place for a cache is your DMZ. If you are concerned with the security of your cache server, and want to be able to communicate with outside cache servers (using ICP), you may want to put your cache on the DMZ. With Squid on your DMZ, internal client PCs are setup to proxy to the firewall. The firewall is then responsible for handing-off these HTTP requests to the cache server (so the firewall in fact treats the cache server as a parent). Since your cache server is (essentially) on the outside of the firewall, the cache doesnt need to treat the firewall as a parent or sibling: it only accepts requests from the firewall: it never passes them to the firewall. If your cache is outside your firewall, you will need to configure your client PCs not to use the firewall as a proxy server for internal hosts. This is quite easy, and is discussed in the chapter on browser configuration.
- 55 -
Since the firewall is acting as a filter between your cache and the outside world, you are going to have to open up some ports on the firewall. The cache will need to be able to connect to port 80 on any machine on the outside world. Since some valid web servers will run on ports other than 80, you should consider allowing connections to any port from the cache server. In short, allow connections to: Port 80 (for normal HTTP requests) Port 443 (for HTTPS requests) Ports higher than 1024 (site search engines often use high-numbered ports) If you are going to communicate with a cache server outside the firewall, you will need even more ports opened. If you are going to communicate with ICP, you will need to allow UDP traffic from and to your cache machine on port 3130. You may find that the cache server that you are peering with uses different ports for reply packets. Its probably a bad idea to open all UDP traffic, though.
- 56 -
Home Up
- 57 -
- 58 -
directory to (say) bin.off, and creating a new bin directory which contains their own squid binary. Use the following commands to set the permissions on this directory correctly: chown root:root /usr/local/squid/ chmod 755 /usr/local/squid/ Since we have already introduced the /usr/local/squid/bin directory, lets set its permissions correctly next. If the directory itself was writeable by malicious users, we would have the same problem that we described above. Lets change it to be owned by root, group root, and make sure that only these root can write to the directory. We also need the files in this directory to be readable (and executable) by everyone, so that normal users can run programs like client. There are no setuid binaries in this directory, and if the rest of the files have the correct permissions, there is no reason not to let users into this directory. cd /usr/local/squid/bin chown root:root . chown root:root * chmod 755 . * Config files all live in the /usr/local/squid/etc/ directory. If a user can write to these files, they can almost certainly do malicious things. Because of this, you should not let normal users edit these files: only users which already have root access should be allowed to edit squid.conf. Earlier in the book, we created a squidadm for these users. The /usr/local/squid/etc/ directory should be owned by root, group squidadm, so that squid-administrators would be able to create and update config files. Many of you will not have encountered chown commands which use more than three numbers before. The following command sets the sticky bit on the directory. Lets assume that my primary group-id is staff (not squidadm.) On some systems, any file that I create will be owned by group staff, even if the directory is owned by the squidadm group. On these systems this would be a security problem: if I create the squid.conf file, people in the staff group may be able to make changes to the file. With the sticky bit set on the directory, any files I create will be owned by the squidadm group. As I have said: this isnt necessary on some operating systems, but these permissions shouldnt have any adverse effect. cd /usr/local/squid/etc chmod 2775 . chown root:squidadm . * When you use RCS (introduced in Chapter 2), the revision history of a file is stored in an RCS logfile. These files will normally be created in the current directory (the ci command appends a comma to the filename to decide the name of the logfile, leading to filenames like squid.conf,v.) If you dont want your directory cluttered with these files, you can create an RCS directory, and move RCS files into it. The Revision Control System only stores logfiles in the current directory if an RCS directory doesnt exist, if one does, all new log files are created in it. If someone can gain access to the log files, they essentially have write access to original file, since when you check a file out (to make changes to it) the log file is considered to be the authoritive source. Dont forget to change the permissions on the RCS log files Squid doesnt create an RCS directory automatically; we create it in the example below.
- 59 -
# first, make the RCS directory cd /usr/local/squid/etc mkdir RCS # move any RCS logfiles into the RCS directory, so that they dont # clutter the config-file directory mv *,v RCS # make sure that the RCS directory is owned by the right people, and # can be writeable by them chown root:squidadm RCS chmod 2770 RCS # change the permissions of the files in the RCS directory to match # newly created files chown root:squidadm RCS/* chmod 770 RCS/* Cache log files should be confidential. You (and other Squid administrators) may have to look at them occasionally, but other users should have no access to the files. Squid runs as the squid user, though, and needs to create the logs, so any directory we make needs to be writeable by the squid user too. chown squid:squidadm /usr/local/squid/logs chmod 770 /usr/local/squid/logs
- 60 -
Lets change the permissions on the cache store so that only squid-administrators can access files in it. Note that you are going to have to repeat this process for every cache_dir in the squid.conf file. mkdir /usr/local/squid/cache/ chown squid:squidadm /usr/local/squid/cache/ chmod 770 /usr/local/squid/cache/ Once the permissions on the cache directories are set correctly, you can run squid -z. Your output should look something like this: cache1:~ # /usr/local/squid/bin/squid -z 1999/06/12 19:15:34| Creating Swap Directories cache1:~ #
Use the chmod o+rx /usr/local/ command to make the directory readable and executable by everyone. Prev Communicating with other proxy servers Home Next Running Squid
- 61 -
Prev
Next
25 Running Squid
Squid should now be configured, and the directories should have the correct permissions. We should now be able to start Squid, and you can try and access the cache with a web browser. Squid is normally run by starting the RunCache script. RunCache (as mentioned ealier) restarts Squid if it dies for some reason, but at this stage we are merely testing that it will run properly: we can add it to startup scripts at a later stage. Programs which handle network requests (such as inetd and sendmail) normally run in the background. They are run at startup, and log any messages to a file (instead of printing it to a screen or terminal, as most user-level programs do.) These programs are often referred to as daemon programs. Squid is such a program: when you run the squid binary, you should be immediately returned to the command line. While it looks as if the program ran and did nothing, its actually sitting in the background waiting for incoming requests. We want to be able to see that Squids actually doing something useful, so we increase the debug level (using -d 1) and tell it not to dissapear into the background (using -N.) If your machine is not connected to the Internet (you are doing a trial squid-install on your home machine, for example) you should use the -D flag too, since Squid tries to do DNS lookups for a few common domains, and dies with an error if it is not able to resolve them. The following output is that printed by a default install of Squid: cache1:~ # /usr/local/squid/bin/squid -N -d 1 -D Squid reads the config file, and changes user-ids here: 1999/06/12 19:16:20| Starting Squid Cache version 2.2.DEVEL3 for i586-pc-linux-gnu... 1999/06/12 19:16:20| Process ID 4121 Each concurrent incoming request uses at least one filedescriptor. 256 filedescriptors is only enough for a small, lightly loaded cache server, see Chapter 12 for more details. Most of the following is diagnostic: 1999/06/12 19:16:20| With 256 file descriptors available 1999/06/12 19:16:20| helperOpenServers: Starting 5 dnsserver processes 1999/06/12 19:16:20| Unlinkd pipe opened on FD 13 1999/06/12 19:16:20| Swap maxSize 10240 KB, estimated 787 objects 1999/06/12 19:16:20| Target number of buckets: 15 1999/06/12 19:16:20| Using 8192 Store buckets, replacement runs every 10 seconds 1999/06/12 19:16:20| Max Mem size: 8192 KB 1999/06/12 19:16:20| Max Swap size: 10240 KB 1999/06/12 19:16:20| Rebuilding storage in Cache Dir #0 (DIRTY) When you connect to an ftp server without a cache, your browser chooses icons to match the files based on their filenames. When you connect through a cache server, it assumes that the page returned will be in html form, and will include tags to load any images so that the directory listing looks
- 62 -
normal. Squid adds these tags, and has a collection of icons that it refers clients to. These icons are stored in /usr/local/squid/etc/icons/. If Squid has permission problems here, you need to make sure that these files are owned by the appropriate users (in the previous section we set permissions on the files in this directory.) 1999/06/12 19:16:20| Loaded Icons. The next few lines are the most important. Once you see the Ready to serve requests line, you should be able to start using the cache server. The HTTP port is where Squid is waiting for browser connections, and should be the same as whatever we set it to in the previous chapter. The ICP port should be 3130, the default, and if you have included other protocols (such as HTCP) you should see them here. If you see permission denied errors here, its possible that you are trying to bind to a low-numbered port (like 80) as a normal user. Try run the startup command is root, or (if you dont have root access on the machine) choose a high-numbered port. Another common error message at this stage is Address already in use. This occurs when another process is already listening to the given port. This could be because Squid is already started (perhaps you are upgrading from an older version which is being restarted by the RunCache script) or you have some other process listening on the same port (such as a web server.) 1999/06/12 19:16:20| Accepting HTTP connections on port 3128, FD 35. 1999/06/12 19:16:20| Accepting ICP messages on port 3130, FD 36. 1999/06/12 19:16:20| Accepting HTCP messages on port 4827, FD 37. 1999/06/12 19:16:20| Ready to serve requests. Once Squid is up-and-running, it reads the cache-store. Since we are starting Squid for the first time, you should see only zeros for all the numbers below: 1999/06/12 19:16:20| storeRebuildFromDirectory: DIR #0 done! 1999/06/12 19:16:25| Finished rebuilding storage disk. 1999/06/12 19:16:25| 0 Entries read from previous logfile. 1999/06/12 19:16:25| 0 Entries scanned from swap files. 1999/06/12 19:16:25| 0 Invalid entries. 1999/06/12 19:16:25| 0 With invalid flags. 1999/06/12 19:16:25| 0 Objects loaded. 1999/06/12 19:16:25| 0 Objects expired. 1999/06/12 19:16:25| 0 Objects cancelled. 1999/06/12 19:16:25| 0 Duplicate URLs purged. 1999/06/12 19:16:25| 0 Swapfile clashes avoided. 1999/06/12 19:16:25| Took 5 seconds ( 0.0 objects/sec). 1999/06/12 19:16:25| Beginning Validation Procedure 1999/06/12 19:16:26| storeLateRelease: released 0 objects 1999/06/12 19:16:27| Completed Validation Procedure 1999/06/12 19:16:27| Validated 0 Entries 1999/06/12 19:16:27| store_swap_size = 21k Prev Starting Squid Home Up Next Testing Squid
- 63 -
Prev
Next
26 Testing Squid
If all has gone well, we can begin to test the cache. True browser access is only covered in the next chapter, and there is a whole chapter devoted to configuring your browser. Until then, testing is done with the client program, which is included with the Squid source, and is in the /usr/local/squid/bin directory. The client program connects to a cache and request a page, and prints out useful timing information. Since client is available on all systems that Squid runs on, and has the same interface on all of them, we use it for the initial testing. At this stage Squid should be in the foreground, logging everything to your terminal. Since client is a unix program, you need access to a command prompt to run it. At this stage its probably easiest to simply start another session (this way you can see if errors are printed in the main window). The client program is compiled to connect to localhost on port 3128 (you can override these defaults from the command line, see the output of client -h for more details.) If you are running client on the cache server, and are using port 3128 for incoming requests, you should be able to type a command like this, and the client program will retrieve the page through the cache server: client http://squid.nlanr.net/ If your cache is running on a different machine you will have to use the -h and -p options. The following command will connect to the machine cache.qualica.comf on port 8080 and retrieve the above web page. Example 5-1. Using the -h and -p client Options
cache1:~ $ /usr/local/squid/bin/client -h cache.qualica.com -p 8080 http://www.ora.com/
The client program can also be used to access web sites directly. As you may remember from reading Chapter 2, the protocol that clients use to access pages through a cache is part of the HTTP specification. The client program can be used to send both "normal" and "cache" HTTP requests. To check that your cache machine can actually connect to the outside world, its a good idea to test access to an outside web server. The next example will retrieve the page at http://www.qualica.com/, and send the html contents of the page to your terminal. If you have a firewall between you and the internet, the request may not work, since the firewall may require authentication (or, if its a proxy-level firewall and is not doing transparent proxying of the data, you may explicitly have to tell client to connect to the machine.) To test requests through the firewall, look at the next section.
- 64 -
A note about the syntax of the next request: you are telling client to connect directly to the remote site, and request the page /. With a request through a cache server, you connect to the cache (as you would expect) and request a whole url instead of just the path to a file. In essence, both normal-HTTP and cache-HTTP requests are identical; one just happens to refer to a whole URL, the other to a file. Example 5-2. Retrieving Pages directly from a remote site with client
cache1:~ $ /usr/local/squid/bin/client -h www.ora.com -p 80 /
Client can also print out timing information for the download of a page. In this mode, the contents of thi page isnt printed: only the timing information is. The zero in the below example indicates that Squid is to retrieve the page until interrupted (with Control-C or Break.) If you want to retrieve the page a limited number of times, simply replace the zero with a number. Example 5-3. Printing timing information for a page download
cache1:~ $ /usr/local/squid/bin/client -g 0 -h www.ora.com -p 80 /
If the request through the cache returned the same page as you retrieved with direct access (you didnt receive an error message from Squid), Squid should be up and running. Congratulations! If things arent going so well for you, you will have received an error message here. Normally, this is because of the acls described in the previous chapter. First, you should have a look at the terminal where you are running Squid (Or, if you are skipping ahead and have put Squid in the background, in the /usr/local/squid/logs/cache.log file.) If Squid encountered some sort of problem, there should be an error or warning in this file. If there are no messages here, you should look at the /usr/local/squid/logs/access.log file next. We havent coverd the details of this file yet, but they are coverded in the next section of this chapter. First, though, lets see if your cache can process requests to internal servers. There are many cases where a request will work to internal servers but not to external machines.
- 65 -
spaces, fields can contain sub-fields, where a "/" indicates the split. When connecting directly to a destination server, field 9 contains two subfields - the key word "DIRECT", followed by the name of the server that it is connecting to. Access to local servers (on your network) should always be DIRECT, even if you have a firewall, as discussed in section 3.1.2. The acl operator always_direct controls this behaviour. 905144366.259 1010 127.0.0.1 TCP_MISS/200 20868 GET http://www.ora.com/ - DIRECT/www.ora.com text/html When you have configured only one parent cache, the hierarchy access type indicates this, and includes the name of that cache. 905144426.435 289 127.0.0.1 TCP_MISS/200 20868 GET http://www.ora.com/ - SINGLE_PARENT/cache1.ora.com text/html There are many more types that can appear in the hierarchy access information field, but these are covered in chapter 11. Another useful field is the Log Tag field, field four. In the following example this is the field "TCP_MISS/200". 905225025.225 609 127.0.0.1 TCP_MISS/200 10089 GET http://www.is.co.za/ - DIRECT/www.is.co.za text/html A MISS indicates that the request was already stored in the cache (or that the page contained headers indicating that the page was not to be cached). A HIT would indicate that the page was already stored in the cache. In the latter case the request time for a remote page should be substantially less than the first occurence in the logs. The time that Squid took to service the request is the second field. This value is in milliseconds. This value should approach that returned by examining a client request, but given operating system buffering there is likely to be a discrepancy. The fifth field is the size of the page returned to the client. Note that an aborted request can end up downloading more than this from the origin server if the quick_abort feature set is turned on in the Squid config file. Here is an example request direct from the origin server: 905230201.136 6642 127.0.0.1 TCP_MISS/200 20847 GET http://www.ora.com/ DIRECT/www.ora.com text/html If we use client to fetch the page a short time later, a HIT is returned, and the time is reduced hugely. 905230209.899 151 127.0.0.1 TCP_HIT/200 20869 GET http://www.ora.com/ - NONE/- text/html Some of you will have noticed that the size of the hit has increased slightly. If you have checked the size of a request from the origin server and compared it to that of the same page through the cache, you will also note that the size of the returned data has increased very slightly. Extra headers are added to pages passing through the cache, indicating which peer the page was returned from (if applicable), age information and other information. Clients never see this information, but it can be useful for debugging. Since Squid 1.2 has support for HTTP/1.1, extra features can be used by clients accessing a copy of a page that Squid already has. Certain extra headers are included into the HTTP headers returned in HITS, indicating support for features which are not available to clients when returning MISSes. In the above example Squid has included a header in the page indicating that range-request are supported. If Squid is performing correctly, you should shut Squid down and add it to your startup files. Since Squid maintains an in-memory index of all objects in the cache, a kill -9 could cause corruption, and should never be used. The correct way to shutdown Squid is to use the command: cache1:~ # ~squid/bin/squid -k shutdown Squid command-line options are covered in chapter 10. 3.4) Addition to Startup Files The location of startup files vary from system to system. The location and naming scheme of these files is beyond the scope of this book. If you already have a local startup file, its a pretty good idea to simply add the RunCache program to that file. Note that you should place RunCache in the background on startup, which is normally done by placing an & after the command: /usr/local/bin/RunCache & The RunCache program attempts to restart Squid if it dies for some reason, and logs basic Squid debug output both to the file "/usr/local/squid/squid.out" and to syslog. Prev Running Squid Home Up Next Browser Configuration
- 66 -
28 Browsers
Squid is the server half of a client-server relationship. Though you have configured Squid, your client (the browser) is still configured to talk to the menagerie of servers that make up the Internet. You have already used the client program included with Squid to test that the cache is working. Browsers are more complicated to configure than client, especially since there are so many different types of browser. This chapter covers the three most common browsers. It also includes information on the proxy configuration of Unix tools, since you may wish to use these for automatic download of pages. Once your browser is configured, some of the proxy-oriented features of browsers are covered. Many browsers allow you to force your cache server to reload the page, and have other proxy-specific features. So that you can skip sections in this chapter that you dont need to read, browsers are configured in the following order: Netscape Communicator, Microsoft Internet Explorer, Opera and finally Unix Clients. You can configure most browsers in more than one way. The first method is the simplest for a sysadmin, the second is simplest for the user. Since this book is written for system administrators, we term the first basic configuration, the second advanced configuration.
- 67 -
- 68 -
- 69 -
You will probably wish to exclude all local sites too. Since the exception list allows you to use a * character for what is known as a wildcard match, you can add *.localdomain.example, and all hosts in your domain will be accessed directly. Many people access local sites by IP address, rather than by name. Since the exception list matches against the URL (??) these will still pass through the cache, and you will need to add an IP address range to the list of hosts to exclude: 192.168.0.* should do nicely. To reduce the local browser cache space (as discussed in the Netscape section in the previous section): View Options General In the Temporary Internet files section, click the Settings button. Move the slider all the way to the left. Since Squid-2.0 and above handle HTTP/1.1 correctly, you should also configure Internet Explorer to use HTTP/1.1 when communicating with the proxy server: View Internet Options Advanced tab Scroll down until you see HTTP 1.1 Settings Tick Use HTTP 1.1 through proxy server (? I believe that opera is the third most common browser ?) (? I dont have a machine with it on... since I run Linux?)
- 70 -
export ftp_proxy tcsh. The C Shell setenv http_proxy http://cache.domain.example:3128/ OR setenv ftp_proxy http://cache.domain.example:3128/ (? ksh, others ?) Prev Testing Squid Home Next Browser-cache Interaction
- 71 -
Prev
Next
29 Browser-cache Interaction
The Internet is a transient place, and at some stage a server that does not correctly handle caching will be found. You can easily add the server to the appropriate do not cache lists, but most browsers give users a way of forcing a web page reload. Netscape Communicator. Pressing the Reload button forces the cache server to reload the HTML page that is being viewed. Holding down the key and pressing Reload forces a reload of all the objects on the current page. page or graphic brings up a menu where you can select reload, which forces a re-get of the page. If you right click in a frame, you can re-load only the frame. Microsoft Internet Explorer 4. With Internet Explorer there is no difference between a reload and a reload. A reload also does a different type of request, which essentially checks if the cache considers the page to be fresh. If the refresh rules on the cache are set to refresh in a long time, the page will come from the cache, and will not be re-fetched from the origin server. Lynx. Pressing will force a reload of the page. Prev Browser Configuration Home Up Next Testing the Cache
- 72 -
Prev
Next
- 73 -
Prev
Next
31 Cache Auto-config
Client browsers can have all options configured manually, or they can be configured to download a autoconfig file (every time the startup), which provides all of the information about your cache setup. Each URL referenced (be it the URL that you typed, or the URL for a graphic on the page yet to be retrieved) is checked against the list of rules. You should keep the list of rules as short as possible, otherwise you could end up slowing down page loads - not at the cache level, but at the browser.
31.1.1 Apache
On some systems Apache already defines the autoconfig mime type. The Apache config file mime.types is used to associate filename extensions with mime types. This file is normally stored in the apache conf directory. This directory also contains the access.conf and httpd.conf files, which you may be more familiar with editing. As you can probalby see, the mime.types file consists of two fields: a mime type on the left, the associated filename extension on the right. Since this file is only read at startup or reconfigure, you will need to send a HUP signal to the parent apache process for your changes to take affect. The following line should be added to the file, assuming that it is not already included: application/x-ns-proxy-autoconfig Example 6-1. Restarting Apache
cd /usr/local/lib/httpd/logs kill -HUP cat httpd.pid
pac
31.1.3 Netscape
(? or here ?)
- 74 -
The next example gets the browser to connect to the cache server named cache.domain.example on port 3128. If the machine is down for some reason, an error message will be returned to the user. Example 6-3. Connecting to a cache server
function FindProxyForURL(url, host) { return "PROXY cache.domain.example:3128"; }
As you may be able to guess from the above, returning text with a semicolon (;) splits the answer returned into two sub-strings. If the first cache server is unavailable, the second will be tried. This provides you with a failover mechanism: you can attempt a local proxy server first and, if it is down, try another proxy. If all are down, a direct attempt will be made. After a short period of time, the proxy will be retried. A third return type is included, for SOCKS proxies, and is in the same format as the HTTP type: return "SOCKS socks.domain.example:3128"; If you have no intranet, and require no exclusions, you should use the above autoconfig file. Configuring machines with above autoconfig file allows you to add future required exclusions very easily.
- 75 -
31.2.2.1 dnsDomainIs
Returns true if the first argument (normally specified as the variable host, which is defined in the autoconfig function by default) is in the domain specified in the second argument. Checks if a host is in a domain. Example 6-5. dnsDomainIs
if dnsDomainIs(host,".mydomain.example") { return "DIRECT"; }
You can check more than one domain by using the || Java operator. Since this is a Java operator you can use the layout described in this example in any combination. Example 6-6. Using multiple dnsDomainIs calls
if (dnsDomainIs(host,".mydomain.example")|| dnsDomainIs(host,".anotherdomain.example")) { return "DIRECT"; }
31.2.2.2 isInNet
Sometimes you will wish to check if a host is in your local IP address range. To do this, the browser resolves the name to find the IP address. Do not use more than one isInNet call if you can help it: each call causes the browser to resolve the hostname all over again, which takes time. A string of these calls can reduce browser performance noticeably. The isInNet function takes three arguments: the hostname, and a subnet/netmask pair. Example 6-7. using the isInNet call
if isInNet(host, "192.168.0.0", "255.255.0.0") { return "DIRECT"; }
31.2.2.3 isPlainHostname
Simply checks that there is no full-stop in the hostname (the only argument for this call). Many people refer to local machines simply by hostname, since the resolver library will automatically attempt to look up host.domain.example if you simply attempt to connect to host. For example: typing www in your browser should bring up your web site. Many people connect to internal web servers (such as one sitting on their co-workers desk) by typing in the hostname of the machine. These connections should not pass through the cache server, so many people use a function like the following: Example 6-8. using isPlainHostName to decide if the connection should be direct
- 76 -
31.2.2.4 myIpAddress
Returns the IP address of the machine that the browser is running on, requires no arguments. On a network with more than one cache, your script can use this information to decide which cache to communicate with. In the next subsection we look at different ways of communicating with a local proxy (with minimal manual user intervention), so the example here is comparatively basic. The below example assumes that you have more than two networks: one with a private address range (10.0.0.*), the others with real IP addresses. If the client machine is in the private address range, it cannot connect directly to the destination server, so if the cache is down for some reason they cannot access the Internet. A machine with a real IP address, on the other hand, should attempt to connect directly to the origin server if the cache is down. (? need to check it will work too! ?). Since myIpAddress requires no arguments, we can simply place it in where we would have put host in the isInNet function call. Example 6-9. myIpAddress
if (isInNet(myIpAddress, "10.0.0.0", "255.255.255.0")) { return "PROXY cache.mydomain.example:3128"; } else { return "DIRECT"; }
31.2.2.5 shExpMatch
The shExpMatch function accepts two arguments: a string and a shell expression. Shell expressions are similar to regular expressions, though are more limited. This function is often used to check if the url or host variables have a specific word in them. If you are configuring a ISP-wide script, this function can be quite useful. Since you do not know if a customer will call their machine "intranet" or "intra" or "admin", you can chain many shExpMatch checks together. Note that in the below example uses a single "intra*" shell expression to match both "intranet" and "intra.mydomain.example". Example 6-10. shExpMatch
if (shExpMatch(host, "intra*")|| shExpMatch(host, "admin*")) { return "DIRECT"; } else { return "PROXY cache.mydomain.example:3128"; }
31.2.2.6 url.substring
This function doesnt take the same form as those described above. Since Squid does not support all possible protocols, you need a way of comparing the first few characters of the destination URL with the list of possible protocols. The function has two arguments. The first is a starting position, the second the number of characters to retrieve. Note that (like C), string start at position 0, rather than at
- 77 -
1. All of this is best demonstrated with an example. The following attempts to connect to the cache for the most common URL types (http, ftp and gopher), but attempts to go directly for protocols that Squid doesnt recognize. Example 6-11. url.substring
if (url.substring(0, 5) == "http:" || url.substring(0, 4) == "ftp:"|| url.substring(0, 7) == "gopher:") return "PROXY cache.is.co.za:8080; DIRECT"; else return "DIRECT";
- 78 -
- 79 -
Carp is used by some cache servers (most notably Microsoft Proxy and Squid) to decide which parent cache to send a request too. Browsers can also use CARP to decide which cache to talk to, using a java auto-config script. For more information (and an example Java script), you should look at the web page http://naragw.sharp.co.jp/sps/ Prev Testing the Cache Home Up Next cgi generated autoconfig files
- 80 -
Prev
Next
- 81 -
Prev
Next
33 Future directions
There has recently been a move towards a standard for the automatic configuration of proxy-caches. New versions of Netscape and Internet Explorer are expected to use the new unknown standard to automatically change their proxy settings. This allows you to manipulate your cache server settings without inconveniencing clients.
33.1 Roaming
Roaming customers have to remove their configured caches, since your access control lists should stop them accessing your cache from another network. Although both problems can be reduced by the cgi-generated configs (discussed above) a firewall between the browser and your cgi server would still mean that roaming users cannot access the Internet. There are changes on the horizon that would help. As more and more protocols take roaming users into account, standards will evolve that make Internet usage plug-and-play. If you are in Tanzania today, plug in your modem and use the Internet. If you are in France in a weeks time, plug in again and (without config changes) you will be ready to go. Progress on standards for autoconfiguration of Internet applications is underway, which will allow administrators to specify config files depending on where a user connects from without something like the cgi kludge above.
33.2 Browsers
Browser support for CARP is not at the stage where it is tremendously useful: once there is a proper standard for setup, its likely to be included into the main browsers. At some stage, expect support for ICP and cache-digests in browsers. The browser will then be able to make intelligent decisions as to which cache to talk to. Since ICP requests are efficient, a browser could send requests for each of the links on a page once it has retrieved the HTML source.
33.3 Transparency
Currently there is a major trend towards transparent caching, not only in the "Outer Internet" (where bandwidth is very expensive), but in the USA. (Transparency is covered in detail in chapter 12.) Transparency has one major advantage: Users do not have to configure their browsers to access the cache.
- 82 -
To backbone providers this means that they can cache all passing traffic. A local ISP would configure their clients to talk to their cache; a backbone provider could then ask their ISP clients to use theirs as parents, but transparent caching has another advantage. A backbone provider is acting as transit for requests that originate on other backbone providers networks. With transparency, a backbone provider reduces this traffic as well as requests from their network to other backbone providers. Assume you place a cache the hop before a major peering point. Here the cache intercepts both incoming requests (from other providers to web servers on your network) and outgoing (from your network to web servers on other providers networks). This will reduce your peering-point usage (by caching outgoing requets for pages), and will also reduce the money you spend on other peoples customers: since you reduce the cost it takes for data to flow out of your network. The latter cost may be minimal, but in times of network trouble it can reduce your latency noticibly. As more and more backbone providers cache pages, more local ISPs will cache ("since its cached further along the path, we may as well implement caching here - its not going to change anything"). Though this will probably cause a drop in the hit rate of the backbone providers, their ever increasing user-base may make up for it. Backbone providers are caching centrally - with large numbers of edge caches (local ISP caches), they are likely to see fewer hits. Certain Inter-University networks have already noticed such a hit rate decline. As more and more universities add local caches, their hit rate falls. Since the Universities are large, its likely that their users will surf the same web page twice. Previously the Inter-University network would have returned the hit for that page, now the Universitys local cache does; this reduces the edge-caches number of queries, and hence its hit rate. Prev cgi generated autoconfig files Home Up Next Ready to Go
- 83 -
Prev
Next
34 Ready to Go
If all has gone well, you should be ready to use your cache, at least on a trial basis. People around your office or division can now be configured to use the cache, and once you are happy with its performance and stability, you can make it a proper service. Prev Future directions Home Up Next Access Control and Access Control Operators
- 84 -
36 Uses of ACLs
The primary use of the acl system is to implement simple access control: to stop other people using your cache infrastructure. (There are other uses of acls, described later in this chapter; in the meantime we are going to discuss only the access control function of acls.) Most people implement only very basic access control, denying access to people that are not on their network. Squids access system is incredibly flexible, but 99% of administrators only use the most basic elements. In this chapter some examples of the less common uses of acls are covered: hopefully you will discover some Squid feature which suits your organization - and which you didnt think was part of Squid before. Prev Ready to Go Home Next Access Classes and Operators
- 85 -
Prev
Squid: A Users Guide Chapter 7. Access Control and Access Control Operators
Next
If the admin connects to the cache from the PC, Squid does the following: Accepts the (HTTP) connection and reads the request Checks the line that reads http_access allow myIP. Since your IP address matches the IP defined in the myIP acl, access is allowed. Remember that Squid drops out of the operator list on the first match. If you connect from a different PC (on the 10.0.*.* network) things are very similar: Accepts the connection and reads the request The source of the connection doesnt match the myIP acl, so the next http_access line is checked.
- 86 -
The myNet acl matches the source of the connection, so access is denied. An error page is returned to the user instead of the requested page. If someone reaches your cache from another netblock (from, say, 192.168.*.*), the above access list will not block access. The reason for this is quite complicated. If Squid works through a set of acl-operators and finds no match, it defaults to using the opposite of the last match (if the previous operator is an allow, the default is to deny; if its a deny, the default is to allow). This seems a bit strange at first, but lets look at an example where this behaviour is used: its more sensible than it seems. The following acl example is nice and simple: its something a first-time cache admin could create. Example 7-2. Only an allow acl-operator
acl myNet src 10.0.0.0/255.255.0.0 http_access allow myNet
A config file with no access lists will allow cache access without any restrictions. An administrator using the above access lists obviously wishes to allow only his network access to the cache. Given the Squid behavior of inverting the last decision, we have an invisible line reading http_access deny all Inverting the last decision is a simple (if not immediately obvious) solution to one of the most common acl mistakes: not adding a final deny all to the end of your acl list. With this new knowledge, have a look at the first example in this chapter: you will see why I said not to use it in your configs. Given that the last operator denies the local network, local people will not be able to access the cache. The remainder of the Internet, however, will! As discussed in chapter 1, the simplest way of creating a catch-all acl is to match requests when they come from any IP address. When programs do netmask arithmetic a subnet of all zeros will match any IP address. A corrected version of the first example dispenses with the myNet acl. Example 7-3. Corrected example 6-1, explicit deny all
acl myIP src 10.0.0.3/255.255.255.255 acl all src 0.0.0.0/0.0.0.0 http_access allow myIP http_access deny all
Once the cache is considered stable and is moved into production, the config would change. http_access lines do add a very small amount of overhead, but thats not the only reason to have simple access rulesets: the less rulesets, the easier your setup is to understand. The below example includes a deny all rule although it doesnt really need one: you may know of the automatic inversion of the last rule, but someone else working on the cache may not. Example 7-4. Example 6-1 once the cache is considered stable
acl myNet src 10.0.0.0/255.255.0.0 acl all src 0.0.0.0/0.0.0.0 http_access allow myNet http_access deny all
You should always end your access lists with an explicit deny. In Squid-2.1 the default config file does this for you when you insert your HTTP acl operators in the appropriate place.
- 87 -
Home Up
- 88 -
Prev
Squid: A Users Guide Chapter 7. Access Control and Access Control Operators
Next
38 Acl lines
The Examples so far have given you an idea of an acl lines layout. Their layout can be symbolized as follows (? Check! ?): acl name type (string|"filename") [string2] [string3] ["filename2"] The acl tag consists of a minimum of three fields: a unique name; an acl type and a decision string. An acl line can have more than one decision string, hence the [string2] and [string3] in the line above.
38.2 Type
So far we have discussed only acls that check the source IP address of the connection. This isnt sufficient for many people: it may be useful for you to allow connections at only certain times, or to only specific domains, or by only some users (using usernames and passwords). If you really want to, you can even combine all of the above: only allow connections from users that have the right password, have the right destination and are going to the right domain. There are quite a few different acl types: the next section of this chapter discusses all of the different types in detail. In the meantime, lets finish the description of the structure of the acl line.
- 89 -
# This line will match requests from either address range: # 10.0.0.0/255.255.255.0 OR 10.1.0.0/255.255.255.0 acl myNets src 10.0.0.0/255.255.255.0 10.1.0.0/255.255.255.0 acl all src 0.0.0.0/0.0.0.0 http_access allow myNets http_access deny all
Large decision lists can be stored in files, so that your squid.conf doesnt get cluttered. Some of the caches I have worked on have had in the region of 2000 lines of acl rules, which could lead to a very cluttered squid.conf file. You can include a file into the decision section of an acl list by placing the filename (with path) in double-quotes. The file simply contains the data set; one datum per line. In the next example the file /usr/local/squid/conf/data/myNets can contain any number of IP ranges, one range per line. Example 7-6.
acl myNets src "/usr/local/squid/conf/data/myNets" acl all src 0.0.0.0/0.0.0.0 http_access allow myNets http_access deny all
While on the topic of long lists of acls: its important to note that you can end up slowing your cache response with very long lists of acls. Checking acls requires CPU time, and long lists can decrease cache performance, since instead of moving data to clients Squid is busy checking access lists. What constitutes a long list? Dont worry about lists with a few hundred entries unless you have a really slow or busy CPU. Lists thousands of lines long can, however, cause problems.
- 90 -
- 91 -
The dst acl type allows one to match accesses by destination domain. This could be used to match urls for popular adult sites, and refuse access (perhaps during specific times). If you want to deny access to a set of sites, you will need to find out these sites IP addresses, and deny access to these IP addresses too. If you just put the IP addresses in, someone determined to access a specific site could find out the IP address associated with that hostname and access it by entering the IP address in their browser. The above is best described with an example. Here, I assume that you want to restrict access to the site www.adomain.example. If you use either the host of nslookup commands, you would find that this server has the IP address 10.255.1.2. Its easiest to just have two acls: one for IPs and one for domains. If the lists get to large, you can simply place them in a file. Example 7-8. Filtering out unwanted destination sites
acl badDomains dstdomain adomain.example acl badIPs dst 10.255.1.2 http_access deny badlist http_access deny badIPs http_access allow myNet http_access deny all
- 92 -
partial words or patterns in URLs or domains. The most common use of regex filters in ACL lists is for the creation of far-reaching site filters: if the url or domain contain a set of banned words, access to the site is denied. If you wish to deny access to sites that contain the word sex in the URL, you would add one acl rule, rather than trying to find every site that has adult material on it. The big problem with regex filters is that not all sites that contain the word sex in the URL are pornographic. By denying these sites you are likely to be infringing peoples rights, and you should refer to a lawyer for advice on the legality of this. Creating a list of sites that you dont want accessed can be tedious. There are companies that sell adult/unwanted material lists which plug into Squid, but these can be expensive. If you cannot justify the cost, you can The url_regex acl type is used to match any word in the URL. Here is an example: Example 7-9. Denying access to sites with the word sex in the URL
acl badURL url_regex -i sex http_access deny badUrl http_access allow myNet http_access deny all
In places where bandwidth is very expensive, system administrators may have no problem with people visiting pornograpic sites. They may, however, want to stop people downloading huge avi files from these sites. The following example would deny downloads of avi files from sites that contain the word sex in the URL. The regular expression below matches any URL that contains the word sex AND ends with .avi. Example 7-10.
acl badURL url_regex -i sex.*\.avi$ http_access deny badUrl http_access allow myNet http_access deny all
The urlpath_regex acl strips off the url-type and hostname, checking instead only the path and filename.
- 93 -
acl bad_dst_TLD dstdom_regex \.com$ \.net$ acl good_src_TLD srcdom_regex \.za$ # allow requests FROM the za domain UNLESS they want to go to \.com or \.net http_access deny bad_dst_TLD http_access allow good_src_TLD
- 94 -
- 95 -
acl ftp proto FTP acl myNet src 10.0.0.0/16 acl all src 0.0.0.0/0.0.0.0 http_access deny ftp http_access allow mynet http_access deny all
The default squid.conf file denies access to a special type of URL, urls which use the cache_object protocol. When Squid sees a request for one of these URLs it serves up information about itself: usage statistics, performance information and the like. The world at large has no need for this information, and it could be a security risk.
- 96 -
One of the best things about Unix is the flexibility you get. If you wanted (for example) only students in their second year on to have access to the cache servers via your Unix machines, you could create a replacement ident server. This server could find out which user that has connected to the cache, but instead of returning the username you could return a string like "third_year" or "postgrad". Rather than maintaining a list of which students are in on both the cache server and the central Unix system, you could simple Squid rules, and the ident server could do all the work where it checks which user is which. Example 7-16. Using Ident to classify users, and using Squid to deny classes
acl responsible ident third_year fourth_year postgrad staff http_access allow responsible
- 97 -
If your region has some sort of local whois server that handles queries in the same way, you can use the as_whois_server Squid config file option to query a different server.
- 98 -
- 99 -
Prev
Squid: A Users Guide Chapter 7. Access Control and Access Control Operators
Next
39 Acl-operator lines
Acl-operators are the other half of the acl system. For each connection the appropriate acl-operators are checked (in the order that they appear in the file). You have met the http_access and icp_access operators before, but they arent the only Squid acl-operators. All acl-operator lines have the same format; although the below format mentions http_access specifically, the layout also applies to all the other acl-operators too. http_access allow|deny [!]aclname [& [!]aclname2 ... ] Lets work through the fields from left to right. The first word is http_access, the actual acl-operator. The allow and deny words come next. If you want to deny access to a specific class of users, you can change the customary allow to deny in the acl line. We have seen where a deny line is useful before, with the final deny of all IP ranges in previous examples. Lets say that you wanted to deny Internet access to a specific list of IP addresses during the day. Since acls can only have one type per acl, you could not create an acl line that matches an IP address during specific times. By combining more than one acl per acl-operator line, though, you get the same effect. Consider the following acls: acl dialup src 10.0.0.0/255.255.255.0 acl work time 08:00-17:00 If you could create an acl-operator that was matched when both the dialup and work acls were true, clients in the range could only connect during the right times. This is where the aclname2 in the above acl-operator definition comes in. When you specify more than one acl per acl-operator line, both acls have to be matched for the acl-operator to be true. The acl-operator function ANDs the results from each acl check together to see if it is to return true of false. You could thus deny the dialup range cache access during working hours with the following acl rules: Example 7-17. Using more than one acl operator on an http_access line
acl myNet src 168.209.2.0/255.255.255.0 acl dialup src 10.0.0.0/255.255.255.0 acl work_hours time 08:00-17:00 # If a connection arrives during work hours, dialup is 1, and # work_hours is 1. When ANDed together the http_access line matches # and denies the client access # during work hours: # 1 AND 1 = TRUE, so the http_access line matches them and # they are denied # after work hours: # 1 AND 0 = FALSE, so the line does not match: the next # http_acess line is checked. Note that # http_access deny dialup work_hours # If its not during work hours, the above line will fail, and the
- 100 -
# next http_access line will be checked. You want to allow dialup # users explicit access here, otherwise they are not caught by the # myNet acl, and are denied by the final deny line. http_access allow dialup http_access allow myNet http_access deny all
You can also invert an acls result value by using an exclamation mark (the traditional NOT value from many programming languages) before the appropriate acl. In the following example I have reduced Example 6-4 into one http_access line, taking advantage of the implicit inversion of the last rule to deny access to all clients. Example 7-18. Specifying more than one acl per http_access line
acl myNet src 10.0.0.0/255.255.0.0 acl all src 0.0.0.0/0.0.0.0 # A request from an outside network: # 1 AND (NOT 0) = True, so the request is denied # A request from an internal network: # 1 AND (NOT 1) = False. Because the last definition # is inverted (see earlier discussions in this chapter # for more detail), the local network is allowed: the # deny is inverted. http_access deny all !myNet # There is an invisible "http_access allow all" here because of the # way Squid inverts the last http_access rule.
Since the above example is quite complicated: lets cover it in more detail: In the above example an IP from the outside world will match the all acl, but not the myNet acl; the IP will thus match the http_access line. Consider the binary logic for a request coming in from the outside world, where the IP is not defined in the myNet acl. Deny http access if ((true) & (!false)) If you consider the relevant matching of an IP in the 10.0.0.0 range, the myNet value is true, the binary representation is as follows: Deny http access if ((true) & (!true)) A 10.0.0.0 range IP will thus not match the only http_access line in the squid config file. Remembering that Squid will default to using the inverse of the last match in the file, accesses will be allowed from the myNet IP range.
- 101 -
always_direct, never_direct snmp_access (covered in the next section of this chapter) delay_classes (covered in the next section of this chapter) broken_posts
If a system cracker is attempting to attack your cache, it can be useful to have their ident value logged. The following example gets Squid not to do ident lookups for machines that are allowed access, but if a request comes from a disallowed IP range, an ident lookup is done and inserted into the log. Example 7-20. Doing ident lookups for unknown machines
- 102 -
acl myNet src 10.0.0.0/255.255.255.0 acl all src 0.0.0.0/0.0.0.0 http_access allow myNet http_access deny all # If the request is from a local machine, dont do an ident query ident_lookup_access deny myNet # If the request is from another network, do an ident query ident_lookup_access allow all
- 103 -
These tags are covered in detail in the following chapter, in the Peer Selection section.
Home Up
- 104 -
Prev
Squid: A Users Guide Chapter 7. Access Control and Access Control Operators
Next
40 SNMP Configuration
Before we continue: if you wish to use Squids SNMP functions, you will need to have configured Squid with the --enable-snmp option, as discussed way back in Chapter 2. The Squid source only includes SNMP code if it is compiled with the correct options. Normally a Unix SNMP server (also called an agent) collects data from the various services running on a machine, returning information about the number of users logged in, the number of sendmail processes running and so forth. As of this writing, there is no SNMP server which gathers Squid statistics and makes them available to SNMP managment stations for interpretation. Code has thus been added to Squid to handle SNMP queries directly. Squid normally listens for incoming SNMP requests on port 3401. The standard SNMP port is 161. For the moment I am going to assume that your management station can collect SNMP data from a port other than 161. Squid will thus listen on port 3401, where it will not interfere with any other SNMP agents running on the machine. No specific SNMP agent or mangement station software is covered by this text. A Squid-specific mib.txt file is included in the /usr/local/squid/etc/ directory. Most management station software should be able to use this file to construct Squid-specific queries.
- 105 -
- 106 -
First, change the snmp_port value in squid.conf to 161. Since we are forwarding requests to another SNMP server, we also need to set forward_snmpd_port to our other-server port, port 3456.
You may have classes of SNMP stations too: you may wish some machines to be able to inspect public data, but others are to be considered completely trusted. The special snmp_community acl type is used to filter requests by destination community. In the following example all local machines are able to get data in the public SNMP community, but only the snmpManager machine is able to get other information. In this example we are using the ANDing of the publicCommunity and myNet acls to ensure that only people on the local network can get even public information. Example 7-25. Using the snmp_community acl type
acl myNet 10.0.0.0/255.255.255.0 acl all src 0.0.0.0/0.0.0.0 acl snmpManager src 10.0.0.2/255.255.255.255 acl publicCommunity snmp_community public http_access allow myNet http_access deny all snmp_access allow snmpManager snmp_access allow publicCommunity myNet # deny people outside of the local network to ALL data, even public snmp_access deny all
Home Up
- 107 -
Prev
Squid: A Users Guide Chapter 7. Access Control and Access Control Operators
Next
41 Delay Classes
Delay Classes are generally used in places where bandwidth is expensive. They let you slow down access to specific sites (so that other downloads can happen at a reasonable rate), and they allow you to stop a small number of users from using all your bandwidth (at the expense of those just trying to use the Internet for work). Many non-US Universities have very small pipes to the Internet. Unfortunately these Universities often end up with huge amounts of their bandwidth being used for surfing that is not study-related. In the US this is fine, since the cost is negligible, but in other countries the cost of this casual surfing is astronomical. To ensure that some bandwidth is available for work-related downloads, you can use delay-pools. By classifying downloads into segments, and then allocating these segments a certain amount of bandwidth (in kilobytes per second), your link can remain uncongested for useful traffic. To use delay-pools you need to have compiled Squid with the appropriate source code: you will have to have used the --enable-delay-pools option when running the configure program back in Chapter 2.
The first line is a standard ACL: it returns true if the requested URL has the word abracadabra in it. The -i flag is used to make the search case-insensitive. The delay_pool_count variable tells Squid how many delay pools there will be. Here we have only one pool, so this option is set to 1. The third line creates a delay pool (delay pool number 1, the first option) of class 1 (the second option to delay_class).
- 108 -
The first delay class is the simplest: the download rate of all connections in the class are added together, and Squid keeps this aggregate value below a given maximum value. The fourth line is the most complex, as if you can see. The delay_parameters option allows you to set speed limits on each pool. The first option is the pool to be manipulated: since we have only one pool in this example, this is set to 1. The second option consists of two values: the restore and max values, seperated by a forward-slash (/). If you download a short file at high speed, you create a so-called burst of traffic. Generally these short bursts of traffic are not a problem: these are normally html or text files, which are not the real bandwidth consumers. Since we dont want to slow everyones access down (just the people downloading comparitively large files), Squid allows you to configure a size that the download is to start slowing down at. If you download a short file, it arrives at full speed, but when you hit a certain threshold the file arrives more slowly. The restore value is used to set the download speed, and the max value lets you set the size at which the files are to be slowed down from. Restore is in kilobytes per second, max is in kilobytes. In the above example, downloads proceed at full speed until they have downloaded 16000 bytes. This limit ensures that small file arrive reasonably fast. Once this much data has been transferred, however, the transfer rate is slowed to 16000 bytes per second. At 8 bits per byte this means that connections are limited to 128kilobits per second (16000 * 8).
- 109 -
In this example, we changed the delay class of the pool to 3. The delay_parameters option now takes four arguments: the pool number; the overall bandwidth rate; the per-network bandwidth rate and the per-user bandwidth rate. The 4kbit per second limit for users seems a little low. You can increase the per-user limit, but you may find that its a better idea to change the max value instead, so that the limit sets in after only (say) 16kilobytes or so. This will allow small pages to be downloaded as fast as possible, but large pages will be brought down without influencing other users. If you want, you can set the per-user limit to something quite high, or even set them to -1, which effectively means that there is no limit. Limits work from right to left, so if I user is sitting alone in a lab they will be limited by their per-user speed. If this value is undefined, they are limited by their per-network speed, and if that is undefined then they are limited by their overall speed. This means that you can set the per-user limit higher than you would expect: if the lab is not busy then they will get good download rates (since they are only limited by the per-network limit).
- 110 -
Again (with time-based acl lists), you can allocate a very small amount of bandwidth to http during working hours, discouraging people from browsing the Web during office hours. By using acls that match specific source IP addresses, you can ensure that sibling caches have full-speed access to your cache. You can prioritize access to a limited set of destination sites by using the dst or dstdomain acl types by inverting the rules we used to slow access to some sites down. You can combine username/password access-lists and speed-limits. You can, for example. allow users that have not logged into the cache access to the Internet, but at a much slower speed than users who have logged in. Users that are logged in get access to dedicated bandwidth, but are charged for their downloads. Prev SNMP Configuration Home Up Next Conclusion
- 111 -
Prev
Squid: A Users Guide Chapter 7. Access Control and Access Control Operators
Next
42 Conclusion
Once your acl system is correctly set up, your cache should essentially be ready to become a functional part of your infrastructure. If you are going to use some of the advanced Squid features (like transparent operation mode, for example), Prev Delay Classes Home Up Next Cache Hierarchies
- 112 -
44 Introduction
Squid is particularly good at communicating with other caches and proxies. Numerous inter-cache communication protocols are supported, including ICP (Inter-Cache Protocol), Cache-Digests, HTCP (Hyper-Text Cache Protocol) and CARP (Cache Array Routing Protocol). Each of these protocols has specific strengths and weaknesses; they are more suited to some circumstances than others. In this chapter we look at each of the protocols in detail. We also look at the different ways that you can structure your cache hierarchy, and work through the config options that effect cache hierarchies. Prev Conclusion Home Next Why Peer
- 113 -
Prev
Next
45 Why Peer
The primary function of an inter-cache protocol is to stop object duplication, increasing hit rates. If you have a large network with widely separated caches, you may wish to store objects in each cache even if one of your other caches has it: by keeping objects close to your users, you reduce their network latency (even if you end up "wasting" disk space in the process.) Inter-branch traffic can be reduced by placing a cache at each branch. Since caches can avoid duplicating objects between them, each disk you add to a cache adds space to the overall hierarchy, increasing your hierarchy hit-rate. This is a lot better than simply having caches at branches which do not communicate with one another, since with that setup you end up end up with multiple copies of each cache object; one per server. Clients can also be configured to query another branches cache if their local one goes down, adding redundancy. If overloaded, a central cache machine can become a network bottleneck. Unlike one cache machine, caches in a hierarchy can be close to all parts of the network; they can also handle a much larger load (with a near-linear increase in performance with each added machine). Loaded caches can thus be replaced with clusters of low-load caches, without wasting disk space. Integrating your caches into a public cache hierarchy can increase your hit rate (since you increase your effective disk space by accessing other machines object stores.) By choosing peers carefully, you can reduce latency, or reduce costs by saving Internet bandwidth (if communicating with your peers is cheaper than going direct to the source.) On the other hand, communicating with peers via loaded (or high-latency) line can slow down your cache. Its best to check your peer response times periodically to check if the peering arrangement is beneficial. You can use the client program to check cache response times, and the cache manager (discussed in Chapter 12) to look at Squids view on the cache. Prev Cache Hierarchies Home Up Next Peer Configuration
- 114 -
Prev
Next
46 Peer Configuration
First, lets look at the squid.conf options available for hierarchy configuration. We will then work through the most common hierarchy structures, so that you can see the way that the options are used. You use the cache_peer option to configure the peers that Squid will communicate with. Other options are then used to select which peer to pass a request to.
The cache_peer option is split into five fields. The first field (cache.domain.example) is the hostname or IP of the cache that is to be queried. The second field indicates the type of relationship, and must be set to either parent or sibling or multicast. The third field sets the HTTP port of the destination server, while the fourth sets the ICP (UDP) query port. The fifth field can contain more than zero or more keywords, although we only use one in the example above; the keyword default sets that the cache will be used as the default path to the outside world. If you compiled Squid to support HTCP, your cache will automatically attempt to connect to TCP port 4827 (there is currently no option to change this port value). Cache digests are transferred via the HTTP port specified on the cache_peer line. Here is a summary of the available cache_peer options: proxy-only. Data retrieved from this remote cache will not be stored locally, but retrieved again on any subsequent request. By default Squid will store objects it retrieves from other caches: by having the object available locally it can return the object fast if its ever requested again. While this is good for latency, it can be a waste of bandwidth, especially if the other cache is on the same piece of ethernet. In the examples section of this chapter, we use this option when load-balancing between two cache servers. weight. If more than one cache server has an object (based on the result of an ICP query), Squid decides which cache to get the data from the cache that responded fastest. If you want to prefer one cache over another, you can add a weight value to the preferred caches config line. Larger values are preferred. Squid times how long each ICP request takes (in milliseconds), and divides the time by the weight value, using the cache with the smallest result. Your weight value should thus not be an unreasonable value. ttl. This tag is covered in the multicast section, later in this chapter. no-query. Squid will send ICP requests to all configured caches. The response time is measured, and used to decide which parent to send the HTTP request to. There is another function of these requests: if there is no response to a request, the cache is marked down. If you are communicating with a cache that does not support ICP, you must use the no-query option: if you dont, Squid will consider that
- 115 -
cache down, and attempt to go directly to the destination server. (If you want, you can set the ICP port on the config line to point to the echo port, port 7. Squid will then use this port to check if the machine is available. Note that you will have to configure inetd.conf to support the UDP echo port.) This option is normally used in conjunction with the default option. default. This sets the host to be the proxy of last resort. If no other cache matches a rule (due to acl or domain filtering), this cache is used. If you have only one way of reaching the outside world, and it doesnt support ICP, you can use the default and no-query options to ensure that all queries are passed through it. If this cache is then down, the client will see an error message (without these options, Squid would attempt to route around the problem.) round-robin. This option must be used on more than one cache_peer line to be useful. Connections to caches configured with this options are spread evenly (round-robined) among the caches. This can be used by client caches to communicate with a group of loaded parents, so that load is spread evenly. If you have multiple Internet connections, with a parent cache on each side, you can use this option to do some basic load-balancing of the connections. multicast-responder. This option is covered in the multicast section later in this chapter. closest-only. no-netdb-exchange. If your cache was configured to keep ICMP (ping) timing information with the --enable-icmp configure option, your cache will attempt to retrieve the remote machines ICMP timing information from any peers. If you dont want this to happen (or the remote cache doesnt support it), you can use the no-netdb-exchange option to stop Squid from requesting this information from the cache. no-delay. Hits from other caches will normally be included into a clients delay-pool information. If you have two caches load-balancing, you dont want the hits from the other cache to be limited. You may also want hits from caches in a nearby hierarchy to come down at full speed, not to be limited as if they were misses. Use the no-delay option to ensure that requests come down at their full speed. login. Caches can be configured to use usernames and passwords on accesses. To authenticate with a parent cache, you can enter a username and password using this tag. Note that the HTTP protocol makes authenticating to multiple cache servers impossible: you cannot chain together a string of proxies, each one requiring authentication. You should only use this option if this is a personal proxy. Prev Why Peer Home Up Next Peer Selection
- 116 -
Prev
Next
47 Peer Selection
Lets say that you have only one parent cache server: the server at your ISP. In Chapter 3, we configured Squid so that the parent cache server would not be queried for internal hosts, so queries to the internal machines went direct, instead of adding needless load to your parent cache (and the line between you). Squid can use access-control lists to decide which cache to talk to, rather than just the destination domain. With access lists, you can use different caches depending on the source IP, domain, text in the URL and more. The advantages of this flexibility are not immediately obvious (even to me), but some examples are given in th remainder of this chapter. First, however, lets cover filtering by destination domain.
- 117 -
valid sites that do contain suspect words in the URL. Example 8-4. Passing suspect urls to a filtering cache
acl suspect_url url_regex "/usr/local/squid/etc/suspect-url-list" acl all src 0.0.0.0/0.0.0.0 cache_peer filtercache.domain.example parent 3128 3130 cache_peer_access filtercache.domain.example allow suspect_url # all other requests go direct cache_peer_access filtercache.domain.example deny all
Lets work through the logic that Squid uses in the above example, so that you can work out which cache Squid is going to talk to when you construct your own rules. First, lets consider a request destined for the web server intranet.mydomain.example. Squid first works through all the always_direct lines; the request is matched by the first (and only) line. The never_direct and always_direct tags are acl-operators, which means that the first match is considered. In this illustration, the matching line instructs Squid to go directly when the acl matches, so all neighboring peers are ignored for this request. If the line used the deny keyword instead of allow, Squid would have simply skipped on to checking the never_direct lines. Now, the second case: a request arrives for an external host. Squid works through the always_direct lines, and finds that none of them match. The never_direct lines are then checked. The all acl matches the connection, so Squid marks the connection as never to be forwarded directly to the origin server. Squid then works through its list of peers, trying to find the cache that the request is best forwarded to (servers that have the object are more likely to get the request, as are servers that respond fast). The algorithm that Squid uses to decide which of its peers to use is discussed shortly.
- 118 -
47.2.6 neighbor_type_domain
You can blur the distinction between peers and a siblings with this tag. Lets say that you work for a very large organization, with many regions, some in different countries. These organizations generally have their own network infrastructure: you will install a link to a local regional office, and they will run links to a core backbone. Lets assume that you work for the regional office, and you have an Internet line that your various divisions share. You also have a link to your head-office, where they have a large cache, and their own Internet link. You peer with their cache (with them set up as a sibling), and you also peer with your local ISPs server. When you request pages from the outside world, you treat your ISPs cache server as a parent, but when you query web servers in your own domain you want the requests to go to your head-offices cache, so that any web sites within your organization are cached. By using the neighbor_type_domain option, you can specify that requests for your local domain are to be passed to your head-offices cache, but other requests are to be passed directly. Example 8-7. Changing the Cache Type by Destination Domain
cache_peer core-cache.mydomain.example sibling 3128 3130 cache_peer cache.isp.example parent 3128 3130 neighbor_type_domain parent mydomain.example
47.3.1 miss_access
The miss_access tag is an acl-operator. This tag has already been covered in the acls chapter (Chapter 6), but is covered here again for completeness. The miss_access tag allows you to create a list of caches which are only allowed to retrieve hits from your cache. If they request an object that is missed, Squid will return an error page denying them access. If the example below is not immediately clear, please refer to Chapter 6 for more information Example 8-8.
acl all src 0.0.0.0/0.0.0.0 acl friendly_company src 10.2.0.3/255.255.255.0 http_access allow friendly_company icp_access allow friendly_company # This line stops the machine 10.2.0.3 from getting hits from our # cache miss_access deny friendly_company miss_access allow all
- 119 -
47.3.2 dead_peer_timeout
If a peer cache has not responded to an ICP request for dead_peer_timeout seconds, the cache will be marked as down, and the object will be retrieved from somewhere else (probably directly from the source.)
47.3.3 icp_hit_stale
Turning this option on can cause problems if you peer with anyone. Prev Peer Configuration Home Up Next Multicast Cache Communication
- 120 -
Prev
Next
- 121 -
cause load on the machine: make sure that the card you buy supports hardware multicast filters). This solution is still not linearly scalable, however, since the reply packets can easily become the bottleneck by themselves.
- 122 -
- 123 -
Prev
Next
49 Cache Digests
Cache digests are one of the latest peering developments. Currently they are only supported by Squid, and they have to be turned on at compile time. Squid keeps its "list" of objects in an in-memory hash. The hash table (which is based on MD5) helps Squid find out if an object is in the cache without using huge amounts of memory or reading files on disk. Periodically Squid takes this table of objects and summarizes it into a small bitmap (suitable for transfer across a modem). If a bit in the map is on, it means that the object is in the store, if its off, the object is not. This bitmap/summary is available to other caches, which connect on the HTTP port and request a special URL. If the client cache (the one that just collected the bitmap) wants to know if the server has an object, it simply performs the same mathematical function that generated the values in the bitmap. If the server has the object, the appropriate bit in the bitmap will be defined. There are various advantages to this idea: if you have a set of loaded caches, you will find that inter-cache communication can use significant amounts of bandwidth. Each request to one cache sparks off a series of requests to all neighboring caches. Each of these queries also causes some server load: the networking stack has to deal with these extra packets, for one thing. With cache-digests, however, load is reduced. The cache digest is generated only once every 10 minutes (the exact value is tunable). The transfer of the digest thus happens fairly seldom, even if the bitmap is rather large (a few 100kbytes is common.) If you were to run 10 caches on the same physical network, however, with each ICP request being a few hundred bytes, the numbers add up. This network load reduction can give your cache time to breathe too, since the kernel will not have to deal with as many small packets. ICP packets are incredibly simple: they essentially contain only the requested URL. Today, however, a lot of data is transferred in the headers of a request. The contents of a static URL may differ depending on the browser that a user uses, cookie values and more. Since the ICP packet only contains the URL, Squid can only check the URL to see if it has the object, not both the headers and the URL. This can (very occasionally) cause strange problems, with the wrong pages being served. With cache digests, however, the bitmap value depends on both the headers AND the url, which stops these strange hits of objects that are actually generated on-the-fly (normally these pages contain cgi-bin in their path, but some dont, and cause problems.) Cache digests can generate a small percentage of false hits: since the list of objects is updated only every 10 minutes, your cache could expire an object a second after you download the summarized index. For the next ten minutes, the client cache would believe your server has data that it doesnt. Some five percent of hits may be false, but they are simply retrieved directly from the origin server if this turns out to be the case. Prev Multicast Cache Communication Home Up Next Cache Hierarchy Structures
- 124 -
Prev
Next
- 125 -
With ICP, there is a chance that an object that is hit is dynamically generated (even if the path does not say so). Cache digests fix this problem, which may make their extra bandwidth usage worthwhile.
50.2 Trees
The traditional cache hierarchy structure involves lots of small servers (with their own disk space, each holding the most common objects) which query another set of large parent servers (there can even be only one large server.) These large servers then query the outside on the client caches behalf. The large servers keep a copy of the object so that other internal caches requesting the page get it from them. Generally, the little servers have a small amount of disk space, and are connected to the large servers by quite small lines. This structure generally works well, as long as you can stop the top-level servers from becoming overloaded. If these machines have problems, all performance will suffer. Client caches generally do not talk to one another at all. The parent cache server should have any object that the lower-down cache may have (since it fetched the object on behalf of the lower-down cache). Its invariably faster to communicate with the head-office (where the core servers would be situated) than another region (where another sibling cache is kept). In this case, the smaller servers may as well treat the core servers as default parents, even using the no-query option, to reduce cache latency. If the head-office is unreachable its quite likely that things may be unusable altogether (if, on the other hand, your regional offices have their own Internet lines, you can configure the cache as a normal parent: this way Squid will detect that the core servers are down, and try to go direct. If you each have your own Internet link, though, there may not be a reason to use a tree structure. You might want to look at the mesh section instead, which follows shortly.) To avoid overloading one server, you can use the round-robin option on the cache_peer lines for each core server. This way, the load on each machine should be spread evenly.
50.3 Meshes
Large hierarchies generally use either a tree structure, or they are true meshes. A true mesh considers all machines equal: there is no set of large root machines, mainly since they are almost all large machines. Multicast ICP and Cache digests allow large meshes to scale well, but some meshes have been around for a long time, and are only using vanilla ICP. Cache digests seem to be the best for large mesh setups these days: they involve bulk data transfer, but as the average mesh size increases machines will have to be more and more powerful to deal with the number of queries coming in. Instead of trying to deal with so many small packets, it is almost certainly better to do a larger transfer every 10 minutes. This way, machines only have to check their local ram to see which machines have the object. Pure multicast cache meshes are another alternative: unfortunately there are still many reply packets generated, but it still effectively halves the number of packets flung around the network.
- 126 -
DNS load balancing is the simplest option: In your DNS file, you simply add two A records for the caches hostname (you did use a hostname for the cache when you configured all those thousands of browsers like I told you, right?) The order that the DNS server returns the names in is continuously, randomly switched, and the client requesting the lookup will connect to a random server. These server machines can be setup to communicate with one-another as peers. By using the proxy-only option, you reduce duplication of objects between the machines, saving disk space (and, hopefully, increasing your hit rate.) There are other load-balancing options. If you have client caches accessing the overloaded server (rather than client pcs), you can configure Squid on these machines with the round-robin option on the cache_peer lines. You could also use the CARP (Cache Array Routing Protocol) to split the load unevenly (if you have one very powerful machine and two less powerful machines, you can use CARP to load the fast cache twice as much as the other machines). Prev Cache Digests Home Up Next The Cache Array Routing Protocol (CARP)
- 127 -
Prev
Next
Now that your cache is integrated into a hierarchy (or is a hierarchy!), we can move to the next section. Accelerator mode allows your cache to function as a front-end for a real web server, speeding up web page access on those old servers. Transparent caches effectively accelerate web servers from a distance (the code, at least, to perform both functions is effectively the same.) If you are going to do transparent proxying, I suggest that you read the next two Chapters. If you arent interested in either of these Squid features, your Squid installation should be up and running. The remainder of the book (Section III) covers cache maintenance and debugging.
- 128 -
Home Up
- 129 -
- 130 -
53.4 Security
Squid can be placed in front of an insecure web server to protect it from the outside world: not merely to stop unwanted clients from accessing the machine, but also to stop people from exploiting bugs in the server code. Prev The Cache Array Routing Protocol (CARP) Home Next Accelerator Configuration Options
- 131 -
Prev
Next
- 132 -
There are a limited number of IP addresses, and they are fast running out. Some systems also have a limited number of IP aliases, which means that you cannot host more than a (fairly arbitrary) number of web sites on machine. If the client were to pass the destination host name along with the path and filename, the web server could listen to only one IP address, and would find the right destination directores by looking in a simple hostname table. From version 1.1 on, the HTTP standard supports a special Host header, which is passed along with every outgoing request. This header also makes transparent caching and acceleration easier: by pulling the host value out of the headers, Squid can translate a standard HTTP request to a cache-specific HTTP request, which can then be handled by the standard Squid code. Turning on the httpd_accel_uses_host_header option enables this translation. You will need to use this option when doing transparent caching. Its important to note that acls are checked before this translation. You must combine this option with strict source-address checks, so you cannot use this option to accelerate multiple backend servers (this is certain to change in a later version of Squid). Prev Accelerator Mode Home Up Next Related Configuration Options
- 133 -
Prev
Next
In the following example, we have changed the config so that the first rule matches (and allows) any request to the machine at IP 10.0.0.5, the accelerated machine. If we did not have the port acl in the below rules, someone could request a URL with a different port number with a request that explicitly specifies a non-standard port. If we were to leave out this rule, it could let a system cracker poke around the system with requests for things like http://server.mydomain.example:25. Example 9-2. After Accelerator Configuration
# the remote server is at 10.0.0.5, port 80 httpd_accel_host 10.0.0.5 httpd_accel_port 80 acl all src 0.0.0.0/0.0.0.0 acl myNet src 10.0.0.0/255.255.255.0
- 134 -
acl acceleratedHost dst 10.0.0.5 acl acceleratedPort port 80 # requests must be the the right host AND the right port to be allowed: http_access allow acceleratedHost acceleratedPort # if they arent accelerated requests, are they at least from my # network? http_access allow myNet http_access deny all
Home Up
- 135 -
Prev
Next
56 Example Configurations
Lets cover two example setups: one, where you are simply using Squids accelerator function so that the machine has both a web server and a cache server on port 80; two, where you are using Squid as an accelerator to speed up a slow machine.
- 136 -
Home Up
- 137 -
- 138 -
If you know your network inside out, and know exactly who would be accessing a site like this, there is probably no problem with using transparent caching. If this is the case, though, it might be easier to simply change all of your users settings. Dialup ISPs generally have little problem implementing transparent caching, since dialup customers almost always get a different IP address whenever they connect. They cannot thus access sites which require a static IP address, so when requests start coming from the cache server there is no problem. ISPs which transparently cache leased-line customers are the most likely to have problems with IP-authenticating servers. If you are phasing transparency in for such an ISP, you must make sure that your customers know all the implications. They must know how to refresh pages (and who to tell if they find such out-of-date pages, so that the Squid refresh rules can be changed), and how the source IP address is going to change. You must not simply install the transparent cache and hope for the best! Prev Example Configurations Home Next The Transparent Caching Process
- 139 -
Prev
Next
- 140 -
You cant simply plug a transparent cache into the network and get it to transparently cache pages. The cache server needs to be in a position where it can fake the reply packets (without the real server interrupting the conversation and confusing things.) The server needs to be the gateway to the outside world. Lets look at the simplest transparent cache setup. The client machine (10.0.0.50) treats the cache servers internal (10.0.0.1) interface as its default gateway. This way, all packets arrive on the cache server before they reach the rest of the Internet. The filter looks for port 80 packets, and passes them to Squid, but allows all other packets to be passed to the routing layer, which passes the packets to the routers IP (172.31.0.2). Once the connection is established, Squid needs to communicate with the client. Squid doesnt do any strange packet assembly: thats left to the transparency layer. When Squid sends reply data to the client, the kernel automatically changes the packets from address, so it appears to the client that the server is just routing the requests from the outside world. When Squid connects to the remote server, however, the connect comes from the external interface of the cache server (172.31.0.1, in the example.) This is where IP-authentication breaks: since the request is coming from the cache (rather than the clients real address (10.0.0.50). Effectively, we need to get four things right to get transparency right: Correct network layout Filtering out the appropriate packets Kernel Transparency: redirecting port 80 connections to Squid Squid settings. Squid needs to know that its supposed to act in transparent mode. Prev Transparent Caching Home Up Next Network Layout
- 141 -
Prev
Next
60 Network Layout
For traffic to be filtered, all network traffic needs to pass through a filter device. On smaller networks, the cache server can do the filtering (as it does in the above example network), but many people are now opting for secondary filter machines. These filter machines can be routers, Unix machines or even so-called layer four switches. These filtering machines allow for automatic failover (in case of cache failure) and load balancing. At the same time, the CPU load on the cache machine is vastly reduced: the CPU doesnt have to examine every passing packet and do caching. Sometimes, data is load-balanced across multiple Internet lines. You must ensure that all outgoing data is routed through the cache machine: the ougoing packets have to pass through the filter server, so if you are load-balancing outgoing traffic across more than one line, you may have to restructure your network so that packets pass through the filter server before they reach the outside world. Prev The Transparent Caching Process Home Up Next Filtering Traffic
- 142 -
Prev
Next
61 Filtering Traffic
Traffic filtering can now be done by numerous devices. A short time ago, only Unix servers (with special modifications) could sort traffic streams by destination port. These days, however, routers, switches and (of course) Unix machines can filter IP traffic. Which device you use to do your filtering depends on your load. For light loads, your cache server can do everything: the filtering, the redirection and the transparent caching. For heavier loads, you may want to use a seperate Unix machine, or you may want to get your router to filter the streams for you (only certain routers can do filtering fast at the hardware level: doing filtering on other routers will add additional load to the CPU). You could even get a so-called layer four switch, which can do filtering at gigabit ethernet speeds.
- 143 -
Prev
Next
- 144 -
Prev
Next
- 145 -
64 Chapter 11. Not Yet Done: Squid Config files and options
3.2: Squid Command Line Options 3.2.1: Help To get a complete list of Squids command-line options, with a short description of each option, use the -h option. 3.2.2: HTTP Port Option: -a Format: -a port number Example: squid -a 3128 Squid will normally accept incoming HTTP requests on the port specified in the squid.conf file with the http_port tag. If you wish to override the tag for some reason, you can use the -a option. 3.2.3: Debug Information Option: -d Format: -d debug level value Example: squid -d 3 By default Squid only logs fatal errors to the screen, logging all other errors to the cache.log file. If you wish to log more information (for example debugging information, rather than only errors) The "-d" option allows you to increase the amount of debug information logged to the screen. If squid is started from your startup scripts, then this output will appear on the console of the machine. If started from a remote login, this output will be written to the screen of your remote session. 3.2.4: Config file Option: -f Format: -f path Example: squid -f /usr/local/etc/squid.conf This option allows you to specify a different path to the squid config file. When installing a binary version of squid, the default path to the squid.conf file may be inappropriate to your system. If you wish to test a different version of the config file, but wish to be able to revert to the previous config file in a hurry, you can use this option to refer to a different config file. To change back to the other config file you just have to restart Squid without this option. 3.2.5: Signaling a running Squid Option: -k Format: -k action Example: squid -k rotate
- 146 -
You can communicate with a running copy of Squid by sending it signals. These signals will cause Squid to perform maintenance functions, doing things like reloading the config file, rotating the logs (for analysis) and so forth. On some operating systems certain signals are reserved. The threads library on Linux, for example, uses the SIGUSR1 and SIGUSR2 signals for thread communication. Sending the wrong signal to a running Squid is easy, and can have unfortunate consequences. This option allows you to use descriptive options to send a running Squid signals, creating a standardized cross-platform user interface. Tag: reconfigure Action: Reloads the squid.conf file. Description: Its important to note that when Squid re-reads this file it closes all current connections, which means that clients that were downloading files will be cut off mid-download. You should only schedule reloads for after-hours, when their impact is minimal. Tag: rotate Action: Rotates the cache.log and access.log files Description: Cache log files get very large. To stop the log files using up all your disk space you should rotate the logs daily. The squid.conf logfile_rotate option sets the maximum number of rotated logs that you wish to keep. The most common use of this action is to rotate the logs just before logfile analysis (see Chapter 10). A crontab signals the rotation, sleeps for a short time, and then calls the logfile analysis program. Tags: shutdown, interrupt Action: Closes current connections, writes index and exits Description: Squid keeps an index of cache objects in memory. When you wish to shutdown Squid you should use this option, rather than simply killing Squid. Shutting down Squid can take a short while, while it writes the object index to disk. Squid writes to the cache.log file while it shuts down, indicating how many objects it has written to the index. Both the shutdown and interrupt tag have the same effect. (?why I thing its since there isnt a "kill" command for NT?) Tag: kill Action: Kills the Squid process Description: The kill tag should only be used if shutdown or interrupt have no effect. Using this tag will kill Squid without giving it a chance to write the cache index file, causing a slow rebuild on the next start. Tag: debug Action: Turns on maximum debugging Description:
- 147 -
At times it is useful to see exactly what the running copy of Squid is doing. Using the debug option will turn maximum logging on for the main Squid process. The output is very verbose, and with a heavily loaded cache can consume megabytes of disk space, so use this only on a lightly loaded cache, and for small periods of time. Tag: check Action: Prints an error message if Squid isnt running Description: Using this tag causes a kill -0 signal to the running copy of Squid. This doesnt do anything to the running process, other than check that it exists (and that the user running the command has permission to send signals to the process). 3.2.5: Logging to syslog Option: -s Format: -s Example: squid -s Squid normally logs events and debug information to a special file, normally stored in "/usr/local/squid/logs/cache.log". In some environments you may wish for events to be logged to central "log server", using syslog. Turning on this flag will are not logged to syslog. Logs of client accesses are stored in the file "/usr/local/squid/logs/access.log" -----------------cache_dir: Squid is designed with the ability to store millions of objects. Given that many operating systems have a limit on file size its not feasible for a cross platform program like Squid to store all objects in one file, though there are patches to allow users to create squid stores on large files or on raw devices. If you run a news server you will probably have an idea of how slow it is to do a directory listing of a directory with hundreds of thousands of files in it. On almost all filesystems there is a linear slowdown as more files are added to a directory. This rules out the other option, creating unique filenames and storing them all in one directory. Squid uses a hierarchy of directories for file storage. The default setup creates 16 first-tier directories. Each one of these directories then contains 256 second-tier directories. Files are only stored in the second-tier directories. This Prev Squid Settings (not done) Home
- 148 -
Table of Contents
Squid . . . . . . . . . 1 Squid . . . . . . . . 1.1 A Users Guide . . . . . 1.1.1 Oskar Pearson . . . . Overall Layout (for writers) . . . . 2 Chapter 1. Overall Layout (for writers) . Terminology and Technology . . . . 3 Chapter 2. Terminology and Technology 4 What Squid is . . . . . . 4.1 Why Cache? . . . . . . 4.1.1 In the USA . . . . . 4.1.1.1 Origin Server Load . . 4.1.1.2 Quick Abort . . . . 4.1.1.3 Peer Congestion . . . 4.1.1.4 Traffic spikes . . . 4.1.1.5 Unreachable sites . . . 4.1.2 Outside of the USA . . . 4.1.2.1 Costs . . . . . 4.1.2.2 Latency . . . . . What Squid is not . . . . . . 5 What Squid is not . . . . . Supported Protocols . . . . . . 6 Supported Protocols . . . . . 6.1 Supported Client Protocols . . 6.2 Inter Cache and Management Protocols Inter-Cache Communication Protocols . 7 Inter-Cache Communication Protocols . Firewall Terminology . . . . . 8 Firewall Terminology . . . . 8.1 The Two Types of Firewall . . 8.2 Firewalled Segments . . . . 8.3 Hand Offs . . . . . . Installing Squid . . . . . . . 9 Chapter 3. Installing Squid . . . 10 Hardware Requirements . . . . 10.1 Gathering statistics . . . . 10.2 Hard Disks . . . . . . 10.3 RAM requirements . . . . 10.4 CPU Power . . . . . Choosing an Operating System . . . 11 Choosing an Operating System . . 11.1 Experience . . . . . . 11.2 Features . . . . . . 11.3 Compilers . . . . . . Basic System Setup . . . . . . 12 Basic System Setup . . . . . 12.1 Default Squid directory structure . 12.2 User and Group IDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 1 5 5 11 11 11 12 12 12 12 12 13 13 13 13 13 14 14 15 15 15 15 16 16 17 17 17 17 17 19 19 19 19 20 22 22 24 24 24 25 25 26 26 26 27
-i-
Getting Squid . . . . . . . . . . . 13 Getting Squid . . . . . . . . . 13.1 Getting the Squid source code . . . . . 13.2 Getting Binary Versions of Squid . . . . Compiling Squid . . . . . . . . . . 14 Compiling Squid . . . . . . . . . 14.1 Compilation Tools . . . . . . . 14.2 Unpacking the Source Archive . . . . . 14.3 Compilation options . . . . . . . 14.3.1 Reducing output of configure . . . . 14.3.2 Destination directory . . . . . . 14.3.3 Using the DL-Malloc Library . . . . 14.3.4 Regular expression routines . . . . 14.3.5 Asynchronous IO . . . . . . . 14.3.6 User Agent logging . . . . . . 14.3.7 Simple Network Monitoring Protocol (SNMP) 14.3.8 Killing the parent process on exit . . . 14.3.9 Reducing time system-calls . . . . 14.3.10 ARP-based Access Control Lists . . . 14.3.11 Inter-cache Communication . . . . 14.3.12 Keeping track of origin request hosts . . 14.3.13 Language selection . . . . . . 14.4 Running configure . . . . . . . 14.4.1 Broken compilers . . . . . . . 14.4.2 Incompatible Options . . . . . . 14.5 Compiling the Squid Source . . . . . 14.6 Installing the Squid binary . . . . . . Squid Configuration Basics . . . . . . . 15 Chapter 4. Squid Configuration Basics . . . . 16 Version Control Systems . . . . . . . The Configuration File . . . . . . . . 17 The Configuration File . . . . . . . Setting Squids HTTP Port . . . . . . . 18 Setting Squids HTTP Port . . . . . . 18.1 Using Port 80 . . . . . . . . 18.1.1 Where to Store Cached Data . . . . Email for the Cache Administrator . . . . . . 19 Email for the Cache Administrator . . . . . Effective User and Group ID . . . . . . . 20 Effective User and Group ID . . . . . . 20.1 FTP login information . . . . . . . Access Control Lists and Access Control Operators . . 21 Access Control Lists and Access Control Operators . 21.1 Simple Access Control . . . . . . 21.2 Ensuring Direct Access to Internal Machines . . Communicating with other proxy servers . . . . 22 Communicating with other proxy servers . . . 22.1 Your ISPs cache . . . . . . . . 22.2 Firewall Interactions . . . . . . . 22.2.1 Proxying Firewalls . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28 . 28 . 28 . 29 . 30 . 30 . 30 . 30 . 31 . 31 . 32 . 32 . 33 . 33 . 34 . 34 . 34 . 35 . 35 . 35 . 36 . 36 . 36 . 37 . 37 . 37 . 38 . 39 . 39 . 40 . 41 . 41 . 42 . 42 . 42 . 44 . 45 . 45 . 46 . 46 . 46 . 48 . 48 . 48 . 50 . 52 . 52 . 53 . 53 . 53 .
- ii -
22.2.1.1 Inside . . . . . . . . 22.2.1.2 Outside . . . . . . . . 22.2.1.3 DMZ . . . . . . . . 22.2.2 Packet Filtering firewalls . . . . . Starting Squid . . . . . . . . . . 23 Chapter 5. Starting Squid . . . . . . . 24 Before Running Squid . . . . . . . 24.1 Subdirectory Permissions . . . . . . 24.1.1 System Dependant Information . . . . 24.1.2 Walking the Directory Tree . . . . 24.1.3 Object Store Directory Permissions . . . 24.1.4 Problems Creating Swap Directories . . Running Squid . . . . . . . . . . 25 Running Squid . . . . . . . . . Testing Squid . . . . . . . . . . . 26 Testing Squid . . . . . . . . . 26.1 Testing a Cache or Proxy Server with Client . . 26.1.1 Testing Intranet Access . . . . . Browser Configuration . . . . . . . . 27 Chapter 6. Browser Configuration . . . . . 28 Browsers . . . . . . . . . . 28.1 Basic Configuration . . . . . . . 28.2 Advanced Configuration . . . . . . 28.3 Basic Configuration . . . . . . . 28.4 Host name . . . . . . . . . 28.4.1 Netscape Communicator 4.5 . . . . 28.4.2 Internet Explorer 4.0 . . . . . . 28.4.3 Unix clients . . . . . . . . Browser-cache Interaction . . . . . . . . 29 Browser-cache Interaction . . . . . . . Testing the Cache . . . . . . . . . . 30 Testing the Cache . . . . . . . . Cache Auto-config . . . . . . . . . 31 Cache Auto-config . . . . . . . . 31.1 Web server config changes for autoconfig files . 31.1.1 Apache . . . . . . . . . 31.1.2 Internet Information Server . . . . 31.1.3 Netscape . . . . . . . . . 31.2 Autoconfig Script Coding . . . . . . 31.2.1 The Hello World! of auto-configuration scripts 31.2.2 Auto-config functions . . . . . . 31.2.2.1 dnsDomainIs . . . . . . . 31.2.2.2 isInNet . . . . . . . . 31.2.2.3 isPlainHostname . . . . . . 31.2.2.4 myIpAddress . . . . . . . 31.2.2.5 shExpMatch . . . . . . . 31.2.2.6 url.substring . . . . . . . 31.2.3 Example autoconfig files . . . . . 31.2.3.1 A Small Organization . . . . . 31.2.3.2 A Dialup ISP . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54 . 54 . 55 . 56 . 58 . 58 . 58 . 58 . 58 . 58 . 60 . 61 . 62 . 62 . 64 . 64 . 65 . 65 . 67 . 67 . 67 . 67 . 68 . 68 . 68 . 69 . 69 . 70 . 72 . 72 . 73 . 73 . 74 . 74 . 74 . 74 . 74 . 74 . 75 . 75 . 76 . 76 . 76 . 76 . 77 . 77 . 77 . 78 . 78 . 78 .
- iii -
31.2.3.3 Leased Line ISP . . . . . . . . . . . 31.3 Cache Array Routing Protocol . . . . . . . . . . cgi generated autoconfig files . . . . . . . . . . . . 32 cgi generated autoconfig files . . . . . . . . . . . Future directions . . . . . . . . . . . . . . . 33 Future directions . . . . . . . . . . . . . . 33.1 Roaming . . . . . . . . . . . . . . 33.2 Browsers . . . . . . . . . . . . . . 33.3 Transparency . . . . . . . . . . . . . Ready to Go . . . . . . . . . . . . . . . . 34 Ready to Go . . . . . . . . . . . . . . . Access Control and Access Control Operators . . . . . . . . 35 Chapter 7. Access Control and Access Control Operators . . . . . 36 Uses of ACLs . . . . . . . . . . . . . . Access Classes and Operators . . . . . . . . . . . . 37 Access Classes and Operators . . . . . . . . . . . Acl lines . . . . . . . . . . . . . . . . . 38 Acl lines . . . . . . . . . . . . . . . 38.1 A unique name . . . . . . . . . . . . . 38.2 Type . . . . . . . . . . . . . . . 38.3 Decision String . . . . . . . . . . . . . 38.4 Types of acl . . . . . . . . . . . . . . 38.4.1 Source/Destination IP address . . . . . . . . . 38.4.2 Source/Destination Domain . . . . . . . . . 38.4.3 Words in the requested URL . . . . . . . . . 38.4.3.1 A Quick introduction to regular expressions . . . . . 38.4.3.2 Using Regular expressions to match words in the requested URL 38.4.3.3 Words in the source or destination domain . . . . . 38.4.4 Current day/time . . . . . . . . . . . . 38.4.5 Destination Port . . . . . . . . . . . . 38.4.6 Protocol (FTP, HTTP, SSL) . . . . . . . . . 38.4.7 Method (HTTP GET, POST or CONNECT) . . . . . . 38.4.8 Browser type . . . . . . . . . . . . . 38.4.9 User name . . . . . . . . . . . . . 38.4.10 Autonomous System (AS) Number . . . . . . . 38.4.11 Username/Password pair . . . . . . . . . . 38.4.12 Using the NCSA authentication module . . . . . . 38.4.13 Using the SMB authentication module . . . . . . . 38.4.14 SNMP Community . . . . . . . . . . . Acl-operator lines . . . . . . . . . . . . . . . 39 Acl-operator lines . . . . . . . . . . . . . 39.1 The other Acl-operators . . . . . . . . . . . 39.1.1 The no_cache acl-operator . . . . . . . . . . 39.1.2 The ident_lookup_access acl-operator . . . . . . . 39.1.3 The miss_access acl-operator . . . . . . . . . 39.1.4 The always_direct and never_direct acl-operators . . . . . 39.1.5 The broken_posts acl-operator . . . . . . . . . SNMP Configuration . . . . . . . . . . . . . . 40 SNMP Configuration . . . . . . . . . . . . . 40.1 Querying the Squid SNMP server on port 3401 . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79 . 79 . 81 . 81 . 82 . 82 . 82 . 82 . 82 . 84 . 84 . 85 . 85 . 85 . 86 . 86 . 89 . 89 . 89 . 89 . 89 . 90 . 91 . 91 . 92 . 92 . 92 . 93 . 94 . 95 . 95 . 96 . 97 . 97 . 97 . 98 . 98 . 99 . 99 . 100 . 100 . 101 . 102 . 102 . 103 . 103 . 104 . 105 . 105 . 105 .
- iv -
40.2 Running multiple SNMP servers on a cache machine . . 40.2.1 Binding the SNMP server to a non-standard port . . 40.2.2 Access Control with more than one Agent . . . Delay Classes . . . . . . . . . . . . . 41 Delay Classes . . . . . . . . . . . 41.1 Slowing down access to specific URLs . . . . . 41.2 The Second Pool Class . . . . . . . . 41.3 The Second Pool Class . . . . . . . . 41.4 The Third Pool Class . . . . . . . . . 41.5 Using Delay Pools in Real Life . . . . . . . Conclusion . . . . . . . . . . . . . 42 Conclusion . . . . . . . . . . . . Cache Hierarchies . . . . . . . . . . . 43 Chapter 8. Cache Hierarchies . . . . . . . . 44 Introduction . . . . . . . . . . . . Why Peer . . . . . . . . . . . . . . 45 Why Peer . . . . . . . . . . . . Peer Configuration . . . . . . . . . . . 46 Peer Configuration . . . . . . . . . . 46.1 The cache_peer Option . . . . . . . . Peer Selection . . . . . . . . . . . . . 47 Peer Selection . . . . . . . . . . . 47.1 Selecting by Destination Domain . . . . . . 47.2 Selecting with Acls . . . . . . . . . 47.2.1 Querying an Adult-Site Filtering-cache for Specific URLs 47.2.2 Filtering with Cache Hierarchies . . . . . 47.2.3 The always_direct and never_direct tags . . . . 47.2.4 prefer_direct . . . . . . . . . . 47.2.5 hierarchy_stoplist . . . . . . . . . 47.2.6 neighbor_type_domain . . . . . . . 47.3 Other Peering Options . . . . . . . . . 47.3.1 miss_access . . . . . . . . . . 47.3.2 dead_peer_timeout . . . . . . . . 47.3.3 icp_hit_stale . . . . . . . . . . Multicast Cache Communication . . . . . . . . 48 Multicast Cache Communication . . . . . . . 48.1 Getting your machine ready for Multicast . . . . 48.2 Querying a Multicast Cache . . . . . . . 48.3 Accepting Multicast Queries: The mcast_groups option . 48.4 Other Multicast Cache Options . . . . . . . 48.4.1 The mcast_icp_query_timeout Option . . . . Cache Digests . . . . . . . . . . . . . 49 Cache Digests . . . . . . . . . . . Cache Hierarchy Structures . . . . . . . . . 50 Cache Hierarchy Structures . . . . . . . . 50.1 Two Peering Caches . . . . . . . . . 50.1.1 Things to Watch Out For . . . . . . . 50.2 Trees . . . . . . . . . . . . 50.3 Meshes . . . . . . . . . . . . 50.4 Load Balancing Servers . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
106 . 106 . 107 . 108 . 108 . 108 . 108 . 109 . 110 . 110 . 112 . 112 . 113 . 113 . 113 . 114 . 114 . 115 . 115 . 115 . 117 . 117 . 117 . 117 . 117 . 118 . 118 . 119 . 119 . 119 . 119 . 119 . 120 . 120 . 121 . 121 . 122 . 122 . 123 . 123 . 123 . 124 . 124 . 125 . 125 . 125 . 125 . 126 . 126 . 126 .
-v-
The Cache Array Routing Protocol (CARP) . . . . 51 The Cache Array Routing Protocol (CARP) . . . . Accelerator Mode . . . . . . . . . . . 52 Chapter 9. Accelerator Mode . . . . . . . 53 When to use Accelerator Mode . . . . . . . 53.1 Acceleration of a slow server . . . . . . 53.2 Replacing a combination cache/web server with Squid 53.3 Transparent Caching . . . . . . . . 53.4 Security . . . . . . . . . . . Accelerator Configuration Options . . . . . . . 54 Accelerator Configuration Options . . . . . . 54.1 The httpd_accel_host option . . . . . . 54.2 The httpd_accel_port option . . . . . . 54.3 The httpd_accel_with_proxy option . . . . . 54.4 The httpd_accel_uses_host_header option . . . Related Configuration Options . . . . . . . . 55 Related Configuration Options . . . . . . . 55.1 The redirect_rewrites_host_header option . . . 55.2 Refresh patterns . . . . . . . . . 55.3 Access Control . . . . . . . . . Example Configurations . . . . . . . . . 56 Example Configurations . . . . . . . . 56.1 Replacing a Combination Web/Cache server . . . 56.2 Accelerating Requests to a Slow Server . . . . Transparent Caching . . . . . . . . . . 57 Chapter 10. Transparent Caching . . . . . . 58 The Problem with Transparency . . . . . . The Transparent Caching Process . . . . . . . 59 The Transparent Caching Process . . . . . . 59.1 Some Routing Basics . . . . . . . . 59.2 Packet Flow with Transparent Caches . . . . Network Layout . . . . . . . . . . . 60 Network Layout . . . . . . . . . . Filtering Traffic . . . . . . . . . . . 61 Filtering Traffic . . . . . . . . . . 61.1 Unix machines . . . . . . . . . 61.2 Routers (not done) . . . . . . . . 61.3 Layer-Four Switches (not done) . . . . . Kernel Redirection (not done) . . . . . . . . 62 Kernel Redirection (not done) . . . . . . . Squid Settings (not done) . . . . . . . . . 63 Squid Settings (not done) . . . . . . . . Not Yet Done: Squid Config files and options . . . . 64 Chapter 11. Not Yet Done: Squid Config files and options .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
128 . 128 . 130 . 130 . 130 . 130 . 130 . 131 . 131 . 132 . 132 . 132 . 132 . 132 . 132 . 134 . 134 . 134 . 134 . 134 . 136 . 136 . 136 . 137 . 138 . 138 . 138 . 140 . 140 . 140 . 140 . 142 . 142 . 143 . 143 . 143 . 143 . 143 . 144 . 144 . 145 . 145 . 146 . 146 .
- vi -