Transparent Proxying with Squid

Squid is often used as a reverse proxy to spare web servers from repeated requests for the same content. But Squid can also be used to spare you from interminable delays when requesting content. Deployed as a local caching proxy, Squid can reduce your site’s bandwidth consumption and make browsing more responsive. Learn how it works, and start saving time and money today.
If you’ve ever looked into setting up a web cache for your office or campus, you’re probably familiar with Squid. Available from http://www.squid-cache.org, Squid is a full-featured web proxy cache licensed under the GNU Public License.
Squid is often used as a reverse proxy, caching and serving a site’s web pages to reduce demands on its web server (see http://www.linux-mag.com/2003-08/lamp_01.html for more details). But Squid is also an effective proxy. By caching popular content — keeping a copy of remote web pages on a local server — Squid can reduce bandwidth consumption and serve (cached) web pages faster.
Installed as a proxy, Squid can collect statistics about browsing behavior, can prevent users from browsing specific web sites and limit web access to only authorized users, and can even filter specific information from web requests. Squid supports SNMP monitoring, proxying for SSL, and proxying and caching of both HTTP and FTP.
Squid also supports Cisco’s Web Cache Communication Protocol (WCCP) and transparent proxying, a proxy scheme that is literally invisible to users. (For a brief description of WCCP, see the sidebar “Cisco’s WCCP.”) Unlike traditional proxying, where each web browser must be individually configured to use a proxy server (a system administrator’s nightmare), transparent proxying uses a firewall or router to intercept web requests and redirect them to the proxy. With transparent proxying, (also called transparent caching), proxying is discreet and easier to maintain. For example, visitors with laptops can just plug-in to your network without adjusting their browser settings.

Taming the Squid

Let’s install Squid and configure the network to use Squid as a transparent proxy. The sample setup uses a Cisco router running WCCP to perform the TCP interception, and Fedora Core 3, running the latest 2.6.10. (Kernel 2.6.10 added WCCP to the ip_gre module and simplifies configuration.
If you cannot use kernel 2.6.10 or newer, you have two options: either patch your ip_gre module to support WCCP, or download and compile the ip_wccp module.) The machine you install Squid on must also have packet filtering, connection tracking, IP tables support, full network address translation (NAT), and REDIRECT target support enabled in the kernel. Fast switching must be set to NO, and you’ll need to make sure you have IP forwarding enabled.
You can check this by running the command:
$ cat /proc/sys/net/ipv4/ip_forward
If that command returns 0, you can enable IP forwarding by putting net.ipv4.ip_forward=1 in your /etc/sysctl.conf. And since that won’t take effect until you reboot, you can temporarily enable the feature by running:
# echo 1 > /proc/sys/net/ipv4/ip_forward
Once you’ve verified the prerequisites, you’re ready to install Squid. As of this writing, the latest stable version of squid is Squid-2.5.STABLE7. This version has a security vulnerability that affects WCCP. If Squid-2.5.STABLE8 hasn’t been released by the time you read this, you’ll need to download a patch from http://www.squid-cache.org/Versions/v2/2.5/bugs/squid-2.5.STABLE7-wccp_buffer_overflow.patch and patch your source tree before continuing.
After downloading and possibly patching the code, you can build Squid. Squid has a wide variety of build options, and you should research all of them carefully, since many can greatly impact both security and performance. The options shown here are the minimum for building Squid as a transparent proxy using WCCP.
To compile Squid, run:
$ ./configure ––enable-linux-netfilter ––enable-wccp && make
Next, run make install as root.
With Squid installed, you can configure it to suit your needs. Edit the squid.conf file, which is located in /usr/local/squid/etc/ by default.
(The squid.conf file is heavily commented and contains a ton of useful information. Read the entire file when you have time.)
For transparent proxying to work, ensure that the following lines are present:
httpd_accel_host virtual
httpd_accel_port 80
httpd_accel_with_proxy on
httpd_accel_uses_host_header on
Minimally, you’ll also need to adjust the http_access directives to allow traffic from your IP addresses. Depending on your distribution, you may also need to create a Linux user and group based on your cache_effective_user and cache_effective_group directives.
Once you’re happy with your configuration, run squid –z to initialize the cache directories. Then start Squid by running the included RunCache script. By default, Squid runs on port 3128. If you’ve changed that default, remember which port you’ve chosen, as you’ll need that information in the next step.

Playing Traffic Cop

With Squid up and running, you now need to redirect traffic destined for port 80 to Squid running on port 3128. (While you can configure squid to run on port 80, this can cause problems, including endless loops when Squid tries to contact itself.) Use an iptables rule to redirect traffic.
To setup the rule, you’ll need to know which interface the requests to be proxied will be coming in on (for example eth0) and the port number for Squid on. Once you have this information, run the following command:
# iptables –t nat –A PREROUTING –i eth0 –p tcp  ––dport 80 –j REDIRECT ––to-port 3128
Of course, you’ll also need to add this command to the appropriate init script so that the rule is recreated on subsequent reboots.
Now that traffic is redirected to squid, it’s time to setup a GRE tunnel between the router and the machine that’s running Squid. To do this, use the following two commands:
# iptunnel add gre1 mode gre remote \
ip-address-of-router local ip-address-of-squid-cache \
dev eth0
# ifconfig gre1 up
The first line creates a GRE tunnel between the router and the machine running Squid via eth0, while the second line creates a GRE loopback adapter that acts as the local endpoint for the GRE tunnel.
With that, Squid is configured. The next step is configuring your router. The sample setup uses WCCP on a Cisco 2600 router, a very common device for a small or branch office. (If you don’t have a Cisco router, many other devices, including a Linux machine acting as a router, support WCCP. At a minimum, verify that whatever router you’re running supports WCCP version 1.)
To cache all outgoing web requests, run the following commands on the router:
Cisco> enable
Cisco# config t
Cisco(config)# ip wccp version 1
Cisco(config)# ip wccp web-cache
Cisco(config)# int your-outgoing-interface
Cisco(config-if)# ip wccp web-cache redirect out
Cisco(config-if)# end
Cisco# write mem
(The directions for most Cisco products with an 12.x IOS image should be similar, but you may need to consult your Cisco documentation.) Optionally, you can also use access-list or route-map to control what client IP addresses get cached.
For example, to only cache requests coming from the network block, use the following:
Cisco> enable
Cisco# config t
Cisco(config)# ip wccp version 1
Cisco(config)# ip wccp web-cache redirect-list 150
Cisco(config)# access-list 150 permit tcp any
Cisco(config)# access-list 150 deny tcp any any
Cisco(config)# int your-outgoing-interface
Cisco(config-if)# ip wccp web-cache redirect out
Cisco(config-if)# end
Cisco# write mem

Browse Away

You should now be able to browse the web from a properly configured IP address and have the requests proxied transparently. You can verify the proxy by looking at Squid’s access.log file.
If you experience problems, here are a few troubleshooting tips.
*First, make sure that the proper Linux kernel modules are loaded using lsmod. You should see ip_gre, ipt_REDIRECT, iptable_nat, and the other modules listed in the prerequisites. Remember that if you are not using a kernel 2.6.10 or newer, you must either patch ip_gre or load the ip_wccp that you downloaded and compiled.
*Next, verify the iptables redirect rule with iptables –t nat –L. Also make sure that another iptables rule isn’t blocking access.
*You can verify your GRE tunnels are setup using iptunnel. You should see output similar to:
gre0: gre/ip remote any local any ttl inherit nopmtudisc
gre1: gre/ip remote "router-ip" local "cache-ip" dev eth0 ttl inherit
*Finally, using ifconfig, you can verify that the gre1 interface is up. On the router, you can verify that Squid has registered itself using sho ip wccp. You should see something like the following:
Cisco#sho ip wccp
Global WCCP information:
Router information:
Router Identifier: router-ip
Protocol Version: 1.0
Service Identifier: web-cache
Number of Cache Engines: 1
Number of routers: 1
Total Packets Redirected: 1074
Redirect access-list: 150
Total Packets Denied Redirect: 12837308
Total Packets Unassigned: 29235
Group access-list: –none-
Total Messages Denied to Group: 0
Total Authentication failures: 0
You can get more detailed information by typing sho ip wccp web-cache detail. You can go one step further and actually debug WCCP events and WCCP packets in real-time by typing debug ip wccp events or debug ip wccp packets, respectively.

Loads of Cache

You should now have Squid running as a transparent proxy. While Squid has a lot of capable features built right in, you may find some third party add-ons and utilities useful.
If you’d like to create comprehensive text and HTML reports for a variety of categories from peak usage to proxy efficiency and bandwidth savings, check out Calamaris, written by Cord Beermann. Written in Perl and licensed under the GPL, Calamaris takes the Squid access log as an input and outputs a report. Calamaris is available from http://cord.de/tools/squid/calamaris/Welcome.html.en.
If you’d prefer dynamic graphs to text reports, squid-rrd, available from http://www.squid-cache.org/~wessels/squid-rrd may be for you. squid-rrd consists of a poller written in Perl that collects statistics from Squid and stores them in an RRD database. From there, a CGI script does the graphing.
While Squid’s built-in access controls are extremely powerful, if you find yourself looking to do something they can’t accomplish, take a look at squidGuard. Available from http://www.squidguard.org and licensed under the GPL, squidGuard is a combined filter, redirector, and access controller plug-in. It allows you to define different time spaces, group sources, group destinations, rewrite or redirect URLs, use regular expressions, and much more.
You can find a complete list of utilities on the Squid site at http://www.squid-cache.org/related-software.html.
Using Squid as a transparent proxy has the potential to save you both time and money. It improves your users’ web browsing experience by increasing perceived speed. It decreases bandwidth usage, allows you to put access controls on what sites can be viewed, and allows you to better monitor your users’ browsing habits and usage patterns.
As long as you weigh the potential for increased help desk calls caused by misbehaving browsers and web applications, you’ll find that in many situations a transparent proxy can enhance your network quite nicely.

Jeremy Garcia is the founder and administrator of LinuxQuestions.org, a free, friendly, and active Linux community. Please send questions and feedback to class="emailaddress">jeremy@linuxquestions.org.

Comments are closed.