You are on page 1of 11

CS498 Systems and Networking Lab

Spring 2012

Lab 3: Server Software Architectures


Instructor: Matthew Caesar Due:

In this assignment, you will learn about the internals of some server software. We will be exploring the source code of the most popular webserver, Apache2. Apache is used in nearly every sort of ennvironment and business. It is used from the smallest of personal developers to the largest of corporations, and hence is exteremly extensible and powerful...if you know how to use it. We will go over setting up existing modules, and conguring them for use. We will also dig into the source code and modify it to do something cool!

1 FAQ
1. When apache starts, it will spew an error that says Can not determine fully qualied domain name. You can ignore this error.

2 Initial Setup
This section will help you get your environment setup so you may compete the lab. Please read everything in this section before moving on, as it is very important everything is setup properly.

2.1

Accessing the Virtual Machine

We have access to a service called vSphere, which will provide us with a number of local Virtual Machines for you to use. It should be noted that you do not HAVE to use these VMs if you wish. If you have your own VM setup, or even a personal computer, you may use that. Please be wary though that, this lab was setup and written using a recent installation of Ubuntu. If you are using a different operating system, we cannot gaurentee everything will work correctly. To access your VM that we have setup for you, please follow these instructions. Speak with the TA (mark) to get permissions to a VM setup. (It is possible you may have already done this...good work!) If you are on a Windows machine, and you are on campus, you can download the vSphere client from https://csilvcenter.ad.uiuc.edu/. If you are not on a Windows machine, you can still use vSphere. The campus Windows Terminal servers already have vSphere installed. The remote desktop host is ews-windows-ts.ews.illinois.edu. Your usename will be UofI\NetID. Be careful of the UofI domain name...it is the only one that will work. (I have no idea why...) If you are using Linux, you can use the rdesktop command from the command line to remote into the windows terminal server. rdesktop ews-windows-ts.ews.illinois.edu.

Lab Matthew Caesar

Launch the vSphere client and connect to csil-vcenter.ad.uiuc.edu. You username and password should be your Netid and AD password. If you see a home screen, click the VMs and Templates icon. This will show you all the VMs you have access to. Start your VM with the green arrow at the top. You can pull up an X-console using the icon that looks like a computer with an arrow pointing out of it. The default user account if cs498class and the password is cs498class. Please change this password Immediately!. This account has sudo access. From this console, you can use this machine as you would any other. Please keep in mind that vSphere and your VM will only be accessible inside the UIUC network, or via VPN.

Figure 1: vSphere Main Menu.

2.2

Installing Apache from Source

In order to install Apache, and then modify its source code, we will be compiling and installing apache 2.2 from source. Note: While it is possible to install the apache package via apt-get (or yum, on Fedora)...do not do so. The package

Lab Matthew Caesar

distro does not come with the source code, and trying to install the source version ontop of the distrobution version is difcult to say the least. You will save more of a headache in this lab by taking the time to install from source. Download the apache archive onto your virtual machine. The best place to download this will be your home directory. wget http://apache.mirrors.pelicantech.com/httpd/httpd-2.2.22.tar.gz Then, unpack the archive, and move into the new directory that is created. tar xzvf httpd-2.2.22.tar.gz Next, we will setup the compilation process. To do this, we will run a script that loads the proper modules, checks to make sure we have all the dependencies to compile, and then auto-generates the Makeles to compile. .\configure Then compile. Note that this will take some time sudo make Once everything is compiled, you need to move the compiled binaries and libraries into a new location. To do that: sudo make install The default directory for installation is /usr/local/apache2/. Lastly, lets start apache using the default congurations cd /usr/local/apache2/bin sudo ./apachectl start Make sure this worked, we will try and access the apache server from the outside. First determine your outward facing IP address using ifcong. Then, open a web browser and navigate to that ip address. You should see some page load. If you see any error, something is wrong. Note: The virtual machines provided to us using vSphere have a limitation with regards to its IP address. These VMs are only accessible from inside the campus network. That means you need to be either physically on campus or VPNed into campus in order for a web browser to see your apache server.

3 The Proxy Server


In this section of the lab, we will enable Apaches Proxy Server module and hook it up to a web browser.

3.1

Proxy Server Background

A proxy server is a web server that sits in between end users and content providers. It allows you to tunnel trafc through it and perform certain actions on the data. Apache can be congured in both a forward and reverse proxy mode. An ordinary forward proxy is an intermediate server that sits between the client and the origin server. In order

Lab Matthew Caesar

to get content from the origin server, the client sends a request to the proxy naming the origin server as the target and the proxy then requests the content from the origin server and returns it to the client. The client must be specially congured to use the forward proxy to access other sites. A typical usage of a forward proxy is to provide Internet access to internal clients that are otherwise restricted by a rewall. It may also be used to restrict access to the internet to a subset of users. The proxy could only request internet resources for certain IP addresses on your network, and lter the rest. A reverse proxy, by contrast, appears to the client just like an ordinary web server. No special conguration on the client is necessary. The client makes ordinary requests for content in the name-space of the reverse proxy. The reverse proxy then decides where to send those requests, and returns the content as if it was itself the origin. A typical usage of a reverse proxy is to provide Internet users access to a server that is behind a rewall. Reverse proxies can also be used to balance load among several back-end servers, or to provide caching for a slower back-end server. In addition, reverse proxies can be used simply to bring several servers into the same URL space. We are going to congure Apache to be a forward proxy. We will then force our web browser to send all requests through this proxy. Apache handles a proxy request in very much the same way as it handles a content request, with one minor change. This caveat will help us in section 4. Normally, when you access a webserver, it will translate your URL (http://somewebsite.com/le.php) into a URI, which will be a path to the specic le on your system (/var/www/mysite/le.php). The request object in Apache stores this URI and uses it to access the content and return it to you. The proxy server, however, does not have a le to retrieve, but another URL. So to keep with the idea of translating a URL to a URI to be used later, it creates a virtual le that signals to following modules that this is not a le, but a proxy request. So a proxy request for http://someothersite.com/newle.php will be turned into the URI proxy:http://someothersite.com/newle.php. Notice that this is not a le path, but just the URL prepended with proxy:. This tells future modules to treat this differently than a normal request. This may seem confusing now, but it will make more sense once we get farther into the lab.

3.2

Congure Apache

First we need to setup our apache installation to use the proxy module. Navigate back to your download of apache. Remember last time, we ran the congure script to setup the Makeles, etc to compile apache? Well we are going to do that again, accept we will also tell the script to prepare the proxy module for compilation as well. ./configure --enable-proxy --enable-proxy-http --enable-proxy-ftp You do not need to stop and start apache before re-conguring. Once you install, you will have to restart the service for any changes to take effect, but you do not need to restart before you congure. This tells the conguration script to enable the core proxy module, and the modules responsible for proxying http and ftp trafc. (We wont be using the ftp one, but it is good to congure it at this time). Next, recompile and reinstall the binaries. sudo make sudo make install Now it is time to actually modify some conguration les in apache. First, navigate to the installation folder of apache. By default it is in /usr/local/apache2. Open up the le conf/httpd.conf. You will have to use sudo when you open the le, otherwise you will not be able to write your changes. This gets particularly frustrating when you make a lot of changes and then realize you cannot write them. The rst setting we will want to change is the port number. The traditional web server port is 80. Just so we dont get confused, lets change the port to something else. The setting name is called Listen. So nd the Listen setting and change it from port 80 to port 5678. Note: You can make this

Lab Matthew Caesar

whatever port you want, just as long as it is not already in use by another protocol. Remember what port you set this to, as we will need it later. If you are going to be doing a lot of congurations for apache, a good habit to get into is putting congurations for different modules into different les. When apache loads, it loads httpd.conf only. But from this le, you can include other conguration les. This makes management easier. At the bottom of the le, but before the SSL Module lines, add the following line: Include conf/modules/proxy.conf Save and close httpd.conf. Next, lets create the conguration le for the proxy module. Create the directory /usr/local/apach2/conf/modules if it does not already exist. And inside that folder, create proxy.conf. First we will go over some cong commands, and then put them together to create a conguration le for the proxy module. As a side note, we will be refering to the word directive in the next couple of sections. This is just apaches word for setting. Each line in the conf les are called directives. <IfModule mod_proxy.c> </IfModule> This is a directive asking if the module is even loaded. If it is, it runs the commands inside. We will want to put all of our proxy directives inside this block. ProxyRequests On This tells the proxy module to proxy each request to its destination. With this directive set to On, it acts as a Forward proxy. If it is set to Off, it acts as a reverse proxy. <Proxy *> </Proxy> This directive is the wrapper for the Proxy access settings. It says we will proxy all (*) requests, using the following access requirements. Order deny,allow Allow from all These are the two access directives we will set. The rst says, process all of the deny rules rst, then process all of the allow rules. This directive is not unique to Proxy, and appears in many apache cong les. The second says we will allow all IP addresses to access our proxy. This can be as restrictive as you want it to be. all is a keyword used here to desribe the ip address 0.0.0.0/0.0.0.0. You may restrict your proxy to only accept requests from a certain range of ip addresses (192.168.0.0/255.255.0.0), or even just 1 specic IP, if you so choose. Put all of these directives together, and we should get a conf le that looks like this: <IfModule mod_proxy.c> ProxyRequests On

Lab Matthew Caesar

<Proxy *> Order deny,allow Allow from all </Proxy> </IfModule> Note: There are more directives that mod proxy can use. The following website lists them all: http : //httpd.apache.org/docs/2.2/mod/mod proxy.html That is it, your proxy server is ready to use. Restart apache using the following command: sudo /usr/local/apache2/bin/apachectl restart Note: Whenever you make a change to a conguration le, you must restart the process in order for those changes to be read in and take effect.

3.3

Congure Web Browser

Next, lets make your browser push all of its requests through your proxy server. This section, we will use refox, as it is the easiest to setup a proxy server. Open up Firefox either on the VM itself, or on your local machine. If you open refox on a local machine, it needs to be on the campus network in order to connect to the proxy. In the options menu, go to the Advanced tab, and then the smaller Network tab, and then click on the Connection Settings button.

Figure 2: Firefox Options Menu. Click the radio button that says Manual proxy conguration. We will enter information for the HTTP Proxy. The Proxy: setting will be the IP address of your virtual machine, and the Port will be whatever port you made your Proxy server Listen to. Save your settings and get out of the options. Navigate to www.google.com and see what happens. Did google come up? (Note: Firefox may do some weird cacheing things, and not take the proxy immediately. Try closing refox and reopening, then refreshing the page if this happens.) If everything was setup correctly, google should have loaded as normal. Lets test to make sure the proxy is actually doing its work. Go back to your virtual machine and enter the following command.

Lab Matthew Caesar

sudo /usr/local/apache2/bin/apachectl stop Then go back to refox and try to load google again. What do you see? If you see something different this time, you know your proxy server was working.

4 Reverse URL
In this section, we will modify some source code of mod proxy to reverse the URL of each request it gets. While this is not terrible useful in the real world, it is a very funny prank you could pull on someone, by attaching your new proxy module to their web browser. (Hint: It is not recommended that you do this to anyone that you dont know, or someone who wont get a good laugh out of it.)

4.1

Background source le information

First, lets go over some basics of how Apache works on the inside. The apache web server is an event driven system. That means, there really is no main function, where all of the logic stems from. Some basic things are loaded at startup, but for the most part, the code just sits there, waiting to be called by an event. These are called callback functions. A module will register specic functions to be called when an event happens. For example, the proxy module will register a function for the ap hook handler event, which is an event that is triggered when a request is ready to be processed. There are other events for pre read request, post read request, pre cong, post cong, and so on. There is also the situation where more than one module might want to take action on a specic event. Apache allows this to happen, and even gives a module writer a little control over what order this happens. When you register for an event, you can specify which modules should be run before your module, and which modules need to be run after yours. For example, if mod rewrite is enabled, it needs to be run rst on the translate name event, before mod proxy can convert the URI into its proxied version. We will go into more detail of where and how events are registered below. Each module also species a number of directives that can be used. You remember directives, right? They are those settings used in the *.conf les. During the initial load of a module, those directives are declared. Then, later, the cong les are read in (generating pre cong and post cong events), which use those directives to set internal settinsg for the module. Later in the lab, we will be creating our own directive and attaching it to mod proxy. Lastly, lets go over some basic information on the source les we will be modifying. There are only 2 we will be modifying, although there are a few others to be aware of: Inside the root folder of the source code ( /httpd-2.2.22/), we will nd the following les:

modules/proxy/mod proxy.h: This le is the mod proxy header. It denes each method, as well as a few structures that we will modify later. modules/proxy/mod proxy.c: This le is the meat of mod proxy. It contains all of the event registers as well as all of the callback functions. modules/proxy/mod proxy http.c: This le handles all of the HTTP trafc for mod proxy. Once mod proxy nishes preparing a request to be proxied, if the request was using HTTP, it forwards it onto this le. modules.c: This le is automatically generated when you run .congure. It loads each module that apache needs. server/main.c: This le is the entry point for apache. It binds the address and open the sockets as well as handles the connection pool.

Lab Matthew Caesar

include/httpd.h: This is where the request object denition is. This is a good reference for when we start working with this object.

4.2

The Basic Hack

Ok, so lets write the basic proxy hack. Note, this section will have a little less guidence, which will give you a chance to explore this code more on your own. Start by opening the le modules/proxy/mod proxy.c and nd the method register hooks. This is where all of the events are registered. We will want to do our work in the post read request event. This event is where mod proxy changes the URL of the request into a lename, prepending proxy: to the URL. This even happens before any processing of the request, and is an ideal place to reverse the URL. Using this method, nd the callback function for the ap hook post read request event. It is there we will reverse the URL. A few hints: In this method, you will want to reverse the URL before it gets prepended with proxy:. Each url will have a scheme, which we do not want to reverse. Otherwise we will get errors we dont want. The only schemes you have to worry about are http and https. (For example, http://google.com should be changed to http://moc.elgoog). Some URLs will have a trailing /. If this character is reversed, it will messup the scheme. (http://google.com/ > http:///moc.legoog is bad). If you need a referance on all of the member variables of request rec, see the denition le, include/httpd.h

Other than these little caveats, it should be a simple string reversal algorithm. Once you are nished writing your reversal code, recompile, install, and restart your server. cd /httpd-2.2.22 sudo make sudo make install cd /usr/local/apache2/bin sudo apachectl restart Then go to your web browser (which still should be congured to use your proxy), and load a page and see what happens!

4.3

Make it user friendly

So this is a fun little hack...but it isnt exactly user friendly. Currently, the URL will be reversed for every request, whenever the proxy module is loaded. This means that anyone who downloads your new proxy module will have to manually dig into the source code if they want to turn this feature off. Lets give an apache administrator the ability to turn ReverseURL on and off via the conguration le. What we are going to do is create a new directive for mod proxy called ProxyReverseURL, which will either be On or Off. This will tell our code to only execute when it is On.

Lab Matthew Caesar

4.3.1

Proxy Server Conf

Because of how apache is designed, it is very difcult to pass state between one method and another. The event driven nature of apache makes it impossible to know which method call is coming next. So module state is stored in a global conguration structure, which is syncronized for the entire module. So we will use this conguration structure in a few different places. Lets go through them one by one. First, we need to add a variable to the struct to keep track of the ReverseURL state. Open up modules/proxy/mod proxy.h. This is where the cong struct is located. It is called proxy server conf. At the bottom of the struct, add an int variable to keep track of the state for reverseurl. Make sure you add it to the bottom of the struct. There are a few methods that use this struct which require the order of the other variables to not change. Next, we need to set the default value of this state. Open modules/proxy/mod proxy.c again and nd the create proxy cong method. This method initializes all of the default data in the conguration struct. At the bottom, initialize your new state variable to 0. The last basic thing we need to do is have our reversal code check this state variable before actually executing. So nd your reversal code and rst check if your new state variable is set to 1. The conguration struct should already be initialized in the method, so you should just have to gure out what the variable is called to access the state. 4.3.2 Directives

So there is one last peice to this puzzle. Currently, the reverse url code is only running when conf >reverseurl is set to 1, but it defaults to 0, and there is no way to change that easily. That was where this part comes in. Remember from above we talked about Directives. Those are the statements inside conguration les that apache loads. We are going to create a new directive just for mod proxy that turns reverseurl on and off. This will involve two steps: First, we need to actually declare the directive. Find the variable proxy cmds[]. This is a tabular variable to holds every directive for mod proxy. We want to add a new AP INIT FLAG directive. This is a directive type that only take 1 input, On or Off. AP INIT FLAG takes 5 aurguments. The rst aurgument is the name of your directive. It is what is typed into a conguration le. The second aurgument is the callback function that is called whenever the directive is encountered in a le. For our purposed, the third aurgument should be set to NULL and the fourth aurgument should be set to RSRC CONF. The last aurgument is a description of what the directive does, and is just a string literal. Use the other AP INIT FLAG parameters as a guideline to setup your own directive. And nally, we will need to create the callback function to be called whenever we see our new directive (remember the function name you put as aurgument 2 in the declaration?) So create a new method with that name. Use the method set proxy error override as an example. Once this is all done, recompile your code and reinstall, as described above. 4.3.3 Modify the cong le

Finally, we need to add our new directive to the cong le. Open up the proxy.conf cong le we created earlier. Just after the ProxyRequests On directive, add your new directive and set it to On. Save the le and restart your apache server. You should see your URLs reversed in your browser. Now go back to the cong le and set the reverse url to Off, and restart apache. You should see normal proxy operations resume.

Lab Matthew Caesar

10

5 The Report
Please submit a report with the following information and also answer the following questions in your report. 1. Copy your proxy.conf le 2. Copy the logic for reversing a URL (Just the code that actually switches the address) 3. Copy the declaration of your ReverseURL Directive 4. Copy the callback function for your ReverseURL Directive Please answer the following questions: 1. In what function did you put your ReverseURL logic? 2. With your proxy server turned on, and ReverseURL turned on, go to moc.legoog.www. What happens? 3. From the question above (after going to moc.elgoog.www), what happens if you do a search? Would there be a way to make this action work? 4. Look in the source code for this question: What does request rec >server store? How about request rec >connection? request rec >lename? 5. Using what you learned in lecture about Server Architecture, what is the purpose of apr pool t *pglobal. (This variable is found in server/main.c) 6. As we described above, the majority of Apaches logic comes from loaded modules. Even the basic http server requires quite a few modules to in order to serve up the simplest of content. For a basic (http) webpage, what module is responsible for processing such a request? What method is called to generate a request rec object from a connection (conn rec) object? 7. Refer to Appendix A for this question. Lets say we want to write a module that directs a request to a specic content directory (i.e. folder on the webserver that has webpages) based on the connections ip prex. (I truely have no idea WHY we would want to do this...but we do). Which hook(s) would you put your logic in and why? 8. Trace an HTTP GET request through Apache. To do this, rst nd the function call used to parse an HTTP GET request. Then, list the set of functions, in order, that are used to process the HTTP GET until a response is returned to the client. 9. Look in the code for the location where the select() system call is used. What purpose does the select() system call serve here? 10. Why does Apache use both threads and also the select() system call? Describe how select() and threads both achieve the same objective. Then describe why, even though they accomplish the same objective, they are still both used in this code.

Common Hooks

These are a few common hooks used by Apache modules, and their relative order in which they are posted. ap hook pre cong - Last chance to do anything before reading in conf les.

Lab Matthew Caesar

11

ap hook post cong - First chance to do anything with the conguration after reading in the conf les. ap hook process connection - Process anything from the socket connection to the client. ap hook pre read request - Last chance to do anything before accepting the request object ap hook post read request - First change to look at the request after accepting it. ap hook translate name - An opportunity to modify the URI of a request in any way. ap hook map to storage - Mapping the URI to an actual le on storage. ap hook xups - Last chance to look at the request before content generation ap hook handler - Content Generation

You might also like