You are on page 1of 58

Foundations of the Web:

HTTP, CGI and Cookies

Ethan Cerami
New York University

10/17/08 HTTP, CGI and Cookies 1


Road Map
 HTTP Overview
 Example HTTP Session
 HTTP 1.0 v. 1.1
 Structure of Client Requests/Server
Responses
 CGI Overview
 Cookies Overview

10/17/08 HTTP, CGI and Cookies 2


HTTP Overview

10/17/08 HTTP, CGI and Cookies 3


HTTP Overview
 HTTP: HyperText Transfer Protocol
 Developed by Tim Berners Lee, 1990
 Enables web clients to request
documents from web servers
 Stateless Protocol
 each HTTP request is completely
independent.
 Web Servers do not retain any memory of
related requests.
 (Cookies are actually used to maintain
state, but more on this later…)

10/17/08 HTTP, CGI and Cookies 4


HTTP Client/Server
 Client/Server Architecture
 Client: web browser that requests a
document.
 Examples: Microsoft Internet Explorer,
Netscape Navigator
 Server: web server that returns a
document
 Examples: Apache Web Server, Microsoft
IIS, etc.

10/17/08 HTTP, CGI and Cookies 5


Http Client/Server

Give me /index.html
Client Web
Web Server
Here you go...
Browser

10/17/08 HTTP, CGI and Cookies 6


HTTP via Telnet
 You can run HTTP via the UNIX
Telnet command.
 Instructions
 Log into your UNIX account
 telnet www.yahoo.com 80
 GET /
 Good method to learn the details
of HTTP

10/17/08 HTTP, CGI and Cookies 7


Sample Telnet Session
bash-2.04$ telnet www.yahoo.com 80
Trying 216.32.74.50...
Connected to www.yahoo.akadns.net.
Escape character is '^]'.
GET /
HTTP/1.0 200 OK
Content-Length: 15582
Content-Type: text/html

<html><head><title>Yahoo!</title><base href=http://www.yahoo.com/><meta
http-equiv="PICS-Label" content='(PICS-1.1
"http://www.rsac.org/ratingsv01.html" l gen true for
"http://www.yahoo.com" r (n 0 s 0 v 0 l))'></head><body><center><form
action=http://search.yahoo.com/bin/search><map name=m><area
coords="0,0,52,52" href=r/a1><area coords="53,0,121,52"
href=r/p1><area coords="122,0,191,52" href=r/m1><area
...

10/17/08 HTTP, CGI and Cookies 8


Example HTTP Session

10/17/08 HTTP, CGI and Cookies 9


Example HTTP Session
 Client requests the following URL:
http://hypothetical.ora.com:80/
 Anatomy of the Request:
 http:// HyperText Transfer Protocol;
other options: ftp, mailto, etc.
 hypothetical.ora.com: host name
 :80: Port Number. 80 is reserved for
HTTP. Ports can range from: 1-65,535
 / Root document

10/17/08 HTTP, CGI and Cookies 10


The Client Request
 Actual Browser Request:
GET / HTTP/1.1
Accept: image/gif, image/x-xbitmap,
image/ jpeg, image/pjpeg, */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE
5.01; Windows NT)
Host: hypothetical.ora.com
Connection: Keep-Alive

10/17/08 HTTP, CGI and Cookies 11


Anatomy of the Client
Request
 GET / HTTP/1.1
 Requests the root / document.
 Specifies HTTP version 1.1.
 HTTP Versions: 1.0 and 1.1 (more on this
later…)
 Accept: image/gif, image/x-xbitmap,
image/ jpeg, image/pjpeg, */*
 Indicates what type of media the browser
will accept.

10/17/08 HTTP, CGI and Cookies 12


Anatomy of the Client
Request
 Accept-Language: en-us
 Browser’s preferred language
 Accept-Encoding: gzip, deflate
 Accepts compressed data (speeds
download times.)
 User-Agent: Mozilla/4.0
(compatible; MSIE 5.01; Windows
NT)
 Indicates the browser type.

10/17/08 HTTP, CGI and Cookies 13


Anatomy of the Client
Request
 Host: hypothetical.ora.com
 Required for HTTP 1.1
 Optional for HTTP 1.0
 A Server may host multiple
hostnames. Hence, the browser
indicates the host name here.
 Connection: Keep-Alive
 Enables “persistent connections”.
Faster performance (more later…)

10/17/08 HTTP, CGI and Cookies 14


Server Response
HTTP/1.1 200 OK
Date: Mon, 24 Sept 2003 20:54:26 GMT
Server: Apache/1.3.6 (Unix)
Last-Modified: Mon, 24 Sept 2003 14:06:11 GMT
Content-length: 327
Connection: close
Content-type: text/html
<title>Sample Homepage</title>
<img src="/images/oreilly_mast.gif">
<h1>Welcome</h2>Hi there, this is a simple web page.
Granted, it may not be as elegant as some other web
pages you've seen on the net, but there are some
common qualities...

10/17/08 HTTP, CGI and Cookies 15


Anatomy of Server
Response
 HTTP/1.1 200 OK
 Server Status Code
 Code 200: Document was found
 We will examine other status codes
shortly.
 Date: Mon, 24 Sept 2003 20:54:26
GMT
 Date on the server.
 GMT (Greenwich Mean Time)
10/17/08 HTTP, CGI and Cookies 16
Anatomy of Server
Response
 Last-Modified: Mon, 24 Sept 2003
14:06:11 GMT
 Indicates the time when the
document was last modified.
 Very useful for browser caching.
 If a browser already has the page in
its cache, it may not need to request
the whole document again (more
later…)

10/17/08 HTTP, CGI and Cookies 17


Anatomy of Server
Response
 Content-length: 327
 Number of bytes in the document
response.
 Connection: close
 Indicates that the server will close the
connection.
 If the client wants to send another
request, it will need to open another
connection to the server.

10/17/08 HTTP, CGI and Cookies 18


Anatomy of Server
Response
 Content-type: text/html
 Indicates the MIME Type of the return
document.
 Multi-Purpose Internet Mail Extensions
 Enables web servers to return binary or text
files.
 Other MIME Categories:
 audio, video, images, xml
 Full list of MIME Types available online at:
http://www.iana.org/assignments/media-types/

10/17/08 HTTP, CGI and Cookies 19


Anatomy of Server
Response
The actual HTML document:
<title>Sample Homepage</title>
<img src="/images/oreilly_mast.gif">
<h1>Welcome</h2>Hi there, this is a
simple web page. Granted, it may not
be as elegant as some other web pages
you've seen on the net, but there are
some common qualities…

10/17/08 HTTP, CGI and Cookies 20


HTTP 1.0 v. 1.1

10/17/08 HTTP, CGI and Cookies 21


Getting Images
 Once a browser receives an HTML
page, it makes separate
connections to retrieve the
images.
Give me /index.html
Client Here you go... Web
Web
Now, give me logo.gifServer
Browser
Here you go...

10/17/08 HTTP, CGI and Cookies 22


HTTP 1.0 v. 1.1
 HTTP 1.0:
 For each request, you must open a
new connection with the server.
 HTTP 1.1
 For each request, the default action is
to maintain an open connection with
the server.
 Faster, Persistent Connections
 Supported by most browsers and
servers.
10/17/08 HTTP, CGI and Cookies 23
Example: HTTP 1.0 v. 1.1
 HTTP 1.0: Get HTML Page plus Images
 Open Connection: GET /index.html
 Open Connection: GET /logo.gif
 Open Connection: GET /button.gif
 HTTP 1.1: Get HTML Page plus Images
 Open Persistent Connection: GET
/index.html
 GET /logo.gif
 GET /button.gif

10/17/08 HTTP, CGI and Cookies 24


Structure of Client
Requests

10/17/08 HTTP, CGI and Cookies 25


Client Requests
 Every client request includes three
parts:
 Method: Used to indicate type of
request, HTTP Version and name of
requested document.
 Header Information: Used to specify
browser version, language, etc.
 Entity Body: Used to specify form
data for POST requests.

10/17/08 HTTP, CGI and Cookies 26


Client Methods
 GET:
 This is the same GET that we
discussed for HTML forms.
 POST:
 This is the same POST method that
we discussed for HTML forms.
 Data is sent in the entity portion of
the HTTP request.

10/17/08 HTTP, CGI and Cookies 27


One More Client Method
 HEAD:
 Similar to GET, except that the method
requests only the header information.
 Server will return date-modified, but will not
return the data portion of the requested
document.
 Useful for browser caching.
 For example:
 If browser contains a cached version of a page, it
issues a head request.
 If document has not been modified recently, use
cached version.

10/17/08 HTTP, CGI and Cookies 28


Structure of Server
Responses

10/17/08 HTTP, CGI and Cookies 29


Server Responses
 Every server response includes
three parts:
 Response line: HTTP version number,
three digit status code, and status
message.
 Header: Information about the server
 Entity Body: The actual data.

10/17/08 HTTP, CGI and Cookies 30


Server Status Codes
 100-199 Informational
 200-299 Client Request
Successful
 300-399 Client Request
Redirected
 400-499 Client Request
Incomplete
 500-599 Server Errors
10/17/08 HTTP, CGI and Cookies 31
Some Important Status
Codes
 200: OK
 Request was successful.
 301: Moved Permanently
 Server redirects client to a new URL.
 404 Not Found
 Document does not exist
 500 Internal Server Error
 Error within the Web Server
 All other status codes are available
online at:
http://www.w3.org/Protocols/HTTP/HTRESP.
html
10/17/08 HTTP, CGI and Cookies 32
Common Gateway
Interface
CGI Overview

10/17/08 HTTP, CGI and Cookies 33


Common Gateway
Interface
 What is CGI?
 A general framework for creating server
side web applications.
 Instead of returning a static web document,
web server returns the results of a program.
 For example
 browser sends the parameter: name=Ethan.
 Web server passes the request to a Perl program.
 Perl Program returns HTML that says, Hello,
Ethan!

10/17/08 HTTP, CGI and Cookies 34


CGI Overview

Name=Ethan Name=Ethan
Web C/Perl
Web
Browser Hello, Ethan! Server Hello, Ethan! Program

10/17/08 HTTP, CGI and Cookies 35


Notes on CGI
 The first mechanism for creating
dynamic web sites.
 What languages can you create
CGI programs in?
 Just about any language: C/C++,
Perl, Java, etc.

10/17/08 HTTP, CGI and Cookies 36


CGI Environment Variables
 CGI includes a number of environment
variables.
 REMOTE_ADDR: Address of client browser
 SERVER_NAME: The Server Host Name or IP
Address
 SERVER_SOFTWARE: Name and version of
the server software.
 QUERY_STRING: A String of GET or POST
Form Variables.

10/17/08 HTTP, CGI and Cookies 37


Hello, World CGI
#!/usr/bin/perl
print "Content-type: text/html\n\n";
print "Hello, World!\n";

10/17/08 HTTP, CGI and Cookies 38


From CGI to Servlets…
 That’s all you are going to cover on CGI?
 Yes, CGI still represents a good way to
create dynamic web applications.
 Nonetheless, Servlets represent a more
powerful architecture…
 If you want to get more information on
CGI, check out: CGI Programming with
Perl (O’Reilly Press.)

10/17/08 HTTP, CGI and Cookies 39


Cookies Overview

10/17/08 HTTP, CGI and Cookies 40


What is a Cookie?
 Small piece of data generated by a web
server, stored on the client’s hard drive.
 Serves as an add-on to the HTTP
specification (remember, HTTP by itself is
stateless.)
 Still somewhat controversial, as it enables
web sites to track web users and their
habits…

10/17/08 HTTP, CGI and Cookies 41


Example Cookie Use
 Web Site Acme.com wants to track the number of
unique visitors who access its site.
 If Acme.com checks the HTTP Server logs, it can
determine the number of “hits”, but cannot determine
the number of unique visitors.*
 That’s because HTTP is stateless. It retains no
memory regarding individual users.
 Cookies provide a mechanism to solve this problem.

* Actually, you could check the log files for IP addresses, b


you would still have the problem of Internet proxies.

10/17/08 HTTP, CGI and Cookies 42


Tracking Unique Visitors
 Step 1: Person A requests home page for acme.com
 Step 2: Acme.com Web Server generates a new
unique ID.
 Step 3: Server returns home page plus a cookie set
to the unique ID.
 Step 4: Each time Person A returns to acme.com,
the browser automatically sends the cookie along
with the GET request.

10/17/08 HTTP, CGI and Cookies 43


Cookie Conversation
Give me the home page!

Here’s the home page plus


a cookie.

Browser Now, give me the news page Server


(cookie is sent automatically)

I’ve seen you before… Here’s


the news page.

10/17/08 HTTP, CGI and Cookies 44


Cookie Notes
 Created in 1994 for Netscape 1.1
 Cookies cannot be larger than 4K
 No domain (e.g. netscape.com,
microsoft.com) can have more than 20
cookies.
 Cookies stay on your machine until:
 they automatically expire
 they are explicitly deleted
 Cookies work the same on all browsers. No
cross-browser problems here!

10/17/08 HTTP, CGI and Cookies 45


Magical Cookies
 The term cookie comes from an old
programming hack, called Magical
Cookies.
 If a programmer couldn’t make two
parts of a program communicate, he
would create a “magical cookie”, a small
text file containing data to transfer
between program parts.

10/17/08 HTTP, CGI and Cookies 46


Cookie Standards
 Version 0 (Netscape):
 The original cookie specification
 Implemented by all browsers and servers
 We will focus on this Version
 Version 1
 A proposed standard of the Internet Engineering
Task Force (IETF)
 Request for Comment 2109
 Unfortunately, not very widely used (hence, we will
stick to Version 0.)

10/17/08 HTTP, CGI and Cookies 47


Why use Cookies?
 Tracking unique visitors
 Creating personalized web sites
 Shopping Carts
 Tracking users across your site:
 e.g. do users that visit your sports news
page also visit your sports store?

10/17/08 HTTP, CGI and Cookies 48


Cookie Anatomy

10/17/08 HTTP, CGI and Cookies 49


Cookie Anatomy
 Version 0 specifies six cookie parts:
 Name
 Value
 Domain
 Path
 Expires
 Secure

10/17/08 HTTP, CGI and Cookies 50


Cookie Parts: Name/Value
 Name
Name of your cookie (Required)
 Cannot contain white spaces, semicolons
or commas.
 Value
 Value of your cookie (Required)

 Cannot contain white spaces, semicolons


or commas.

10/17/08 HTTP, CGI and Cookies 51


Cookie Parts: Domain
 Only pages from the domain which created a cookie are allowed
to read the cookie.
 For example, amazon.com cannot read yahoo.com’s cookies
(imagine the security flaws if this were otherwise!)
 By default, the domain is set to the full domain of the web server
that served the web page.
 For example, myserver.mydomain.com would automatically
set the domain to .myserver.mydomain.com

10/17/08 HTTP, CGI and Cookies 52


Cookie Parts: Domain
 Note that domains are always prepended with a dot.
 This is a security precaution: all domains must
have at least two periods.
 You can however, set a higher level domain
 For example, myserver.mydomain.com can set
the domain to .mydomain.com. This way
hisserver.mydomain.com and
herserver.mydomain.com can all access the same
cookies.
 No matter what, you cannot set a domain other than
your own.

10/17/08 HTTP, CGI and Cookies 53


Cookie Parts: Path
 Restricts cookie usage within the site.
 By default, the path is set to the path of the
page that created the cookie.
 Example: user requests page from
mymall.com/storea. By default, cookie will
only be returned to pages for or under
/storea.
 If you specify the path to / the cookie will be
returned to all pages (a common practice.)

10/17/08 HTTP, CGI and Cookies 54


Cookie Parts: Expires
 Specifies when the cookie will expire.
 Specified in Greenwich Mean Time (GMT):
 Wdy DD-Mon-YYYY HH:MM:SS GMT
 If you leave this value blank, browser will
delete the cookie when the user exits the
browser.
 This is known as a session cookies, as opposed to
a persistent cookie.

10/17/08 HTTP, CGI and Cookies 55


Cookie Parts: Secure
 The secure flag is designed to encrypt
cookies while in transit.
 A secure cookie will only be sent over a
secure connection (such as SSL.)
 In other words, if a cookie is set to
secure, and you only connect via a non-
secure connection, the cookie will not
be sent.

10/17/08 HTTP, CGI and Cookies 56


Example Cookie from Google
HTTP/1.1 200 OK
Cache-control: private
Content-Type: text/html
Set-Cookie:
PREF=ID=11cebd117082ef7a:TM=1074966
051:LM=
1074966051:S=CgHQLEJ57-U9oRXn;
expires=Sun, 17-Jan-
2038 19:14:07 GMT; path=/;
domain=.google.com
Content-Encoding: gzip
Server: GWS/2.1
Content-length: 1216
Date: Sat, 24 Jan 2004
10/17/08 17:40:51
HTTP, CGI and Cookies GMT 57
Example from Amazon.com
TTP/1.1 302
ate: Sat, 24 Jan 2004 17:58:29 GMT
erver: Stronghold/2.4.2 Apache/1.3.6 C2NetEU/2412 (Unix)
amarewrite/0.1 mod_fastcgi/2.2.12
et-Cookie: session-id-time=1075536000; path=/;
domain=.amazon.com; expires=Saturday, 31-Jan-2004
08:00:00 GMT
et-Cookie: session-id=103-0070896-9210277; path=/;
domain=.amazon.com; expires=Saturday, 31-Jan-2004
08:00:00 GMT
ansfer-Encoding: chunked
ontent-Type: text/html
10/17/08 HTTP, CGI and Cookies 58

You might also like