Professional Documents
Culture Documents
In their simplest form, web caches store temporary copies of web objects. They are
designed primarily to improve the accessibility and availability of this type of data to end
users. Caching is not an alternative to increased connectivity, but instead optimises the
usage of available bandwidth.
• Local caches are the most common type; they sit on the edge of the LAN just
before the Internet connection. All outbound web requests are directed through
them in an effort to fulfil web requests locally before passing traffic over the
Internet connection.
• ISP caches are used on the networks of most Internet Service Providers
(ISPs). They provide customers with improved performance and conserve
bandwidth on their own external connections to the Internet.
• Reverse caches are used to reduce the workload of content provider’s web
servers. They position the cache between the web server and its internet
connection, so that when a remote user requests a web page, the request must
first pass through the cache before reaching the web server. If the cache has a
stored copy of the requested item, it delivers it direct rather than passing the
request through to the web server.
• Less bandwidth used – if content is cached locally on the LAN, web requests do
not consume Internet connection bandwidth.
• Caching benefits both the single end user and the content providers –
ISPs and other users of the same infrastructure all benefit greatly from the
reduction in bandwidth usage.
The response time of a WWW service often plays an important role in its success
or demise. From a user's perspective, the response time is the time elapsed from
when a request is initiated at a client to the time that the response is fully loaded
by the client. This paper presents a framework for accurately measuring the
client-perceived response time in a WWW service. Our framework provides
4
feedback to the service provider and eliminates the uncertainties that are
common in existing methods. This feedback can be used to determine whether
performance expectations are met, and whether additional resources (e.g. more
powerful server or better network connection) are needed. The framework can
also be used when a consolidator provides Web hosting service, in which case the
framework provides quantitative measures to verify the consolidator's compliance
to a specified Service Level Agreement. Our approach assumes the existing
infrastructure of the Internet with its current technologies and protocols. No
modification is necessary to existing browsers or servers, and we accommodate
intermediate proxies that cache documents. The only requirement is to
instrument the documents to be measured, which can be done automatically
using a tool we provide.
The number of servers and the amount of information available in the World Wide Web
have been growing exponentially in the last five years. The use of World Wide Web as an
information retrieving mechanism has also become popular. As a consequence, popular
Web servers have been receiving an increasing number of requests. Some servers
receive up to 100 million requests daily which results in more than one request per
millisecond on average. Thus, in order for a Web server to be able to respond at such a
rate, it should reduce the overhead of handling the requests to a minimum.
Currently, the greatest fraction of server latency for document requests (excluding the
execution of CGI-scripts) comes from disk accesses. When there is a request for a
document at a Web server, the server makes one or more file system calls to open, read
and close the requested file. These file system calls result in disk accesses, and when the
file is not on the local disk, file transfers through a network.
5
Hence, it is interesting to cache the files in the main memory so as to reduce access to
the local and remote disks. Indeed, RAM is much faster (by several orders of magnitude)
than magnetic disks. Such an idea has already been used in some software (for example,
harvest httpd accelerator. Such a caching mechanism was called main memory caching
or document caching. In this project we shall refer it as server caching. Server caching
might appear to have less impact on the quality of Web applications than the client
caching (or proxy caching), which aims at reducing network delay by caching remote
files. This indeed seems to be true in traditional networks where the retrieval time of a
document is dominated by transfer time due to the low-bandwidth interconnections.
However, even in such a situation, a significant portion of requests of a Web server may
be from local users at academic institutions or large companies. These clients are
typically connected to the server through high bandwidth LANs (e.g. FDDI or ATM) so that
the retrieval time is likely to be dominated by the server's latency. In the near future,
with the deployment of ATM WANs, the information retrieval time is also likely to be
dominated by the latency at the server.
6
While client caching is characterized by relatively low hit rates (varying from 20% to
40%, the server caching, however, can achieve very high hit rates due to the
particularity of Web traffic where a small number of documents of a Web server
dominates the requirements of clients. It is shown in by analyzing request traces of
several Web servers that even a small amount of main memory (512 Kbytes) can hold a
significant portion (60%) of the documents required.
In existing system there is no exact cache present at the server. Most caches are
maintained at the proxies itself. Moreover caching policies in the existing system use
Least Recently Used (LRU) page replacement algorithm in their server cache (if one is
present), but the throughput level is low in LRU when compared with our caching policy.
7
SYSTEM DESIGN
FLOW CHART FOR PROPOSED SYSTEM
Check
If not found
for the
check for
page in
the page in
the
the Server.
cache
A C B
8
A B
Is
Cache
full
Yes No
D
E F
9
F G
Calculate the
total
D turnaround
time.
Dispatch the
C requested page to
the client.
Flash
Listen for next
Page Not
request.
found.
H
11
IMPLEMENTATION DETAILS
Tomcat 4.1 is Jakarta’s Web Server which implements Servlet 2.3 and Java Server
pages. It provides a platform for developing and deploying web applications and web
services. This server is used as the web server in our project. This server is responsible
for handling all transactions between the client and the server. The cache that we
designed inside the server can also be viewed as a middle man between the client and
the server. It can be compared with the high speed cache in the memory systems.
In this phase we configure the Servlet program to handle client request. Tasks like
sending pages from server to the client using input output stream is implemented. Pages
can have images also. Dynamic file size generation is a part of next step which gives us
the file size details which is an important criteria needed for coming phases. In the next
step the mapping of URL ‘s are implemented along with URL navigation. The page name
is obtained from the client. The content type of the response is set in the response
header and the objects for the input and output streams are created. The page is first
located and then fetched either from the cache or from the server (explained in the next
phase) and dispatched to the client.
Implementing the Processing Logic:
Processing logic is the main phase of the project where caching algorithm is
implemented. In the processing logic the following tasks are performed. Firstly the time
stamp of the page in the cache (main memory) is checked with its original copy in the
Server(secondary memory). If the page is not found in the Cache the page is fetched
from the Root if found there and is cached in the Cache. A page replacement logic is
applied to the page contents in the Cache so that new page can be accommodated
within the same Cache space by replacing the intended page if the cache lacks space to
make way for the new page.
If the page is found in the cache even then the following steps are executed before
delivering the page to the client to ensure cache consistency.
cache for future use. If the search is unsuccessful then “Page not found” message is
flashed to the client.
CACHE DESIGN:
In their simplest form, web caches store temporary copies of web objects. They
are designed primarily to improve the accessibility and availability of this type of data to
end users. Caching is not an alternative to increased connectivity, but instead optimizes
the usage of available bandwidth. After the initial access/download, users can access a
single locally stored copy of the content rather than repeatedly requesting the same
content from the origin
server.
Here we have constructed a cache of predefined size to hold WebPages within this
size. This is a volatile cache that is the cache is present only as long as the server runs.
Once the server goes down and is started again a cache comes up. Again the cache gets
populated based on user requests.
The cache here is divided into two parts. The two parts are key and value. The key
acts as the index to the value part. This diagram best explains the cache.
Key Value
(Page name) (Page contents)
Page2.html 1011100101110101010010……………………….
Home.html 1010111101010001010111…………………...
The binary data of the page is packed into an user defined object which also contains the
following
CACHING POLICY:
In this system we present a cache that can cache static pages and we apply Least
Frequently Used (LFU) page replacement algorithm as our caching policy. Here you
will have a cache that scoops every page requested by the user. It then checks for the
presence of the page if the page is found the time of the page is checked with that of the
same page in the server. Depending on the time stamp the page is fetched either from
the server or from the cache. That is the recent page is fetched. This is to check whether
the user has modified the page contents in the server. If so there is no point in fetching
the page from cache as it contains the stale copy. By this way we see that whatever
page user gets from the cache is the same copy that is present in the server. And again
because the cache size is fixed we have to suggest a caching policy in order to maintain
the cache with the required pages. We use the LFU algorithm for page replacement. This
algorithm actually suits this environment as cache means hits and misses. LFU also
makes use of hits in its implementation so we justify its use here. The proposed system
has shown that the cache penalty is always low and same is the case of misses. We
have also implemented the time difference between request of page from cache and the
same page from server.
LFU Algorithm
•Algorithm: Least Frequently Used
•Least frequently used documents are removed first.
•Advantages: Simplicity., to reduce latency so that the client requests
will get served fast then LFU would be the best choice.
LFU algorithm: when free space in cache is smaller than S, repeat the following until
free cache space is at least S: replace LFU document.
LFU algorithm
MONITORING HITS:
KEY VALUE
(PAGE NAME) (HITS)
22
Index.html
Page2.html
10
Home.html
37
Results.html
500
Admit.html
61
News.html
7
The hit counter decides the fate of the page in the cache. The number of hits is directly
proportional to the page ‘s stay in cache. That is page with more hits is likely to stay in
cache than the page with fewer hits.
Whenever the cache is found full and a new page is to be placed in it the following steps
are taken.
Hit counts of all the pages are obtained. The page with the least hit count is selected.
This page is replaced with the incoming page. What if two pages have the same hit?
Then we replace the larger page.
SCREEN SHOTS:
INPUT
15
OUTPUT
16
CACHE STATISTICS:
19