You are on page 1of 10

International Journal of Computer Engineering and Technology ENGINEERING (IJCET), ISSN 0976INTERNATIONAL JOURNAL OF COMPUTER 6367(Print), ISSN 0976

6375(Online) Volume 4, Issue 2, March April (2013), IAEME & TECHNOLOGY (IJCET)

ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume 4, Issue 2, March April (2013), pp. 356-365 IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com

IJCET
IAEME

IMPROVING ACCESS LATENCY OF WEB BROWSER BY USING CONTENT ALIASING IN PROXY CACHE SERVER
Sachin Chavan1, Nitin Chavan2
1 2

Department of Computer Engineering, MPSTME, NMIMS, Shirpur Department of Information Technology, MPSTME, NMIMS, Shirpur

ABSTRACT The web community is growing so quickly that the number of clients accessing web servers is increasing nearly tremendously. This rapid increase of web clients affected several aspects and characteristics of web such as reduced network bandwidth, increased latency, and higher response time for users who require large scale web services. This paper considers different types of proxy actions and proposes a novel design and methodology to address these issues. Focused on studies in what way they influence the browser display time. It discusses also acceptable loading times and the scope of cacheable objects. The methodology works by analysing content in the proxy cache, identifying content aliasing, duplicate suppression and by the creation of the respective soft links. The present solution makes intelligent use of the proxy cache server to overcome these problems. In this study proxies were designed to enable network administrators to control internet access from within intranet. But when proxy cache is used, there develops the problem of Aliasing. Aliasing in proxy server caches occurs when the same content is stored in the cache several times. The present methodology improves performance in case of access latency and browser response time at the same time it avoids storing the same content in cache multiple times those results in wastage of storage space. KEYWORDS: Access Latency, Cache, Web Proxy, Mirroring, and Duplicate Suppression, Content aliasing.

356

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 2, March April (2013), IAEME

1. INTRODUCTION In the field of web server management, researchers have focused on aliasing in proxy server caches for a long time. Web caching consists of storing frequently referred objects on a caching server instead of the original server, so that web servers can make better use of network bandwidth, reduce the workload on servers, and improve the response time for users. Aliasing means giving multiple names to the same thing. The proxy cache also stores all of the images and sub files for the visited pages, so if the user jumps to a new page within the same site that uses, for example, the same images, the proxy cache has them already stored and can load them into the user's browser quicker than having to retrieve them from the Web site server's remote site. Aliasing in proxy server caches occurs when the same content is stored in cache multiple times. On the World Wide Web, aliasing commonly occurs when a client makes two requests, and both the requests have the same payload. Currently, browsers perform cache lookups using Uniform Resource Locators (URLs) as identifiers. Websites that contain the same content are called mirrors. Mirrors are redundancy mechanisms built into the web space to serve web pages faster, but they cost in terms of cache space. As the amount of web traffic increases, the efficient utilization of network bandwidth increasingly becomes more important. The Technique needs to analyse web traffic to understand its characteristics. That will optimize the use of network bandwidth to reduce network latency and to improve response time for users [8]. A proxy cache is a shared network device that can undertake Web transactions on behalf of a client, and, like the browser, the proxy cache stores the content. Subsequent requests for this content, by this or any other client of the cache will trigger the cache to deliver the locally stored copy of the content, avoiding a repeat of the download from the original content source [4].

Proxy Cache Server

Bandwidth Saving and Traffic Reduction

Figure 1. Concept of Caching (Proxy Cache)

357

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 2, March April (2013), IAEME

1.1 Advantages of Caching 1. Web caching reduces the workload of the remote Web server 2. Client can obtain a cached copy at the proxy if the remote server is not available. 3. It provides us a chance to analyze an organization usage patterns. 1.2 Disadvantages of using a caching: 1. A client might be looking at stale data due to the lack of proper proxy updating. 2. The access latency may increase in the case of a cache miss due to the extra proxy processing. 3. A single proxy cache is always a bottleneck. 4. A single proxy is a single point of failure. 2. RELATED WORK 2.1 The Access Latency Latency is defined as the delay between a request for a Web page and receiving that page in its entirety. The latency problem occurs when users judge the download as too long. Unacceptable latency does not only adversely effects user satisfaction. Web pages that are loaded faster are judged to be significantly more interesting than their slower counterparts [12]. Studies on human cognition revealed that the response time shorter than 0.1 second is unnoticeable and the delay of 1 second matches the pace of interactive dialog. Following table shows the transfer rate of different connection types. Table 1. Transfer Rates for different connection Type Connection Type Slow Normal Maximum Modem 33k6 <2.734 3 3.65 Modem 56k <4.199 5 6.08 ISDN 64k <5.469 6 6.94 Cable <9.766 17 by provider ADSL <12.21 24 732 Ethernet 10Base-T (10 Megabits/sec) <73 195 977 Table shows the different parameters that affects the access time of browser. The different parameters are type of connection used by the user and the condition of connection. The timing of internet use also affects on access latency due to bandwidth sharing. 2.2 Web Traffic The amount of data sent and received by visitors to a website is web traffic. It is analysis to see the popularity of web sites and individual pages or sections within a site. Web traffic can be analyzed by viewing the traffic statistics found in the web server log file, an automatically generated list of all the pages served. Traffic analysis is conducted using access logs from web proxy server. Each entry in access logs records the URL of document being requested, date and time of the request, the name of the client host making the request, number of bytes returns to requesting client, and information that describe how the clients request was treated as proxy [1]. Processing these log entries can produce useful summary statistics about workload volume, document type and sizes, popularity of document and proxy cache performance [5].
358

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 2, March April (2013), IAEME

2.3 Static Caching It is a new approach of web caching which uses yesterdays log to predict the todays user request. The static caching algorithm defines a fixed set of URLs by analyzing the logs of previous periods. It then calculates the value of the unique URL. Depending on the value, URLs are arranged in the descending order, and the URL with the highest value is selected. This set of URLs is known as the working set. When a user requests a document and the document is present in the working set, the request is fulfilled from the cache. Otherwise, the user request is fulfilled from the origin server [6]. 2.4 Dynamic Caching Dynamic caching is more complex than static caching and requires detailed knowledge of the application. One must consider the candidates for dynamic caching carefully since, by its very nature, dynamically generated content can be different based on the state of the application. Therefore, it is important to consider under what conditions dynamically generated content can be cached returning the correct response. This requires knowledge of the application, its possible states, and other data, such as parameters that ensure the dynamic data is generated in a deterministic manner [3]. 2.5 MD5 Algorithm MD5, developed by Ron Rives in 1992, is a comparison cryptographic hash algorithm that succeeded the MD4 algorithm. MD5 takes an input of any length and generates an MD5 digest of fixed length (128 bits or 32 characters). Because MD5 uses the same algorithm every time, a particular data string always generates the same MD5 hash every time. MD5 cryptographic hash offers several advantages over its predecessors (such as MD4) and its competitors (such as, SHA and SHA.1). One of these advantages is that MD5 is a one way cryptographic hash. Another advantage is that MD5 can accept inputs of any length but still generates a fixed length output. MD5 is fast, and it is highly unlikely that two different strings can hash to the same digest. Moreover, with MD5 it is also highly unlikely that two different input strings can hash to the same digest. Furthermore, MD5 is reliable in the sense that the same input string always yields the same output digest every time [11]. 3. EXPERIMENTAL SETUP 3.1 Changing of proxy server In most of the organizations or institution server does not support the proxy cache, so it is difficult to use main server as cache server so we have to change the proxy server from main server to other server [2]. Following are the steps to switch machine to other proxy: 1. Open the browser for ex. Internet Explorer 2. In internet explorer pull down the Tools menu and click Internet Options... 3. Click the Connections tab: 4. click the LAN Settings... button: 5. In the Address: box change "proxy1 Address" to "proxy2 Address" or vice versa and click OK. 6. Click OK on the Internet Options dialogue box to get back to the browser screen and you will now be able to get external sites.

359

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 2, March April (2013), IAEME

3.2 Duplication of Data Duplication of data means storing the multiple copies of same data object. In case of cache when we cache the object or the webpage that web page is stored at cache memory but when the different users request the same page then the multiple copies of that object or web page is stored at cache memory which results in the wastage of storage space as we all know the maintenance of cache is an expensive task so such wastage is not affordable. To avoid the problem of duplication of the data objects or web page duplicate suppression mechanism is to be used [7]. If the duplicate copy of data is saved at proxy cache then it acquires more space of storage in the analysis part given in work shows that the effect of duplication in the cache space [4]. 3.3 Duplicate Suppression You can reduce storage space requirements by avoiding duplicating copies of the same data. Content Engine provides the option to suppress storage of duplicate content elements. Duplicate suppression applies to any kind of content. Incoming content is not added to the storage area if identical content exists in the storage area; only unique content is added [14]. Due to large network size there are many pages on web, most of those pages will not be referenced multiple times by any one cache, means the probability with which the Kth page will be referenced is 1/K. re-referenced follow a distribution similar to Zipfs law [9]. 3.5 Experimental Results The experimentation carried out at the lab of our institute. Some of popular websites are considered for experiment. Those websites are use to analyse for access latency of browser under different conditions. Keyword based search also used for Latency time based on the type of content either image or text search. Table 2. Response time of search engine for Text and Image Search. Text Search From From Web Server Cache Server 250 140 140 130 250 120 240 130 210 140 250 190 240 160 250 150 230 160 160 140 170 140 210 150 160 140 260 120 Image Search From From Web Server Cache Server 230 200 300 100 350 150 250 100 640 200 240 120 280 160 120 120 310 100 190 110 180 100 150 140 330 90 310 70

Keywords SVKM NMIMS RCPIT CANNON SAMSUNG NOKIA MATLAB OPERA SIEMENS MICROMAX MPSC UPSC IRCTC RRB

360

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 2, March April (2013), IAEME

Table 2 show the reduction into response time of browser when page is fetched from cache server instead of web server. First column shows the different keywords which is used for analysis. Same keywords are used for the text search and image search. Table contains response time of browser for text search as well as image search. From table we can say that there is considerable amount of reduction of access latency when the page is fetch from Proxy Cache. Figure 2 shows the comparison of response time when the page is fetch from main server and the response time when it is fetch from proxy cache. From Figure 2 we can say that there is considerable amount of reduction of the response time. Figure shows the graph plot for comparison of response time when the response comes from main source and when the response comes from local cache server for Text Search for some keywords. Here first bar shows the response time when the page is fetch from Web server where second bar shows the response time when the page is fetch from local proxy cache server where we have implemented content aliasing algorithm.

Figure.2 Response time of Search engine for Text Search From Figure 2 it is clear that in text search for keyword we get 40 or more than 40 percent of reduction of response time. Where in case of some keywords like Samsung, IRCTC, RRB, Siemens the response time is reduced by more than 70 percent. Wherein case of opera, SVKM, and UPSC it is negligible or at most 10 percent. It is due to dynamic content comes under the search. Figure 3 shows the comparison of response time for image search for given keywords when the page is fetch from main server and the response time when it is fetch from proxy cache. From Figure 3 we can say that there is difference between the response times. Figure shows the graph plot for comparison of response time when the response comes from main source and when the response comes from local cache server for Image Search for some keywords. Here first bar shows the response time when the page is fetch from Web server where second bar shows the response time when the page is fetch from local proxy cache server where we have implemented content aliasing algorithm.

361

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 2, March April (2013), IAEME

Figure 3. Response time of Search engine for Image Search From Figure 3 it is clear that in Image search for keyword we get very less amount of reduction in the response time because the images are more dynamic than the text. Table 3. Connection Time and Response time of browser for some Websites. WEBSITE www.nmims.edu www.rcpit.ac.in www.mpsc.gov.in www.upsc.gov.in www.unipune.ac.in www.wipro.com www.infosys.com www.techmahindra.com www.jaihindcollege.com www.jaihindcollege.ac.in www.msbte.com www.msbshse.ac.in www.cbse.nic.in www.irctc.com From Web Server Connection Response 7000 44000 6120 26140 5800 25700 1890 4760 2480 8600 2300 24750 1710 18180 990 18000 1210 13230 1800 15930 1800 10170 1530 4550 1130 5580 1710 12960 From Cache Server Connection Response 3000 14000 3920 10310 3200 6390 320 690 1130 1580 900 3780 770 1980 1260 7250 500 1170 540 1040 810 1130 540 1040 630 900 1670 3240

Table 3 shows the connection time and response time of browser for a various sites. It gives the comparison of connection time and response time when page is fetched from cache server instead of web server. First column shows the different websites which is used for analysis. From table we can say that there is considerable amount of reduction of access latency when the page is fetch from Proxy Cache instead of main server.
362

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 2, March April (2013), IAEME

Figure.4 Connection time for different Websites Figure 4 shows the effect of content aliasing on the access time of web browser in terms of connection time. In maximum cases we get more than 50 percent of reduction in connection time. In some cases the reduction is 30-50 percent. In case of IRCTC website the reduction in connection time is negligible. Where in case of TECHMAHINDRA website connection time increased. It is due to the dynamic content is more on website.

Figure. 5 Response time for Different Websites Figure 5 shows the comparative graph of response time of browser for different websites. When the web page is fetched from cache server then the response time is less. From above graph we can say that the reduction in response time is more than 60 percent in each case. In some cases the reduction into the response time is more than 90 percent. So by using the content aliasing in proxy cache server we get significant amount of time save in terms of response time as well as connection time.
363

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 2, March April (2013), IAEME

It is clear that amount of user time is saved by using the concept of content aliasing. We have achieved reduction of access latency by also considering other parameters like cache size, stale data. 4. CONCLUSION The analysis based experimental results proves the need for methodology that improve the web access performance to enhance bandwidth utilization and greater connectivity speed. Here the suggested Design aspects improve the web performance in terms of reduced latency, improved user response time, and optimal use of the existing bandwidth by using web caching. Content aliasing successfully detected using a web based application, database queries and files system calls. A considerable amount of duplicate storage can be avoided through the suggested methodology. It is, therefore, a very useful mechanism for web proxy caches. Moreover, the solution is successfully able to keep cached pages in synchronization with the pages on the web server, checking for new pages if needed. This work can be further optimize by the Daemon Process, which can be design and run periodically to check the consistency of the data cached and the data at the web server. This can be scheduled during the slack time with the less traffic which will not add any additional toll on the bandwidth as well as it updates the TTL Time to Live Period of the cached data. REFERENCES [1] [2] [3] Kartik Bommepally, Glisa T. K., Jeena J. Prakash, Sanasam Ranbir Singh and Hema A Murthy Internet Activity Analysis through Proxy Log IEEE, 2010. E-Services Team, Changing Proxy Server by the Robert Gordon University, School hill, Aberdeen, Scotland-2006. Chen, W.; Martin, P.; Hassanein, H.S., "Caching dynamic content on the Web," Canadian Conference on Electrical and Computer Engineering, 2003, vol.2, no., pp. 947- 950 vol.2, 4-7 May 2003. Sadhna Ahuja, Tao Wu and Sudhir Dixit On the Effects of Content Compression on Web Cache Performance, Proceedings of the International Conference on Information Technology: Computers and Communications, 2003. Mark S. Squillante, David D. Yaot and Li Zhang Web Traffic Modeling and Web Server Performance Analysis Proceedings of the 38' Conference on Decision & Control Phoenix, Arizona USA December 1999. C. E. Wills and M. Mikhailov, Studying the Impact of More Complete Server Information on Web Caching, Computer Communications, vol. 24, no. 2, pp. 184.190, May 2000. J Wang A Survey of Web Caching Schemes for the Internet - Cornell Network Research Group (C/NRG), Department of Computer Science, Cornell University 1999. N. Shivakumar and H. Garcia-Molina, Finding near Replicas of Documents on the Web Proc. Workshop on Web Databases, Mar. 1998. L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and Zipf like Distributions: Evidence and Implications. In Proc. Infocom 99. New York, NY, March, 1999.

[4]

[5]

[6]

[7] [8] [9]

364

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 2, March April (2013), IAEME

[10]

[11] [12] [13]

[14]

[15]

Guerrero, C.; Juiz, C.; Puigjaner, R.; "Web Performance and Behavior Ontology," Complex, Intelligent and Software Intensive Systems, 2008. CISIS 2008. International Conference on, vol., no., pp.219-225, 4-7 March 2008. Kimmo Jarvinen, Matti Tommiska and Jorma Skytta, Hardware Implementation Analysis of the MD5 Hash Algorithm, IEEE Computer Society. 2005. Andrzej Sieminski, The impact of Proxy caches on Browser Latency International Journal of Computer Science & Applications, 2005, Vol. II, No. II, pp. 5 21. S B Patil, Sachin Chavan, Preeti Patil; High quality design to enhance and improve performance of large scale web applications International Journal of Computer Engineering and Technology (IJCET), Volume 3, Issue 1, January- June (2012), pp. 198-205, ISSN Print: 0976 6367, ISSN Online: 0976 6375. S.Vikram Phaneendra, Minimizing Client-Server Traffic Based on AJAX, International journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 1, 2012, pp. 10 - 16, ISSN Print: 0976 6367, ISSN Online: 0976 6375. A. Suganthy, G.S.Sumithra, J.Hindusha, A.Gayathri and S.Girija, Semantic Web Services and its Challenges, International journal of Computer Engineering & Technology (IJCET), Volume 1, Issue 2, 2010, pp. 26 - 37, ISSN Print: 0976 6367, ISSN Online: 0976 6375.

365

You might also like