You are on page 1of 34

BBA IV Sem/CA II/Unit-1

Unit I
World Wide Web

The World Wide Web (usually just referred to as the Web) is a collection of millions of files
stored on thousands of computers (called web servers) all over the world. These files represent
text documents, pictures, video, sounds, programs, interactive environment and just about any
other kind of information that has ever been recorded in computer files. The Web is probably the
largest and most diverse collection of information ever assembled. The web was developed by
CERN (The European Laboratory for Particle Physics).
What unites these files is a system for linking one file to another and transmitting them
across the Internet. The HTML language allows a file to contain links to related files. Such a link
(also called a hyperlink) contains the information necessary to locate the related file on the
Internet. When you connect to the Internet and use a Web browser program, you can read, view,
hear, or otherwise interact with the Web without paying attention to whether the information that
you are accessing is stored on a computer down the hall or on the other side of the world. A news
story stored on a computer in Singapore may link you to a stock quote stored in NEW York, a
picture stored in New Delhi and an audio file stored in Tokyo. The combination of the Web
servers, the Internet, and your Web browser assembles this information seamlessly and presents
it to you as a unified whole.
By following links, you can get from almost any Web document to almost any other Web
document. For this reason, some people like to think of the entire Web as being one big
document. In this view, the links just take you from one part of the document to another.

How Web Works?

The World Wide Web is the most popular part of


the Internet by far. Once you spend time on the Web you
will begin to feel like there is no limit to what you can
discover. The Web allows rich and diverse
communication by enabling you to access and interact
with text, graphics, animation, photos, audio and video.
So just what is this miraculous creation? On the simplest
level, the Web physically consists of your personal
computer, web browser software, a connection to an
Internet service provider, computers called servers that
host digital data, and routers and switches to direct the
flow of information.
The Web is known as a client-server system. Your computer is
the client; the remote computers that store electronic files are the
servers. Here's how it works:
Let's say you want to visit the Louvre museum website.
First you enter the address or URL of the website in your web
browser (more about this shortly). Then your browser requests
the web page from the web server that hosts the Louvre's site.
The server sends the data over the Internet to your computer.
Your web browser interprets the data, displaying it on your
computer screen.
BBA IV Sem/CA II/Unit-1

The Louvre's website also has links to the sites of other museums, such as the Vatican
Museum. When you click on that link, you access the web server for the Vatican Museum. In this
way, information scattered across the globe can be linked together.

The "glue" that holds the Web together is called hypertext and hyperlinks. This feature
allows electronic files on the Web to be linked so you can jump easily between them. On the
Web, you navigate through pages of
information--commonly known as browsing
or surfing--based on what interests you at that
particular moment.
To access the Web you need a web browser,
such as Netscape Navigator or Microsoft
Internet Explorer. How does your web
browser distinguish between web pages and
other types of data on the Internet? Web
pages are written in a computer language
called Hypertext Markup Language or
HTML.

Some Web History

The World Wide Web was originally developed in 1990 at CERN, the European
Laboratory for Particle Physics. The original idea came from a young computer scientist, Tim
Berners-Lee. It is now managed by The World Wide Web Consortium.

The WWW Consortium is funded by a large number of corporate members, including


AT&T, Adobe Systems, Inc., Microsoft Corporation and Sun Microsystems, Inc. Its purpose is to
promote the growth of the Web by developing technical specifications and reference software
that will be freely available to everyone. The Consortium is run by MIT with INRIA (The French
National Institute for Research in Computer Science) acting as European host, in collaboration
with CERN.

The National Center for Supercomputing Applications (NCSA) at the University of


Illinois at Urbana-Champaign, was instrumental in the development of early graphical software
utilizing the World Wide Web features created by CERN. NCSA focuses on improving the
productivity of researchers by providing software for scientific modeling, analysis, and
visualization. The World Wide Web was an obvious way to fulfill that mission. NCSA Mosaic,
one of the earliest web browsers, was distributed free to the public. It led directly to the
phenomenal growth of the World Wide Web.

2
BBA IV Sem/CA II/Unit-1

World Wide Web Vs Internet

Many a times we do not make a distinction between the Internet and the World Wide
Web. Though they are related to each other, they are not the same. The Internet is a massive
network that connects millions of computers across globe. Whereas, web is a way by which the
information is accessed over the Internet. Information over the Internet travels from computer to
computer via protocols. While sending electronic mails Internet uses SMTP protocol, while
sharing files (files can be text, images, video or MP3), the Internet uses FTP protocol and while
exchanging web related information (i.e. hypertext information) it uses HTTP protocol. But web
uses HTTP protocol to transmit the data, share the web pages (hyperlink documents) and
exchange the business logic. It utilizes the browser such as Internet Explorer or Netscape
Navigator, to display the hypertext documents. Web therefore, can be said to be a portion of the
Internet.

Domain Name System (DNS)

Format. IP numbered addresses are difficult to remember. People are better in remembering
names and mnemonics (symbols, letters etc). Therefore, numbered addresses have been mapped
into name, which consists of the host name and a domain (the group to which the computer
belongs). The general format of domain name system is given below: -
Host Name. Second Level Domain Name. First Level Domain Name

Where,
(a) Host Name is the name of the service provider or network name, e.g., VSNL.
(b) Domain Name signifies the kind of organisation. Some of the organisational and
geographic domain names are given in table.

Rules: The rules that are followed for mapping numbered IP addresses into DNS scheme are: -

(a) The DNS is distributed hierarchical naming system.

(b) A node on the DNS can be named by traversing the tree from itself to the root. At
each node, the name is added and a period (.) is appended to it until the root is
reached.

(c) Each node can have any number of child nodes but only one parent node. Child
nodes must have different names to ensure a unique naming system.

(d) All the letters used in the name of a node must be lower case with no space
between the dots (periods).

Figure shows the domain name (address) of a node with a name APJ. A domain name
Server on Internet keeps a directory of all the nodes on it.

3
BBA IV Sem/CA II/Unit-1

com in gov
mil
net edu
yahoo
vsnl apj.vsnl.net.in
apj
FIG: DOMAIN NAME

Organizational & Geographical Domain Names

com
Commercial Organization
edu
Educational
gov
Government Agencies
mil Military Organisation

com Commercial Organisation

Sites which perform some administrative functions for the


net
Net

org Non Profit Organization

au Australia

ca Canada

es Spain

fr France

hk Hong Kong

in India

jp Japan

uk United Kingdom

us United States

IP Addressing
BBA IV Sem/CA II/Unit-1

Every host and router on the Internet has a unique IP address which encodes its network
number and host number. No two machines or routers can have same IP address. The addressing
scheme on Internet uses IPv4 (Internet protocol version four), which is a 32-bit IP addressing
scheme. In this scheme, 32 bits are divided into four groups of 8-bit each joined by a period (i.e.,
8 bits.8 bits.8 bits.8 bits). With eight bits 256 (28) numbers can be represented. Thus, each eight-
bit group can represent numbers from 0 to 255. A typical IP address will appear like 137.00.2.11.
Based on this addressing scheme, networks connected on Internet have been classified into five
types as shown in figure.

Range of Hosts
Class
8 16 24 32
1.0.0.0 to
A 0 Network Host
127.255.255.255
126 Networks with 16 mil hosts
128.0.0.0 to
B 10 Network Host
191.255.255.255
16,382 Networks with 64 K hosts

192.0.0.0 to
C 110 Network Host
223.255.255.255
2 mil Networks with 254 hosts

224.0.0.0 to
D 1110 Multicast Address
239.255.255.255

240.0.0.0 to
E 11110 Reserved for future use
247.255.255.255

FIG: IP ADDRESSING SYSTEM

URL (Uniform Resource Locator)

A string of characters that specify the address of a Web page.

The browsers display is hypertext that contains pointers to the other documents. The
pointers are implemented using a concept that is central to Web browsers called Uniform
Resource Locator. URL can be thought of as a network extension of standard file name concept
except that in this case the file and its directory can exist on any computer on the network.
Typing a URL in the location area and hitting the return key will cause the browser to attempt to
retrieve that page. If the browser is successful in finding the page, the browser will display it.
This high-level explanation does not, however, convey any of the details of what is happening.
To go from a URL to having the Web page displayed, the browser needs to be able to answer
such questions as:

How can the page be accessed?


Where can the page be found?
What is the file name corresponding to the page?

5
The URL is designed to incorporate sufficient information to resolve these questions.
Quite naturally, then, the URL has three parts. We can view the format of a URL as follows:
BBA IV Sem/CA II/Unit-1

how://where/what

OR we can say in other words, URL contains three parts: the first describes the type of
resource (protocol), second part gives the name of server housing the resource, the third part
gives the full file name of resource i.e. directory, subdirectory and file name. The format is:

protocol://domain name of server/directory name/sub-directory name/file name

At this point, it is helpful to consider a sample URL to illustrate the three parts:

http://pubpages.uminn.edu/index.html

Let us break this example down into its component.

1. http-: Defines the protocol or schema by which to access the page. In this case, the
protocol is Hyper Text Transfer Protocol. This protocol is the set of rules by which an
HTML document is transferred over the Web.
2. pubpages.uminn.edu-: Identifies the domain name of the computer where the page
resides. The computer is a Web server capable of satisfying page requests. Just as a waiter
serves food, a Web server serves Web pages. The name pubpages.uminn.edu tells the
browser on which computer to find the Web page. In this case, the computer is located at
the University of Minnesota.
3. index.html-: Provides the local name (usually a filename) uniquely identifying the
specific page. If no name is specified, the Web server where the page is located may
supply a default file. On many systems, the default file is named index.html or index.htm.

This example demonstrates that the URL consists of a protocol, a Web servers domain name,
and a file name.

Entering a URL in the location field of the browser will bring up the designated Web page,
barring any problems. For example, if the Web page has moved to another machine or has been
removed, or if you type an invalid URL, or if the server you are trying to access is unavailable,
an error message will be displayed. Another way to retrieve a Web page is to mouse over and
click on a hyperlink in the Web page that is currently being displayed.
In the URL example presented earlier, the protocol to access the page was http. This is used
for transferring an HTML document. Much of the power of browser is that they are
multiprotocol. That is they can retrieve and render information from a variety of servers and
sources. The given table provides a summary of other common protocols:

Protocol Name Use Example


ftp File Transfer ftp://ftp.bio.umaine.edu
gopher Gopher gopher://gopher.tc.umn.edu/11/libraries
http Hypertext http://www.chem.uab.edu/pauling/argon.html
telnet Remote Login telnet://www.amnesty.org
Mail to Sending E-mail mailto:kim_lee@mycompany.com
Concept of Protocol
BBA IV Sem/CA II/Unit-1

For any network to exist, there must be connections between computers and agreements
(protocols) about the communication language. However, setting up connection and agreements
between disparate computers (PCs to mainframe) is complicated by the fact that over the last
decade, systems have become increasingly heterogeneous in their software and hardware as well
as their intended functionality. A range of standards for networking, called protocol stacks has
been developed.

A Protocol standard allows heterogeneous computers to talk to each other. Protocol


stacks are software that performs variety of actions necessary for data transmission between
computers. Protocol stacks are set of rules for inter computer communication that has been
agreed upon and implemented by many vendors, users and standard bodies. The protocol stack
works by residing either in a computers memory or in the memory of transmission device like
a network interface card. When data is ready for transmission it puts the data on the wire. At the
receiving end, it takes the data off the wire and prepares the data for the application, taking off
the error control information that was added at the transmission end. Internet Uses TCP/IP
(Transmission Control Protocol/ Internet Protocol) as a protocol.

Web Caching

Web caching is the storage of Web objects near the user to allow fast access, thus
improving the user experience of the Web surfer. Examples of some Web objects are Web pages
(the HTML itself), images in Web pages, etc. Web objects can be cached locally on the users
computer or on a server on the Web.

Browser cache: Browsers cache Web objects on the users machine. A browser first looks for
objects in its cache before requesting them from the website. Caching frequently used Web
objects speeds up Web surfing. For example, I often use google.com and yahoo.com. If their
logos and navigation bars are stored in my browsers cache, then the browser will pick them up
from the cache and will not have to get them from the respective websites. Getting the objects
from the cache is much faster than getting them from the websites.

Web objects can have an expiry time associated with them after which the object is considered to
be stale. A stale object is not used. If the object in the cache is stale, then it is equivalent to
the object not being in the cache. An expiry date can be specified in the http header of a Web
object. The expiry date is specified using EXPIRES and CACHE-CONTROL http headers.

What are the Advantages of Web Caching?

Web caching has the following advantages:

Faster delivery of Web objects to the end user.


Reduces bandwidth needs and cost. It benefits the user, the service provider and the
website owner.
Reduces load on the website servers.

7
Web Server
BBA IV Sem/CA II/Unit-1

Web servers are computers that deliver (serves up) Web pages. In other words we can say,
a web server is a computer that stores the web pages and gives them to the client whenever asked
for. When a client or the browser sends request message, it searches for the domain name. Every
Web server has an IP address and possibly a domain name. For example, if you enter
the URL http://www.pcwebopedia.com/index.html in your browser, this sends a request to the
Web server whose domain name is pcwebopedia.com. The server then fetches the page
named index.html and sends it to your browser.

Any computer can be turned into a Web server by installing server software and connecting the
machine to the Internet. There are many Web server software applications, including public
domain software from NCSA and Apache, and commercial packages
from Microsoft, Netscape and others.

Proxy Server

A server that sits between a client application, such as a Web browser, and a real server. It
intercepts all requests to the real server to see if it can fulfill the requests itself. If not, it forwards
the request to the real server.
In computer networks, a proxy server is a server (a computer system or an application) that acts
as an intermediary for requests from clients seeking resources from other servers. A client
connects to the proxy server, requesting some service, such as a file, connection, web page, or
other resource available from a different server. The proxy server evaluates the request according
to its filtering rules. For example, it may filter traffic by IP address or protocol. If the request is
validated by the filter, the proxy provides the resource by connecting to the relevant server and
requesting the service on behalf of the client. A proxy server may optionally alter the client's
request or the server's response, and sometimes it may serve the request without contacting the
specified server. In this case, it 'caches' responses from the remote server, and returns subsequent
requests for the same content directly.

Proxy servers have two main purposes:

Improve Performance: Proxy servers can dramatically improve performance for groups
of users. This is because it saves the results of all requests for a certain amount of time.
Consider the case where both user X and user Y access the World Wide Web through a
proxy server. First user X requests a certain Web page, which we'll call Page 1. Sometime
later, user Y requests the same page. Instead of forwarding the request to the Web server
where Page 1 resides, which can be a time-consuming operation, the proxy server simply
returns the Page 1 that it already fetched for user X. Since the proxy server is often on the
same network as the user, this is a much faster operation. Real proxy servers support
hundreds or thousands of users. The major online services such as America
Online, MSN and Yahoo, for example, employ an array of proxy servers.
BBA IV Sem/CA II/Unit-1

Filter Requests: Proxy servers can also be used to filter requests. For example, a
company might use a proxy server to prevent its employees from accessing a specific set
of Web sites.

8
Firewall

A system designed to prevent unauthorized access to or from a private network.


Firewalls can be implemented in both hardware and software, or a combination of both.
Firewalls are frequently used to prevent unauthorized Internet users from accessing private
networks connected to the Internet, especially intranets. All messages entering or leaving the
intranet pass through the firewall, which examines each message and blocks those that do not
meet the specified security criteria.

There are several types of firewall techniques:

Packet filter: Looks at each packet entering or leaving the network and accepts or
rejects it based on user-defined rules. Packet filtering is fairly effective and transparent to
users, but it is difficult to configure. In addition, it is susceptible to IP spoofing.
Application gateway: Applies security mechanisms to specific applications, such
as FTP and Telnet servers. This is very effective, but can impose a performance
degradation.
Circuit-level gateway: Applies security mechanisms when a TCP or
UDP connection is established. Once the connection has been made, packets can flow
between the hosts without further checking.
Proxy server: Intercepts all messages entering and leaving the network. The proxy
server effectively hides the true network addresses.

In practice, many firewalls use two or more of these techniques in concert. A firewall is
considered a first line of defense in protecting private information. For greater security, data can
be encrypted.

Web Portal
BBA IV Sem/CA II/Unit-1

A Web portal or public portal refers to a Web site or service that offers a broad array of
resources and services, such as e-mail, forums, search engines, and online shopping malls. The
first Web portals were online services, such as AOL, that provided access to the Web, but by now
most of the traditional search engines have transformed themselves into Web portals to attract
and keep a larger audience.

An enterprise portal is a Web-based interface for users of enterprise applications. Enterprise


portals also provide access to enterprise information such as corporate databases, applications
(including Web applications), and systems.

Home Page

This is the starting point or front page of a Web site. This page usually has some sort of
table of contents on it and often describes the purpose of the site. For example,
http://www.apple.com/index.html is the home page of Apple.com. When you type in a basic
URL, such as "http://www.cnet.com," you are typically directed to the home page of the Web
site. Many people have a "personal home page," which is another way the term "home page" can
be used.

Web Page and Web Site

Web pages are what make up the World Wide Web. These documents are written in
HTML (hypertext markup language) and are translated by your Web browser. Web pages can
either be static or dynamic. Static pages show the same content each time they are viewed.
Dynamic pages have content that can change each time they are accessed. These pages are
typically written in scripting languages such as PHP, Perl, ASP, or JSP. The scripts in the pages
run functions on the server that return things like the date and time, and database information. All
the information is returned as HTML code, so when the page gets to your browser, all the
browser has to do is translate the HTML.

Please note that a Web page is not the same thing as a Web site. A Web site is a collection
of pages. A Web page is an individual HTML document. This is a good distinction to know, as
most techies have little tolerance for people who mix up the two terms.
BBA IV Sem/CA II/Unit-1

Cookies

A cookie, also known as an HTTP cookie, web cookie, or browser cookie, is used for
an origin website to send state information to a user's browser and for the browser to return the
state information to the origin site. The state information can be used for authentication,
identification of a user session, user's preferences, shopping cart contents, or anything else that
can be accomplished through storing text data on the user's computer.

Cookies cannot be programmed, cannot carry viruses, and cannot install malware on the
host computer. However, they can be used by spyware to track user's browsing activitiesa
major privacy concern that prompted European and US law makers to take action. Cookies can
also be stolen by hackers to gain access to a victim's web account.

Browsers

A Web browser is a program that your computer runs to communicate with the Web
servers on the Internet, which enables it to download and display the Web pages that you request.
A Web browser is an interface between the user and the internal working of the Internet.
Browsers are referred as Web clients or universal clients as they follow the principle of client
server technology where the browser is the client.
On typing a URL in the address window or by following hyperlinks; the browser contacts
the server by sending a request for the required information. After receiving this information the
browser displays it on the Web page in the users window.
At a minimum, a Web browser must understand HTML and display text. In recent years,
however, Internet users have came to expect a lot more. A state-of-the-art Web browser provides
a full multimedia experience, complete with pictures, sound, video, and even 3-D imaging.
Because a Web browser has the ability to interpret or display so many types of files; you
often may use a Web browser even when you are not connected to the Internet. Windows 98, for
example, uses Internet Explorer to open most image files.
There are many types of browsers; you can obtain a comprehensive list of the same from
the web site www.browsers.com. The most popular browsers; by far; are Netscape Navigator and
Microsoft Internet Explorer. Both are state-of-the-art browsers; and the competition between
them is fierce.

10
Both Navigator and Internet Explorer are available over the Internet at no charge. Microsoft
designed Internet Explorer for the Windows operating system, but it is now available for
Macintosh and some UNIX system, as well. Navigator is available for Windows, Macintosh,
UNIX, and Linux operating system.
BBA IV Sem/CA II/Unit-1

Features of a Good Browser

1. The most important feature of a web browser is the presentation of web pages without
distortion.
2. The browser should support multimedia features like sound, video, etc.
3. It should support also forms and frames. Frames divide web pages into sections, thus
improving readability.
4. A good browser should have the ability to open multiple windows.
5. Latest browsers support Active X technology, Java, VRML and other plug-in support.
6. E-mail, News, and FTP support should also be extended.
7. Last but not the least, certain amount of security features like the ability to block the
access to certain Web pages should also exist.

Internet

The Internet - Interconnected Networks - is the most well known component of


the Information Super Highway (I-Way) infrastructure. Today, Internet is an information
distribution system spanning several continents. Its general infrastructure targets not only one
electronic commerce application, such as video-on-demand or home shopping, but a wide range
on computer-based services, such as e-mail, EDI, information publishing, information retrieval
and video conferencing. Simply put, the Internet environment is unique combination of postal
service, telephone system, research library, supermarket and talk show center that enables people
to share and purchase information. Internet is viewed as a prototype for emerging I- way of
which it will become one component.

Internet began around 1965 when US Department of defence (DOD) financed the design
of a computer network to link a handful of universities and military research laboratories called
Advance Research Project Agency Net work (ARPA net). In mid 1980's National Science
Foundation (NSF) took over the control, when defence traffic moved from ARPA net to MIL net.
In 1987, the NSF created NSF net. In 1991, commercial Internet started using NSF backbone. In
1995, NSF net was decommissioned and modern Internet came into existence.

Internet Administration

The Internet, with its roots primarily in the research domain, has evolved and gained a
broader user base with significant commercial activity. Various group that coordinate Internet
issues have guided and development. Figure shows the general organization of Internet
administration.
BBA IV Sem/CA II/Unit-1

11

Internet Society (ISOC)

The Internet Society (ISOC) is an international, non-profit organization formed in 1992


to provide support for the Internet standards process. ISOC accomplishes this through
maintaining and support other Internet administrative bodies such as IAB, IETF, IRTF and
IANA. ISOC also promotes research and other scholarly activities relating to the Internet.

Internet Architecture Board (IAB)

The Internet Architecture Board (IAB) is the technical advisor to the ISOC. The main
purpose of the IAB is to oversee the continuing development of the TCP/IP Protocol Suit and to
serve advisory capacity to research members of the Internet community. IAB accomplishes this
through its two primary components, the Internet Engineering Task Force (IETF) and the Internet
Research Task Force (IRTF). Another responsibility of the IAB is the editorial management of
the RFCs. IAB is also the external liaison between the Internet and other standards organizations
and forum.

Internet Engineering Task Force (IETF)

The Internet Engineering Task Force (IETF) is a forum of working groups managed by
the Internet Engineering Steering Group (IESG). IETF is responsible for identifying operational
problems and proposing solutions to these problems. IETF also develops and reviews
specifications intended as Internet standards. The working groups are collected into areas, and
each area concentrates on a specific topic. Currently nine areas have been defined, although this
is by no means a hard and fast number. The areas are:
BBA IV Sem/CA II/Unit-1

12

Applications
Internet Protocols
Routing
Operations
User Services
Network Management
Transport
Internet protocol next generation (IPng)
Security

Internet Research Task Force

The Internet Research Task Force (IRTF) is a forum of working groups managed by the
Internet Research Steering Group (IRSG). IRTF focuses on long term research topics related to
Internet protocols, applications, architecture, and technology.

Internet Assigned Numbers Authority (IANA) and Internet Corporation for Assigned Names and
Numbers (ICANA)

The Internet Assigned Numbers Authority (IANA), supported by the U S government,


was responsible for the management of Internet domain names and addresses until October 1998.
At that time the Internet Corporation for Assigned Names and Numbers (ICANA), a private non-
profit corporation managed by an international board, assumed IANA operations.

Network Information Center (NIC)

The Network Information Center (NIC) is responsible for collecting and distributing
information about TCP/IP protocols.

History of Internet

1960s Telecommunications:- Essential to the early Internet concept was packet


switching; in which data to be transmitted is divided into small packets of information and
labeled to identify the sender and recipient. The packets were sent over a network and then
reassembled at their destination. If any packet did not arrive or was not intact, the original sender
was requested to resend the packet.
ARPANET, 1969:- In 1969, Bolt, Beranek, and Newmann, Inc., (BBN) designed a
network called the ARPANET for the United States Department of Defense. The
BBA IV Sem/CA II/Unit-1

military created ARPA to enable researchers to share super-computing power. It


was rumored that the military developed ARPANET in response to the threat of a
nuclear attack destroying the countrys communication system.

1970s Telecommunications:- In this decade, the ARPANET was used primarily by the
military, some of the larger companies, such as IBM, and universities. The general population
was not yet connected to the system and very few people were on line at work.

13

The use of Local Area Networks (LANs) became more prevalent during the 1970s. Also the idea
of an open architecture was promoted; that is, networks making up the ARPANET could have
any design. In later years, this concept had a tremendous impact on the growth of the ARPANET.
Twenty Three Nodes, 1972:- By 1972, the ARPANET was international, with nodes
in Europe at the University College in London, England, and the Royal Radar
Establishment in Norway. The number of nodes on the network was up to 23, and the
trend would be for that number to double every year from then on. Ray Tomlinson,
who worked at BBN, invented e-mail.
UUCP, 1976:- AT & T Bell Labs developed UNIX to UNIX copy. In 1977, UUCP
was distributed with UNIX.
USENET, 1979:- User Network (USENET) was starting by using UUCP to connect
Duke University and the University of North Carolina at Chapel Hill. Newsgroup
emerged from this early development.

1980s Telecommunications:- In this decade, Transmission Control Protocol/Internet


Protocol (TCP/IP), a set of rules governing how networks making up the ARPANET
communicate, was established. For the first time, the term Internet was being used to describe
the ARPANET. Security became a concern, as virus appeared. As the Internet became longer, the
Domain Name System was developed; to allow the network to expand more easily by assigning
names to host computers in distributed fashion.
CSNET, 1980:- The computer Science network (CSNET) connected all University
Computer Science departments in the United States. Computer Science departments
were relatively new and only a limited number existed in 1980. CSNET joined the
ARPANET in 1981.
BITNET, 1981:- The Because Its Time Network (BITNET) formed at the City
University of New York and connected to Yale University. Many mailing lists
originated with BITNET.
TCP/IP, 1983:- The United States Defense Communication Agency required that
TCP/IP be used for all ARPANET hosts. Since TCP/IP was distributed at no charge,
the Internet became what is called an open system. This allowed the Internet to grow
quickly, as all connected computers were now speaking the same language. Central
administration was no longer necessary to run the network.
NSFNET, 1985:- The National Science Foundation Network (NSFNET) was formed
to connect the National Science Foundations (NSFs) five super-computing centers.
This allowed researchers to access the most powerful computers in the world, at a
time when large, powerful, and expensive computers were a rarity and generally
inaccessible.
BBA IV Sem/CA II/Unit-1

The Internet Worm and IRC, 1988:- The virus called Internet Worm (created by
Robert Morris while he was a computer science graduate student at Cornell
University) was released. It infected 10 percent of all Internet hosts. Also in this year,
Internet Relay Chat (IRC) was written by Jarkko Oikarinen.
NSF Assumes Control of the ARPANET, 1989:- NSF took over control of the
ARPANET in 1989. This changeover went unnoticed by nearly all users. Also, the
number of hosts on the Internet exceeded the 1,00,000 mark.

14
1990s Telecommunications:- During the 1990s, lots of commercial organizations started getting
on-line. This stimulated the growth of the Internet like never before. URLs appeared in television
advertisements and, for the first time, young children went on-line in significant numbers.
Graphical browsing tools were developed, and the programming language HTML
allowed users all over the world to publish on what was called the World Wide Web. Millions of
people went on-line to work, shop, bank, and be entertained. The Internet played a much more
significant role in society, as many nontechnical users from all walks of life got involved with
computers. Computer literacy and Internet courses sprang up all over the world.
Gopher, 1991:- Gopher was developed at the University of Minnesota, whose sports
teams mascot is the Golden Gopher. Gopher allowed you to go for or fetch files on
the Internet using a menu based system. Many Gophers sprang up all over the
country, and all types of information could be located on Gopher servers. Gopher is
still available and accessible through Web browsers, but its popularity has faded; for
the most part, it is only of historical interest. (gopher://gopher.well.sf.ca.us/)
World Wide Web, 1991:- The World Wide Web (WWW) was created by Tim Berners-
Lee at CERN (a French acronym for the European Laboratory for Particle Physics),
as a simple way to publish information and make it available on the Internet.
WWW Publicly Available, 1992:- The interesting nature of the Web caused it to
spread, and it became available to the public in 1992. Those who first used the system
were immediately impressed.
Netscape Communications, 1994:- The company called Netscape Communications,
formed by Marc Andreessen and Jim Clark, released Netscape Navigator, a Web
browser that captured the imagination of everyone who used it. The number users of
this software grew at a phenomenal rate. Netscape made its money largely through
advertising on its Web pages.
Yahoo, 1994:- Stanford graduate students David Filo and Jerry Yang developed their
Internet Search Engine and directory called Yahoo, which is now world famous.
Java, 1995:- The Internet programming environment, Java, was released by Sun
Microsystems, Inc. This language, originally called Oak, allowed programmers to
develop Web pages that were more interactive.
Microsoft Discovers the Internet, 1995:- The software giant committed many of its
resources to developing its browser, Microsoft Internet Explorer, and Internet
applications.

Netscape Releases Sources Code, 1998:- Netscape Communications released the


source code for its Web browser.
BBA IV Sem/CA II/Unit-1

Internet Services

The Internet provides a mechanism for millions of computers to communicate, but what kind
of information is transmitted? Many services are available over the Internet, and the following
are the most popular ones.

1) E-Mail:- Enables people to send private message, as well as files, to one or more other
people.

15
2) Mailing Lists:- Enable group of people to conduct group conversations by E-mail, and
provide a way of distributing newsletters by E-mail.
3) On-line Chat:- Provides a way for real time online chatting to occur, whereby participants
read each others message within seconds of when they are sent.
4) Voice and video conferencing:- Enable two or more people to hear and see each other and
share other applications.
5) The World Wide Web:- A distributed system of interlinked pages that include text,
pictures, sound, and other information.
6) File Transfer:- Lets people download files from public file servers, including a wide
variety of programs.
7) Remote Login:- There are two programs that allow you to login to another computer from
an a/c in which you are already logged, they let you use and interact with s/w on remote
machine. To do this, you will need a second computer a/c and password that is accessible
to you.
8) Internet Telephony:- As the name suggest, Internet Telephony involves the usage of the
Internet to transmit real time audio from personal computer to another(or in some
instance to other telephone itself)
9) USENET:- It is a bulletin board service featuring a large no of discussion groups
involving millions of people around the world.
10) Archie:- It is an indexing service like library. The large number of FTP server and
archieved on the number of archie server on Internet.
11) Gopher:- Before web came into existence University of Minnesota, developed a system
called Gopher connecting Universities, Colleges and Government Authorities. Gopher
system is based on set of related menus. The entire interconnected Gopher servers are
collectively known as Gopher Space.
12) Veronica:- It provides the archies services to Gopher. Veronica services are not necessary
always easier and faster as gopher server are widely distributed.
13) WAIS:- It is an Internet Service which looks for specific information from Internet
databases. Searching is done by keywords and source documents are indexed for fast
retrieval.

Basic Structure of Internet


BBA IV Sem/CA II/Unit-1

Internet is the network of networks. Basic elements of Internet and associated


components are shown schematically in figure. Various terms have the following meanings: -

(a) Internet Service Provider (ISP):- ISP acts as an interface between end-users (which
could be a stand alone PC or LANs) and Internet. ISP acts as main crossing of the town,
which allows traffic to come out of the town and join the national highway. ISP has
routers and severs, through which it connects end-users to Internet backbone. For all
problems and management at end-user level, an end-user interacts with ISP only.

(b) Router:- A special purpose computer that directs the packets of data along a network.

16
(c) Gateway:- ISP gets connected to Internet's backbone through a Gateway. A Gateway
functions as a door to enter the Internet backbone. It connects number of ISPs to Internet
backbone. In India VSNL has been the sole Gateway service provider until recently.
However, private operators are now permitted to provide Gateway services.

(d) Internet Backbone:- Internet backbone is high bandwidth (high speed) fiber optic
cable - on which numbers of routers are in place - and is managed through Network
Operations Center of Internet. The Internet backbone is of different bandwidth in various
segments.

The basic elements of Internet are a user (standalone PC or a LAN), ISP, routers,
gateways and Internet backbone. Thus, an end-user wishes to establish link with another user on
LAN, goes through his LAN - ISP - Gateway and gets connected to distant end user through
Gateway - ISP - LAN (refer figure).

Internet R R LAN
Backbone R
LAN

R R
R R
Network R
LAN
Operation
Centre
Gateway LAN III
PC

LAN I Router PC
ISP

LAN II Server PC

PC
Stand Alone PC
BBA IV Sem/CA II/Unit-1

FIG: BASIC ELEMENTS OF INTERNET


Intranet

An intranet is a private network (usually a LAN, but may be larger) that uses TCP/IP and
other Internet standard protocols. Because it uses TCP/IP, the standard Internet communications
protocol, an intranet supports TCP/IP-based protocols, such as HTTP (the protocol that web
browsers use to talk to web servers), and SMTP and POP (the protocols that e-mail programmes
use to send and receive mail).

17
In other words, an intranet can run web servers, web clients, mail servers , and mail clients
An intranet is a network for a single organization with following features: -

It uses Internet technology Browser & TCP/IP

All services available on Internet can be implemented on intranet

It could be implemented on a single LAN or a combination of LANs

It could be implemented on a MAN or WAN

Intranet need not be connected with Internet (for outside connectivity it can be
through the Internet)

It is a private Internet of an organisation

Architecture of Intranet

The architecture of intranet is shown in figure. A simplified intranet consists of following


components: -

(a) Workstations & Client Software. A PC with any Operating System (Win 95, 98,
Mac, Unix) that supports networking can be connected on intranet as a workstation. In
addition to other application programmes, workstations run client software that provides
the user with access to network servers. On an intranet a client software will typically
include (depending upon the services provided) a browser (MS Internet explorer,
Netscape Navigator), e-mail client (outlook Express), newsreaders, chat or FTP clients.
These clients may be integrated with the OS or add-on.
(b) Servers, NOS & Server Software. This is an important area of intranet in respect
of hardware and software requirements, viz.,

(i) The servers provide services to the workstations connected with the
intranet. A network server is required to manage the LAN. Besides this,
BBA IV Sem/CA II/Unit-1

depending on the services to be provided servers would be required, e.g., Web


server, mail server, FTP server, application servers and printer server.

(ii) Network Operating System (Windows NT, Unix, and Linux) is required to
run on Network server. Client part of NOS would require to be run on
workstations.

(iii) Server software includes web server, mail server etc. (depending on the
server & services required). Many intranet server programmes run on Unix and
some on NT. Lots of freeware and shareware server programmes are available for
Unix server programmes. Windows NT server comes with a Web server (MS
Internet Information Server).

18
(iv) Intranet also needs middleware, the software that provide the access to
database from a web browser, e.g., calls to the database programme to read and
write records.

(c) Network Cards, Cabling, Switches/Hubs. These are the components that are
required to setup LAN. Commonly used network adapter card is Ethernet, most common
configuration of LAN is star topology and commonly used cables are CAT-5 or CAT-6
UTP cables.
(d) Security Systems (Firewall). If intranet is connected to the Internet, we need to
control the kind of information that can pass between intranet and Internet. The
hardware, software and procedures that provide access control make up a firewall.
Firewall systems are of two categories, viz.,

Network WWW Email News FTP Application


Server Server Server Server Server Server

Corp Switch
LAN External LAN
OR User

Firewall
Corporate Firewall
Intranet
Internet
Router Router
Public Domain
BBA IV Sem/CA II/Unit-1

FIG: ARCHITECTURE OF INTRANET

(i) Network-Level Firewalls. These firewalls examine only the headers of


each packet of information passing to or from the Internet. The firewall accepts
or rejects packets based on the packets sender, receiver and port number (each
Internet service, such as e-mail or WWW has different pot number). For example,
firewall might allow e-mail/Web packets to and from any computer on the
intranet, but allow remote login packets to and from only selected computers.

19
(ii) Application-Level Firewalls. These firewalls handle packets for each
Internet service separately, usually by running a programme called proxy server,
which accepts e-mail, Web, Chat, newsgroup and other packets from computers
on the intranet, strips off the information that identifies the packet and passes it
along to the Internet or vice versa. When the replies return, the proxy server
passes the replies back to the computer that sent the original message. To the rest
of the Internet, all packets appear to be from the proxy server, so no information
leaks out about the individual computers on your intranet. A proxy server can
keep a log of all packets that pass by. The proxy server can be configured to
allow one-way login and disallow the other way.

Advantages and Disadvantages of an Intranet

LANs and intranets both let you share hardware, software, and information by connecting
computers together. You dont need an intranet to share files and printers, or to send e-mail
among the people on your network: an LAN can do those jobs. The following are some reasons
to convert a LAN to an intranet, or to connect your computers together into an intranet: -

(a) Intranets Use Standard Protocols. Internet protocols such as TCP/IP are used on a
huge number of diverse computers. More development is happening for Internet-based
communication than other types of communication. For example, intranet users can
choose from a wide variety of e-mail programmes, because so many have been written
for the Internet.

(b) Intranets are Scalable. TCP/IP works fine on the Internet, which has millions of
host computers. So you dont have to worry about your network outgrowing its
communications protocol.

(c) Intranet Components are relatively Cheap and some are free. Because the
Internet started as an academic and military network (rather than a commercial one),
BBA IV Sem/CA II/Unit-1

there is a long tradition of free, cheap, and cooperative software development. Some of
the best Internet software is free, including Apache (the most widely used web server),
Pegasus, and Eudora Lite (two excellent e-mail client programmes).

(d) Intranets enable you to set up Internet-style Information Services. You can have
your own private web, using web servers on your intranet to serve web pages to members
of your organisation only. You can also support chat, Usenet, telnet, FTP, or other
Internet services privately on your network. Push technology (web channels) can deliver
assignments, job status, and group schedules to the users desktop via his or her browser.

(e) Intranets let People Share their Information. Everyone in your organisation can
make their information available to other employees by creating web pages for the
intranet.
20
Because many word processing programmes can now save documents as web pages,
creating pages for an intranet does not require a lot of training. Rather than printing and
distributing reports, people can put them on the intranet and send e-mail to tell everyone
where the report is stored.

Of course, intranets have some disadvantages too, including these: -

(a) Intranets Cost Money. You may need to upgrade computers, buy new software,
run new cabling, and teach people to use the new systems.

(b) People in your organization may waste time. If you connect your intranet to the
Internet, people may spend hours a week watching sports results or checking their stock
options. Even if you dont connect to the Internet, people can use the intranet to build
web sites about the company softball team and send e-mail about upcoming baby
showers. Youll need policies in place to determine how the intranet may be used.

What can you do with an Intranet?

Many organisations, especially those with large existing computer systems, have lots of
information that is hard to get at. The intranet can change all that, by using Internet tools. Here
are some ideas/ways that your organisation large or small can use as an intranet.

(a) E-mail within the organisation and to and from the Internet. People can use one
e-mail programme to exchange mail both with other intranet users and with the Internet.

(b) Private Discussion Groups. Using a mailing list manager or a news server
accessible only to people in your organisation, you can set up mailing lists or newsgroups
to encourage people to share information within departments or across the organisation.

(c) Private Websites. Each department in your organisation can create a website that
is accessible only to people on the intranet. Instead of circulating memos and handbooks,
information can go on these web sites. For example, the human resource department can
post all employee policies, job postings, and upcoming training opportunities. The
BBA IV Sem/CA II/Unit-1

marketing department can post information about products, including upcoming release
dates, how products are targeted, and other information that is not appropriate for a public
site on the Internet-based web. Every department can post web pages to shore its
information with the other departments in the organisation. By using the intranet instead
of printing on paper, it is economical to publish large documents and document that
change frequently.

(d) Access to Legacy Databases. If your organisation has information that is locked
away in an inaccessible database, you can convert the information to web pages so that
everyone on the intranet can see it. (Legacy systems are those considered outdated by
whoever is describing the system). For example, a non-profit organisation might have a
proprietary database containing all of its fundraising and membership information.

21
By using a programme that can display database information as web pages and enter
information from web page forms into the database, all the people at the non-profit
organisation can see, and even update, selected information from the database by using
only a web browser. Naturally, the programme would need to limit that could see and
change particular information in the database.

(e) Teleconferencing. Rather than spend huge amount on video teleconferencing


systems, think about using your intranet (and the Internet), instead. If your organisation
has offices in several locations, you can use the Internet for online chats with text, voice,
and even limited video.

Security Policies

In addition to a firewall, you need to take steps to make sure that the intranet is used
appropriately in your organisation: -

(a) Establish acceptable-use Policies. Post rules for using the intranet, including the
use of e-mail, the web, and discussion groups both within the intranet and on the Internet.

(b) Monitor usage. It does not mean to suggest that you look over everyones
shoulders while they use the intranet, but make sure that someone monitors the content of
the intranets web sites and discussion groups. Look for copyright infringements,
personnel issues, and security lapses.

(c) Close the door behind Departing Employees. When someone leaves the
organisation, make sure that a system is devised to close the persons accounts, change
passwords, and deny other access to the intranet.

(d) Be Vigilant about Data in general, not just about the intranet. The intranets
connection to the Internet can certainly be a security hazard, but important data can also
walk out your organisations door on a diskette in someones pocket, in a fax, or many
other ways.

Extranet
BBA IV Sem/CA II/Unit-1

An extranet is a network that links selected resources of the intranet of a company with
its customers, suppliers and other business partners. Main features of extranet are: -

(a) The link between the intranet and its business partners is achieved through TCP/IP,
the standard Internet protocol.

(b) The extranet is an extended intranet, which isolates business communication from
open Internet through secure solutions.

(c) Extranets provide the privacy and security of an intranet while retaining the global
reach of the Internet.

22
(d) Extranets use cryptography and authorization procedures for securing data flows
between intranets through the Internet.

Extranet connects intranets of business partners, suppliers, financial services, distributors,


customers etc by an agreement between collaborating partners. The emphasis is on allowing
access to authorized groups through strictly controlled mechanism.
Extranets have led to true proliferation of e-commerce and act as an engine for B2B
collaboration. It is the combination of intranets and extranets, which has established the virtual
corporation paradigm. This new virtual paradigm of e-commerce allows corporations to take
advantage of any market opportunity anywhere, anytime and offering customized services and
products. It is this combination that provides the technological backbone for strategic advantage
to organizations in terms of reach, intensity, response time and innovative skills.

Architecture of Extranet

12. Figure shows the basic architecture of an intranet with its extension to one LAN or a
single user. This makes it an extranet. Similar logic can be extended to make it general
infrastructure of extranet plus intranets as shown in figure-2.
Intranet
Company C

ISP

Internet Intranet
Intranet Public ISP Company A
ISP Domain Location 1
Company B

ISP

Intranet
Company A
Location 2
BBA IV Sem/CA II/Unit-1

FIG: ARCHITECTURE OF EXTRANET

Components of Extranet

Since extranet is an extension of intranet, the additional hardware and software that is
needed to extend an intranet, is: -

(a) Firewall servers and their software

(b) Router

(c) Internet connection (at least ISDN)

Basic Level Applications of Extranet

The basic level applications of extranet are given below: -

S No Service Applications
1 Secure e-mail For B2B Communications
2 Usenet Services Bulletin board services, one-to-many info
exchange, EDI messages, floating tenders
3 Mailing List Private one-to-many e-mail, online newsletter,
discussion group
4 File Transfer (FTP) Exchange of data between supply chains, between
Corp HQ & various companies, customer support
& sales data
5 Conferencing & Chat Electronic meetings
6 Remote login (Telnet) Access to databases & ERP software
7 Calendar Scheduling tasks

ISP

An ISP is a company that supplies Internet connectivity to home and business customers.
ISPs support one or more forms of Internet access, ranging from traditional modem dial-up to
DSL and cable modem broadband service to dedicated T1/T3 lines.
BBA IV Sem/CA II/Unit-1

More recently, wireless Internet service providers or WISPs have emerged that offer
Internet access through wireless LAN or wireless broadband networks.

24
In addition to basic connectivity, many ISPs also offer related Internet services like email, Web
hosting and access to software tools.A few companies also offer free ISP service to those
who need occasional Internet connectivity. These free offerings feature limited connect time and
are often bundled with some other product or service.

ISP Architecture

As stated earlier, for availing the Internet services, each user must be connected to an ISP.
For each modem at the user end, there is corresponding modem at the ISP. ISP has number of
servers for each service that it provides. The versatility of the ISP can be measured by the
number and type of services (in terms of value addition) provided by it to its customers. Figure
shows the typical ISP architecture.

Email WWW News Gopher WAIS Appl


Server Server Server Server Server Server

Modem Farm
Dial-up
Terminal Mod
Server Mod
Billing Mod
Server Verify User log-in &
Password
Mod
Router ISDN
connection Terminal ISDN
FIG: ARCHITECTURE
Server OF AN ISP Mod

To Internet
Searching

Searching the World Wide Web

With the advent of the World Wide Web came the wide spread availability of on-line
information. It is no longer necessary to travel to the library to find the answer to a question or
engage in research on a specialized topic. Much of what you might want to know is availability
BBA IV Sem/CA II/Unit-1

through the web. Since any one can publish on the web, the range of topics that can be found is
nearly all encompassing. However, while a lot of information is available on-line, not all of it is
completely accurate.

In all likelihood, the answers to your questions are some where on the Web, but how do
you locate them? In the early days of the Web, unless you knew exactly where to look, you had
trouble finding what you wanted.

25
Unlike a library, the pages on the Web are not as neatly organized as books on shelves, nor are
Web pages completely cataloged in one central location. Even knowing where to look for
information is not a guarantee that you will find it, since Web page addresses are constantly
changing. Usually, a forwarding address is provided for a page that has moved, but it may only
be available for a short time.

The rapid growth of the Web, as well as its huge size, has ruled out trying to keep track
manually of What is what and What is where. As people were spending their time trying to
find things on the Web, rather than actually reading the material they were after, the first
directories and search engines were being developed. These tools allow you to find information
more quickly and easily. You have probably already been using these tools, but perhaps not as
effectively as possible.

Methods of Searching

1. Directories:- The first method of finding and organizing Web information is the directory
approach. A Web directory or Web guide is a hierarchical representation of hyperlinks. The top
level of the directory typically provides a wide range of very general topics, such as arts,
automobiles, education, entertainment, news, science, sports, and so on. Each of these topics is a
hyperlink that leads to more specialized subtopics. They in turn have a number of subtopics, and
so on until you reach a specific web page.
In addition to being very easy to use, another benefit of a directory structure is you need
not know exactly what you are looking for in order to find something worthwhile. You select the
category for the topic in which you are interested. You continue to move down through hierarchy,
selecting subcategories and narrowing the search at each level, until you are presented with a list
of hyperlinks that pertain to your topic.
As you begin with zero in on your topic, you may find other interesting items of which
you were previously unaware. On the other hand, you may reach the bottom of the directory
without finding the information you were after. In such case, you may need to back track, going
up several levels and then proceeding down again. Of course, it is possible that the directory you
are searching does not contain the information you want, in this case you may decide to try either
a different directory or a search engine.
When traversing a directory downward, you are moving toward more specific topics.
When going upward, you are heading back to more general topics. Directories are useful if you
want to explore a topic and its related areas, or if you want to research a subject, but not at a very
detailed level.
If you are interested in a very specific topic, you may want to start off by using a search
engine or a meta search engine. Arriving at a very specific topic in a directory structure involves
traversing between five and ten hyperlink level.
BBA IV Sem/CA II/Unit-1

Note that while the directory structure is logically organized as a hierarchy, a specific
Web page may occur in many different parts of the hierarchy. There is usually more than one
way to reach a given page.

Popular Directories

AOL NetFind - www.aol.com/netfind


CNET Search.com - www.search.com

26
Excite - www.excite.com
Infoseek - www.infoseek.com
Looksmart - www.looksmart.com
Lycos - www.lycos.com
Magellan - www.mckinley.com
Yahoo - www.yahoo.com
Rediff - www.rediff.com

2. Search Engine:- The second approach to organizing information and locating information on
the Web is a search engine, which is a computer program that does the following:
(a) Allow you to submit a form containing a query that consists of a word or phrase
describing the specific information you are trying to locate on the Web.
(b) Searches its database to try to match your query.
(c) Collates and returns a list of click able URLs containing presentations that match your
query; the list is usually ordered, with the batter matches appearing at the top.
(d) Permits you to revise and resubmit a query.

A number of search engines also provide URLs for related or suggested topics.

Many people find that search engines are not as easy to use as directories. To use a search
engine, you supply a query by entering information into a field on the screen. To be effective,
that is, to have the search engine return a small list of URLs on your topic of interest, you often
need to be very specific. To pose such queries, you must learn the query syntax of the search
engine with which you are working. Learning the syntax so that you can phrase effective and
legal queries often requires that you read and understand the documentation accompanying the
search engine. A hyperlink to the documentation is usually provided next to the query field, and
example queries are often given.
Once you learn to use a specific search engine query language effectively, you can
quickly zoom in on very narrow topics, this is the advantage of a search engine. The
disadvantages are that you have to learn the query language and you have to learn a search
strategy.
The user-friendliness and power of query languages vary from search engine to search
engine. We recommend you try several of them and then learn the syntax of one search engines
query language. Since each search engine searches a different database, you would be best off
learning about a search engine that has indexed an gauge this by posing similar queries to a
number of search engines and seeing which one finds the best matches.
BBA IV Sem/CA II/Unit-1

Popular Search Engines

AOL NetFind - www.aol.com/netfind


Excite - www.excite.com
Infoseek - www.infoseek.com
Looksmart - www.looksmart.com
Lycos - www.lycos.com
Magellan - www.mckinley.com
Yahoo - www.yahoo.com

27
Rediff - www.rediff.com
AltaVista - altavista.digital.com
Hot Bot - www.hotbot.com
Google - www.google.com
Web Crawler - www.webcrawler.com

3. Meta Search Engine:- A meta search engine or all-in-one search engine performs a search by
calling on more than one other search engine to do the actual work. The results are collated,
duplicate retrievals are eliminated, and the results are ranked according to how well they match
your query. You are then presented with a list of URLs.

The advantage of a meta search engine is that you can access a number of different search
engines with a single query. The disadvantage is that you will often have a high noise-to-signal
ratio; that is a lot of matches will not be of interest to you. This means you will need to spend
more time evaluating the results and deciding which hyperlinks to follow.

For very specific, hard to locate topics, meta search engines can often be a good starting
point. For example, if you try to locate a topic using your favorite search engine, but fail to turn
up anything useful, you may want to query a meta search engine.

Popular Meta Search Engine

Meta Search - www.metasearch.com


Meta Crawler - www.metacrawler.com
Meta Find - www.metafind.com
Savvy Search - guaraldi.cs.colostate.edu:2000

4. Web Ring:- A web ring is community of related Web pages that are organized into a circular
ring. Each page in a ring has links that enable visitors to move to an adjacent site on the ring,
access a ring index or jump to a random site. Web sites are added continuously to the web rings.
Each ring is managed from one of the sites. Web rings are fun to visit, but they do not contain the
volume of information of the other search tools. Currently, web rings are available on many
topics, including acrobatics, religion, Spanish Hotels, Disney Land, medieval studies. Most web
rings are devoted to games. Web ring home page at www.webring.com contains more
BBA IV Sem/CA II/Unit-1

information on the web rings and how to search web rings. Another devoted to web rings is the
ring surf site, located at www.ringsurf.com.

Search Terminology

Here are a few common search related terms we should know about.

Search Tool:- Any mechanism for locating information on the Web, usually refers to a
search or meta search engine or a directory.

28
Query:- Information entered into a form on a search engines Web page that describes the
information being sought. Query need not be a question. Invariably a word or a phrase is
used. A phrase is put within the quotes e.g. Indian Tigers.
Query Syntax:- A set of rules describing what constitutes a legal query. On some search
engines, special symbols may be used in a query. Syntax defines the grammar of the
query writing. Each search engine may have different syntax rules that are available in
Help menu of the search engine.
Query Semantics:- A set of rules that defines the meaning of a query.
Page View:- The viewing of one specific HTML file without counting any graphics or
other items on the page is referred to as page view rate.
Hit/Match:- A URL that a search engine returns in response to a query. Commonly
thought of as the number of times a page on a web site is requested by a browser but this
is not accurate. Hits also includes the number of times all other files, such as graphic,
images are viewed. For example, if your home page has nine graphics on it, each time
someone views your home page, the log file registers one hit for the HTML file and nine
hits for the graphics, for a total of ten hits. Because the term hits has such an
ambiguous meaning, most people are now measuring traffic in terms of page views.
Visit:- All the pages viewed by a user within a continuous session, which can include a
single HTML file or a visit that lasts for a given duration, is called visit.
Relevancy Score:- A value that indicates how close a match a URL was to a query;
usually expressed as a value from 1 to 100, with the higher score meaning more relevant.

Search Engine Components

If you understand how a search tool works, there is a good chance you will be able to use
it more effectively. For the most part, these same ideas apply to directories; the main difference is
that the hierarchical organizational structure and categorizations for directories need to be in
place and displayed. The references include additional information about how directories are put
together.
To describe how a search engine works, we split up its functions into a number of
components: user interface, searcher, and evaluator.

User Interface:- The screen in which you type a query and which displays the search results.
Searcher:- The part that searches a database for information to match you query.
BBA IV Sem/CA II/Unit-1

Evaluator:- The function that assigns relevancy scores to the information.

In addition, a search engines database is created using the following.

Gatherer:- The component that traverses the Web, collecting information about pages.
Indexer:- The function that categorizes the data obtained by the gatherer.

For comparison, think of the different facets of a typical library, such as a acquisitions,
cataloging , indexing , and on-line searching.

29
User Interface:- The user interface must provide a mechanism by which a user can submit
queries to the search engine. This is universally done using forms. In addition, the user interface
needs to display the results of the search in a convenient way. The user should be presented with
a list of hits from their search, a relevancy score for each hit and a summary of each page that
was matched. This way, the user can make an informed choice as to which hyperlinks to follow.

Searcher:- The searcher is a program that uses the search engines index and database to see if
any matches can be found for the query. Your query must first be transformed into a syntax that
the searcher can process. Since the databases associated with search engines are extremely large
(with perhaps 25,000,000 to 50,000,000 indexed pages), a highly efficient search strategy must
be applied.

Evaluator:- The searcher locates any URLs that match your query. The hits retrieved by your
query are called the result set of the search. Not all of the hits will match your query equally
well. For example, a query about Honey Bees might be matched by a page containing the
phrase Honey Bees in the following sentence:

Ants, honey bees, and crickets are all insects.

Or by the page title

Everything You Ever Wanted To Know About Honey Bees.

Clearly, in most cases, it would be better to rank this second page much higher, as it
probably contains many more references to Honey Bees.

The ranking process is carried out by the evaluator, a program that assigns a relevancy
score to each page in the result set. The relevancy score is an indication of how well a given page
matched your query.
How is the relevancy score computed by the evaluator? This varies from search engine to
search engine. A number of different factors are involved, and each one contributes a different
percentage towards the overall ranking of a page. Some of the factors typically considered are:
a) How many times the words in the query appear in the page.
b) Whether or not the query words appear in the title.
c) The proximity of the query words to the beginning of the page.
BBA IV Sem/CA II/Unit-1

d) Whether the query words appear in the CONTENT attribute of the meta tag.
e) How many of the query words appear in the documents.

Some search engines also consider other factors in computing a relevancy score. Each
factor is weighted, and a value is computed that rates the page. The values are usually
normalized and are assigned numbers between 1 and 100, with 100 representing the best possible
match. As part of the user interface, the result set and relevancy scores computed by the
evaluator are displayed for the user. With the best matches appearing first. Hyperlinks to each hit
are provided and a short description of the page is usually given.

30
Gatherer:- A search engine obtains its information by using a gatherer, a program that traverses
the Web and collects the information about web documents. The gatherer does not collect the
information every time a query is made. Rather the gatherer is run at regular intervals, and it
returns information that is incorporated into the search engines database and is indexed.
Alternate names for gatherer are bot, crawler, robot, spider, and worm.

Indexer:- Once the gatherer retrieves information about Web pages, the information is put into a
database and indexed. The indexer function creates a set of keys (an index) that organizes the
data, so that high-speed electronic searches can be conducted and the desired information can be
located and retrieved quickly.

Types of Queries
Two types of queries are generally used for surfing-
(a) Pattern Matching Queries:- It is the most basic type of query, which is used. To
formulate a pattern-matching query a keyword or a group of keywords are used and typed
in query submission form. The search engine returns the URL of any page that contains
these keywords. The result set varies from one search engine to other. The search result
may vary if singular or plural words are used. A space between two words treats them as
two words. We can also use (+) and (-) signs to include or exclude a word from the query
words, e.g. the query +Indian+Lion-Tiger will search for the words Indian and Lion but
not Tiger. Any words within the quotes are taken as one word or phrase. These syntax
rules may vary with different search engines. For details one must go through the Help
support of that search engine.
(b) Boolean Queries:- Boolean queries involve Boolean operations AND, OR and NOT.
Most search engines permit to enter Boolean queries. Some example of Boolean queries
is given below-
(i) Lion AND Tiger Will show all pages that contains both Lion and Tiger.
(ii) Lion OR Tiger Will show all pages that contains either Lion or Tiger or
both, i.e. at least one of the word.
(iii) Lion NOT Tiger Will show all pages that contains information about Lion
but not Tiger. Thus, Boolean NOT operation is used to exclude a word.
Search Strategies

Determining which search engine to use can be challenging. You can begin by testing a
number of different search engines, trying to find one that you believe meets the following
conditions:
BBA IV Sem/CA II/Unit-1

Possesses a user friendly interface.


Has easy to understand, comprehensive documentation.
Is convenient to access; that is you do not have to wait several minutes before being able
to submit a query.
Contains a large database, so that it knows a lot about the information for which you are
searching.
Does a good job in assigning relevancy scores.

If you can find a search engine that meets most of these criteria, you should concentrate on
learning it well, rather than learning a little bit about several different search engines.

31
Once you have learned a query syntax of that search engine, you can begin to formulate
your search strategy. When you post queries to the search engine, two common situations can
occur: either your query does not turn up a sufficient number of hits, or your query turns up too
many hits. In the next sections, you will learn strategies for dealing with these situations.

1. Too Few Hits : Search Generalization

Suppose your query returns no hits or only a couple of hits, neither of which is very useful to
you. In this case, you need to generalize your search. The ways to do this include:
If you used a pattern matching query, eliminate one of the more specific keywords from
your query.
If you used a Boolean query, remove one of the keywords or phrases with which you
used AND, or delete a NOT item you specified.
If you restricted your search domain, enlarge it.
If you are still having no luck, try keywords that are more general, or exchange a couple
of the keywords with synonyms.
If this fails, you may decide to use a directory and work your way down to the topic of
interest. Another alternative would be to use a metasearch engine.

2. Too Many Hits : Search Specialization

Suppose your query returns more URLs than you could possibly look through. In this case,
you need to specialize your search.
If you started with a pattern matching query, you may want to add more keywords.
If you began with a Boolean query, you might want to AND another keyword, or use the
NOT operator to exclude some pages.
If you are still retrieving too many hits, try capitalizing proper nouns or names.
If nothing seems to work, try reviewing the first 20 or son URLs, since search engines list
the best matches near the top. If they do not contain what you are looking for, the
information they do contain may help you refine your search.
If this fails, you could resort to a directory and work your way down to the topic of
interest.
BBA IV Sem/CA II/Unit-1

32

You might also like