Professional Documents
Culture Documents
Unit I
World Wide Web
The World Wide Web (usually just referred to as the Web) is a collection of millions of files
stored on thousands of computers (called web servers) all over the world. These files represent
text documents, pictures, video, sounds, programs, interactive environment and just about any
other kind of information that has ever been recorded in computer files. The Web is probably the
largest and most diverse collection of information ever assembled. The web was developed by
CERN (The European Laboratory for Particle Physics).
What unites these files is a system for linking one file to another and transmitting them
across the Internet. The HTML language allows a file to contain links to related files. Such a link
(also called a hyperlink) contains the information necessary to locate the related file on the
Internet. When you connect to the Internet and use a Web browser program, you can read, view,
hear, or otherwise interact with the Web without paying attention to whether the information that
you are accessing is stored on a computer down the hall or on the other side of the world. A news
story stored on a computer in Singapore may link you to a stock quote stored in NEW York, a
picture stored in New Delhi and an audio file stored in Tokyo. The combination of the Web
servers, the Internet, and your Web browser assembles this information seamlessly and presents
it to you as a unified whole.
By following links, you can get from almost any Web document to almost any other Web
document. For this reason, some people like to think of the entire Web as being one big
document. In this view, the links just take you from one part of the document to another.
The Louvre's website also has links to the sites of other museums, such as the Vatican
Museum. When you click on that link, you access the web server for the Vatican Museum. In this
way, information scattered across the globe can be linked together.
The "glue" that holds the Web together is called hypertext and hyperlinks. This feature
allows electronic files on the Web to be linked so you can jump easily between them. On the
Web, you navigate through pages of
information--commonly known as browsing
or surfing--based on what interests you at that
particular moment.
To access the Web you need a web browser,
such as Netscape Navigator or Microsoft
Internet Explorer. How does your web
browser distinguish between web pages and
other types of data on the Internet? Web
pages are written in a computer language
called Hypertext Markup Language or
HTML.
The World Wide Web was originally developed in 1990 at CERN, the European
Laboratory for Particle Physics. The original idea came from a young computer scientist, Tim
Berners-Lee. It is now managed by The World Wide Web Consortium.
2
BBA IV Sem/CA II/Unit-1
Many a times we do not make a distinction between the Internet and the World Wide
Web. Though they are related to each other, they are not the same. The Internet is a massive
network that connects millions of computers across globe. Whereas, web is a way by which the
information is accessed over the Internet. Information over the Internet travels from computer to
computer via protocols. While sending electronic mails Internet uses SMTP protocol, while
sharing files (files can be text, images, video or MP3), the Internet uses FTP protocol and while
exchanging web related information (i.e. hypertext information) it uses HTTP protocol. But web
uses HTTP protocol to transmit the data, share the web pages (hyperlink documents) and
exchange the business logic. It utilizes the browser such as Internet Explorer or Netscape
Navigator, to display the hypertext documents. Web therefore, can be said to be a portion of the
Internet.
Format. IP numbered addresses are difficult to remember. People are better in remembering
names and mnemonics (symbols, letters etc). Therefore, numbered addresses have been mapped
into name, which consists of the host name and a domain (the group to which the computer
belongs). The general format of domain name system is given below: -
Host Name. Second Level Domain Name. First Level Domain Name
Where,
(a) Host Name is the name of the service provider or network name, e.g., VSNL.
(b) Domain Name signifies the kind of organisation. Some of the organisational and
geographic domain names are given in table.
Rules: The rules that are followed for mapping numbered IP addresses into DNS scheme are: -
(b) A node on the DNS can be named by traversing the tree from itself to the root. At
each node, the name is added and a period (.) is appended to it until the root is
reached.
(c) Each node can have any number of child nodes but only one parent node. Child
nodes must have different names to ensure a unique naming system.
(d) All the letters used in the name of a node must be lower case with no space
between the dots (periods).
Figure shows the domain name (address) of a node with a name APJ. A domain name
Server on Internet keeps a directory of all the nodes on it.
3
BBA IV Sem/CA II/Unit-1
com in gov
mil
net edu
yahoo
vsnl apj.vsnl.net.in
apj
FIG: DOMAIN NAME
com
Commercial Organization
edu
Educational
gov
Government Agencies
mil Military Organisation
au Australia
ca Canada
es Spain
fr France
hk Hong Kong
in India
jp Japan
uk United Kingdom
us United States
IP Addressing
BBA IV Sem/CA II/Unit-1
Every host and router on the Internet has a unique IP address which encodes its network
number and host number. No two machines or routers can have same IP address. The addressing
scheme on Internet uses IPv4 (Internet protocol version four), which is a 32-bit IP addressing
scheme. In this scheme, 32 bits are divided into four groups of 8-bit each joined by a period (i.e.,
8 bits.8 bits.8 bits.8 bits). With eight bits 256 (28) numbers can be represented. Thus, each eight-
bit group can represent numbers from 0 to 255. A typical IP address will appear like 137.00.2.11.
Based on this addressing scheme, networks connected on Internet have been classified into five
types as shown in figure.
Range of Hosts
Class
8 16 24 32
1.0.0.0 to
A 0 Network Host
127.255.255.255
126 Networks with 16 mil hosts
128.0.0.0 to
B 10 Network Host
191.255.255.255
16,382 Networks with 64 K hosts
192.0.0.0 to
C 110 Network Host
223.255.255.255
2 mil Networks with 254 hosts
224.0.0.0 to
D 1110 Multicast Address
239.255.255.255
240.0.0.0 to
E 11110 Reserved for future use
247.255.255.255
The browsers display is hypertext that contains pointers to the other documents. The
pointers are implemented using a concept that is central to Web browsers called Uniform
Resource Locator. URL can be thought of as a network extension of standard file name concept
except that in this case the file and its directory can exist on any computer on the network.
Typing a URL in the location area and hitting the return key will cause the browser to attempt to
retrieve that page. If the browser is successful in finding the page, the browser will display it.
This high-level explanation does not, however, convey any of the details of what is happening.
To go from a URL to having the Web page displayed, the browser needs to be able to answer
such questions as:
5
The URL is designed to incorporate sufficient information to resolve these questions.
Quite naturally, then, the URL has three parts. We can view the format of a URL as follows:
BBA IV Sem/CA II/Unit-1
how://where/what
OR we can say in other words, URL contains three parts: the first describes the type of
resource (protocol), second part gives the name of server housing the resource, the third part
gives the full file name of resource i.e. directory, subdirectory and file name. The format is:
At this point, it is helpful to consider a sample URL to illustrate the three parts:
http://pubpages.uminn.edu/index.html
1. http-: Defines the protocol or schema by which to access the page. In this case, the
protocol is Hyper Text Transfer Protocol. This protocol is the set of rules by which an
HTML document is transferred over the Web.
2. pubpages.uminn.edu-: Identifies the domain name of the computer where the page
resides. The computer is a Web server capable of satisfying page requests. Just as a waiter
serves food, a Web server serves Web pages. The name pubpages.uminn.edu tells the
browser on which computer to find the Web page. In this case, the computer is located at
the University of Minnesota.
3. index.html-: Provides the local name (usually a filename) uniquely identifying the
specific page. If no name is specified, the Web server where the page is located may
supply a default file. On many systems, the default file is named index.html or index.htm.
This example demonstrates that the URL consists of a protocol, a Web servers domain name,
and a file name.
Entering a URL in the location field of the browser will bring up the designated Web page,
barring any problems. For example, if the Web page has moved to another machine or has been
removed, or if you type an invalid URL, or if the server you are trying to access is unavailable,
an error message will be displayed. Another way to retrieve a Web page is to mouse over and
click on a hyperlink in the Web page that is currently being displayed.
In the URL example presented earlier, the protocol to access the page was http. This is used
for transferring an HTML document. Much of the power of browser is that they are
multiprotocol. That is they can retrieve and render information from a variety of servers and
sources. The given table provides a summary of other common protocols:
For any network to exist, there must be connections between computers and agreements
(protocols) about the communication language. However, setting up connection and agreements
between disparate computers (PCs to mainframe) is complicated by the fact that over the last
decade, systems have become increasingly heterogeneous in their software and hardware as well
as their intended functionality. A range of standards for networking, called protocol stacks has
been developed.
Web Caching
Web caching is the storage of Web objects near the user to allow fast access, thus
improving the user experience of the Web surfer. Examples of some Web objects are Web pages
(the HTML itself), images in Web pages, etc. Web objects can be cached locally on the users
computer or on a server on the Web.
Browser cache: Browsers cache Web objects on the users machine. A browser first looks for
objects in its cache before requesting them from the website. Caching frequently used Web
objects speeds up Web surfing. For example, I often use google.com and yahoo.com. If their
logos and navigation bars are stored in my browsers cache, then the browser will pick them up
from the cache and will not have to get them from the respective websites. Getting the objects
from the cache is much faster than getting them from the websites.
Web objects can have an expiry time associated with them after which the object is considered to
be stale. A stale object is not used. If the object in the cache is stale, then it is equivalent to
the object not being in the cache. An expiry date can be specified in the http header of a Web
object. The expiry date is specified using EXPIRES and CACHE-CONTROL http headers.
7
Web Server
BBA IV Sem/CA II/Unit-1
Web servers are computers that deliver (serves up) Web pages. In other words we can say,
a web server is a computer that stores the web pages and gives them to the client whenever asked
for. When a client or the browser sends request message, it searches for the domain name. Every
Web server has an IP address and possibly a domain name. For example, if you enter
the URL http://www.pcwebopedia.com/index.html in your browser, this sends a request to the
Web server whose domain name is pcwebopedia.com. The server then fetches the page
named index.html and sends it to your browser.
Any computer can be turned into a Web server by installing server software and connecting the
machine to the Internet. There are many Web server software applications, including public
domain software from NCSA and Apache, and commercial packages
from Microsoft, Netscape and others.
Proxy Server
A server that sits between a client application, such as a Web browser, and a real server. It
intercepts all requests to the real server to see if it can fulfill the requests itself. If not, it forwards
the request to the real server.
In computer networks, a proxy server is a server (a computer system or an application) that acts
as an intermediary for requests from clients seeking resources from other servers. A client
connects to the proxy server, requesting some service, such as a file, connection, web page, or
other resource available from a different server. The proxy server evaluates the request according
to its filtering rules. For example, it may filter traffic by IP address or protocol. If the request is
validated by the filter, the proxy provides the resource by connecting to the relevant server and
requesting the service on behalf of the client. A proxy server may optionally alter the client's
request or the server's response, and sometimes it may serve the request without contacting the
specified server. In this case, it 'caches' responses from the remote server, and returns subsequent
requests for the same content directly.
Improve Performance: Proxy servers can dramatically improve performance for groups
of users. This is because it saves the results of all requests for a certain amount of time.
Consider the case where both user X and user Y access the World Wide Web through a
proxy server. First user X requests a certain Web page, which we'll call Page 1. Sometime
later, user Y requests the same page. Instead of forwarding the request to the Web server
where Page 1 resides, which can be a time-consuming operation, the proxy server simply
returns the Page 1 that it already fetched for user X. Since the proxy server is often on the
same network as the user, this is a much faster operation. Real proxy servers support
hundreds or thousands of users. The major online services such as America
Online, MSN and Yahoo, for example, employ an array of proxy servers.
BBA IV Sem/CA II/Unit-1
Filter Requests: Proxy servers can also be used to filter requests. For example, a
company might use a proxy server to prevent its employees from accessing a specific set
of Web sites.
8
Firewall
Packet filter: Looks at each packet entering or leaving the network and accepts or
rejects it based on user-defined rules. Packet filtering is fairly effective and transparent to
users, but it is difficult to configure. In addition, it is susceptible to IP spoofing.
Application gateway: Applies security mechanisms to specific applications, such
as FTP and Telnet servers. This is very effective, but can impose a performance
degradation.
Circuit-level gateway: Applies security mechanisms when a TCP or
UDP connection is established. Once the connection has been made, packets can flow
between the hosts without further checking.
Proxy server: Intercepts all messages entering and leaving the network. The proxy
server effectively hides the true network addresses.
In practice, many firewalls use two or more of these techniques in concert. A firewall is
considered a first line of defense in protecting private information. For greater security, data can
be encrypted.
Web Portal
BBA IV Sem/CA II/Unit-1
A Web portal or public portal refers to a Web site or service that offers a broad array of
resources and services, such as e-mail, forums, search engines, and online shopping malls. The
first Web portals were online services, such as AOL, that provided access to the Web, but by now
most of the traditional search engines have transformed themselves into Web portals to attract
and keep a larger audience.
Home Page
This is the starting point or front page of a Web site. This page usually has some sort of
table of contents on it and often describes the purpose of the site. For example,
http://www.apple.com/index.html is the home page of Apple.com. When you type in a basic
URL, such as "http://www.cnet.com," you are typically directed to the home page of the Web
site. Many people have a "personal home page," which is another way the term "home page" can
be used.
Web pages are what make up the World Wide Web. These documents are written in
HTML (hypertext markup language) and are translated by your Web browser. Web pages can
either be static or dynamic. Static pages show the same content each time they are viewed.
Dynamic pages have content that can change each time they are accessed. These pages are
typically written in scripting languages such as PHP, Perl, ASP, or JSP. The scripts in the pages
run functions on the server that return things like the date and time, and database information. All
the information is returned as HTML code, so when the page gets to your browser, all the
browser has to do is translate the HTML.
Please note that a Web page is not the same thing as a Web site. A Web site is a collection
of pages. A Web page is an individual HTML document. This is a good distinction to know, as
most techies have little tolerance for people who mix up the two terms.
BBA IV Sem/CA II/Unit-1
Cookies
A cookie, also known as an HTTP cookie, web cookie, or browser cookie, is used for
an origin website to send state information to a user's browser and for the browser to return the
state information to the origin site. The state information can be used for authentication,
identification of a user session, user's preferences, shopping cart contents, or anything else that
can be accomplished through storing text data on the user's computer.
Cookies cannot be programmed, cannot carry viruses, and cannot install malware on the
host computer. However, they can be used by spyware to track user's browsing activitiesa
major privacy concern that prompted European and US law makers to take action. Cookies can
also be stolen by hackers to gain access to a victim's web account.
Browsers
A Web browser is a program that your computer runs to communicate with the Web
servers on the Internet, which enables it to download and display the Web pages that you request.
A Web browser is an interface between the user and the internal working of the Internet.
Browsers are referred as Web clients or universal clients as they follow the principle of client
server technology where the browser is the client.
On typing a URL in the address window or by following hyperlinks; the browser contacts
the server by sending a request for the required information. After receiving this information the
browser displays it on the Web page in the users window.
At a minimum, a Web browser must understand HTML and display text. In recent years,
however, Internet users have came to expect a lot more. A state-of-the-art Web browser provides
a full multimedia experience, complete with pictures, sound, video, and even 3-D imaging.
Because a Web browser has the ability to interpret or display so many types of files; you
often may use a Web browser even when you are not connected to the Internet. Windows 98, for
example, uses Internet Explorer to open most image files.
There are many types of browsers; you can obtain a comprehensive list of the same from
the web site www.browsers.com. The most popular browsers; by far; are Netscape Navigator and
Microsoft Internet Explorer. Both are state-of-the-art browsers; and the competition between
them is fierce.
10
Both Navigator and Internet Explorer are available over the Internet at no charge. Microsoft
designed Internet Explorer for the Windows operating system, but it is now available for
Macintosh and some UNIX system, as well. Navigator is available for Windows, Macintosh,
UNIX, and Linux operating system.
BBA IV Sem/CA II/Unit-1
1. The most important feature of a web browser is the presentation of web pages without
distortion.
2. The browser should support multimedia features like sound, video, etc.
3. It should support also forms and frames. Frames divide web pages into sections, thus
improving readability.
4. A good browser should have the ability to open multiple windows.
5. Latest browsers support Active X technology, Java, VRML and other plug-in support.
6. E-mail, News, and FTP support should also be extended.
7. Last but not the least, certain amount of security features like the ability to block the
access to certain Web pages should also exist.
Internet
Internet began around 1965 when US Department of defence (DOD) financed the design
of a computer network to link a handful of universities and military research laboratories called
Advance Research Project Agency Net work (ARPA net). In mid 1980's National Science
Foundation (NSF) took over the control, when defence traffic moved from ARPA net to MIL net.
In 1987, the NSF created NSF net. In 1991, commercial Internet started using NSF backbone. In
1995, NSF net was decommissioned and modern Internet came into existence.
Internet Administration
The Internet, with its roots primarily in the research domain, has evolved and gained a
broader user base with significant commercial activity. Various group that coordinate Internet
issues have guided and development. Figure shows the general organization of Internet
administration.
BBA IV Sem/CA II/Unit-1
11
The Internet Architecture Board (IAB) is the technical advisor to the ISOC. The main
purpose of the IAB is to oversee the continuing development of the TCP/IP Protocol Suit and to
serve advisory capacity to research members of the Internet community. IAB accomplishes this
through its two primary components, the Internet Engineering Task Force (IETF) and the Internet
Research Task Force (IRTF). Another responsibility of the IAB is the editorial management of
the RFCs. IAB is also the external liaison between the Internet and other standards organizations
and forum.
The Internet Engineering Task Force (IETF) is a forum of working groups managed by
the Internet Engineering Steering Group (IESG). IETF is responsible for identifying operational
problems and proposing solutions to these problems. IETF also develops and reviews
specifications intended as Internet standards. The working groups are collected into areas, and
each area concentrates on a specific topic. Currently nine areas have been defined, although this
is by no means a hard and fast number. The areas are:
BBA IV Sem/CA II/Unit-1
12
Applications
Internet Protocols
Routing
Operations
User Services
Network Management
Transport
Internet protocol next generation (IPng)
Security
The Internet Research Task Force (IRTF) is a forum of working groups managed by the
Internet Research Steering Group (IRSG). IRTF focuses on long term research topics related to
Internet protocols, applications, architecture, and technology.
Internet Assigned Numbers Authority (IANA) and Internet Corporation for Assigned Names and
Numbers (ICANA)
The Network Information Center (NIC) is responsible for collecting and distributing
information about TCP/IP protocols.
History of Internet
1970s Telecommunications:- In this decade, the ARPANET was used primarily by the
military, some of the larger companies, such as IBM, and universities. The general population
was not yet connected to the system and very few people were on line at work.
13
The use of Local Area Networks (LANs) became more prevalent during the 1970s. Also the idea
of an open architecture was promoted; that is, networks making up the ARPANET could have
any design. In later years, this concept had a tremendous impact on the growth of the ARPANET.
Twenty Three Nodes, 1972:- By 1972, the ARPANET was international, with nodes
in Europe at the University College in London, England, and the Royal Radar
Establishment in Norway. The number of nodes on the network was up to 23, and the
trend would be for that number to double every year from then on. Ray Tomlinson,
who worked at BBN, invented e-mail.
UUCP, 1976:- AT & T Bell Labs developed UNIX to UNIX copy. In 1977, UUCP
was distributed with UNIX.
USENET, 1979:- User Network (USENET) was starting by using UUCP to connect
Duke University and the University of North Carolina at Chapel Hill. Newsgroup
emerged from this early development.
The Internet Worm and IRC, 1988:- The virus called Internet Worm (created by
Robert Morris while he was a computer science graduate student at Cornell
University) was released. It infected 10 percent of all Internet hosts. Also in this year,
Internet Relay Chat (IRC) was written by Jarkko Oikarinen.
NSF Assumes Control of the ARPANET, 1989:- NSF took over control of the
ARPANET in 1989. This changeover went unnoticed by nearly all users. Also, the
number of hosts on the Internet exceeded the 1,00,000 mark.
14
1990s Telecommunications:- During the 1990s, lots of commercial organizations started getting
on-line. This stimulated the growth of the Internet like never before. URLs appeared in television
advertisements and, for the first time, young children went on-line in significant numbers.
Graphical browsing tools were developed, and the programming language HTML
allowed users all over the world to publish on what was called the World Wide Web. Millions of
people went on-line to work, shop, bank, and be entertained. The Internet played a much more
significant role in society, as many nontechnical users from all walks of life got involved with
computers. Computer literacy and Internet courses sprang up all over the world.
Gopher, 1991:- Gopher was developed at the University of Minnesota, whose sports
teams mascot is the Golden Gopher. Gopher allowed you to go for or fetch files on
the Internet using a menu based system. Many Gophers sprang up all over the
country, and all types of information could be located on Gopher servers. Gopher is
still available and accessible through Web browsers, but its popularity has faded; for
the most part, it is only of historical interest. (gopher://gopher.well.sf.ca.us/)
World Wide Web, 1991:- The World Wide Web (WWW) was created by Tim Berners-
Lee at CERN (a French acronym for the European Laboratory for Particle Physics),
as a simple way to publish information and make it available on the Internet.
WWW Publicly Available, 1992:- The interesting nature of the Web caused it to
spread, and it became available to the public in 1992. Those who first used the system
were immediately impressed.
Netscape Communications, 1994:- The company called Netscape Communications,
formed by Marc Andreessen and Jim Clark, released Netscape Navigator, a Web
browser that captured the imagination of everyone who used it. The number users of
this software grew at a phenomenal rate. Netscape made its money largely through
advertising on its Web pages.
Yahoo, 1994:- Stanford graduate students David Filo and Jerry Yang developed their
Internet Search Engine and directory called Yahoo, which is now world famous.
Java, 1995:- The Internet programming environment, Java, was released by Sun
Microsystems, Inc. This language, originally called Oak, allowed programmers to
develop Web pages that were more interactive.
Microsoft Discovers the Internet, 1995:- The software giant committed many of its
resources to developing its browser, Microsoft Internet Explorer, and Internet
applications.
Internet Services
The Internet provides a mechanism for millions of computers to communicate, but what kind
of information is transmitted? Many services are available over the Internet, and the following
are the most popular ones.
1) E-Mail:- Enables people to send private message, as well as files, to one or more other
people.
15
2) Mailing Lists:- Enable group of people to conduct group conversations by E-mail, and
provide a way of distributing newsletters by E-mail.
3) On-line Chat:- Provides a way for real time online chatting to occur, whereby participants
read each others message within seconds of when they are sent.
4) Voice and video conferencing:- Enable two or more people to hear and see each other and
share other applications.
5) The World Wide Web:- A distributed system of interlinked pages that include text,
pictures, sound, and other information.
6) File Transfer:- Lets people download files from public file servers, including a wide
variety of programs.
7) Remote Login:- There are two programs that allow you to login to another computer from
an a/c in which you are already logged, they let you use and interact with s/w on remote
machine. To do this, you will need a second computer a/c and password that is accessible
to you.
8) Internet Telephony:- As the name suggest, Internet Telephony involves the usage of the
Internet to transmit real time audio from personal computer to another(or in some
instance to other telephone itself)
9) USENET:- It is a bulletin board service featuring a large no of discussion groups
involving millions of people around the world.
10) Archie:- It is an indexing service like library. The large number of FTP server and
archieved on the number of archie server on Internet.
11) Gopher:- Before web came into existence University of Minnesota, developed a system
called Gopher connecting Universities, Colleges and Government Authorities. Gopher
system is based on set of related menus. The entire interconnected Gopher servers are
collectively known as Gopher Space.
12) Veronica:- It provides the archies services to Gopher. Veronica services are not necessary
always easier and faster as gopher server are widely distributed.
13) WAIS:- It is an Internet Service which looks for specific information from Internet
databases. Searching is done by keywords and source documents are indexed for fast
retrieval.
(a) Internet Service Provider (ISP):- ISP acts as an interface between end-users (which
could be a stand alone PC or LANs) and Internet. ISP acts as main crossing of the town,
which allows traffic to come out of the town and join the national highway. ISP has
routers and severs, through which it connects end-users to Internet backbone. For all
problems and management at end-user level, an end-user interacts with ISP only.
(b) Router:- A special purpose computer that directs the packets of data along a network.
16
(c) Gateway:- ISP gets connected to Internet's backbone through a Gateway. A Gateway
functions as a door to enter the Internet backbone. It connects number of ISPs to Internet
backbone. In India VSNL has been the sole Gateway service provider until recently.
However, private operators are now permitted to provide Gateway services.
(d) Internet Backbone:- Internet backbone is high bandwidth (high speed) fiber optic
cable - on which numbers of routers are in place - and is managed through Network
Operations Center of Internet. The Internet backbone is of different bandwidth in various
segments.
The basic elements of Internet are a user (standalone PC or a LAN), ISP, routers,
gateways and Internet backbone. Thus, an end-user wishes to establish link with another user on
LAN, goes through his LAN - ISP - Gateway and gets connected to distant end user through
Gateway - ISP - LAN (refer figure).
Internet R R LAN
Backbone R
LAN
R R
R R
Network R
LAN
Operation
Centre
Gateway LAN III
PC
LAN I Router PC
ISP
LAN II Server PC
PC
Stand Alone PC
BBA IV Sem/CA II/Unit-1
An intranet is a private network (usually a LAN, but may be larger) that uses TCP/IP and
other Internet standard protocols. Because it uses TCP/IP, the standard Internet communications
protocol, an intranet supports TCP/IP-based protocols, such as HTTP (the protocol that web
browsers use to talk to web servers), and SMTP and POP (the protocols that e-mail programmes
use to send and receive mail).
17
In other words, an intranet can run web servers, web clients, mail servers , and mail clients
An intranet is a network for a single organization with following features: -
Intranet need not be connected with Internet (for outside connectivity it can be
through the Internet)
Architecture of Intranet
(a) Workstations & Client Software. A PC with any Operating System (Win 95, 98,
Mac, Unix) that supports networking can be connected on intranet as a workstation. In
addition to other application programmes, workstations run client software that provides
the user with access to network servers. On an intranet a client software will typically
include (depending upon the services provided) a browser (MS Internet explorer,
Netscape Navigator), e-mail client (outlook Express), newsreaders, chat or FTP clients.
These clients may be integrated with the OS or add-on.
(b) Servers, NOS & Server Software. This is an important area of intranet in respect
of hardware and software requirements, viz.,
(i) The servers provide services to the workstations connected with the
intranet. A network server is required to manage the LAN. Besides this,
BBA IV Sem/CA II/Unit-1
(ii) Network Operating System (Windows NT, Unix, and Linux) is required to
run on Network server. Client part of NOS would require to be run on
workstations.
(iii) Server software includes web server, mail server etc. (depending on the
server & services required). Many intranet server programmes run on Unix and
some on NT. Lots of freeware and shareware server programmes are available for
Unix server programmes. Windows NT server comes with a Web server (MS
Internet Information Server).
18
(iv) Intranet also needs middleware, the software that provide the access to
database from a web browser, e.g., calls to the database programme to read and
write records.
(c) Network Cards, Cabling, Switches/Hubs. These are the components that are
required to setup LAN. Commonly used network adapter card is Ethernet, most common
configuration of LAN is star topology and commonly used cables are CAT-5 or CAT-6
UTP cables.
(d) Security Systems (Firewall). If intranet is connected to the Internet, we need to
control the kind of information that can pass between intranet and Internet. The
hardware, software and procedures that provide access control make up a firewall.
Firewall systems are of two categories, viz.,
Corp Switch
LAN External LAN
OR User
Firewall
Corporate Firewall
Intranet
Internet
Router Router
Public Domain
BBA IV Sem/CA II/Unit-1
19
(ii) Application-Level Firewalls. These firewalls handle packets for each
Internet service separately, usually by running a programme called proxy server,
which accepts e-mail, Web, Chat, newsgroup and other packets from computers
on the intranet, strips off the information that identifies the packet and passes it
along to the Internet or vice versa. When the replies return, the proxy server
passes the replies back to the computer that sent the original message. To the rest
of the Internet, all packets appear to be from the proxy server, so no information
leaks out about the individual computers on your intranet. A proxy server can
keep a log of all packets that pass by. The proxy server can be configured to
allow one-way login and disallow the other way.
LANs and intranets both let you share hardware, software, and information by connecting
computers together. You dont need an intranet to share files and printers, or to send e-mail
among the people on your network: an LAN can do those jobs. The following are some reasons
to convert a LAN to an intranet, or to connect your computers together into an intranet: -
(a) Intranets Use Standard Protocols. Internet protocols such as TCP/IP are used on a
huge number of diverse computers. More development is happening for Internet-based
communication than other types of communication. For example, intranet users can
choose from a wide variety of e-mail programmes, because so many have been written
for the Internet.
(b) Intranets are Scalable. TCP/IP works fine on the Internet, which has millions of
host computers. So you dont have to worry about your network outgrowing its
communications protocol.
(c) Intranet Components are relatively Cheap and some are free. Because the
Internet started as an academic and military network (rather than a commercial one),
BBA IV Sem/CA II/Unit-1
there is a long tradition of free, cheap, and cooperative software development. Some of
the best Internet software is free, including Apache (the most widely used web server),
Pegasus, and Eudora Lite (two excellent e-mail client programmes).
(d) Intranets enable you to set up Internet-style Information Services. You can have
your own private web, using web servers on your intranet to serve web pages to members
of your organisation only. You can also support chat, Usenet, telnet, FTP, or other
Internet services privately on your network. Push technology (web channels) can deliver
assignments, job status, and group schedules to the users desktop via his or her browser.
(e) Intranets let People Share their Information. Everyone in your organisation can
make their information available to other employees by creating web pages for the
intranet.
20
Because many word processing programmes can now save documents as web pages,
creating pages for an intranet does not require a lot of training. Rather than printing and
distributing reports, people can put them on the intranet and send e-mail to tell everyone
where the report is stored.
(a) Intranets Cost Money. You may need to upgrade computers, buy new software,
run new cabling, and teach people to use the new systems.
(b) People in your organization may waste time. If you connect your intranet to the
Internet, people may spend hours a week watching sports results or checking their stock
options. Even if you dont connect to the Internet, people can use the intranet to build
web sites about the company softball team and send e-mail about upcoming baby
showers. Youll need policies in place to determine how the intranet may be used.
Many organisations, especially those with large existing computer systems, have lots of
information that is hard to get at. The intranet can change all that, by using Internet tools. Here
are some ideas/ways that your organisation large or small can use as an intranet.
(a) E-mail within the organisation and to and from the Internet. People can use one
e-mail programme to exchange mail both with other intranet users and with the Internet.
(b) Private Discussion Groups. Using a mailing list manager or a news server
accessible only to people in your organisation, you can set up mailing lists or newsgroups
to encourage people to share information within departments or across the organisation.
(c) Private Websites. Each department in your organisation can create a website that
is accessible only to people on the intranet. Instead of circulating memos and handbooks,
information can go on these web sites. For example, the human resource department can
post all employee policies, job postings, and upcoming training opportunities. The
BBA IV Sem/CA II/Unit-1
marketing department can post information about products, including upcoming release
dates, how products are targeted, and other information that is not appropriate for a public
site on the Internet-based web. Every department can post web pages to shore its
information with the other departments in the organisation. By using the intranet instead
of printing on paper, it is economical to publish large documents and document that
change frequently.
(d) Access to Legacy Databases. If your organisation has information that is locked
away in an inaccessible database, you can convert the information to web pages so that
everyone on the intranet can see it. (Legacy systems are those considered outdated by
whoever is describing the system). For example, a non-profit organisation might have a
proprietary database containing all of its fundraising and membership information.
21
By using a programme that can display database information as web pages and enter
information from web page forms into the database, all the people at the non-profit
organisation can see, and even update, selected information from the database by using
only a web browser. Naturally, the programme would need to limit that could see and
change particular information in the database.
Security Policies
In addition to a firewall, you need to take steps to make sure that the intranet is used
appropriately in your organisation: -
(a) Establish acceptable-use Policies. Post rules for using the intranet, including the
use of e-mail, the web, and discussion groups both within the intranet and on the Internet.
(b) Monitor usage. It does not mean to suggest that you look over everyones
shoulders while they use the intranet, but make sure that someone monitors the content of
the intranets web sites and discussion groups. Look for copyright infringements,
personnel issues, and security lapses.
(c) Close the door behind Departing Employees. When someone leaves the
organisation, make sure that a system is devised to close the persons accounts, change
passwords, and deny other access to the intranet.
(d) Be Vigilant about Data in general, not just about the intranet. The intranets
connection to the Internet can certainly be a security hazard, but important data can also
walk out your organisations door on a diskette in someones pocket, in a fax, or many
other ways.
Extranet
BBA IV Sem/CA II/Unit-1
An extranet is a network that links selected resources of the intranet of a company with
its customers, suppliers and other business partners. Main features of extranet are: -
(a) The link between the intranet and its business partners is achieved through TCP/IP,
the standard Internet protocol.
(b) The extranet is an extended intranet, which isolates business communication from
open Internet through secure solutions.
(c) Extranets provide the privacy and security of an intranet while retaining the global
reach of the Internet.
22
(d) Extranets use cryptography and authorization procedures for securing data flows
between intranets through the Internet.
Architecture of Extranet
12. Figure shows the basic architecture of an intranet with its extension to one LAN or a
single user. This makes it an extranet. Similar logic can be extended to make it general
infrastructure of extranet plus intranets as shown in figure-2.
Intranet
Company C
ISP
Internet Intranet
Intranet Public ISP Company A
ISP Domain Location 1
Company B
ISP
Intranet
Company A
Location 2
BBA IV Sem/CA II/Unit-1
Components of Extranet
Since extranet is an extension of intranet, the additional hardware and software that is
needed to extend an intranet, is: -
(b) Router
S No Service Applications
1 Secure e-mail For B2B Communications
2 Usenet Services Bulletin board services, one-to-many info
exchange, EDI messages, floating tenders
3 Mailing List Private one-to-many e-mail, online newsletter,
discussion group
4 File Transfer (FTP) Exchange of data between supply chains, between
Corp HQ & various companies, customer support
& sales data
5 Conferencing & Chat Electronic meetings
6 Remote login (Telnet) Access to databases & ERP software
7 Calendar Scheduling tasks
ISP
An ISP is a company that supplies Internet connectivity to home and business customers.
ISPs support one or more forms of Internet access, ranging from traditional modem dial-up to
DSL and cable modem broadband service to dedicated T1/T3 lines.
BBA IV Sem/CA II/Unit-1
More recently, wireless Internet service providers or WISPs have emerged that offer
Internet access through wireless LAN or wireless broadband networks.
24
In addition to basic connectivity, many ISPs also offer related Internet services like email, Web
hosting and access to software tools.A few companies also offer free ISP service to those
who need occasional Internet connectivity. These free offerings feature limited connect time and
are often bundled with some other product or service.
ISP Architecture
As stated earlier, for availing the Internet services, each user must be connected to an ISP.
For each modem at the user end, there is corresponding modem at the ISP. ISP has number of
servers for each service that it provides. The versatility of the ISP can be measured by the
number and type of services (in terms of value addition) provided by it to its customers. Figure
shows the typical ISP architecture.
Modem Farm
Dial-up
Terminal Mod
Server Mod
Billing Mod
Server Verify User log-in &
Password
Mod
Router ISDN
connection Terminal ISDN
FIG: ARCHITECTURE
Server OF AN ISP Mod
To Internet
Searching
With the advent of the World Wide Web came the wide spread availability of on-line
information. It is no longer necessary to travel to the library to find the answer to a question or
engage in research on a specialized topic. Much of what you might want to know is availability
BBA IV Sem/CA II/Unit-1
through the web. Since any one can publish on the web, the range of topics that can be found is
nearly all encompassing. However, while a lot of information is available on-line, not all of it is
completely accurate.
In all likelihood, the answers to your questions are some where on the Web, but how do
you locate them? In the early days of the Web, unless you knew exactly where to look, you had
trouble finding what you wanted.
25
Unlike a library, the pages on the Web are not as neatly organized as books on shelves, nor are
Web pages completely cataloged in one central location. Even knowing where to look for
information is not a guarantee that you will find it, since Web page addresses are constantly
changing. Usually, a forwarding address is provided for a page that has moved, but it may only
be available for a short time.
The rapid growth of the Web, as well as its huge size, has ruled out trying to keep track
manually of What is what and What is where. As people were spending their time trying to
find things on the Web, rather than actually reading the material they were after, the first
directories and search engines were being developed. These tools allow you to find information
more quickly and easily. You have probably already been using these tools, but perhaps not as
effectively as possible.
Methods of Searching
1. Directories:- The first method of finding and organizing Web information is the directory
approach. A Web directory or Web guide is a hierarchical representation of hyperlinks. The top
level of the directory typically provides a wide range of very general topics, such as arts,
automobiles, education, entertainment, news, science, sports, and so on. Each of these topics is a
hyperlink that leads to more specialized subtopics. They in turn have a number of subtopics, and
so on until you reach a specific web page.
In addition to being very easy to use, another benefit of a directory structure is you need
not know exactly what you are looking for in order to find something worthwhile. You select the
category for the topic in which you are interested. You continue to move down through hierarchy,
selecting subcategories and narrowing the search at each level, until you are presented with a list
of hyperlinks that pertain to your topic.
As you begin with zero in on your topic, you may find other interesting items of which
you were previously unaware. On the other hand, you may reach the bottom of the directory
without finding the information you were after. In such case, you may need to back track, going
up several levels and then proceeding down again. Of course, it is possible that the directory you
are searching does not contain the information you want, in this case you may decide to try either
a different directory or a search engine.
When traversing a directory downward, you are moving toward more specific topics.
When going upward, you are heading back to more general topics. Directories are useful if you
want to explore a topic and its related areas, or if you want to research a subject, but not at a very
detailed level.
If you are interested in a very specific topic, you may want to start off by using a search
engine or a meta search engine. Arriving at a very specific topic in a directory structure involves
traversing between five and ten hyperlink level.
BBA IV Sem/CA II/Unit-1
Note that while the directory structure is logically organized as a hierarchy, a specific
Web page may occur in many different parts of the hierarchy. There is usually more than one
way to reach a given page.
Popular Directories
26
Excite - www.excite.com
Infoseek - www.infoseek.com
Looksmart - www.looksmart.com
Lycos - www.lycos.com
Magellan - www.mckinley.com
Yahoo - www.yahoo.com
Rediff - www.rediff.com
2. Search Engine:- The second approach to organizing information and locating information on
the Web is a search engine, which is a computer program that does the following:
(a) Allow you to submit a form containing a query that consists of a word or phrase
describing the specific information you are trying to locate on the Web.
(b) Searches its database to try to match your query.
(c) Collates and returns a list of click able URLs containing presentations that match your
query; the list is usually ordered, with the batter matches appearing at the top.
(d) Permits you to revise and resubmit a query.
A number of search engines also provide URLs for related or suggested topics.
Many people find that search engines are not as easy to use as directories. To use a search
engine, you supply a query by entering information into a field on the screen. To be effective,
that is, to have the search engine return a small list of URLs on your topic of interest, you often
need to be very specific. To pose such queries, you must learn the query syntax of the search
engine with which you are working. Learning the syntax so that you can phrase effective and
legal queries often requires that you read and understand the documentation accompanying the
search engine. A hyperlink to the documentation is usually provided next to the query field, and
example queries are often given.
Once you learn to use a specific search engine query language effectively, you can
quickly zoom in on very narrow topics, this is the advantage of a search engine. The
disadvantages are that you have to learn the query language and you have to learn a search
strategy.
The user-friendliness and power of query languages vary from search engine to search
engine. We recommend you try several of them and then learn the syntax of one search engines
query language. Since each search engine searches a different database, you would be best off
learning about a search engine that has indexed an gauge this by posing similar queries to a
number of search engines and seeing which one finds the best matches.
BBA IV Sem/CA II/Unit-1
27
Rediff - www.rediff.com
AltaVista - altavista.digital.com
Hot Bot - www.hotbot.com
Google - www.google.com
Web Crawler - www.webcrawler.com
3. Meta Search Engine:- A meta search engine or all-in-one search engine performs a search by
calling on more than one other search engine to do the actual work. The results are collated,
duplicate retrievals are eliminated, and the results are ranked according to how well they match
your query. You are then presented with a list of URLs.
The advantage of a meta search engine is that you can access a number of different search
engines with a single query. The disadvantage is that you will often have a high noise-to-signal
ratio; that is a lot of matches will not be of interest to you. This means you will need to spend
more time evaluating the results and deciding which hyperlinks to follow.
For very specific, hard to locate topics, meta search engines can often be a good starting
point. For example, if you try to locate a topic using your favorite search engine, but fail to turn
up anything useful, you may want to query a meta search engine.
4. Web Ring:- A web ring is community of related Web pages that are organized into a circular
ring. Each page in a ring has links that enable visitors to move to an adjacent site on the ring,
access a ring index or jump to a random site. Web sites are added continuously to the web rings.
Each ring is managed from one of the sites. Web rings are fun to visit, but they do not contain the
volume of information of the other search tools. Currently, web rings are available on many
topics, including acrobatics, religion, Spanish Hotels, Disney Land, medieval studies. Most web
rings are devoted to games. Web ring home page at www.webring.com contains more
BBA IV Sem/CA II/Unit-1
information on the web rings and how to search web rings. Another devoted to web rings is the
ring surf site, located at www.ringsurf.com.
Search Terminology
Here are a few common search related terms we should know about.
Search Tool:- Any mechanism for locating information on the Web, usually refers to a
search or meta search engine or a directory.
28
Query:- Information entered into a form on a search engines Web page that describes the
information being sought. Query need not be a question. Invariably a word or a phrase is
used. A phrase is put within the quotes e.g. Indian Tigers.
Query Syntax:- A set of rules describing what constitutes a legal query. On some search
engines, special symbols may be used in a query. Syntax defines the grammar of the
query writing. Each search engine may have different syntax rules that are available in
Help menu of the search engine.
Query Semantics:- A set of rules that defines the meaning of a query.
Page View:- The viewing of one specific HTML file without counting any graphics or
other items on the page is referred to as page view rate.
Hit/Match:- A URL that a search engine returns in response to a query. Commonly
thought of as the number of times a page on a web site is requested by a browser but this
is not accurate. Hits also includes the number of times all other files, such as graphic,
images are viewed. For example, if your home page has nine graphics on it, each time
someone views your home page, the log file registers one hit for the HTML file and nine
hits for the graphics, for a total of ten hits. Because the term hits has such an
ambiguous meaning, most people are now measuring traffic in terms of page views.
Visit:- All the pages viewed by a user within a continuous session, which can include a
single HTML file or a visit that lasts for a given duration, is called visit.
Relevancy Score:- A value that indicates how close a match a URL was to a query;
usually expressed as a value from 1 to 100, with the higher score meaning more relevant.
If you understand how a search tool works, there is a good chance you will be able to use
it more effectively. For the most part, these same ideas apply to directories; the main difference is
that the hierarchical organizational structure and categorizations for directories need to be in
place and displayed. The references include additional information about how directories are put
together.
To describe how a search engine works, we split up its functions into a number of
components: user interface, searcher, and evaluator.
User Interface:- The screen in which you type a query and which displays the search results.
Searcher:- The part that searches a database for information to match you query.
BBA IV Sem/CA II/Unit-1
Gatherer:- The component that traverses the Web, collecting information about pages.
Indexer:- The function that categorizes the data obtained by the gatherer.
For comparison, think of the different facets of a typical library, such as a acquisitions,
cataloging , indexing , and on-line searching.
29
User Interface:- The user interface must provide a mechanism by which a user can submit
queries to the search engine. This is universally done using forms. In addition, the user interface
needs to display the results of the search in a convenient way. The user should be presented with
a list of hits from their search, a relevancy score for each hit and a summary of each page that
was matched. This way, the user can make an informed choice as to which hyperlinks to follow.
Searcher:- The searcher is a program that uses the search engines index and database to see if
any matches can be found for the query. Your query must first be transformed into a syntax that
the searcher can process. Since the databases associated with search engines are extremely large
(with perhaps 25,000,000 to 50,000,000 indexed pages), a highly efficient search strategy must
be applied.
Evaluator:- The searcher locates any URLs that match your query. The hits retrieved by your
query are called the result set of the search. Not all of the hits will match your query equally
well. For example, a query about Honey Bees might be matched by a page containing the
phrase Honey Bees in the following sentence:
Clearly, in most cases, it would be better to rank this second page much higher, as it
probably contains many more references to Honey Bees.
The ranking process is carried out by the evaluator, a program that assigns a relevancy
score to each page in the result set. The relevancy score is an indication of how well a given page
matched your query.
How is the relevancy score computed by the evaluator? This varies from search engine to
search engine. A number of different factors are involved, and each one contributes a different
percentage towards the overall ranking of a page. Some of the factors typically considered are:
a) How many times the words in the query appear in the page.
b) Whether or not the query words appear in the title.
c) The proximity of the query words to the beginning of the page.
BBA IV Sem/CA II/Unit-1
d) Whether the query words appear in the CONTENT attribute of the meta tag.
e) How many of the query words appear in the documents.
Some search engines also consider other factors in computing a relevancy score. Each
factor is weighted, and a value is computed that rates the page. The values are usually
normalized and are assigned numbers between 1 and 100, with 100 representing the best possible
match. As part of the user interface, the result set and relevancy scores computed by the
evaluator are displayed for the user. With the best matches appearing first. Hyperlinks to each hit
are provided and a short description of the page is usually given.
30
Gatherer:- A search engine obtains its information by using a gatherer, a program that traverses
the Web and collects the information about web documents. The gatherer does not collect the
information every time a query is made. Rather the gatherer is run at regular intervals, and it
returns information that is incorporated into the search engines database and is indexed.
Alternate names for gatherer are bot, crawler, robot, spider, and worm.
Indexer:- Once the gatherer retrieves information about Web pages, the information is put into a
database and indexed. The indexer function creates a set of keys (an index) that organizes the
data, so that high-speed electronic searches can be conducted and the desired information can be
located and retrieved quickly.
Types of Queries
Two types of queries are generally used for surfing-
(a) Pattern Matching Queries:- It is the most basic type of query, which is used. To
formulate a pattern-matching query a keyword or a group of keywords are used and typed
in query submission form. The search engine returns the URL of any page that contains
these keywords. The result set varies from one search engine to other. The search result
may vary if singular or plural words are used. A space between two words treats them as
two words. We can also use (+) and (-) signs to include or exclude a word from the query
words, e.g. the query +Indian+Lion-Tiger will search for the words Indian and Lion but
not Tiger. Any words within the quotes are taken as one word or phrase. These syntax
rules may vary with different search engines. For details one must go through the Help
support of that search engine.
(b) Boolean Queries:- Boolean queries involve Boolean operations AND, OR and NOT.
Most search engines permit to enter Boolean queries. Some example of Boolean queries
is given below-
(i) Lion AND Tiger Will show all pages that contains both Lion and Tiger.
(ii) Lion OR Tiger Will show all pages that contains either Lion or Tiger or
both, i.e. at least one of the word.
(iii) Lion NOT Tiger Will show all pages that contains information about Lion
but not Tiger. Thus, Boolean NOT operation is used to exclude a word.
Search Strategies
Determining which search engine to use can be challenging. You can begin by testing a
number of different search engines, trying to find one that you believe meets the following
conditions:
BBA IV Sem/CA II/Unit-1
If you can find a search engine that meets most of these criteria, you should concentrate on
learning it well, rather than learning a little bit about several different search engines.
31
Once you have learned a query syntax of that search engine, you can begin to formulate
your search strategy. When you post queries to the search engine, two common situations can
occur: either your query does not turn up a sufficient number of hits, or your query turns up too
many hits. In the next sections, you will learn strategies for dealing with these situations.
Suppose your query returns no hits or only a couple of hits, neither of which is very useful to
you. In this case, you need to generalize your search. The ways to do this include:
If you used a pattern matching query, eliminate one of the more specific keywords from
your query.
If you used a Boolean query, remove one of the keywords or phrases with which you
used AND, or delete a NOT item you specified.
If you restricted your search domain, enlarge it.
If you are still having no luck, try keywords that are more general, or exchange a couple
of the keywords with synonyms.
If this fails, you may decide to use a directory and work your way down to the topic of
interest. Another alternative would be to use a metasearch engine.
Suppose your query returns more URLs than you could possibly look through. In this case,
you need to specialize your search.
If you started with a pattern matching query, you may want to add more keywords.
If you began with a Boolean query, you might want to AND another keyword, or use the
NOT operator to exclude some pages.
If you are still retrieving too many hits, try capitalizing proper nouns or names.
If nothing seems to work, try reviewing the first 20 or son URLs, since search engines list
the best matches near the top. If they do not contain what you are looking for, the
information they do contain may help you refine your search.
If this fails, you could resort to a directory and work your way down to the topic of
interest.
BBA IV Sem/CA II/Unit-1
32