Class Notes

Unit 1:
Internet was jumpstarted in the early days of computing history, in 1969 with the U.S. Defense Department's Advanced
Research Projects Agency Network (ARPANET). ARPA-funded researchers developed many of the protocols used for
Internet communication today. This timeline offers a brief history of the Internets evolution:
Internetworking: Concepts, Architecture, and Protocols
There are different kinds of networks for different needs.
The Concept of Universal Service
It can be difficult or impossible to figure out ways for two computers to communicate when they each belong to a
different and incompatible type of network.
However, people naturally want universal service - they want service that allow any pair of computers to
communicate.
Part of the reason why universal service is desirable is because it enables people to be more productive.
Universal Service in a Heterogeneous World
One cannot join incompatible networks simply by connecting media together. Even if the media used are the
same, there will be other incompatibilities such as differing frequencies, voltages, and coding techniques.
Nor is it enough to employ bridge technology to connect incompatible networks. Bridges work by forwarding
frames, and frame formats vary dramatically from network to network.
Internetworking By using a combination of hardware and software, one may form a network of networks - an
internetwork or internet.
Physical Network Connection with Routers
o The basic unit of hardware used to form an internet is a router.
o A router has some characteristics in common with a bridge:
A router contains a CPU and memory - so like a bridge it is a type of special-purpose computer.
A router has connections to two or more networks.
The interfaces used in a router to connect to networks are ordinary network interfaces - they are just like the
interfaces that all other computers use to connect to that type of network.
o Typically a bridge connects two network segments that are exactly the the same type of network.
o Unlike a bridge, a router is typically connected to two or more networks that are NOT the same type of network.
For example, a router might connect:
an Ethernet to a WAN, or
an Ethernet to a WiFi network, or
a WiFi network to an IBM Token Ring network.
Internet Architecture
o The job of a router is to forward packets. To forward a packet, a router receives a packet on one of its
network interfaces and transmits it out over another of its network interfaces.
o A router has to process information about a packet in its CPU to decide on which interface to use to
transmit the packet.
o The more interfaces a router has, the more incoming packets it may be required to process per unit of
time.
o To keep their workload manageable, routers are usually built with a fairly small number of network
interfaces.
o It's good to build an internet with redundant links and routers - so that the network is not likely to be
partitioned if just a handful of routers and/or links fail.
o Internet architects must consider:
need for reliability
need for channel capacity
cost constraints
expected traffic
expected performance characteristics of available router hardware
Achieving Universal Service
o The job of an internet is to allow any pair of computers to communicate.
o Therefore the system of routers has to assure that packets are forwarded all the way from the source to the
destination.
o How can this be done, when the frame formats and addressing techniques on the various networks are
hopelessly incompatible?
o It can be done through the use of protocol software.
A Virtual Network
o Through the use of routers and Internet protocol software the Internet creates a virtual network that
overlays the physical networks it joins together.
o Users and applications programs are able to ignore the physical networks, and just work with
functionality provided by the virtual network.
Protocols for Internetworking :The protocol family utilized to construct the Internet is the TCP/IP protocol suite.
Review of TCP/IP Layering
o The IP layer is layer 3 in the TCP/IP networking model.
o IP specifies Internet packet format and mechanisms for forwarding packets.
o TCP is layer 4. It deals with the messages and procedures that ensure reliable transfer.
Host Computers, Routers, and Protocol Layers
o A host computer (also known simply as a host) is a computer that connects to the Internet and runs
applications.
o The design of the Internet requires that host computers and routers execute TCP/IP protocol software.
Device Router Switch
Directs data in a network. Passes data between
Allow to connect multiple device and port can be
Function home computers, and between computers and
manage, Vlan can create security also can apply
the modem.
Data Link Layer. Network switches operate at Layer
Layer Network Layer (Layer 3 devices)
2 of the OSI model.
Data
Transmission Packet Frame (L2 Switch) Frame & Packet (L3 switch)
form
Transmission At Initial Level Broadcast then Uni-cast &
First broadcast; then unicast & multicast as needed.
Type Multicast
Ports 2/4/8 Switch is multi port Bridge. 24/48 ports
Used in (LAN,
LAN, WAN LAN
MAN, WAN)
Device Type Networking device Active Device (With Software) & Networking device
Switches use content accessible memory CAM table
Store IP address in Routing table and maintain
Table which is typically accessed by ASIC (Application
address at its own.
Specific integrated chips).
Transmission
Full duplex Half/Full duplex
Mode
Broadcast In Router, every port has its own Broadcast Switch has one broadcast domain [unless VLAN
Domain domain. implemented]
A router is a networking device that connects a A network switch is a computer networking device
local network to other local networks. At the that is used to connect many devices together on a
Definition Distribution Layer of the network, routers direct computer network. A switch is considered more
traffic and perform other functions critical to advanced than a hub because a switch will on send
efficient network operation. msg to device that needs or request it
Speed 1-10 Mbps (Wireless); 100 Mbps (Wired) 10/100 Mbps, 1 Gbps
Connecting two or more nodes in the same network
Used for Connecting two or more networks
or different network
Device Category Intelligent Device Intelligent Device
Bandwidth Bandwidth sharing is Dynamic (Enables either There is no sharing port can be 10, 100, 1000 and
sharing static or dynamic bandwidth sharing for 10000 Mbps individual
modular cable interfaces. The default percent-
value is 0. The percent-value range is 1-96.)
Address used for
data Uses IP address Uses MAC address
tramsmission
Routing Decision Take faster routing decisions Take more time for complicated routing decisions
NAT (Network
Address Routers can perform NAT Switches cannot perform NAT
Translation)
In a different network environment (MAN/ In a LAN environment, an L3 switch is faster than a
Faster
WAN), a router is faster than an L3 switch. router (built in switching hardware)
Priority rt range On/Off setting of port VLAN Port
Features Firewall VPN Dynamic hadling of Bandwidth
mirroring
Protocols
When two humans converse, they may have to use the same language but they generally understand each other without
having to adhere to rigid rules of grammar or formal language frameworks. Computers, on the other hand, have to have
everything explicitly defined and structured. If computers wish to communicate with one another, they have to know in
advance exactly how information is to be exchanged and precisely what the format will be. Therefore, standard methods
of transmitting and processing various kinds of information are used and these methods are called "protocols". Protocols
are established by international agreement and ensure that computers everywhere can talk to one another. There are a
variety of protocols for different kinds of information and functions. This article will discuss some of the common
protocols that the average PC user is likely to encounter.
TCP/IP
TCP (Transmission Control Protocol) and IP (Internet Protocol) are two different procedures that are often linked
together. The linking of several protocols is common since the functions of different protocols can be
complementary so that together they carry out some complete task. The combination of several
protocols to carry out a particular task is often called a "stack" because it has layers of operations. In
fact, the term "TCP/IP" is normally used to refer to a whole suite of protocols, each with different
functions. This suite of protocols is what carries out the basic operations of the Web. TCP/IP is also used
on many local area networks. The details of how the Web works are beyond the scope of this article but I
will briefly describe some of the basics of this very important group of protocols. More details can be
found in the references in the last section.
When information is sent over the Internet, it is generally broken up into smaller pieces or "packets". The use of packets
facilitates speedy transmission since different parts of a message can be sent by different routes and then reassembled
at the destination. It is also a safety measure to minimize the chances of losing information in the transmission process.
TCP is the means for creating the packets, putting them back together in the correct order at the end, and checking to
make sure that no packets got lost in transmission. If necessary, TCP will request that a packet be resent.
Internet Protocol (IP) is the method used to route information to the proper address. Every computer on the Internet
has to have its own unique address known as the IP address. Every packet sent will contain an IP address showing where
it is supposed to go. A packet may go through a number of computer routers before arriving at its final destination and
IP controls the process of getting everything to the designated computer. Note that IP does not make physical
connections between computers but relies on TCP for this function. IP is also used in conjunction with other protocols
that create connections.
UDP and ICMP
Another member of the TCP/IP suite is User Datagram Protocol (UDP). (A datagram is almost the same as a packet
except that sometimes a packet will contain more than one datagram.) This protocol is used together with IP when small
amounts of information are involved. It is simpler than TCP and lacks the flow-control and error-recovery functions of
TCP. Thus, it uses fewer system resources.
A different type of protocol is Internet Control Message Protocol (ICMP) . It defines a small number of messages used for
diagnostic and management purposes. It is also used by Ping and Traceroute
Mail Protocols POP3 and SMTP
Email requires its own set of protocols and there are a variety, both for sending and for receiving mail. The most
common protocol for sending mail is Simple Mail Transfer Protocol (SMTP). When configuring email clients, an Internet
address for an SMTP server must be entered. The most common protocol used by PCs for receiving mail is Post Office
Protocol(POP). It is now in version 3 so it is called POP3. Email clients require an address for a POP3 server before they
can read mail. The SMTP and POP3 servers may or may not be the same address. Both SMTP and POP3 use TCP for
managing the transmission and delivery of mail across the Internet.
A more powerful protocol for reading mail is Interactive Mail Access Protocol (IMAP). This protocol allows for the
reading of individual mailboxes at a single account and is more common in business environments. IMAP also uses TCP
to manage the actual transmission of mail
Hypertext Transfer Protocol
Web pages are constructed according to a standard method called Hypertext Markup Language (HTML). An HTML page
is transmitted over the Web in a standard way and format known as Hypertext Transfer Protocol (HTTP). This protocol
uses TCP/IP to manage the Web transmission.
A related protocol is "Hypertext Transfer Protocol over Secure Socket Layer" (HTTPS), first introduced by Netscape. It
provides for the transmission in encrypted form to provide security for sensitive data. A Web page using this protocol
will have https: at the front of its URL.
File Transfer Protocol
File Transfer Protocol (FTP) lives up to its name and provides a method for copying files over a network
from one computer to another. More generally, it provides for some simple file management on
the contents of a remote computer. It is an old protocol and is used less than it was before the
World Wide Web came along. Today, its primary use is uploading files to a Web site. It can also
be used for downloading from the Web but, more often than not, downloading is done via
HTTP. Sites that have a lot of downloading (software sites, for example) will often have an FTP
server to handle the traffic. If FTP is involved, the URL will have ftp: at the front
IP Addresses
Every computer connected to the Internet is identified by a unique four-part string, known as its Internet Protocol (IP)
address. An IP address consists of four numbers (each between 0 and 255) separated by periods. For example, one
machine at MIT has the IP address, e.g., 18.72.0.3.
At MIT, most machines have IP addresses beginning with "18". The "18" signifies the main MIT network, whereas the
later numbers identify the specific machine. (At other sites, the first two parts of the IP address identify the network, while
the last two parts identify the computer within that network.)
While we tend to think of the IP address as four numbers separated by periods, the whole string actually forms a single
32-bit "dotted decimal" number. This is why each part can only go up to 255: each part - or "octet" - is the decimal
representation of an 8-bit binary number.
Host Names and Domain Names
Since IP addresses are rather difficult to remember (and are not particularly descriptive), the Internet also allows you to
specify a computer by a name rather than a number string. For example, the machine at MIT with the IP address 18.72.0.3
can also be referred to as: bitsy.mit.edu.
This whole string is known as the computer's host name. In this string, the first part ("bitsy") is the name of the machine
itself, while everything else ("mit.edu") is the domain name.
The domain name is the name of a network associated with an organization. For sites in the United States, domain names
typically take the form: org-name.org-type
The org-type is usually one of the following:
com indicates a commercial organization (e.g., a company)
edu indicates an educational organization
org indicates a general (often non-commercial) organization
gov indicates a U.S. government agency
mil indicates a U.S. military site
The World Wide Web (abbreviated WWW or the Web) is an information space where documents and
other web resources are identified by Uniform Resource Locators (URLs), interlinked by
hypertext links, and can be accessed via the Internet. English scientist Tim Berners-Lee invented
the World Wide Web in 1989. He wrote the first web browser computer program in 1990 while
employed at CERN in Switzerland. The Web browser was released outside of CERN in 1991, first
to other research institutions starting in January 1991 and to the general public on the Internet
in August 1991.
Architecture and Working of a Web Browser
The browsers main functionality is to fetch the files from the server and to display them on the screen. It basically
displays html files containing images, PDF, videos, flashes, etc in an ordered layout. A browser is a group of structured
codes that performs plenty of tasks to display a webpage on the screen. These codes are separated in to different
components according to their tasks performed. The structure of a browser is shown in the below image.
User Interface It is the space where interaction between users and the browser occurs. Most of the browsers have
common inputs for user interface. Some of them are - an address bar, next and back buttons, buttons for home, refresh and
stop, options to bookmark web pages, etc.
Browser Engine It is the piece of code that communicates the inputs of user interface with the rendering engine. It is
responsible for querying and manipulating the rendering engine according to the inputs from various user interfaces.
Rendering Engine It is the part thoroughly responsible for displaying the requested content on the screen. It first
parses the html tags and then using the styles, it builds a render tree and finally a render layout, which displays the content
on the screen.
Process HTML markup and build the DOM tree.
Process CSS markup and build the CSSOM tree.
Combine the DOM and CSSOM into a render tree.
Run layout on the render tree to compute geometry of each node.
Paint the individual nodes to the screen
Networking The fraction of the code written in the browser, responsible to send various network calls. For example
sending the http requests to the server.
Java Script Interpreter It is the component of the browser written to interpret the java script code presented in a web
page.
UI Backend This draws basic widgets on the browser like combo boxes, windows, etc.
Data Storage It is small database created on the local drive of the computer where the browser is installed. This
database stores various files like cache, cookies, et
Overview
Web server is a computer where the web content is stored. Basically web server is used to host the web sites but there
exists other web servers also such as gaming, storage, FTP, email etc.
Web site is collection of web pages whileweb server is a software that respond to the request for web resources.
Web Server Working
Web server respond to the client request in either of the following two ways:
Sending the file to the client associated with the requested URL.
Generating response by invoking a script and communicating with database
Key Points
When client sends request for a web page, the web server search for the requested page if requested page is found
then it will send it to client with an HTTP response.
If the requested web page is not found, web server will the send an HTTP response:Error 404 Not found.
If client has requested for some other resources then the web server will contact to the application server and data
store to construct the HTTP response.
Architecture
Web Server Architecture follows the following two approaches:
1. Concurrent Approach
2. Single-Process-Event-Driven Approach.
Concurrent Approach
Concurrent approach allows the web server to handle multiple client requests at the same time. It can be achieved by
following methods:
Multi-process
Multi-threaded
Hybrid method.
Multi-processing
In this a single process (parent process) initiates several single-threaded child processes and distribute incoming requests
to these child processes. Each of the child processes are responsible for handling single request.
It is the responsibility of parent process to monitor the load and decide if processes should be killed or forked.
Multi-threaded
Unlike Multi-process, it creates multiple single-threaded process.
Hybrid
It is combination of above two approaches. In this approach multiple process are created and each process initiates
multiple threads. Each of the threads handles one connection. Using multiple threads in single process results in less load
on system resources.
CGI CGI is the abbreviation of Common Gateway Interface. It is a specification for transferring information between a
World Wide Web server and a CGI program. A CGI program is any program designed to accept and return data that
conforms to the CGI specification. The program could be written in any programming language, including C, Perl, Java,
or Visual Basic.
CGI Programs
CGI programs are the most common way for Web servers to interact dynamically with users. Many HTML pages that
contain forms, for example, use a CGI program to process the form's data once it's submitted. Another increasingly
common way to provide dynamic feedback for Web users is to include scripts or programs that run on the user's
machine rather than the Web server. These programs can be Java applets, Java scripts, or ActiveX controls. These
technologies are known collectively as client-side solutions, while the use of CGI is a server-side solution because the
processing occurs on the Web server.
One problem with CGI is that each time a CGI script is executed, a new process is started. For busy websites, this can
slow down the server noticeably. A more efficient solution, but one that it is also more difficult to implement, is to use
the server's API, such as ISAPI or NSAPI. Another increasingly popular solution is to use Java servlets
N tier architecture.
A multitier architecture (often referred to as n-tier architecture) or multilayered architecture is a clientserver
architecture in which presentation, application processing, and data management functions are physically separated.
The most widespread use of multitier architecture is the multi-tierarchitecture.
N-tier application architecture provides a model by which developers can create flexible and reusable applications. By
segregating an application into tiers, developers acquire the option of modifying or adding a specific layer, instead of
reworking the entire application. A multi-tierarchitecture is typically composed of a presentation tier, a domain
logic tier, and a data storage tier.
Common layers
In a logical multilayered architecture for an information system with an object-oriented design, the following four are
the most common:
Presentation layer (a.k.a. UI layer, view layer, presentation tier in multitier architecture)
Application layer (a.k.a. service layer[5][6] or GRASP Controller Layer [7])
Business layer (a.k.a. business logic layer (BLL), domain layer)
Data access layer (a.k.a. persistence layer, logging, networking, and other services which are required to support
a particular business layer)
Web development usage
In the web development field, multi-tieris often used to refer to websites, commonly electronic commerce websites,
which are built using three tiers:
1. A front-end web server serving static content, and potentially some cached dynamic content. In web-based
application, Front End is the content rendered by the browser. The content may be static or generated dynamically.
2. A middle dynamic content processing and generation level application server (e.g., ASP.NET, Ruby on
Rails, Django, Laravel, Spring Framework, CodeIgniter, Symfony, Flask)
3. A back-end database or data store, comprising both data sets and the database management system software
that manages and provides access to the data.
4.
Web services (sometimes called application services) are services (usually including some combination of programming
and data, but possibly including human resources as well) that are made available from a business's Web server for Web
users or other Web-connected programs. Providers of Web services are generally known as application service provider
s. Web services range from such major services as storage management and customer relationship management ( CRM )
down to much more limited services such as the furnishing of a stock quote and the checking of bids for an auction item.
The accelerating creation and availability of these services is a major Web trend.
Users can access some Web services through a peer-to-peer arrangement rather than by going to a central server. Some
services can communicate with other services and this exchange of procedures and data is generally enabled by a class
of software known as middleware . Services previously possible only with the older standardized service known as
Electronic Data Interchange ( EDI ) increasingly are likely to become Web services. Besides the standardization and wide
availability to users and businesses of the Internet itself, Web services are also increasingly enabled by the use of the
Extensible Markup Language ( XML ) as a means of standardizing data formats and exchanging data. XML is the
foundation for the Web Services Description Language ( WSDL ).
As Web services proliferate, concerns include the overall demands on network bandwidth and, for any particular service,
the effect on performance as demands for that service rise. A number of new products have emerged that enable
software developers to create or modify existing applications that can be "published" (made known and potentially
accessible) as Web services
Common Gateway Interface Programs
CGI programs are the most common way for Web servers to interact dynamically with users. Many HTML pages that
contain forms, for example, use a CGI program to process the form's data once it's submitted. Another increasingly
common way to provide dynamic feedback for Web users is to include scripts or programs that run on the user's
machine rather than the Web server. These programs can be Java applets, Java scripts, or ActiveX controls. These
technologies are known collectively as client-side solutions, while the use of CGI is a server-side solution because the
processing occurs on the Web server.
One problem with CGI is that each time a CGI script is executed, a new process is started. For busy websites, this can
slow down the server noticeably. A more efficient solution, but one that it is also more difficult to implement, is to use
the server's API, such as ISAPI or NSAPI. Another increasingly popular solution is to use Java servers.
URL (Uniform Resource Locator), as the name suggests, provides a way to locate a resource on the web, the hypertext
system that operates over the internet. The URL contains the name of the protocol to be used to access the resource
and a resource name. The first part of a URL identifies what protocol to use. The second part identifies the IP address or
domain name where the resource is located.
A URL has two main components:
Protocol identifier: For the URL http://example.com, the protocol identifier is http.
Resource name: For the URL http://example.com, the resource name is example.com.
Note that the protocol identifier and the resource name are separated by a colon and two forward slashes. The protocol
identifier indicates the name of the protocol to be used to fetch the resource. The example uses the Hypertext Transfer
Protocol (HTTP), which is typically used to serve up hypertext documents. HTTP is just one of many different protocols
used to access different types of resources on the net. Other protocols include File Transfer Protocol (FTP), Gopher, File,
and News.
The resource name is the complete address to the resource. The format of the resource name depends entirely on the
protocol used, but for many protocols, including HTTP, the resource name contains one or more of the following
components:
Host Name
The name of the machine on which the resource lives.
Filename
The pathname to the file on the machine.
Port Number
The port number to which to connect (typically optional).
Reference
A reference to a named anchor within a resource that usually identifies a specific location within a file (typically
optional).
For many protocols, the host name and the filename are required, while the port number and reference are optional.
For example, the resource name for an HTTP URL must specify a server on the network (Host Name) and the path to the
document on that machine (Filename); it also can specify a port number and a reference.
HTTP is short for HyperText Transfer Protocol. HTTP is the underlying protocol used by the World Wide Web and this
protocol defines how messages are formatted and transmitted, and what actions Web servers and browsers should take
in response to various commands.
For example, when you enter a URL in your browser, this actually sends an HTTP command to the Web server directing it
to fetch and transmit the requested Web page. The other main standard that controls how the World Wide Web works
is HTML, which covers how Web pages are formatted and displayed.
Basic Features
There are three basic features that make HTTP a simple but powerful protocol:
HTTP is connectionless: The HTTP client, i.e., a browser initiates an HTTP request and after a request is made,
the client disconnects from the server and waits for a response. The server processes the request and re-establishes the
connection with the client to send a response back.
HTTP is media independent: It means, any type of data can be sent by HTTP as long as both the client and the
server know how to handle the data content. It is required for the client as well as the server to specify the content type
using appropriate MIME-type.
HTTP is stateless: As mentioned above, HTTP is connectionless and it is a direct result of HTTP being a stateless
protocol. The server and client are aware of each other only during a current request. Afterwards, both of them forget
about each other. Due to this nature of the protocol, neither the client nor the browser can retain information between
different requests across the web pages.
HTTP/1.0 uses a new connection for each request/response exchange, where as HTTP/1.1 connection may be used for
one or more request/response exchanges.
Request-response model
Whenever a software guy enters in web development world the first thing that he comes to know is, client and server,
client makes an request and server sends a response back to client. So let's first discuss how basically communication
takes place in web world and what exactly the terms client and server means.
Client: A client in basically something (web browser) or someone (user) who requests some resource from server.
Server: A server is a combination of a hardware machines and a number of softwares running on that machine. The one
and only duty of server is to serve resources that are being requested by the client.
Servers itself are capable of serving static resources only. To serve dynamic response servers needs some extra
technologies like servlets running on them.
How servlets helps server in serving dynamic contents, we will come to know about it in later part of this blog.
Now to get client and server in contact and make communication possible we needs some set of rules so called
http(Hyper text transfer protocol). In web there are a number of protocols other than http that does the communication
work done, but almost in 99% of applications the requests being made are http requests.
Http: Http can be assumed as a common interface of interaction that both client and server understands.
Request/Response circle
Web flow starts from a request being made by user's browser, the request is made as http request so that server can
understand it. Based on request the server searches for an appropriate resource and sends it back to client in form of
http response.
Http Request: A http request basically have three major components.
1 - HTTP method, there are 7 methods defined in java servlets but most of the time you will see either a get or post
method. We will get to know about these methods and their usage in later part of this blog.
2 - The requested page URL, the page to access like www.google.com.
3 - Parameters, parameters (as id, name, email.. etc.) are being send as part of request on which the response is being
generated.
Http Response: A http request basically have three major components.
1 - A status code, this code tells the browser whether the request is successful or not.
2 - Content type, it tells the browser about the type of content that response web page contains in it (text, picture,
html...etc).
3 - The content, the important and last information that is the served resource that the user was requested.
HTTPS : This is Hyper text Transfer protocol with added security layer in place in form on TLS/SSL. Servers and clients
communicate with each other exactly same as HTTP but over a secure channel.
The SSL layer serves for two main purpose :
1) It is confirmed after using HTTPS that you are talking to server directly that you are thinking of.
2) It also ensures that only server reads the data you sent over network. No else can read it.
An SSL connection between client and server is establish by handshake which focuses on below things :
1)To make sure that client is talking to right server
2) Both parties have agreed on a 'cipher' which includes which encryption they will use to exchange data.
3) Both parties should agree key for this algorithm
As soon as connection is established, both parties can used agreed algorithm and keys to securly send messages to each
other.
===================================================================================
Unit 2: HTML
HTML is the standard markup language for creating Web pages.
HTML stands for Hyper Text Markup Language
HTML describes the structure of Web pages using markup
HTML elements are the building blocks of HTML pages
HTML elements are represented by tags
HTML tags label pieces of content such as "heading", "paragraph", "table", and so on
Browsers do not display the HTML tags, but use them to render the content of the page
HTML element is defined by a starting tag. If the element contains other content, it ends with a closing tag,
where the element name is preceded by a forward slash as shown below with few tags:
Start Tag Content End Tag
<p> This is paragraph content. </p>
<h1> This is heading content. </h1>
<div> This is division content. </div>
<br />
So here <p>....</p> is an HTML element, <h1>...</h1> is another HTML element. There are some HTML elements
which don't need to be closed, such as <img.../>, <hr /> and <br /> elements. These are known as void elements.
HTML documents consist of a tree of these elements and they specify how HTML documents should be built,
and what kind of content should be placed in what part of an HTML document.
HTML Tag vs. Element
An HTML element is defined by a starting tag. If the element contains other content, it ends with a closing tag.
For example <p> is starting tag of a paragraph and </p> is closing tag of the same paragraph but <p>This is
paragraph</p> is a paragraph element.
Nested HTML Elements
It is very much allowed to keep one HTML element inside another HTML element
XHTML syntax is very similar to HTML syntax and almost all the valid HTML elements are valid in XHTML as well. But
when you write an XHTML document, you need to pay a bit extra attention to make your HTML document compliant to
XHTML.
Here are the important points to remember while writing a new XHTML document or converting existing HTML
document into XHTML document
Write a DOCTYPE declaration at the start of the XHTML document.
Write all XHTML tags and attributes in lower case only.
Close all XHTML tags properly.
Nest all the tags properly.
Quote all the attribute values.
Forbid Attribute minimization.
Replace the name attribute with the id attribute.
Deprecate the language attribute of the script tag.
Here is the detail explanation of the above XHTML rules
DOCTYPE Declaration
All XHTML documents must have a DOCTYPE declaration at the start. There are three types of DOCTYPE declarations,
which are discussed in detail in XHTML Doctypes chapter. Here is an example of using DOCTYPE
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Case Sensitivity
XHTML is case sensitive markup language. All the XHTML tags and attributes need to be written in lower case only.

<A Href="/xhtml/xhtml_tutorial.html">XHTML Tutorial</A>

<a href="/xhtml/xhtml_tutorial.html">XHTML Tutorial</a>
In the example, Href and anchor tag A are not in lower case, so it is incorrect.
Closing the Tags
Each and every XHTML tag should have an equivalent closing tag, even empty elements should also have closing tags.
Here is an example showing valid and invalid ways of using tags
<p>This paragraph is not written according to XHTML syntax.

<img src="/images/xhtml.gif" >
The following syntax shows the correct way of writing above tags in XHTML. Difference is that, here we have closed both
the tags properly.

<p>This paragraph is not written according to XHTML syntax.</p>

<img src="/images/xhtml.gif" />
Attribute Quotes
All the values of XHTML attributes must be quoted. Otherwise, your XHTML document is assumed as an invalid
document. Here is the example showing syntax
<img src="/images/xhtml.gif" width=250 height=50 />
<img src="/images/xhtml.gif" width="250" height="50" />
Attribute Minimization
XHTML does not allow attribute minimization. It means you need to explicitly state the attribute and its value. The
following example shows the difference
<option selected>
<option selected="selected">
Here is a list of the minimized attributes in HTML and the way you need to write them in XHTML
HTML Style XHTML Style
compact compact="compact"
checked checked="checked"
declare declare="declare"
readonly readonly="readonly"
disabled disabled="disabled"
selected selected="selected"
defer defer="defer"
ismap ismap="ismap"
nohref nohref="nohref"
noshade noshade="noshade"
nowrap nowrap="nowrap"
multiple multiple="multiple"
noresize noresize="noresize"
The id Attribute
The id attribute replaces the name attribute. Instead of using name = "name", XHTML prefers to use id = "id". The
following example shows how
<img src="/images/xhtml.gif" name="xhtml_logo" />
<img src="/images/xhtml.gif" id="xhtml_logo" />
The language Attribute
The language attribute of the script tag is deprecated. The following example shows this difference
<script language="JavaScript" type="text/JavaScript">
document.write("Hello XHTML!");
</script>
<script type="text/JavaScript">
document.write("Hello XHTML!");
</script>
Nested Tags
You must nest all the XHTML tags properly. Otherwise your document is assumed as an incorrect XHTML document. The
following example shows the syntax
<b><i> This text is bold and italic</b></i>
<b><i> This text is bold and italic</i></b>
Element Prohibitions
The following elements are not allowed to have any other element inside them. This prohibition applies to all depths of
nesting. Means, it includes all the descending elements.
Element Prohibition
<a> Must not contain other <a> elements.
<pre> Must not contain the <img>, <object>, <big>, <small>, <sub>, or <sup> elements.
Must not contain the <input>, <select>, <textarea>, <label>, <button>, <form>, <fieldset>, <iframe> or
<button>
<isindex> elements.
<label> Must not contain other <label> elements.
<form> Must not contain other <form> elements.
A Minimal XHTML Document
The following example shows you a minimum content of an XHTML 1.0 document
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/TR/xhtml1" xml:lang="en" lang="en">
<head>
<title>Every document must have a title</title>
</head>
<body>
...content goes here...
</body>
</html>
Differences between HTML and XHTML
HTML and XHTML are almost exactly the same. However, XHTML has a stricter set of rules for coding tags. These can be
listed as follows:
1. XHTML tags are case sensitive. For instance, <p> and <P> will be read as two different tags.
2. XHTML tags must be closed with a closing tag or a trailing slash. Examples: <p>Some text</p>, <br />. In HTML,
closing tags and trailing slashes are not required (but they are acceptable).
3. XHTML does not permit overlapping nested tag structures. HTML does.
4. In XHTML attribute values must be enclosed in quotation marks.
XML
XML stands for EXtensible Markup Language
XML is a markup language much like HTML.
XML was designed to describe data.
XML tags are not predefined in XML. You must define your own tags.
XML is self describing.
XML uses a DTD (Document Type Definition) to formally describe the data.
The main difference between XML and HTML

XML is not a replacement for HTML.
XML and HTML were designed with different goals:
XML was designed to describe data and to focus on what data is.
HTML was designed to display data and to focus on how data looks.
HTML is about displaying information, XML is about describing information.
XML elements can be defined as building blocks of an XML. Elements can behave as containers to hold text, elements,
attributes, media objects or all of these.
Each XML document contains one or more elements, the scope of which are either delimited by start and end tags, or
for empty elements, by an empty-element tag.
Syntax
Following is the syntax to write an XML element:
<element-name attribute1 attribute2>
....content
</element-name>
where
element-name is the name of the element. The name its case in the start and end tags must match.
attribute1, attribute2 are attributes of the element separated by white spaces. An attribute defines a property
of the element. It associates a name with a value, which is a string of characters. An attribute is written as:
name = "value"
name is followed by an = sign and a string value inside double(" ") or single(' ') quotes.
Empty Element
An empty element (element with no content) has following syntax:
<name attribute1 attribute2.../>
Example of an XML document using various XML element:
<?xml version="1.0"?>
<contact-info>
<address category="residence">
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
<address/>
</contact-info>
XML Elements Rules
Following rules are required to be followed for XML elements:
An element name can contain any alphanumeric characters. The only punctuation mark allowed in names are
the hyphen (-), under-score (_) and period (.).
Names are case sensitive. For example, Address, address, and ADDRESS are different names.
Start and end tags of an element must be identical.
An element, which is a container, can contain text or elements as seen in the above example.
An XML attribute has following syntax:
<element-name attribute1 attribute2 >
....content..
< /element-name>
where attribute1 and attribute2 has the following form:
name = "value"
value has to be in double (" ") or single (' ') quotes. Here, attribute1 and attribute2 are unique attribute labels.
Attributes are used to add a unique label to an element, place the label in a category, add a Boolean flag, or otherwise
associate it with some string of data. Following example demonstrates the use of attributes:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE garden [
<!ELEMENT garden (plants)*>
<!ELEMENT plants (#PCDATA)>
<!ATTLIST plants category CDATA #REQUIRED>
]>
<garden>
<plants category="flowers" />
<plants category="shrubs">
</plants>
</garden>
Attributes are used to distinguish among elements of the same name. When you do not want to create a new element
for every situation. Hence, use of an attribute can add a little more detail in differentiating two or more similar
elements.
In the above example, we have categorized the plants by including attribute category and assigning different values to
each of the elements. Hence we have two categories of plants, one flowers and other color. Hence we have two plant
elements with different attributes.
You can also observe that we have declared this attribute at the beginning of the XML.
Attribute Types
Following table lists the type of attributes:
Attribute Type Description
It takes any literal string as a value. CDATA is a StringType. CDATA is character data. This means, any
StringType
string of non-markup characters is a legal part of the attribute.
This is more constrained type. The validity constraints noted in the grammar are applied after the
attribute value is normalized. The TokenizedType attributes are given as:
ID : It is used to specify the element as unique.
IDREF : It is used to reference an ID that has been named for another element.
TokenizedType IDREFS : It is used to reference all IDs of an element.
ENTITY : It indicates that the attribute will represent an external entity in the document.
ENTITIES : It indicates that the attribute will represent external entities in the document.
NMTOKEN : It is similar to CDATA with restrictions on what data can be part of the attribute.
NMTOKENS : It is similar to CDATA with restrictions on what data can be part of the attribute.
This has a list of predefined values in its declaration. out of which, it must assign one value. There are
two types of enumerated attribute:
NotationType : It declares that an element will be referenced to a NOTATION declared somewhere
EnumeratedType
else in the XML document.
Enumeration : Enumeration allows you to define a specific list of values that the attribute value must
match.
Element Attribute Rules
Following are the rules that need to be followed for attributes:
An attribute name must not appear more than once in the same start-tag or empty-element tag.
An attribute must be declared in the Document Type Definition (DTD) using an Attribute-List
Attribute values must not contain direct or indirect entity references to external entities.
The replacement text of any entity referred to directly or indirectly in an attribute value must not contain either
less than sign <
Types of Character Entities
There are three types of character entities:
Predefined Character Entities
Numbered Character Entities
Named Character Entities
Predefined Character Entities
They are introduced to avoid the ambiguity while using some symbols. For example, an ambiguity is observed when less
than ( < ) or greater than ( > ) symbol is used with the angle tag(<>). Character entities are basically used to delimit tags
in XML. Following is a list of pre-defined character entities from XML specification. These can be used to express
characters without ambiguity.
Ampersand: &
Single quote: '
Greater than: >
Less than: <
Double quote: "
Numeric Character Entities
The numeric reference is used to refer to a character entity. Numeric reference can either be in decimal or hexadecimal
format. As there are thousands of numeric references available, these are a bit hard to remember. Numeric reference
refers to the character by its number in the Unicode character set.
General syntax for decimal numeric reference is:
&# decimal number ;
General syntax for hexadecimal numeric reference is:
&#x Hexadecimal number ;
The following table lists some predefined character entities with their numeric values:
Entity name Character Decimal reference Hexadecimal reference
quot " " "
amp & & &
apos ' ' '
lt < < <
gt > > >
Cascading Style Sheets, fondly referred to as CSS, is a simple design language intended to simplify the process of making
web pages presentable.
CSS handles the look and feel part of a web page. Using CSS, you can control the color of the text, the style of fonts, the
spacing between paragraphs, how columns are sized and laid out, what background images or colors are used, layout
designs,variations in display for different devices and screen sizes as well as a variety of other effects.
CSS is easy to learn and understand but it provides powerful control over the presentation of an HTML document. Most
commonly, CSS is combined with the markup languages HTML or XHTML.
Advantages of CSS
CSS saves time You can write CSS once and then reuse same sheet in multiple HTML pages. You can define a
style for each HTML element and apply it to as many Web pages as you want.
Pages load faster If you are using CSS, you do not need to write HTML tag attributes every time. Just write one
CSS rule of a tag and apply it to all the occurrences of that tag. So less code means faster download times.
Easy maintenance To make a global change, simply change the style, and all elements in all the web pages will
be updated automatically.
Superior styles to HTML CSS has a much wider array of attributes than HTML, so you can give a far better look
to your HTML page in comparison to HTML attributes.
Multiple Device Compatibility Style sheets allow content to be optimized for more than one type of device. By
using the same HTML document, different versions of a website can be presented for handheld devices such as
PDAs and cell phones or for printing.
Global web standards Now HTML attributes are being deprecated and it is being recommended to use CSS. So
its a good idea to start using CSS in all the HTML pages to make them compatible to future browsers.
Offline Browsing CSS can store web applications locally with the help of an offline catche.Using of this, we can
view offline websites. The cache also ensures faster loading and better overall performance of the website.
Platform Independence The Script offer consistent platform independence and can support latest browsers as
well.
There are three types of CSS styles:
Inline styles:
Inline styles are styles that are written directly in the tag on the document. Inline styles affect only the tag they are
applied to.
<a href="" style="text-decoration: none;">
Embedded styles:
embedded styles are styles that are embedded in the head of the document. Embedded styles affect only the tags
on the page they are embedded in.
<style type="text/css">
p { color: #00f; }
</style>
External styles:
External styles are styles that are written in a separate document and then attached to various Web documents.
External style sheets can affect any document they are attached to.
<link rel="stylesheet" type="text/css" href="styles.css" />
CSS best practices recommends that you use primarily external style sheets for styling Web pages so that you get the
most benefit of the cascade and inheritance.
Document Object Model (DOM) is a cross-platform and language-independent application programming interface that
treats an HTML, XHTML, or XML document as a tree structure wherein each node is an object representing a part of the
document. The objects can be manipulated programmatically and any visible changes occurring as a result may then be
reflected in the display of the document
The history of the Document Object Model is intertwined with the history of the "browser wars" of the late 1990s
between Netscape Navigator and Microsoft Internet Explorer, as well as with that of JavaScript and JScript, the first
scripting languages to be widely implemented in the layout engines of web browsers.
JavaScript was released by Netscape Communications in 1995 within Netscape Navigator 2.0. Netscape's competitor,
Microsoft, released Internet Explorer 3.0 the following year with a port of JavaScript called JScript. JavaScript and JScript
let web developers create web pages with client-side interactivity. The limited facilities for detecting user-generated
events and modifying the HTML document in the first generation of these languages eventually became known as "DOM
Level 0" or "Legacy DOM." No independent standard was developed for DOM Level 0, but it was partly described in the
specification of HTML 4.
Legacy DOM was limited in the kinds of elements that could be accessed. Form, link and image elements could be
referenced with a hierarchical name that began with the root document object. A hierarchical name could make use of
either the names or the sequential index of the traversed elements. For example, a form input element could be
accessed as either document.formName.inputName or document.forms.elements.
The Legacy DOM enabled client-side form validation and the popular "rollover" effect.
In 1997, Netscape and Microsoft released version 4.0 of Netscape Navigator and Internet Explorer respectively, adding
support for Dynamic HTML (DHTML), functionality enabling changes to a loaded HTML document. DHTML required
extensions to the rudimentary document object that was available in the Legacy DOM implementations. Although the
Legacy DOM implementations were largely compatible since JScript was based on JavaScript, the DHTML DOM
extensions were developed in parallel by each browser maker and remained incompatible. These versions of the DOM
became known as the "Intermediate DOM."
After the standardization of ECMAScript, the W3C DOM Working Group began drafting a standard DOM specification.
The completed specification, known as "DOM Level 1", was recommended by W3C in late 1998. By 2005, large parts of
W3C DOM were well-supported by common ECMAScript-enabled browsers, including Microsoft Internet Explorer
version 6 (from 2001), Opera, Safari and Gecko-based browsers (like Mozilla, Firefox, SeaMonkey and Camino).
DOM Levels
DOM Level 1 defines the core elements of the Document Object Model. DOM Level 2 extends those elements and adds
events. DOM Level 3 extends DOM lvl 2 and adds more elements and events.
Each new level of the DOM adds or changes specific sets of features. When browsers are said to be DOM Level X
compliant developers can (hopefully) assume that the browser correctly handles the specified DOM api calls.
DOM Tree
DOM events:
When a user clicks the mouse
When a web page has loaded
When an image has been loaded
When the mouse moves over an element
When an input field is changed
When an HTML form is submitted
When a user strokes a key
=====================================================================================
Unit 3: JavaScript
Introduction
JavaScript is a dynamic language that executes within a browser. JavaScript code is embedded within an HTML page
using the JavaScript tag. The <script> tag is used to embed JavaScript code. JavaScript code can be embedded in
An external file
The header of the page
The body of the page
In this example, JavaScript is embedded within the header. As soon as the page is loaded this code is executed.
<html>
<head>
<title>JavaScript Example</title>
<script language="JavaScript 1.2">

</script>
</head>
<body>The body</body>
</html>
The Document write method displays the text.
Notice that the JavaScript code is enclosed in HTML comment tags:


These are often used to surround JavaScript code. In older browsers JavaScript was not recognized or handled. To avoid
the display of this code in a page, the browser would ignore the contents of the comment. However, in a browser that
supports JavaScript the comments tags are ignored and the code is executed.
Internal JavaScript Code
JavaScript code that is not found in a function is executed as the page containing it is loaded. To illustrate this,
JavaScript code is placed in the head and body section of an HTML page.
<html>
<head>
<script type="text/javascript">
document.write("Execute during page load from the head<br>");
</script>
</head>
<body>
document.write("Execute during page load from the body<br>");
</script>
</body>
</html>
JavaScript code found in a function is not executed until the function is called. If we modify the previous example by
adding a function to return a string, the function is not loaded when the page is loaded.
<html>
<head>
function displayString() {
return "<h1>Main Heading<h1>"
}
document.write("Execute during page load from the head<br>");
</script></head>
<body>
document.write("Execute during page load from the body<br>");
</script>
</body>
</html>
The output will be the same.
Functions
A function consists of the function keyword followed by the name of the function , a set of open and close parentheses
enclosing an optional parameter list and a body enclosed in a set of curly braces.
function functionName(parameterList) {
// body
}
A function uses the return keyword to return a value from a function.
<html>
<head>
function getHeader() {
return "<h1>Main Heading</h1>"
}
</script>
</head>
<body>
document.write(getHeader());
</script>
</body>
</html>
Parameters are separated by commas in the function declaration.
<html>
<head>
function multiply(num1, num2) {
return num1*num2;
}
</script>
</head>
<body>
document.write(multiply(2,4));
</script>
</body>
</html>
External JavaScript Code

It is advantageous to group common functions in an external JavaScript file. This permits the reuse of the functions in
the file in multiple HTML pages.
JavaScript functions are stored in a file using the .js extension. If we placed the following functions in a file named
scripts.js we can reference and subsequently use the functions from an HTML page.
// functions.js
function getHeader() {
return "<h1>Main Heading</h1>"
}
function multiply(num1, num2) {
return num1*num2;
}
Notice that the C++ style comment can be used in JavaScript. Also notice that the <script> tag is not and should not be
used in a JavaScript file.
In the HTML file, the <script> tag can also be used to indicate the location of a JavaScript file. The src attribute is
assigned the path and filename of the file.
<html>
<head>
<script type="text/javascript" src="functions.js">
</script>
</head>
<body>
document.write(multiply(2,4));
</script></body>
</html>
<script> Attributes
There are two attributes of the <script> tag that are of immediate interest:
type The value assigned to this attribute specifies the scripting language src The location of an external scripting file
The src attribute specifies that the code is actually found in a file which should be loaded and then executed. The .js
extension is normally used for JavaScript code files. The following example illustrates the use of these attributes.
<html>
<body>
<script type="text/javascript" src="corefunctions.js">
</script>
</body>
</html>
JavaScript Language Elements
It is useful to discuss JavaScript in terms of language elements including: Variables
Operators
Expressions Statements Objects
Functions and methods
Variables
Variables are used to hold data. A JavaScript identifier: Starts with a letter or underscore, and
Is followed by letters, underscore or digits
JavaScript is a case-sensitive language
Scope
The scope of an identifier is either
Global An identifier that is accessible anywhere on the page
Local Is accessible only within the function it is declared within
A global variable is typically declared simply by assigning a value to it.
globalVariable = 100;
A local variable is declared within a function using the var keyword.
function someFunction() { var counter = 0; globalVariable = 100;
}
The identifier, counter, is local to the function and can only be used in that function. However, the identifier,
globalVariable, is not preceded by the var keyword and is thus a global variable that can be used anywhere on the page,
inside or outside of the function.
Data Types
There are six data types in JavaScript :
Nubers Integer or floating point numbers

Booleans Either true/false or a number (0 being false) can be used for boolean values
Strings Sequence of characters enclosed in a set of single or double quotes
Objects Entities that typically represents elements of a HTML page
Null No value assigned which is different from a 0
Undefined Is a special value assigned to an identifier after it has been declared but before a value has
been assigned to it
JavaScript is a dynamically typed language. The data type of the identifier is not assigned when the identifier is declared.
When a value is assigned to the identifier the identifier takes on that type. The data type of the variable is not important
until an operator is applied to the variable. The behavior of the operator is dependent of the data type being acted
upon.
For example:
var name = Sally
name = 34
The string, Sally, is first assigned to the variable. Next, the integer 34 is assigned to the variable. Both are legal but usage
of the identifier is inconsistent. It is better if we are consistent when assigning a data type to a variable. This leads to less
confusing code.Literals
Literals are simple constants such as:
34
3.14159
frog beaks
/nTitle/n
true
For string, escape sequence can be used to embed special values. An escape sequence consists of the back slash
character followed by a character that has special meaning. Escape sequences recognized by JavaScript include:
Character Meaning
\b backspace
\f form feed
\n new line
\r carriage return
\t tab
\\ backslash character
\" double quote
\ Single quote
\ddd Octal number
\xdd Tow digit hexadecimal number
\xdddd Four digit hexadecimal number
Operators
The JavaScript operators include:
Precedence Operator Associability Meaning
member Left-to-right .
1 []
new Right-to-left new
2 function call Left-to-right ()
++ n/a Increment by 1
3
-- Decrement by 1
! Right-to-left logical not
~ bitwise not
+ unary plus
4 - unary minus
typeof type of
void void
delete delete
* Left-to-right Multiplication
5 / Division
% Modulo division
+ Left-to-right addition
6
- subtraction
<< Left-to-right shift left
7 >> shift right
>>> arithmetic shift right
> Left-to-right Greater than
>= Greater than or equal
8
< Less than
<= Less than or equal
Precedence Operator Associability Meaning

9 == Left-to-right equality
!= not equal
=== strict equality
!= = strict inequality
10 & Left-to-right bitwise and
11 ^ Left-to-right bitwise xor
12 | Left-to-right bitwise or
13 && Left-to-right logical and
14 || Left-to-right logical or
15 (condition)?value1:value Right-to-left tertiary operator
16 2
= Right-to-left assignment
+=
-=
*=
/=
%=
<<=
>>=
>>>=
&=
^=
|=
17 , Left-to-right comma operator
Arrays
Arrays are allocated using the new keyword.
names = new Array(10);
numbers = new Array(5);
Array indexes start at 0 and extend to the size of the array minus 1. To assign a value to an element of an array open and
close brackets are used.
names[0] = "Rabbit"; names[1] = "Happy"; names[9] = "Dover";
The size of an array can be increased dynamically by assigning a value to an element pass the end of the array. Array can
be created that initially has no elements at all. In addition, they are not of a fixed size but can grow dynamically.
pictures = new Array();
pictures[35] = "Mona Lisa";
The array, pictures, initially has no elements. After "Mona Lisa" has been assigned the array has 36 elements. The
unassigned elements are set to Undefined.
The length property of arrays returns the number of elements in the array.
<html>
<head>
<script language="JavaScript1.2">

</script>
</head>
<body>
</body>
</html>
Converting Between Data Types
There are a number of techniques for converting between data types. To convert from a string several parse and other
functions are available.
parseFloat Converts a string to a float parseInt Converts a string to an integer Number Converts a
string to a number
The last example below uses an arithmetic expression to implicitly convert the string to a number.
<html>
<head>
<title>JavaScript Data Conversion</title>

</script>
</head>
<body>
</body>
</html>
A number can be converted to a string or Boolean using the String and Boolean functions.
<html>
<head>
<title> JavaScript Data Conversion </title>
<script language="JavaScript 1.2"> document.write("<br> String - " + String(2.34)); document.write("<br> Boolean - " +
Boolean(2.34));
</script>
</head>
<body>
</body>
</html>
Regular Expressions
A regular expression is a way of performing pattern matching. A pattern is defined and then applied to a target string.
The form of a regular expression and how they are applied to a target string varies somewhat between languages.
In JavaScript, a regular expression is defined using a series of characters that define the pattern enclosed in a pair of
forward slashes. For example to match white spaces the \s is used.
re = /\s/g;
The \s means that all white spaces are to be matched and the g means that this needs to be applied to the entire target
string. The split function can be used to illustrate this pattern. The split function is executed against a target string and
will break the target up into individual string based on the split functions regular expression argument. The split
function returns an array of strings.
<html>
<head>
re=/\s/g;
target="Test of the split function";
result = target.split(re);
document.write("Length: " + result.length + "<br>");
for(i=0;i<result.length;i++) {
document.write(result[i]+"<br>");
}
</script>
</head>
<body>
</body>
</html>
There are several character sequences that have special meaning in a regular expression. The tutorial found at
http://www.zytrax.com/tech/web/regex.htm provides an overview of regular expressions. Here we will look at only a
few.
The \ is an escape sequence character which means do not treat the following character as a literal. Consider the
following example:
re=/s/g;
target="Test of the split function";
The split function split the target based on the presence of the letter s. The \s in the previous example treated the s as a
special character which represented white spaces. Other escape sequences include:
Escape Sequence Meaning

\d Any digit in the range 0-9
\s White space
\w Any character in the range 0-9, A-Z and a-z
\b Match any character at the beginning of a word
These escape sequences are case sensitive. An upper case letter for these escape sequences generally means NOT. That
is for \D match any character not in the range 0-9.
Metacharacters also convey special meaning in a regular expression.
Meta Character Meaning

[] Match any character within the brackets
- Is used within brackets to indicate a range [a-d]
^ When used within braces it means negation
^ When used outside of a set of brackets it means to match only at the beginning of a
target ^First
$ Means to only match at the end of a target [word$]

. Match any character at that position [ton.]
Using the regular expression:
re=/[ ]/;
target="Test of the split function;
Results in the same output for /\s/ for this example.
The brackets and the dash is illustrated for a SSN.
re=/[-]/;
target="254-96-9163";
result=target.split(re);
Regular Expression Functions
There are other JavaScript functions that use regular expressions other than the split function including:
test Will return true/false depending if a match occurs match Returns a match if found
search Returns the index of the first match replace Replaces matches with a given string
The test function will return a true or a false.
rexp = /at/
if(rexp.test("catalog")) {
document.write("found!<br>");
} else {
document.write("not found!<br>");
}
<html>
<head>
<script language="JavaScript 1.2"> rexp = /at/ document.write("catalog".match(rexp));
</script>
</head>
<body>
</body>
</html>
rexp = /at/
document.write("catalog".search(rexp));
Math Object
The JavaScript Math object provides several properties and methods that can be useful.
Property Description
E Euler's number (~ 2.718)
LN2 the natural logarithm of 2
LN10 the natural logarithm of 10
LOG2E the base-2 logarithm of E
LOG10E the base-10 logarithm of E
PI PI
SQRT1_2 the square root of 1/2
SQRT2 the square root of 2
Method Description
abs(x) Returns the absolute value of x
acos(x) Returns the arccosine of x (radians)
asin(x) Returns the arcsine of x, in (radians)
atan(x) Returns the arctangent of x as a value
atan2(y,x) Returns the arctangent of the quotient of its arguments
ceil(x) Returns x, rounded upwards to the nearest integer
cos(x) Returns the cosine of x (radians)
exp(x) Returns the value of Ex
floor(x) Returns x, rounded downwards to the nearest integer
log(x) Returns the natural logarithm (base E) of x
max(x,y,z,...,n) Returns the number with the highest value
min(x,y,z,...,n) Returns the number with the lowest value
pow(x,y) Returns the value of x to the power of y
random() Returns a random number between 0 and 1
round(x) Rounds x to the nearest integer
sin(x) Returns the sine of x (radians)
sqrt(x) Returns the square root of x
tan(x) Returns the tangent of an angle
For example, to compute the area of a circle uses the function:
function areaOfACircle(radius) {
return Math.PI*radius*radius;
}
JavaScript Objects
There exist a number of predefined objects associated with the web browser and the HTML document loaded. Each of
these objects has certain properties associated with them.
Document Input Password
Events Input Radio
Elements Input Reset
Anchor Input Submit
Area Input Text
Base Link
Body Meta
Button Object
Form Option
Frame/IFrame Select
Frameset Style
Image Table
Input Button Table Cell
Input Checkbox Table Row
Input File Text area
Input Hidden
An object frequently consists of sub elements which are separated by periods.
document.myform.text1.value
Objects also can have methods which are distinguished from properties by the use of the open and close parentheses.
Here the values associated with the first form are reset.
document.forms[0].reset();
Window
The window object can be used to create new windows and dialog boxes and includes these method:
Open Opens a new window
Close Closes the window
alert Displays an alert message box confirm Displays a confirms dialog box prompt Displays a
prompt dialog box
It also possesses several properties including:
document Returns the Document object
innerHeight The height of the content area of the window innerWidth The width of the content area
of the window outerHeight The height or the window including toolbars innerWidth The width of the
window
Alert Message Box

The alert message box displays a simple message: alert('An Alert Message');
Confirm Dialog Box

The confirm dialog box displays a confirm type message and then either returns a true or false value depending on which
button is pressed.
var result = confirm("Continue?");
document.write(result);
If Cancel is selected, false is returned.
Prompt Dialog Box

The prompt dialog box provides a way of getting input from the user. The prompt function has two arguments. The first
is the prompt message and the second is a default value if any.
var result = prompt("Name:","");
document.write(result);
The value returned is the value entered by the user.
Document
The Document object provides access to all of the HTML elements of the current page. Useful properties include:
Cookie Will return name/value pairs of the cookies used by the document domain Returns the domain name of the
server
title Returns or set the title
URL Returns the URL of the document.
In addition, it consists of a series of array that hold the contents of the page. These objects can be accessed and
modified. For example, the forms array contains a list of all of the forms that make up a page. Here the first form is
selected. The value of the third element of the form is returned.
Document.forms[0].elements[2].value
URL Property
The URL property is easy to use. document.write(document.URL);
Frame
The Frame object refers to a frame of the web page. The Frames array is a list of the frames that make up a web page.
Properties of a frame include:
frames An array listing the frames that make up the page. Indexes start at 0 length The number of elements in the
frames array
self Designates the current frame name The name of the frame
parent The parent frame of the current frame
Methods of the frame object that are of interest include:
blur Removes the focus from the frame focus Gives the frame focus
setInterval - clearInterval setTimeout clearTimeout
JavaScript Events
Many elements of DOM support events. These events are normally the result of some user actions.
Event Meaning
onload Occurs when a window or frame has loaded
onunload Occurs when a document is removed from a window or frame
onclick The mouse is clicked on an element
ondblclick The double click event
onmousedown Mouse down event
onmouseup Mouse up event
onmouseover Mouse moves onto an element
onmousemove Mouse moves over an element
onmouseout Mouse leaves an element
onfocus Element receives focus
onblur Element loses focus
onkeypress Key press event
onkeydown Key is pressed down
onkeyup Key is released
onsubmit Submit button is pressed
onreset Form reset event occurs
onselect Some text in an element is selected
onchange Element loses focus and its value changes
onClick Example
<html>
<head>
<title>JavaScript onClick Example</title>
<script language="JavaScript">

</script>
</head>
<body>
<form action="SampleServlet" method="POST">
First Number: <input type="text" name="num1" size="20"><br> Second Number: <input type="text" name="num2"
size="20">
<br><br>
<input type="submit" onclick="popup()"value="Add">
</form>
</body>
</html>
Animation
JavaScript does not have a function such as Javas sleep method that pauses a task for a specified period of time.
However, JavaScript has two functions that can be used to delay the execution of a function.
setTimeout Will execute a function a specific number of milliseconds in the future

setInterval Will execute a function every milliseconds
Both functions take on two arguments:
Function The first argument identifies the function to execute
Time The number of milliseconds
setTimeout(someFunction,500); // The function will be executed 500 milliseconds
// in the future
setInterval(someFunction,500); // The function will be executed 500 every milliseconds
The use of the setTimeout is illustrated here by moving a <div> tag across the screen. The int function setups the
animation by retrieving a reference to the tag and calling the move function. The function move modifies the position of
the tag and recursively schedules itself for future invocation.
function move() {
square.style.left = parseInt(square.style.left)+1+'px';
setTimeout(move,20);
}
function init() {
square = document.getElementById('Square');
square.style.left = '0px';
move();
}
The complete page follows:
<html>
<head>
<title>JavaScript Animation</title>
var square = null;
function move() {
setTimeout(move,20);
}
function init() {
move();
}
window.onload = init;
</script>
</head>
<body>
<br>
<div id="Square" style="position:absolute;
left:0px; top:8em; width:5em;
line-height:3em; background:#99ccff; border:1px solid #003366; white-space:nowrap; padding:0.5em;"
> Moving
</div>
</body>
</html>
The same effect can be created using the setInterval function.
function move() {
}
function init() {
setInterval(move,20);
}
Unit 4:
Evaluation of web applications
The history of web application development is quite newsworthy and uncommon. Developers had to find the most
radical and intensive solutions to the existing problems. It was considerable to make web apps work on different
operating systems fluently. The earliest computing models were inconvenient. Every app had its precompiled client
program and it had to be separately installed on every users PC. Furthermore, the components of client and server were
tightly bound to the definite operating system and computer architecture. As a result, it was expensive to port apps to
other systems. If you recall the Web of its earliest days, you will say that the client received a web page as a static
document. It was difficult to have interactive experience when you worked with such a page. When you introduced any
changes to the web page, you required time to refresh this page inasmuch as conducted a round trip back to its server.
The year 1995 is a crucial year in the era of the Internet. Netscape Communications presented JavaScript, a client-side
scripting language that enables programmers to improve the user interface with the dynamic elements. JavaScript made
the Internet faster and more productive because the data was no longer sent to the server to generate the whole web
page. The embedded scripts fulfill various tasks on the specific downloaded page right on the spot. JavaScript is one of
the three most notable technologies (with HTML and CSS) of content production for WWW. It has the application
programming interface that enables experts to work with texts, dates and various regular expressions. In fact, it does
not possess input/output that makes the machine communicate with the outside world.
In 1996, Macromedia Flash was introduced. It was also a revolutionary innovation that made the Web brighter and
interactive. This vector animation player enabled programmers to enrich web pages with animation. This multimedia
software platform works with animation, different types of browser games, vector graphics and Internet and mobile
applications. It was a solid progress when Adobe Flash included the streaming of audio and video in its animation. This
platform makes the user interact with the machine with the help of a mouse, microphone and a keyboard. Moreover,
any program interactions on the client side no longer require communication with the server. Since 1996, the growing
popularity of various interactive online video games has been noticed due to the revolutionary technology provided by
Macromedia Flash. When you recall the Internet before 2000, the majority of websites used embedded interactive
multimedia content on their pages. Very soon, the popularity of Flash declined. WebPages gained their regular look.
The users work was no longer interrupted by the odd and unexpected ads and streaming videos in as much as they
slowed the work of the website and consumed the additional traffic. Nevertheless, there are still cheap websites that
utilize Flash on their WebPages. Nowadays, Adobe Flash is mostly used for the creation of various video games and
interactive applications for smart phones and tablets.
In 1999, the concept of web application appeared in Java language. Later on, in 2005, Ajax was introduced by Jesse
James Garrett in his article Ajax: A New Approach to Web Application. This complex of web development techniques
enabled programmer to compose asynchronous web apps. The principle of its work is very simple and revolutionary at
the same time. It made it possible for the user to work in the Web faster and better. Web apps are able to send data to
the server and retrieve it from it without interfering with the work on the particular page. It does not have to download
the whole page. Ajax was first created for Internet Explorer but very soon, such browsers as Opera, Mozilla and Safari
adopted it too. Google has been using this technique intensively in Gmail and Google Maps since 2005.
The latest version of HTML, HTML5, saw the world in 2014. HTML5 serves to present content into WWW and arrange it
into the logical structures. This language appeared as the improvement of the existing HTML standard. Its role is to
support the brand new type of multimedia that is constantly developing now. If we speak about animation, we should
say that HTML5 is not an autonomic technology. It does not supply WebPages with animation or various streaming
videos. If you plan to add animation, you should use HTML5 with JavaScript. This language can be readable by humans
and computers. It enables programmers to create web applications that are truly independent from web browsers and
platforms. What about the popularity of HTML5? According to the survey, at least 34% of the most popular websites
have used HTML5. Therefore, the importance of this technology cannot be overestimated.
The history of web application development is quite complicated. There are many technologies (Flash, Java, Silverlight,
etc.) that make the work in the Internet as easy as possible. One can listen to audio, watch videos and draw on the
screen with the help of the simple click of a mouse. The interactivity of the Web has become enormous and it will be
even more effective and varied in future. Ajax is one of the best examples of the set of technologies that improve the
level of interactivity between the user and the machine. Without doubt, we will be the witnesses of the rapid
improvement of the Internet technologies, web applications in particular.
History of web application
Earlier in clientserver computing, each application had its own client program and it worked as a user interface and
need to be installed on each user's personal computer. Most web applications use HTML/XHTML that is mostly
supported by all the browsers and web pages are displayed to the client as static documents. A web page can merely
displays static content and it also lets the user navigate through the content, but a web application provides a more
interactive experience.
Any computer running Servlets or JSP needs to have a container. A container is nothing but a piece of software
responsible for loading, executing and unloading the Servlets and JSP. While servlets can be used to extend the
functionality of any Java- enabled server. They are mostly used to extend web servers, and are efficient replacement
for CGI scripts. CGI was one of the earliest and most prominent server side dynamic content solutions, so before going
forward it is very important to know the difference between CGI and the Servlets.
Common Gateway Interface (CGI)
The Common Gateway Interface, which is normally referred as CGI, was one of the practical technique developed for
creating dynamic content. By using the CGI, a web server passes requests to an external program and after executing the
program the content is sent to the client as the output. In CGI when a server receives a request it creates a new process
to run the CGI program, so creating a process for each request requires significant server resources and time, which
limits the number of requests that can be processed concurrently. CGI applications are platform dependent. There is no
doubt that CGI played a major role in the explosion of the Internet but its performance, scalability issues make it less
than optimal solutions.
Java Servlets
Java Servlet is a generic server extension that means a java class can be loaded dynamically to expand the functionality
of a server. Servlets are used with web servers and run inside a Java Virtual Machine (JVM) on the server so these are
safe and portable. Unlike applets they do not require support for java in the web browser. Unlike CGI, servlets don't use
multiple processes to handle separate request. Servets can be handled by separate threads within the same process.
Servlets are also portable and platform independent.
Three Basic Types of Web Documents
Static: A static document resides in a file on a web server.
The server transfers the same file in response to every client request for the URL of the document.
Dynamic: Using a program, the server creates a new version of the document in response to each client request for the
document's URL. The document can be different for each client request. Some examples of dynamic website features can
include; content management system, e-commerce system, bulletin / discussion boards, intranet or extranet facilities, ability for clients
or users to upload documents, ability for administrators or users to create content or add information to a site.
Active: In response to the request from the client the server sends a program to the client. The client runs the program
to display and interact with the document. The program can continuously update the display. Examples being, forum
discussions, online shopping and ordering, online documentation and content sharing. Active websites generally provide some sort of
purpose or theme in which the users can use to interact with and or share with other users of the site.
Advantages and Disadvantages of Each Document Type
Static: Advantages: simple, reliable, efficient
Disadvantages: inflexible - it can be inconvenient and costly to change static documents.
Dynamic: advantages: provides current information
Disadvantages: document cannot change after reaching the client;
Creators must have knowledge of programming;
Greater demand is placed on servers;
It tends to take longer for the server to execute the program and transmit the document to the client;
Active: advantages: can update the browser user's screen continuously
Disadvantages: cost of creating, testing, and running;
Security: documents and export information;
Requires more sophisticated browser and more powerful computer;
Care must be taken that the programs are portable.
Multi-Tier Application
A multi-tier application is a specific type of n-tier architecture. In the case of multi-tier architecture, the tiers are as
follows:
Presentation tier (also known as the user interface or the client application)
Business logic tier (also known as the application server)
Data storage tier (also known as the database server)
N-tier denotes a software engineering concept used for the design and implementation of software systems using
client/server architecture divided into multiple tiers. This decouples design and implementation complexity, thus
allowing for the scalability of the deployed system.
In a three-tier application, the user interaction is managed by the presentation tier, which provides an easy-to-operate
front end. The business rules are managed by the business tier, which controls and operates the entire application
framework. The underlying data is stored and served by the data storage tier, also known as data persistence.
The three tiers are loosely coupled to each other, with predetermined and stable interfaces. This decoupling allows for
significant changes to occur within the design, implementation and scale of each tier, without impacting the other tiers.
The business rules are removed from the client and are executed in the application server, also known as the middle
tier. The application server ensures that the business rules are processed correctly. It also serves as an intermediary
between the client application and database server.
The advantage of a three-tier application over a two-tier application is the added modularity. This allows for the
replacement of any tier without affecting the other tiers and the separation of business-related functions from
database-related functions. Finally, a three-tier application significantly increases a system's load balancing, scalability
for performance and maintainability.
Introduction To Apache Web Server
Apache is a public domain Web server developed by a loosely knit group of programmers. Public domain refers to any
program that is not copyrighted. Public-domain software is free and can be used without restrictions. The term public-
domain software is often used incorrectly to include freeware, free software that is nevertheless copyrighted. The first
version of Apache, based on the NCSA httpd Web server, was developed in 1995. Because it was developed from
existing NCSA code plus various patches, it was called a patchy server - hence the name Apache Server. It is used to host
more than 50% of all Web sites in the world.
Core development of the Apache Web server is performed by a group of about 20 volunteer programmers, called the
Apache Group. However, because the source code is freely available, anyone can adapt the server for specific needs,
and there is a large public library of Apache add-ons. Add ons' refers to a product designed to complement another
product.
Apache has been shown to be substantially faster, more stable, and more feature-full than many other web servers.
Apache is run on over 25 million Internet servers (as of December 2006). It has been tested thoroughly by both
developers and users. The Apache Group maintains rigorous standards before releasing new versions of their server, and
the server runs without a hitch on over one half of all WWW servers available on the Internet. When bugs do show up,
we release patches and new versions as soon as they are available.
Features:
DBM Databases for Authentication
It allows you to easily set up password-protected pages with enormous numbers of authorized users, without bogging
down the server.
Customized Responses to Errors and Problems
Allows setting up files, or even CGI scripts, which are returned by the server in response to errors and problems, e.g.
setup a script to intercept 500 Server Errors and perform on-the-fly diagnostics for both users.
Support for CGI Scripting
Allows scripting of web applications in PHP, Perl, Python and many more languages.
Multiple Directory Index Directives
Allows saying Directory Index index.html index.cgi, which instructs the server to either send back index.html or run
index.cgi when a directory URL is requested, whichever it finds in the directory. Unlimited flexible URL rewriting and
aliasing. Apache has no fixed limit on the numbers of Aliases and Redirects, which may be declared in the config files. In
addition, a powerful rewriting engine can be used to solve most URL manipulation problems.
Content Negotiation
The ability to automatically serve clients of varying sophistication and HTML level compliance, with documents which
offer the best representation of information that the client is capable of accepting.
Virtual Hosts
A much requested feature, sometimes known as multi-homed servers. This allows the server to distinguish between
requests made to different IP addresses or names (mapped to the same machine). Apache also offers dynamically
configurable mass-virtual hosting.
Configurable Reliable Piped Logs
One can configure Apache to generate logs in the format that you want. In addition, on most UNIX architectures, Apache
can send log files to a pipe, allowing for log rotation, hit filtering, real-time splitting of multiple vhosts into separate logs,
and asynchronous DNS resolving on the fly.
Security
Security is fundamentally about protecting assets. Assets may be tangible items, such as a Web page or your customer
database or they may be less tangible, such as your company's reputation.
Security is a path, not a destination. As you analyze your infrastructure and applications, you identify potential threats
and understand that each threat presents a degree of risk. Security is about risk management and implementing
effective countermeasures.
The Foundations of Security
Security relies on the following elements:
Authentication
Authentication addresses the question: who are you? It is the process of uniquely identifying the clients of your
applications and services. These might be end users, other services, processes, or computers. In security parlance,
authenticated clients are referred to as principals.
Authorization
Authorization addresses the question: what can you do? It is the process that governs the resources and operations that
the authenticated client is permitted to access. Resources include files, databases, tables, rows, and so on, together with
system-level resources such as registry keys and configuration data. Operations include performing transactions such as
purchasing a product, transferring money from one account to another, or increasing a customer's credit rating.
Auditing
Effective auditing and logging is the key to non-repudiation. Non-repudiation guarantees that a user cannot deny
performing an operation or initiating a transaction. For example, in an e-commerce system, non-repudiation
mechanisms are required to make sure that a consumer cannot deny ordering 100 copies of a particular book.
Most common security Issues
SQL INJECTIONS
SQL injection is a type of web application security vulnerability in which an attacker attempts to use application code to
access or corrupt database content. If successful, this allows the attacker to create, read, update, alter, or delete data
stored in the back-end database. SQL injection is one of the most prevalent types of web application security
vulnerabilities.
CROSS SITE SCRIPTING (XSS)
Cross-site scripting (XSS) targets an application's users by injecting code, usually a client-side script such as JavaScript,
into a web application's output. The concept of XSS is to manipulate client-side scripts of a web application to execute in
the manner desired by the attacker. XSS allows attackers to execute scripts in the victim's browser which can hijack user
sessions, deface websites, or redirect the user to malicious sites.
BROKEN AUTHENTICATION & SESSION MANAGEMENT
Broken authentication and session management encompass several security issues, all of them having to do with
maintaining the identity of a user. If authentication credentials and session identifiers are not protected at all times an
attacker can hijack an active session and assume the identity of a user.
INSECURE DIRECT OBJECT REFERENCES
Insecure direct object reference is when a web application exposes a reference to an internal implementation object.
Internal implementation objects include files, database records, directories, and database keys. When an application
exposes a reference to one of these objects in a URL hackers can manipulate it to gain access to a user's personal data.
SECURITY MISCONFIGURATION
Security misconfiguration encompasses several types of vulnerabilities all centered on a lack of maintenance or a lack of
attention to the web application configuration. A secure configuration must be defined and deployed for the application,
frameworks, application server, web server, database server, and platform. Security misconfiguration gives hackers
access to private data or features and can result in a complete system compromise.
CROSS-SITE REQUEST FORGERY (CSRF)
Cross-Site Request Forgery (CSRF) is a malicious attack where a user is tricked into performing an action he or she didn't
intend to do. A third-party website will send a request to a web application that a user is already authenticated against
(e.g. their bank). The attacker can then access functionality via the victim's already authenticated browser. Targets
include web applications like social media, in browser email clients, online banking, and web interfaces for network
devices.
Proxy Sever: A proxy server is a dedicated computer or a software system running on a computer that acts as an
intermediary between an endpoint device, such as a computer, and another server from which a user or client is
requesting a service. The proxy server may exist in the same machine as a firewall server or it may be on a separate
server, which forwards requests through the firewall. An advantage of a proxy server is that its cache can serve all users.
If one or more Internet sites are frequently requested, these are likely to be in the proxy's cache, which will improve
user response time
A firewall is a network security system, either hardware- or software-based, that uses rules to control incoming and
outgoing network traffic.
Firewall: A firewall acts as a barrier between a trusted network and an untrusted network. A firewall controls access to
the resources of a network through a positive control model. This means that the only traffic allowed onto the network
is defined in the firewall policy; all other traffic is denied.
Types of Firewalls
There are two types of firewalls.
1. Filtering Firewalls - that block selected network packets.
2. Proxy Servers (sometimes called firewalls) - that make network connections for you.
Packet Filtering Firewalls
Packet Filtering is the type of firewall built into the Linux kernel.
A filtering firewall works at the network level. Data is only allowed to leave the system if the firewall rules allow it. As
packets arrive they are filtered by their type, source address, destination address, and port information contained in
each packet.
Many network routers have the ability to perform some firewall services. Filtering firewalls can be thought of as a type
of router. Because of this you need a deep understanding of IP packet structure to work with one.
Because very little data is analyzed and logged, filtering firewalls take less CPU and create less latency in your network.
Filtering firewalls do not provide for password controls. User can not identify themselves. The only identity a user has is
the IP number assigned to their workstation. This can be a problem if you are going to use DHCP (Dynamic IP
assignments). This is because rules are based on IP numbers you will have to adjust the rules as new IP numbers are
assigned. I don't know how to automate this process.
Filtering firewalls are more transparent to the user. The user does not have to setup rules in their applications to use the
Internet. With most proxy servers this is not true.
Proxy Servers
Proxies are mostly used to control, or monitor, outbound traffic. Some application proxies cache the requested data.
This lowers bandwidth requirements and decreases the access the same data for the next user. It also gives
unquestionable evidence of what was transferred.
There are two types of proxy servers.
1. Application Proxies - that do the work for you.
2. SOCKS Proxies - that cross wire ports.
Application Proxy
The best example is a person telneting to another computer and then telneting from there to the outside world. With a
application proxy server the process is automated. As you telnet to the outside world the client send you to the proxy
first. The proxy then connects to the server you requested (the outside world) and returns the data to you.
Because proxy servers are handling all the communications, they can log everything they (you) do. For HTTP (web)
proxies this includes very URL they you see. For FTP proxies this includes every file you download. They can even filter
out "inappropriate" words from the sites you visit or scan for viruses.
Application proxy servers can authenticate users. Before a connection to the outside is made, the server can ask the user
to login first. To a web user this would make every site look like it required a login.
SOCKS Proxy
A SOCKS server is a lot like an old switch board. It simply cross wires your connection through the system to another
outside connection.
Most SOCKS server only work with TCP type connections. And like filtering firewalls they don't provide for user
authentication. They can however record where each user connected to.
Middleware: "The software layer that lies between the operating system and the applications on each side of a
distributed computing system in a network.
Types of middleware:
Hurwitz's classification system organizes the many types of middleware that are currently available. These classifications
are based on scalability and recoverability:
Remote Procedure Call (RPCs) Client makes calls to procedures running on remote systems. It can be asynchronous or
synchronous.
Message Oriented Middleware (MOM) Messages sent to the client are collected and stored until they are acted upon,
while the client continues with other processing.
Object Request Broker (ORB) this type of middleware makes it possible for applications to send objects and request
services in an object-oriented system.
SQL Oriented Data Access: Middleware between applications and database servers.
CORBA: The Common Object Request Broker Architecture is a standard defined by the Object Management
Group (OMG) designed to facilitate the communication of systems that are deployed on diverse platforms. CORBA
enables collaboration between systems on different operating systems, programming languages, and computing
hardware. CORBA uses an object-oriented model although the systems that use the CORBA do not have to be object-
oriented. CORBA is an example of the distributed object paradigm.
CORBA allows an application to request an operation to be performed by a distributed object and for the results of the
operation to be returned back to the application making the request. The application communicates with the distributed
object that is actually performing the operation. This is basic client/server functionality, where a client issues a request
to a server and the server responds back to the client. Data can pass from the client to the server and is associated with
a particular operation on a particular object. Data is then returned back to the client in the form of a response.
Benefits
CORBA's benefits include language- and OS-independence, freedom from technology-linked implementations, strong
data-typing, high level of tunability, and freedom from the details of distributed data transfers.
Language independence
CORBA was designed to free engineers from limitations of coupling their designs to a particular software language.
Currently there are many languages supported by various CORBA providers, the most popular being Java and C++. There
are also C++11, C-only, SmallTalk, Perl, Ada, Ruby, and Python implementations, just to mention a few.
OS-independence
CORBA's design is meant to be OS-independent. CORBA is available in Java (OS-independent), as well as natively for
Linux/Unix, Windows, Solaris, OS X, OpenVMS, HPUX, Android, LynxOS, VxWorks, ThreadX, INTEGRITY, and others.
Freedom from technologies
One of the main implicit benefits is that CORBA provides a neutral playing field for engineers to be able to normalize the
interfaces between various new and legacy systems. When integrating C, C++, Object Pascal, Java, Fortran, Python, and
any other language or OS into a single cohesive system design model, CORBA provides the means to level the field and
allow disparate teams to develop systems and unit tests that can later be joined together into a whole system. This does
not rule out the need for basic system engineering decisions, such as threading, timing, object lifetime, etc. These issues
are part of any system regardless of technology. CORBA allows system elements to be normalized into a single cohesive
system model.
For example, the design of a multitier architecture is made simple using Java Servlets in the web server and various
CORBA servers containing the business logic and wrapping the database accesses. This allows the implementations of
the business logic to change, while the interface changes would need to be handled as in any other technology. For
example, a database wrapped by a server can have its database schema change for the sake of improved disk usage or
performance (or even whole-scale database vendor change), without affecting the external interfaces. At the same time,
C++ legacy code can talk to C/Fortran legacy code and Java database code, and can provide data to a web interface.
Data-typing
CORBA provides flexible data typing, for example an "ANY" data type. CORBA also enforces tightly coupled data typing,
reducing human errors. In a situation where Name-Value pairs are passed around, it is conceivable that a server
provides a number where a string was expected. CORBA Interface Definition Language provides the mechanism to
ensure that user-code conforms to method-names, return-, parameter-types, and exceptions.
High Tunability
Many implementations (e.g. ORBexpress (Ada, C++, and Java implementation)[3] and OmniORB (open source C++ and
Python implementation))[4] have options for tuning the threading and connection management features. Not all ORB
implementations provide the same features.
Freedom from data-transfer details
When handling low-level connection and threading, CORBA provides a high level of detail in error conditions. This is
defined in the CORBA-defined standard exception set and the implementation-specific extended exception set. Through
the exceptions, the application can determine if a call failed for reasons such as "Small problem, so try again", "The
server is dead" or "The reference does not make sense." The general rule is: Not receiving an exception means that the
method call completed successfully. This is a very powerful design feature.
Compression
CORBA marshals its data in a binary form and supports compression. IONA, Remedy IT, and Telefnica have worked on
an extension to the CORBA standard that delivers compression. This extension is called ZIOP and this is now a formal
OMG standard.ions and database servers.
Remote Method Invocation: The RMI (Remote Method Invocation) is an API that provides a mechanism to create
distributed application in java. The RMI allows an object to invoke methods on an object running in another JVM.
The RMI provides remote communication between the applications using two objects stub and skeleton.
Understanding stub and skeleton
RMI uses stub and skeleton object for communication with the remote object.
A remote object is an object whose method can be invoked from another JVM. Let's understand the stub and skeleton
objects:
Stub
The stub is an object, acts as a gateway for the client side. All the outgoing requests are routed through it. It resides at
the client side and represents the remote object. When the caller invokes method on the stub object, it does the
following tasks:
1. It initiates a connection with remote Virtual Machine (JVM),
2. It writes and transmits (marshals) the parameters to the remote Virtual Machine (JVM),
3. It waits for the result
4. It reads (unmarshals) the return value or exception, and
5. It finally, returns the value to the caller.
Skeleton
The skeleton is an object, acts as a gateway for the server side object. All the incoming requests are routed through it.
When the skeleton receives the incoming request, it does the following tasks:
1. It reads the parameter for the remote method
2. It invokes the method on the actual remote object, and
3. It writes and transmits (marshals) the result to the caller.
In the Java 2 SDK, a stub protocol was introduced that eliminates the need for skeletons.
Message-oriented middleware (MOM) is software or hardware infrastructure supporting sending and receiving
messages between distributed systems. MOM allows application modules to be distributed over heterogeneous
platforms and reduces the complexity of developing applications that span multiple operating systems and network
protocols.
Advantages of middleware:
Real time information access among systems
Streamlines business processes and helps raise organizational efficiency
Maintains information integrity across multiple systems
It covers a wide range of software systems, including distributed Objects and components, message-oriented
communication, and mobile application support.
Middleware is anything that helps developers create networked applications
Disadvantage of Middleware:
Prohibitively high development costs
There are few people with experience in the market place
There exist relatively few satisfying standards
The tools are not good enough
Too many platforms to be covered
Middleware often threatens the real-time performance of a system
Middleware products are not very mature
EJB provides an architecture to develop and deploy component based enterprise applications considering robustness,
high scalability and high performance. An EJB application can be deployed on any of the application server compliant
with J2EE 1.3 standard specification. We'll be discussing EJB 3.0 in this tutorial.
Benefits
Simplified development of large scale enterprise level application.
Application Server/ EJB container provides most of the system level services like transaction handling, logging,
load balancing, persistence mechanism, exception handling and so on. Developer has to focus only on business
logic of the application.
EJB container manages life cycle of EJB instances thus developer needs not to worry about when to
create/delete EJB objects.
Types
EJB are primarily of three types which are briefly described below:
Type Description
Session Bean Session bean stores data of a particular user for a single session. It can be stateful or stateless. It
is less resource intensive as compared to entity beans. Session bean gets destroyed as soon as
user session terminates.
Entity Bean Entity bean represents persistent data storage. User data can be saved to database via entity
beans and later on can be retrieved from the database in the entity bean.
Message Driven Message driven beans are used in context of JMS (Java Messaging Service). Message Driven
Bean Beans can consumes JMS messages from external entities and act accordingly.
Distributed Component Object Model

DCOM allows processes to be efficiently distributed to multiple computers so that the client and server components of
an application can be placed in optimal locations on the network. Processing occurs transparently to the user because
DCOM handles this function. Thus, the user can access and share information without needing to know where the
application components are located. If the client and server components of an application are located on the same
computer, DCOM can be used to transfer information between processes. DCOM is platform independent and supports
any 32-bit application that is DCOM-aware.
Advantages of Using DCOM
DCOM is a preferred method for developers to use in writing client/server applications for Windows 2000. With DCOM,
interfaces to software objects can be added or upgraded, so applications aren't forced to upgrade each time the
software object changes. Objects are software entities that perform specific functions. These functions are implemented
as dynamic-link libraries so that changes in the functions, including new interfaces or the way the function works, can be
made without rewriting and recompiling the applications that call them.
Windows 2000 supports DCOM by making the implementation of application pointers transparent to the application and
the object. Only the operating system needs to know if the function called is handled in the same process or across the
network. This frees the application from concerns with local or remote procedure calls. Administrators can choose to
run DCOM applications on local or remote computers, and can change the configuration for efficient load balancing.
Your application might support its own set of DCOM features. For more information about configuring your application
to use DCOM, see your application's documentation.
DCOM builds upon remote procedure call (RPC) technology by providing an easy-to-use mechanism for integrating
distributed applications on a network. A distributed application consists of multiple processes that cooperate to
accomplish a single task. Unlike other interprocess communication (IPC) mechanisms, DCOM gives you a high degree of
control over security features, such as permissions and domain authentication. It can also be used to start applications
on other computers or to integrate Web browser applications that run on the Microsoft ActiveX platform.
Entities used in an HTTP request message.
Request-Line
The Request-Line begins with a method token, followed by the Request-URI and the protocol version, and ending with
CRLF. The elements are separated by space SP characters.
Request-Line = Method SP Request-URI SP HTTP-Version CRLF
The term CRLF refers to Carriage Return (ASCII 13, \r) Line Feed (ASCII 10, \n). They're used to note the termination of a
line, however, dealt with differently in today's popular Operating Systems
Request Method
The request method indicates the method to be performed on the resource identified by the given Request-URI. The
method is case-sensitive and should always be mentioned in uppercase. The following table lists all the supported
methods in HTTP/1.1.
Request Methods
GET: The GET method is used to retrieve information from the given server using a given URI. Requests using GET should
only retrieve data and should have no other effect on the data.
HEAD: Same as GET, but it transfers the status line and the header section only.
POST: A POST request is used to send data to the server, for example, customer information, file upload, etc. using
HTML forms.
PUT: Replaces all the current representations of the target resource with the uploaded content.
DELETE: Removes all the current representations of the target resource given by URI.
CONNECT: Establishes a tunnel to the server identified by a given URI.
OPTIONS: Describe the communication options for the target resource.
TRACE: Performs a message loop back test along with the path to the target resource.
Request-URI
The Request-URI is a Uniform Resource Identifier and identifies the resource upon which to apply the request.
Server-side scripting is a technique used in web development which involves employing scripts on a web server which
produce a response customized for each user's (client's) request to the website.
Server-side scripting also enables the website owner to hide the source code that generates the interface, whereas with
client-side scripting, the user has access to all the code received by the client. A down-side to the use of server-side
scripting is that the client needs to make further requests over the network to the server in order to show new
information to the user via the web browser. These requests can slow down the experience for the user, place more
load on the server, and prevent use of the application when the user is disconnected from the server
Web Server: A Web server is a program that uses HTTP (Hypertext Transfer Protocol) to serve the files that form Web
pages to users, in response to their requests, which are forwarded by their computers' HTTP clients. Dedicated
computers and appliances may be referred to as Web servers as well.
Best practices for server deployments.
Keep the installation structure SIMPLE. Files and directories should be kept to a minimum. Dont install anything thats
never going to be used.
Always get rid of old files. When something goes wrong on a production host, the last thing I want to be doing is trawling
through random directories and copies of old files to find whats gone wrong.
Automate it this almost goes without saying, but deployments should NOT be manual, theres far too much room for
human error. Use a tool for doing deployments, something that supports the native OS operations, like rpm using yum
for RedHat. Alternatively, if youre deploying to multiple different OSs, try using a scripting language to script the
deployments.
Dont over do it with the hyperlinks. Use them only if you have to. Its too easy to end up with hyperlinks pointing to the
wrong place, or to break them altogether. Its also a good idea for your applications themselves to rely on hyperlinks. I
would rather enforce standardization of the environments and have my applications use real paths than rely on
hyperlinks. Hyperlinks simply add another level of configuration and a reliance on something else which is all too
breakable.
Delete everything first. If youre simply deploying a new directory or package, completely remove the existing one. Take
a backup if necessary, but delete that backup at the end if the deployment is successful. This is similar to point 2, but
more robust. I think that if at all possible, you shouldnt rely on sync tools like rsync/xcopy/robocopy to do your
deployments. If your time and bandwidth allows, delete everything first and upload the complete new package, not just
a delta.
Have a roll back strategy. Things can sometimes go wrong and often the best policy is to roll-back to a known-working
version. Keeping a backup of the last known working version locally on the target machine can often be the quickest and
simplest method, but again I would avoid this option if bandwidth allows. I dont like having old versions sitting around
on servers; it leads to cluttered production boxes. I would much rather do a roll-back using the same mechanism used
for doing a normal deployment.
Dont make changes to your deploy mechanism or deploy scripts between deploying to different environments. This is
just common sense, but Ive seen a process where, in order to actually deploy something onto a production server, the
deploy script had to be manually changed! Suffice to say I wasnt a fan of that idea.
Web Client: It typically refers to the Web browser in the user's machine. It may also refer to plug-ins and helper
applications that enhance the browser to support special services from the site. The term may imply the entire user
machine or refer to a handheld device that provides Web access.
Web Services: Web services are self-contained, modular, distributed, dynamic applications that can be described,
published, located, or invoked over the network to create products, processes, and supply chains. These applications can
be local, distributed, or web-based.
To summarize, a complete web service is, therefore, any service that:
Is available over the Internet or private (intranet) networks
Uses a standardized XML messaging system
Is not tied to any one operating system or programming language
Is self-describing via a common XML grammar
Is discoverable via a simple find mechanism
Components of Web Services
The basic web services platform is XML + HTTP. All the standard web services work using the following components
SOAP (Simple Object Access Protocol)
UDDI (Universal Description, Discovery and Integration)
WSDL (Web Services Description Language)
Mail Server: A mail server is the computerized equivalent of your friendly neighborhood mailman. Every email that is
sent passes through a series of mail servers along its way to its intended recipient. Although it may seem like a message
is sent instantly - zipping from one PC to another in the blink of an eye - the reality is that a complex series of transfers
takes place. Without this series of mail servers, you would only be able to send emails to people whose email address
domains matched your own - i.e., you could only send messages from one example.com account to another
example.com account.
Types of Mail Servers
Mail servers can be broken down into two main categories: outgoing mail servers and incoming mail servers. Outgoing
mail servers are known as SMTP, or Simple Mail Transfer Protocol, servers. Incoming mail servers come in two main
varieties. POP3, or Post Office Protocol, version 3, servers are best known for storing sent and received messages on PCs'
local hard drives. IMAP, or Internet Message Access Protocol, servers always store copies of messages on servers. Most
POP3 servers can store messages on servers, too, which is a lot more convenient.
Proxy server: A proxy server is a dedicated computer or a software system running on a computer that acts as an
intermediary between an endpoint device, such as a computer, and another server from which a user or client is
requesting a service. The proxy server may exist in the same machine as a firewall server or it may be on a separate
server, which forwards requests through the firewall.
Multimedia server: A multimedia server refers either to a dedicated computer appliance or to a specialized application
software, ranging from an enterprise class machine providing video on demand, to, more commonly, a small personal
computer or NAS (Network Attached Storage) for the home, dedicated for storing various digital media (meaning digital
videos/movies, audio/music, and picture files).
Unit 5:
Servlets
Java Servlets are programs that run on a Web or Application server and act as a middle layer between a requests coming
from a Web browser or other HTTP client and databases or applications on the HTTP server.
Using Servlets, we can collect input from users through web page forms, present records from a database or another
source, and create web pages dynamically.
Java Servlets often serve the same purpose as programs implemented using the Common Gateway Interface (CGI). But
Servlets offer several advantages in comparison with the CGI.
Performance is significantly better.
Servlets execute within the address space of a Web server. It is not necessary to create a separate process to
handle each client request.
Servlets are platform-independent because they are written in Java.
Java security manager on the server enforces a set of restrictions to protect the resources on a server machine.
So servlets are trusted.
The full functionality of the Java class libraries is available to a servlet. It can communicate with applets,
databases, or other software via the sockets and RMI mechanisms
Servlet for handling HTTP GET request Example

The doGet () method is invoked by server through service () method to handle a HTTP GET request. This method also
handles HTTP HEAD request automatically as HEAD request is nothing but a GET request having no body in the code for
response and only includes request header fields. To understand the working of doGet () method, let us consider a
sample program to define a servlet for handling the HTTP GET request.
A program to define a servlet for handling HTTP GET request
import java.io.*;
import java.util.*;
import javax.servlet.*;
public class ServletGetExample extends HttpServlet

{
public void doGet(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException
{
PrintWriter out = res.getWriter();
String login= req.getParameter("loginid");
String password= req.getParameter("password");
out.println("Your login ID is: ");
out.println(login);
out.println("Your password is: ");
out.println(password);
out.close();
}
}
In this example, the doGet () method of HttpServlet class is overridden to handle the HTTP GET request. The two
parameters passed to the doGet () method are req and res, the Objects of HttpServletRequest and HttpServletResponse
interface respectively. The req object allows to read data provided in the client request and the res object is used to
develop Response for the client request.
The corresponding HTML code for this servlet is as follows
<HTML>
<BODY>
<CENTER>
<FORM NAME="Form1" METHOD="post" ACTION="http://localhost:8080/ServletGetExample">
<B>Login ID</B> <INPUT TYPE="text" NAME="loginid" SIZE="30">
<P>
<B>Password</B> <INPUT TYPE="password" NAME="password" SIZE="30">
</P>
<P>
<INPUT TYPE=submit VALUE="Submit".>
</P>
</BODY>
</HTML>
GET and POST Requests:
The GET Method
In GET method the data is sent as URL parameters that are usually strings of name and value pairs separated by
ampersands (&). In general, a URL with GET data will look like this:
http://www.example.com/action.php?name=john&age=24
The bold parts in the URL are the GET parameters and the italic parts are the value of those parameters. More than one
parameter=value can be embedded in the URL by concatenating with ampersands (&). One can only send simple text
data via GET method.
Advantages and Disadvantages of Using the GET Method
Since the data sent by the GET method are displayed in the URL, it is possible to bookmark the page with specific query
string values.
The GET method is not suitable for passing sensitive information such as the username and password, because these are
fully visible in the URL query string as well as potentially stored in the client browser's memory as a visited page.
Because the GET method assigns data to a server environment variable, the length of the URL is limited. So, there is a
limitation for the total data to be sent.
The POST Method
In POST method the data is sent to the server as a package in a separate communication with the processing script. Data
sent through POST method will not visible in the URL.
Advantages and Disadvantages of Using the POST Method
It is more secure than GET because user-entered information is never visible in the URL query string or in the server logs.
There is a much larger limit on the amount of data that can be passed and one can send text data as well as binary data
(uploading a file) using POST.
Since the data sent by the POST method is not visible in the URL, so it is not possible to bookmark the page with specific
query.
Redirecting request multi-tier applications
The View is responsible for displaying information to the user, as well as any user-interface controls that allow the user
to interact with the application. In a web application, the View therefore consists of HTML, either static or dynamically
generated that is sent from the webserver in an HTTP response message. The View may request data from the Model in
order to display the application's status to the user, but this data is always requested by the View and not spontaneously
sent by the Model (a "pull" instead of a "push"). However, unlike our previous interpretation of the Presentation tier,
the View does not handle any of the input of from the user directly; instead this is passed to the Controller.
The Controller accepts all input messages from the user. In a web application, these consist of GET and POST HTTP
requests. Based on the nature of the request, the Controller executes functionality in the Model, and based upon the
results, redirects the user to the appropriate View. The Controller takes on responsibilities of both the Presentation and
Business Logic tiers, but really acts as a bridge between the two by translating the user's manipulation of the controls
into the execution of functions exposed by the Model.
Finally, the Model incorporates the bulk of the Business Logic and Data tier responsibilities. This does not mean that the
two layers are commingled--they remain logically distinct--but they both act in concert to supply the services requested
from the Controller. The Model stores the current state of the application (Data tier) and exposes the functions that may
change that state (Business Logic and Data tiers). The View may call upon the Model to retrieve that state, but only the
Controller may execute functions to change the state.
JavaServer Pages
JavaServer Pages (JSP) is a technology for developing web pages that support dynamic content which helps developers
insert java code in HTML pages by making use of special JSP tags, most of which start with <% and end with %>.
A JavaServer Pages component is a type of Java servlet that is designed to fulfill the role of a user interface for a Java
web application. Web developers write JSPs as text files that combine HTML or XHTML code, XML elements, and
embedded JSP actions and commands.
Using JSP, you can collect input from users through web page forms, present records from a database or another source,
and create web pages dynamically.
JSP tags can be used for a variety of purposes, such as retrieving information from a database or registering user
preferences, accessing JavaBeans components, passing control between pages and sharing information between
requests, pages etc.
Advantages of JSP:
Following is the list of other advantages of using JSP over other technologies:
Active Server Pages (ASP): The advantages of JSP are twofold. First, the dynamic part is written in Java, not
Visual Basic or other MS specific language, so it is more powerful and easier to use. Second, it is portable to
other operating systems and non-Microsoft Web servers.
Pure Servlets: It is more convenient to write (and to modify!) regular HTML than to have plenty of println
statements that generate the HTML.
Server-Side Includes (SSI): SSI is really only intended for simple inclusions, not for "real" programs that use form
data, make database connections, and the like.
JavaScript: JavaScript can generate HTML dynamically on the client but can hardly interact with the web server
to perform complex tasks like database access and image processing etc.
Static HTML: Regular HTML, of course, cannot contain dynamic information.
JavaBeans: A JavaBean is a specially constructed Java class written in the Java and coded according to the JavaBeans API
specifications.
Following are the unique characteristics that distinguish a JavaBean from other Java classes:
It provides a default, no-argument constructor.
It should be serializable and implement the Serializable interface.
It may have a number of properties which can be read or written.
It may have a number of "getter" and "setter" methods for the properties.
JavaBeans Properties:
A JavaBean property is a named attribute that can be accessed by the user of the object. The attribute can be of any Java
data type, including classes that you define.
A JavaBean property may be read, write, read only, or write only. JavaBean properties are accessed through two
methods in the JavaBean's implementation class:
Method Description
getPropertyName() For example, if property name is firstName, your method name would
be getFirstName() to read that property. This method is called accessor.
setPropertyName() For example, if property name is firstName, your method name would
be setFirstName() to write that property. This method is called mutator.
A read-only attribute will have only a getPropertyName() method, and a write-only attribute will have only a
setPropertyName() method.mic information.
Java Bean
A Java Bean is a java class that should follow following conventions:
It should have a no-arg constructor.
It should be Serializable.
It should provide methods to set and get the values of the properties, known as getter and setter methods.
Use of Java Bean
According to Java white paper, it is a reusable software component. A bean encapsulates many objects into one object,
so we can access this object from multiple places. Moreover, it provides the easy maintenance.
Simple example of java bean class
//Employee.java
package mypack;
public class Employee implements java.io.Serializable{
private int id;
private String name;
public Employee(){}
public void setId(int id){this.id=id;}
public int getId(){return id;}
public void setName(String name){this.name=name;}
public String getName(){return name;}
}
Access the java bean class
To access the java bean class, we should use getter and setter methods.
package mypack;
public class Test{
public static void main(String args[]){
Employee e=new Employee();//object is created
e.setName("Arjun");//setting value to the object
System.out.println(e.getName());
}}
Connecting to Excel Spreadsheets Through ODBC

Example
A company stores its employee data in an Excel file called employees.xls. This file contains two worksheets:
employee_details and job_history. You must load the data from the employee_details worksheet into a target table in
Oracle Warehouse Builder.
Solution
To load data stored in an Excel file into a target table, you must first use the Excel file as a source. Oracle Warehouse
Builder enables you to connect to data stored in a non-Oracle source, such as Microsoft Excel, using "Oracle Database
Heterogeneous Services".
Case Study
This case study shows you how to use an Excel file called employees.xls as a source in Oracle Warehouse Builder.
Step 1: Install ODBC Driver for Excel
To read data from Microsoft Excel, you need the ODBC driver for Excel. By default, the ODBC driver for Excel is installed
on a Windows system.
Step 2: Delimit the Data in the Excel File (Optional)
If you want to delimit the data to be imported from the Excel file, then define a name for the range of data being
sourced:
In the employee_details worksheet, highlight the range to query from Oracle.
The range should include the column names and the data. Ensure that the column names confirm to the rules
for naming columns in the Oracle Database.
From the Insert menu, select Name and then Define. The Define Name dialog box is displayed. Specify a name
for the range.
Step 3: Create a System DSN
Set up a System Data Source Name (DSN) using the Microsoft ODBC Administrator.
Select Start, Settings, Control Panel, Administrative Tools, Data Sources (ODBC).
This opens the ODBC Data Source Administrator dialog box.
Navigate to the System DSN tab and click Add to open the Create New Data Source dialog box.
Select Microsoft Excel Driver as the driver for which you want to set up the data source.
Click Finish to open the ODBC Microsoft Excel Setup dialog box.
The ODBC Microsoft Setup dialog box is shown in Figure

Figure: ODBC Microsoft Excel Setup Dialog Box
Specify a name for the data source. For example, odbc_excel.
Click Select Workbook to select the Excel file from which you want to extract data.
Verify that the Version field lists the correct version of the source Excel file.
Step 4: Create the Heterogeneous Services Initialization File
To configure the agent, you must set the initialization parameters in the heterogeneous services initialization file. Each
agent has its own heterogeneous services initialization file. The name of the Heterogeneous Services initialization file is
initSID.ora, where SID is the Oracle system identifier used for the agent. This file is located in the OWB_HOME\hs\admin
directory.
Create the initexcelsid.ora file in the OWB_HOME\hs\admin directory as follows:
HS_FDS_CONNECT_INFO = odbc_excel
HS_AUTOREGISTER = TRUE
HS_DB_NAME = dg4odbc
Here, odbc_excel is the name of the system DSN you created in Step 3. excelsid is the name of the Oracle system
identifier used for the agent.
Step 5: Modify the listener.ora file
Set up the listener on the agent to listen for incoming requests from the Oracle Database. When a request is received,
the agent spawns a Heterogeneous Services agent. To set up the listener, modify the entries in the listener.ora file
located in the OWB_HOME\network\admin directory as follows:
SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(SID_NAME = excelsid)
(OWB_HOME = C:\oracle11g\product\11.2.0\db_1)
(PROGRAM = dg4odbc)
)
(SID_DESC =
(SID_NAME = PLSExtProc)
(OWB_HOME = C:\oracle11g\product\11.2.0\db_1)
(PROGRAM = extproc)
)
)
For the SID_NAME parameter, use the SID that you specified when creating the initialization parameter file for
the Heterogeneous Services, which, in this case, is excelsid..
Ensure that the OWB_HOME parameter value is the path to your Oracle Database home directory.
The value associated with the PROGRAM keyword defines the name of the agent executable.
Remember to restart the listener after making these modifications.
Note:
Ensure that the initialization parameter GLOBAL_NAMES is set to FALSE in the database's initialization parameter file.
FALSE is the default setting for this parameter.
Step 6: Create an ODBC Source Module
Use the following steps to create an ODBC source module:
From the Projects Navigator, create an ODBC source module.
ODBC is listed under the Databases node. See "Creating an ODBC Module".
To provide connection information, on the Connection Information page, click Edit to open the Edit Non-Oracle
Location dialog box and provide the following details:
Ensure that the service name you provide equals the SID_NAME you specified in the listener.ora file.
Enter the host name and the port number in the Host and Port fields respectively.
Because you are not connecting to an Oracle database, you can provide dummy values for user name and password. The
fields cannot be empty.
The Schema field can be left empty because you are not importing metadata from a schema.
Click Test Connection to verify the connection details.
Step 7: Import Metadata from Excel Using the Metadata Import Wizard
Use the Metadata Import Wizard to import metadata from the Excel file into Oracle Warehouse Builder. Select Tables as
the Filter condition. The wizard displays all the worksheets in the source Excel file under the Tables node in the list of
available objects.
Select employee_details and use the right arrow to move it to the list of selected objects.
Click Finish to import the metadata.
The data from the employee_details worksheet is now stored in a table called employee_details in the ODBC source
module.
Step 8: Create a Mapping to Load Data Into the Target Table
Create a mapping in the module that contains the target table. Use the employee_details table imported in the previous
step as the source and map it to the target table.
Figure displays the mapping used to load data into the target table.
Step 9: Deploy the Mapping

Use the Control Center Manager or Design Center to deploy the mapping you created in step 8. Ensure that you first
deploy the source module before you deploy the mapping. See Oracle Warehouse Builder Data Modeling, ETL, and Data
Quality Guide for more information about mappings.
Troubleshooting
This section lists some errors that you may encounter while providing the connection information.
Error
ORA-28546: connection initialization failed, porbable Net8 admin error
ORA-28511: lost RPC connection to heterogeneous remote agent using
SID=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(Host=localhost)(PORT=1521)))(CONNECT_DATA=(SID=
oracledb)))
ORA-02063: preceeding 2 lines from OWB###
Probable Cause
Providing the same SID name as that of your database.
Action
Provide an SID name different from the SID name of your database.
Error
ORA-28500: connection from ORACLE to a non-Oracle system returned this message:
[Generic Connectivity Using ODBC][H006] The init parameter <HS_FDS_CONNECT_INFO>
is not set. Please set it in init<orasid>.ora file.
Probable Cause
Name mismatch between SID name provided in the listener.ora file and the name of the initSID.ora file in
OWB_HOME\hs\admin.
Action
Ensure that the name of the initSID.ora file and the value provided for the SID_NAME parameter in listener.ora file is the
same.
Tip:
Ensure that you restart the listener service whenever you make changes to the listener.ora file.
Connecting to SQL Server Database Through ODBC
Scenario
Your company has data that is stored in SQL Server and you would like to import this into Oracle Warehouse Builder.
Once you import the data, you can perform data profiling to correct anomalies, and then transform the data according
to your requirements by using mappings.
Solution
One of the ways to connect to an SQL Server database from Oracle Warehouse Builder is to use an ODBC gateway. Once
connected, you can import metadata and load data.
Case Study
To connect to SQL Server and import metadata, refer to the following sections:
"Creating an ODBC Data Source"
"Configuring the Oracle Database Server"
"Adding the SQL Server as a Source in Oracle Warehouse Builder"
If you encounter problems implementing this solution, see "Troubleshooting".
Creating an ODBC Data Source
You must create an ODBC data source to connect to the SQL Server database using ODBC. You must set up a System
Data Source Name (DSN):
Select Start, Control Panel, Administrative Tools, Data Sources (ODBC).
This opens the ODBC Data Source Administrator dialog box.
Navigate to the System DSN tab and click Add to open the Create New Data Source dialog box.
Select SQL Server as the driver for which you want to set up the data source.
Click Finish to open the Create A New Data Source to SQL Server Wizard.
In the Name field, specify a name for the data source. For example, sqlsource.
In the Server field, select the server to which you want to connect and click Next.
Specify whether the authentication should be done at the Operating System level or at the server level. Click
Next.
Select the database file and click Next.
Accept the default values in the next screen and click Finish.
Test the data source to verify the connection.
Configuring the Oracle Database Server
Next, you must configure the Oracle database to connect to the SQL Server database. Oracle Warehouse Builder can
then use this configuration to extract metadata from the SQL Server database. This involves the following steps:
"Creating a Heterogeneous Service Configuration File"
"Editing the listener.ora file"
Creating a Heterogeneous Service Configuration File
You must create the heterogeneous file in the OWB_HOME\hs\admin directory. The naming convention for this file
should be as follows:
Must begin with init
Must end with the extension .ora
Must not contain space or special characters
For example, you can name the file initsqlserver.ora.
Enter the following in the file:
HS_FDS_CONNECT_INFO = sqlsource
HS_FDS_TRACE_LEVEL = 0
Here, sqlsource is the name of the data source that you specified while creating the ODBC data source.
Editing the listener.ora file
You must add a new SID description in the listener.ora file. This file is stored in the OWB_HOME\network\admin
directory.
Modify the file as shown:
SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(SID_NAME = sqlserver)
(OWB_HOME = c:\oracle10g\owb_home)
(PROGRAM = dg4odbc)
)
(SID_DESC =
(SID_NAME = PLSExtProc)
(OWB_HOME = c:\oracle10g\owb_home)
(PROGRAM = extproc)
)
)
The SID_NAME parameter must contain the name of the configuration file you created in the previous step. However, it
must not contain the init prefix. For example, if the configuration file you created in the previous step was
initsqlserver.ora, then the value of the SID_NAME parameter should be sqlserver.
OWB_HOME must point to the Oracle home location of your database installation.
The value associated with the PROGRAM keyword defines the name of the executable agent, which, in this case, is
dg4odbc.
Restart the listener service after making these modifications.
Adding the SQL Server as a Source in Oracle Warehouse Builder
The final step involves adding an ODBC module in Oracle Warehouse Builder, and importing the data from the SQL
server into this module.
To add an ODBC source module in Oracle Warehouse Builder:
Within a project in the Projects Navigator, navigate to the Databases node.
Right-click ODBC and select New ODBC Module.
Create a new ODBC module using the Create Module Wizard.
Use the Connection Information page to provide the location details. To create a new location, click Edit to open
the Edit Non-Oracle Location dialog box.
In the Edit Location dialog box, ensure that you enter user name and password within double quotation marks
("). For example, if the user name is matt, then enter "matt".
For Service Name, enter the SID name you provided in the listener.ora file. Also select the schema from which
you want to import the metadata.
Click Test Connection to verify the connection details.
To import metadata into the ODBC module:
Right-click the module and select Import.
Import the metadata using the Import Metadata Wizard.
The tables and views available for import depend on the schema you selected when providing the connection
information.
Troubleshooting
Some of the errors that you may encounter while providing the connection information are listed here:
Error
[Generic Connectivity Using ODBC][Microsoft][ODBC Driver Manager] Data source name
not found and no default driver specified (SQL State: IM002; SQL Code: 0)
ORA-02063: preceding 2 lines from OWB_###
Probable Cause
Creating the DSN from the User DSN tab.
Action
Create the DSN from the System DSN tab.
Error

[Generic Connectivity Using ODBC][Microsoft][ODBC SQL Server Driver][SQL
Server]Login failed for user 'SA'. (SQL State: 28000; SQL Code: 18456)
ORA-02063: preceding 2 lines from OWB_###
Probable Cause
The user name and password in the Edit Location dialog box are not enclosed within double quotation marks.
Action
Enter the user name and password within double quotation marks.
Tip:
Ensure that you restart the listener service whenever you make changes to the listener.ora file.

Class Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Class Notes

Uploaded by

Copyright:

Available Formats

Unit 1:

The main difference between XML and HTML

Notice that the JavaScript code is enclosed in HTML comment tags:

External JavaScript Code

Nubers Integer or floating point numbers

Precedence Operator Associability Meaning

document.write("<br>names[0] - " + names[0] ); document.write("<br>names[1] - " + names[1] );

Escape Sequence Meaning

Meta Character Meaning

$ Means to only match at the end of a target [word$]

Alert Message Box

Confirm Dialog Box

Prompt Dialog Box

The value returned is the value entered by the user.

setTimeout Will execute a function a specific number of milliseconds in the future

Introduction To Apache Web Server

Distributed Component Object Model

Servlet for handling HTTP GET request Example

public class ServletGetExample extends HttpServlet

Connecting to Excel Spreadsheets Through ODBC

The ODBC Microsoft Setup dialog box is shown in Figure

Step 9: Deploy the Mapping

ORA-28500: connection from ORACLE to a non-Oracle system returned this message:

You might also like