Professional Documents
Culture Documents
Powered
By
Surya
Surya 1
ActiveNET ™ Always first in Race
Introduction:
XML developed by an XML working group in 1996
Under auspices of the World Wide Web Consortium (W3C)
Chaired by Jon Bosak - Sun Microsystems
Standards:
SGML (ISO 8879:1986) - by definition, well-formed XML documents are conformant SGML
documents
Unicode and ISO/IEC 10646 - this specification define the encodings and meanings of the
characters
IETF RFC 1738 and RFC 1808 - these define the syntax and semantics of Uniform Resource
Locators (URLs)
Is a subset of SGML
The two known products derived from SGML are HTML and XML
XML is used to represent data before presentation (little bit abstraction isn't see the next sections)
Whereas if the same data want to be presented on different systems is it possible with the help of
HTML, certainly No. Then is there any neutral language available which presents the data on
different systems using markup languages like for example if you take Mobile support WML
(Wireless Markup Language), PDA (Personal Digital Assistance) requires PDAML, Browser
supports HTML.
XML instead of embedding data in static presentation format like HTML, it formats the data in a
structured format so that depends on the device we are using the structured data will be
converted into various presentation formats using translators.
XML is having many more advantages we will discuss in the later sections.
XML standards are developed by W3C (World Wide Web Consortium) is a consortium (non-
profitable organization) working for setting up standards for markup languages. W3C is formed by
professionals, consultants from various organizations.
XML Objects:
Different objects used in XML document are:
Node - All types of objects used in XML document are generally refered with this name
Element - Instructions embedded in <> (angular braces)
Attribute - Parameters used in elements
Text - Data placed in between start and end element
PI (Processing Instruction) - Contains xml version, encoding etc
DOCTYPE (Document Type) - Contains DTD path
Comment - Used to supress the data/to mark unused data
CDATA (unparsed-Character DATA section) - useful data but want to ignored by XML parser
Entity - General and Parameter Entities
Entities
Surya 2
ActiveNET ™ Always first in Race
Even though XML document is a structured document more than that it is a plain document.
Reading content from XML document needs an extra talent and energy be means of using XML
parsers.
XML Parsers:
XML parser is a tool which read, understand and process content of the XML document. More
clarity will given in the next coming sections.
First of all you tell me your requirement is whether to read or update XML document.
Kinds of Parsers:
Surya 3
ActiveNET ™ Always first in Race
Validating parser
-Verifies both well-formedness and Validness
-That means it also checks the "structure of the XML document used in the document" compared
with "document structure definition files like DTD/Schema".
What is well-formedness?
Well-formedness is a set of following rules that every XML document must have to satisfy: Rules
are as follows
-Document must begin with PI (<?xml version="1.0"?>)
-Document must begin with root element and ends with the same (<EMPS> </EMPS>)
-Every element must have both start and end element (<EMP> </EMP>)
-Attribute values must be placed in quotes ( ' or ") (<DESIG cadre="01">)
-Elements must be properly nested
(
<DESIG>
<NAME>Director</NAME>
<SENIORITY>1</SENIORITY>
</DESIG>
)
-Elements are case sensitive (both start and end elements must use the same case)
What is Validness?
-Verifiying structure used in XML document with the structure defined in DTD/Schema
-The path of the DTD/Schema included on top of the XML document next to PI declaration
If DTD as follows:
<!DOCTYPE root_ele IDENTIFIER 'FPI' 'dtd_uri'>
IDENTIFERs are two: i) PUBLIC ii) SYSTEM
If PUBLIC identifier is used then FPI must be included
FPI (Formal Public Indentifier)
Ex:
<!DOCTYPE EMPS SYSTEM 'emps.dtd'>
<!DOCTYPE EMPS PUBLIC '-/Indian Government/Income Tax/en' 'emps.dtd'>
If Schema as follows:
<EMPS href="EMPS.xsd">
DTD PDAML
XSL
Schema XSL Transformer WML
Surya 4
ActiveNET ™ Always first in Race
DTDs:
Brief look at DTDs:
-Element declaration: <!ELEMENT ele_name (sub_ele1, sub_ele2, sub_ele3)>
Ex: emps.dtd
<!ELEMENT EMPS (EMP*)>
<!ELEMENT EMP (EMPNO, ENAME, SAL, DESIG, ADDRESS, (PHONE|MOBILE), EMAIL)>
<!ELEMENT EMPNO (#PCDATA)>
<!ELEMENT ENAME (#PCDATA)>
<!ELEMENT SAL (#PCDATA)>
<!ELEMENT DESIG (#PCDATA)>
<!ELEMENT ADDRESS (#PCDATA)>
<!ELEMENT PHONE (#PCDATA)>
<!ELEMENT MOBILE (#PCDATA)>
<!ELEMENT EMAIL (#PCDATA)>
In the above given document some characters are used in suffix to element declaration, They
are:
No Sign - only one occurance (1)
? - Zero or one occurance (0/1)
* - Zero or many occurances (>=0)
+ - Once or More occurances (>=1)
Surya 5
ActiveNET ™ Always first in Race
Attribute declaration:
syntax:
<!ATTLIST ele_name attr1_name attr1_type attr1_constraint>
Some of the examples are:
<!ATTLIST EMP id ID #REQUIRED>
<!ATTLIST EMP last_name CDATA #IMPLIED>
<!ATTLIST temparature units (Celsius|Fahrenheit) CDATA #IMPLIED>
<!ATTLIST account MIN_BAL #FIXED "1000.00">
attr_types:
ID - Restricts the attribute value used only once in the document
Ex: <EMP id="1">
<!ATTLIST EMP id ID #REQUIRED>
IDREF -
IDREFs
CDATA
NMTOKEN
NMTOKENS
attr_constraints:
IMPLIED - Optional
REQUIRED - Mandatory
FIXED - The value of this type of attribute is FIXED/CONSTANT/FINAL
Schemas:
Schema is another form of defining the structure of XML document in place of DTD.
Surya 6
ActiveNET ™ Always first in Race
<name>Robert Smith</name>
<street>8 Oak Avenue</street>
<city>Old Town</city>
<state>PA</state>
<zip>95819</zip>
</billTo>
<comment>Hurry, my lawn is going wild!</comment>
<items>
<item partNum="872-AA">
<productName>Lawnmower</productName>
<quantity>1</quantity>
<USPrice>148.95</USPrice>
<comment>Confirm this is electric</comment>
</item>
<item partNum="926-AA">
<productName>Baby Monitor</productName>
<quantity>1</quantity>
<USPrice>39.98</USPrice>
<shipDate>2004-05-21</shipDate>
</item>
</items>
</purchaseOrder>
<!--
po.xsd
-->
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:annotation>
<xsd:documentation xml:lang="en">
Purchase order schema for Example.com.
Copyright 2004 Example.com. All rights reserved.
</xsd:documentation>
</xsd:annotation>
<xsd:complexType name="PurchaseOrderType">
<xsd:sequence>
<xsd:element name="shipTo" type="USAddress"/>
<xsd:element name="billTo" type="USAddress"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="items" type="Items"/>
</xsd:sequence>
<xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>
<xsd:complexType name="USAddress">
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="street" type="xsd:string"/>
<xsd:element name="city" type="xsd:string"/>
<xsd:element name="state" type="xsd:string"/>
<xsd:element name="zip" type="xsd:decimal"/>
Surya 7
ActiveNET ™ Always first in Race
</xsd:sequence>
<xsd:attribute name="country" type="xsd:NMTOKEN"
fixed="US"/>
</xsd:complexType>
<xsd:complexType name="Items">
<xsd:sequence>
<xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="productName" type="xsd:string"/>
<xsd:element name="quantity">
<xsd:simpleType>
<xsd:restriction base="xsd:positiveInteger">
<xsd:maxExclusive value="100"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="USPrice" type="xsd:decimal"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="partNum" type="SKU" use="required"/>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
The purchase order consists of a main element, purchaseOrder, and the subelements shipTo,
billTo, comment, and items. These subelements (except comment) in turn contain other
subelements, and so on, until a subelement such as USPrice contains a number rather than any
subelements. Elements that contain subelements or carry attributes are said to have complex
types, whereas elements that contain numbers (and strings, and dates, etc.) but do not contain
any subelements are said to have simple types. Some elements have attributes; attributes always
have simple types.
The purchase order schema consists of a schema element and a variety of subelements, most
notably element, complexType, and simpleType which determine the appearance of elements
and their content in instance documents.
Each of the elements in the schema has a prefix xsd: which is associated with the XML Schema
namespace through the declaration, xmlns:xsd="http://www.w3.org/2001/XMLSchema", that
appears in the schema element. The prefix xsd: is used by convention to denote the XML
Schema namespace, although any prefix can be used.
Surya 8
ActiveNET ™ Always first in Race
In XML Schema, there is a basic difference between complex types which allow elements in their
content and may carry attributes, and simple types which cannot have element content and
cannot carry attributes.
Defining PurchaseOrderType
<xsd:complexType name="PurchaseOrderType">
<xsd:sequence>
<xsd:element name="shipTo" type="USAddress"/>
<xsd:element name="billTo" type="USAddress"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="items" type="Items"/>
</xsd:sequence>
<xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>
Occurance constraints:
Occurance constraints for Elements and Attributes
Element: minOccurs, maxOccurs, fixed, default
Attribute: use, fixed, default
New simple types are defined by deriving them from existing simple types. We use the
simpleType element to define and name the new simple type. We use the restriction element to
indicate the existing (base) type, and to identify the "facets" that constrain the range of values.
Suppose we wish to create a new type of integer called myInteger whose range of values is
between 10000 and 99999 (inclusive). We base our definition on the built-in simple type integer,
whose range of values also includes integers less than 10000 and greater than 99999. To define
myInteger, we restrict the range of the integer base type by employing two facets called
minInclusive and maxInclusive:
Surya 9
ActiveNET ™ Always first in Race
The purchase order schema contains another, more elaborate, example of a simple type
definition. A new simple type called SKU is derived (by restriction) from the simple type string.
Furthermore, we constrain the values of SKU using a facet called pattern in conjunction with the
regular expression "\d{3}-[A-Z]{2}" that is read "three digits followed by a hyphen followed by two
upper-case ASCII letters":
XML Schema defines fifteen facets which are listed in Appendix B. Among these, the
enumeration facet is particularly useful and it can be used to constrain the values of almost every
simple type, except the boolean type. The enumeration facet limits a simple type to a set of
distinct values. For example, we can use the enumeration facet to define a new simple type
called USState, derived from string, whose value must be one of the standard US state
abbreviations:
USState would be a good replacement for the string type currently used in the state element
declaration. By making this replacement, the legal values of a state element, i.e. the state
subelements of billTo and shipTo, would be limited to one of AK, AL, AR, etc. Note that the
enumeration values specified for a particular type must be unique.
List Types:
Creating a List of myInteger's
<xsd:simpleType name="listOfMyIntType">
<xsd:list itemType="myInteger"/>
</xsd:simpleType>
Several facets can be applied to list types: length, minLength, maxLength, and enumeration. For
example, to define a list of exactly six US states (SixUSStates), we first define a new list type
called USStateList from USState, and then we derive SixUSStates by restricting USStateList to
only six items:
<xsd:simpleType name="SixUSStates">
Surya 10
ActiveNET ™ Always first in Race
<xsd:restriction base="USStateList">
<xsd:length value="6"/>
</xsd:restriction>
</xsd:simpleType>
Elements whose type is SixUSStates must have six items, and each of the six items must be one
of the (atomic) values of the enumerated type USState, for example:
<sixStates>PA NY CA NY LA AK</sixStates>
Union Types:
Union Type for Zipcodes
<xsd:simpleType name="zipUnion">
<xsd:union memberTypes="USState listOfMyIntType"/>
</xsd:simpleType>
When we define a union type, the memberTypes attribute value is a list of all the types in the
union.
Now, assuming we have declared an element called zips of type zipUnion, valid instances of the
element are:
<zips>CA</zips>
<zips>95630 95977 95945</zips>
<zips>AK</zips>
Mixed Content:
Snippet of Customer Letter
<letterBody>
Surya 11
ActiveNET ™ Always first in Race
Empty Content:
Now suppose that we want the internationalPrice element to convey both the unit of currency and
the price as attribute values rather than as separate attribute and content values. For example:
Annotations
XML Schema provides three elements for annotating schemas for the benefit of both human
readers and applications. In the purchase order schema, we put a basic schema description and
copyright information inside the documentation element, which is the recommended location for
human readable material. We recommend you use the xml:lang attribute with any documentation
Surya 12
ActiveNET ™ Always first in Race
elements to indicate the language of the information. Alternatively, you may indicate the language
of all information in a schema by placing an xml:lang attribute on the schema element.
The appInfo element, which we did not use in the purchase order schema, can be used to provide
information for tools, stylesheets and other applications.
Both documentation and appInfo appear as subelements of annotation, which may itself appear
at the beginning of most schema constructions. To illustrate, the following example shows
annotation elements appearing at the beginning of an element declaration and a complex type
definition:
<xsd:group name="shipAndBill">
<xsd:sequence>
<xsd:element name="shipTo" type="USAddress"/>
<xsd:element name="billTo" type="USAddress"/>
</xsd:sequence>
</xsd:group>
An 'All' Group
<xsd:complexType name="PurchaseOrderType">
Surya 13
ActiveNET ™ Always first in Race
<xsd:all>
<xsd:element name="shipTo" type="USAddress"/>
<xsd:element name="billTo" type="USAddress"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="items" type="Items"/>
</xsd:all>
<xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>
Attribute Groups:
Adding Attributes to the Inline Type Definition
<xsd:element name="Item" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="productName" type="xsd:string"/>
<xsd:element name="quantity">
<xsd:simpleType>
<xsd:restriction base="xsd:positiveInteger">
<xsd:maxExclusive value="100"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="USPrice" type="xsd:decimal"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="partNum" type="SKU" use="required"/>
<!-- add weightKg and shipBy attributes -->
<xsd:attribute name="weightKg" type="xsd:decimal"/>
<xsd:attribute name="shipBy">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="air"/>
<xsd:enumeration value="land"/>
<xsd:enumeration value="any"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
</xsd:complexType>
</xsd:element>
Surya 14
ActiveNET ™ Always first in Race
<xsd:attributeGroup name="ItemDelivery">
<xsd:attribute name="partNum" type="SKU" use="required"/>
<xsd:attribute name="weightKg" type="xsd:decimal"/>
<xsd:attribute name="shipBy">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="air"/>
<xsd:enumeration value="land"/>
<xsd:enumeration value="any"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
</xsd:attributeGroup>
Surya 15
ActiveNET ™ Always first in Race
DOM interfaces API is supplied by W3C and implementation are provided by vendors like IBM,
Oracle, Sun, Microsoft and Apache. To understand DOM well read the following paragraphs.
Implementations supplied by
IBM - XML4J - com.ibm.xml.parser
Oracle - XMLParserV2
Sun - JAXP
Microsoft - MSXMLDOM
Apache - Xerces
Node
getChildNodes()
getNodeName()
getNodeValue() NodeList
hasChildNodes() getLength()
item()
Element Document
createElement()
createAttribute()
createTextNode()
createComment()
createProcessingInstruction()
Element deptEle=doc.createElement("DEPT");
Element deptNoEle=doc.createElement("DEPTNO");
Element dnameEle=doc.createElement("DNAME");
deptNoEle.appendChild(doc.createTextNode("1"));
dnameEle.appendChild(doc.createTextNode("Production"));
deptEle.appendChild(deptNoEle);
deptEle.appendChild(dnameEle);
Surya 16
ActiveNET ™ Always first in Race
rootEle.appendChild(deptEle);
doc.appendChild(rootEle);
doc.printWithFormat(new FileWriter("depts.xml"));
}// main()
}// class
The generated copy of XML document from the above code is to:
<!--depts.xml-->
<DEPTS>
<DEPT>
<DEPTNO>1</DEPTNO>
<DNAME>Production</DNAME>
</DEPT>
</DEPTS>
// The below application reads XML document content using DOM API
// DocumentReader.java
import java.io.*;
import com.ibm.xml.parser.*;
import org.w3c.dom.*;
public class DocumentReader
{
public static void main(String rags[]) throws Exception
{
Parser p=new Parser("err");
Document doc=p.readStream(new FileInputStream("depts.xml"));
Element rootEle=doc.getDocumentElement();
NodeList nl1=rootEle.getChildNodes();
int len11=nl1.getLength();
for(int i=0;i<len1;i++)
{
Node n1=nl1.item(i);
NodeList nl2=n1.getChildNodes();
int len2=nl2.getLength();
for(int j=0;j<len2;j++)
{
Node n2=nl2.ietm(j);
System.out.println(n2.getNodeName()+":"+((Child)n2).getText());
}// for2
System.out.println();
}// for1
}// main()
}// class
Surya 17
ActiveNET ™ Always first in Race
// SAXExample.java
import org.xml.sax.*;
import java.io.*;
import com.ibm.xml.parser.*;
public class SAXExample extends HandlerBase
{
public void startDocument()
{
System.out.println("Document starts here");
}
public void startElement(String name, AttributeList list)
{
System.out.println("\t"+name+" element starts here");
int len1=list.getLength();
System.out.println("\t\t Attributes");
for(int i=0;i<len1;i++)
{
System.out.println("\t\t"+list.getName(i)+":"+list.getValue(i));
}// for()
System.out.println("\t\t Attributes ends here");
}// startElement()
Surya 18
ActiveNET ™ Always first in Race
// DocumentHandler sub class must be registered with SAX parser (event destination)
sax.setDocumentHandler(ex);
<TABLE>
<TR>
<TD>1</TD>
<TD>ABC</TD>
</TR>
</TABLE>
What you observed in the above given code? What i observed is that we want to replace
<EMPS> element with <TABLE> element, <EMP> element with <TR> and <EMPNO> element
with <TD> element whereas the data remain constant in above said code. The process of
converting XML tags replaced with HTML or any other markup language format is called as
transformation. The tool used to transform formats is called as Transformation Tool (XSLT - XSL
Transformation) and the document which contains transformation rules is called as XSL
document.
A stylesheet contains a set of template rules. A template rule has two parts:
*a pattern which is matched against nodes in the source tree (XML document)
*and a template which can be instantiated to form part of the result tree (XSL document
instructions).
Surya 19
ActiveNET ™ Always first in Race
This allows a stylesheet to be applicable to a wide class of documents that have similar source
tree structures.
XSLT makes use of the expression language defined by [XPath] for selecting elements for
processing, for conditional processing and for generating text.
In every XSL document this namespace must be mentioned in the root element. The root element
of XSL document is "xsl:stylesheet". Where xsl prefix is called as namespace prefix and
stylesheet is the root element name.
XSLT processors must use the XML namespaces mechanism [XML Names] to recognize
elements and attributes from this namespace. Elements from the XSLT namespace are
recognized only in the stylesheet not in the source document.
xsl:import
xsl:include
xsl:strip-space
xsl:preserve-space
xsl:output
xsl:key
xsl:decimal-format
xsl:namespace-alias
xsl:attribute-set
xsl:variable
xsl:param
xsl:template
Here is one sample document on XSL document consisting of all the sub elements:
The xsl:stylesheet element may contain the following types of elements:
Surya 20
ActiveNET ™ Always first in Race
<xsl:param name="...">...</xsl:param>
<xsl:template match="...">
...
</xsl:template>
<xsl:template name="...">
...
</xsl:template>
</xsl:stylesheet>
XSL is mainly used to transform one form content into another form of content (XML-HTML, XML-
XML, XML-WML, XML-PDAML)
<xsl:template match="/">
<HTML>
<BODY>
<xsl:apply-templates/>
</BODY>
</HTML>
</xsl:template>
<xsl:template match="DEPTS">
<TABLE border="10" align="center">
<xsl:apply-templates/>
</TABLE>
</xsl:template>
<xsl:template match="DEPT">
<TR>
<xsl:apply-templates/>
</TR>
</xsl:template>
<xsl:template match="DEPT/*">
<TD><xsl:value-of select="."/></TD>
Surya 21
ActiveNET ™ Always first in Race
</xsl:template>
</xsl:stylesheet>
// XSLTExample.java
import javax.xml.transform.*;
import javax.xml.parsers.*;
import java.io.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import org.xml.sax.*;
import org.w3c.dom.*;
TransformerFactory tf=TransformerFactory.newInstance();
t.transform(new
DOMSource(DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new
InputSource(new FileInputStream(rags[0])))), new StreamResult(new
FileOutputStream(rags[2])));
}// main()
}// class
Surya 22