You are on page 1of 36

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.


XML: Nn tng, k thut v
ng dng
XML SAX
(Simple API for XML)
2
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Sample Document
<transaction>
<account>89-344</account>
<buy shares=100>
<ticker exch=NASDAQ>WEBM</ticker>
</buy>
<sell shares=30>
<ticker exch=NYSE>GE</ticker>
</sell>
</transaction>
3
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
DOM Parser
DOM = Document Object Model
Parser creates a tree object out of the document
User accesses data by traversing the tree
The API allows for constructing, accessing and
manipulating the structure and content of XML documents
4
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Document as Tree
transaction
account
89-344
buy
ticker
shares
100
WEBM
exch
sell
ticker
shares
30
NYSE
GE
exch
NASDAQ
Methods like:
getRoot
getChildren
getAttributes
etc.
5
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Advantages and Disadvantages
Advantages:
Natural and relatively easy to use
Can repeatedly traverse tree
Disadvantages:
High memory requirements the whole document is kept in
memory
Must parse the whole document before use
6
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
SAX Parser
SAX = Simple API for XML
Parser creates events while traversing tree
Parser calls methods (that you write) to deal with the
events
Similar to an IOStream, goes in one direction
7
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Document as Events
<transaction>
<account>89-344</account>
<buy shares=100>
<ticker exch=NASDAQ>WEBM</ticker>
</buy>
<sell shares=30>
<ticker exch=NYSE>GE</ticker>
</sell>
</transaction>
Start tag: transaction Start tag: account Text: 89-344 End tag: account
Start tag: buy Attribute: shares Value: 100
8
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Advantages and Disadvantages
Advantages:
Requires little memory
Fast
Disadvantages:
Cannot reread
Less natural for object oriented programmers (perhaps)
9
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Which should we use? DOM vs. SAX
If your document is very large and you only need a few
elements - use SAX
If you need to manipulate (i.e., change) the XML - use
DOM
If you need to access the XML many times - use DOM
10
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
XML Parsers
There are several different ways to categorise parsers:
Validating versus non-validating parsers
DOM parsers versus SAX parsers
Parsers written in a particular language (Java, C++, Perl, etc.)
11
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Validating Parsers
A validating parser makes sure that the document
conforms to the specified DTD
This is time consuming, so a non-validating parser is
faster
12
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Using an XML Parser
Three basic steps
Create a parser object
Pass the XML document to the parser
Process the results
Generally, writing out XML is not in the scope of parsers
(though some may implement proprietary mechanisms)
13
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
The SAX Parser
SAX parser is an event-driven API
An XML document is sent to the SAX parser
The XML file is read sequentially
The parser notifies the class when events happen, including errors
The events are handled by the implemented API methods to
handle events that the programmer implemented
14
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Used to create a
SAX Parser
Handles document
events: start tag, end
tag, etc.
Handles
Parser
Errors
Handles
DTDs and
Entities
15
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Problem
The SAX interface is an accepted standard
There are many implementations
Like to be able to change the implementation used without
changing any code in the program
How is this done?
16
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Factory Design Pattern
Have a Factory class that creates the actual Parsers.
The Factory checks the value of a system property that
states which implementation should be used
In order to change the implementation, simply change the
system property
17
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Creating a SAX Parser
Import the following packages:
org.xml.sax.*;
org.xml.sax.helpers.*;
Set the following system property:
System.setProperty("org.xml.sax.driver",
"org.apache.xerces.parsers.SAXParser");
Create the instance from the Factory:
XMLReader reader = XMLReaderFactory.createXMLReader();
18
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Receiving Parsing Information
A SAX Parser calls methods such as startDocument,
startElement, etc., as it runs
In order to react to such events we must:
implement the ContentHandler interface
set the parsers content handler with an instance of our class
19
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
ContentHandler
// Methods (partial list)
public void startDocument();
public void endDocument();
public void characters(char[] ch, int start, int length);
public void startElement(String namespaceURI,
String localName, String qName,
Attributes atts);
public void endElement(String namespaceURI,
String localName, String qName);
20
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Namespaces and Element Names
<?xml version='1.0' encoding='utf-8'?>
<forsale date="12/2/03"
xmlns:xhtml = "urn:http://www.w3.org/1999/xhtml">
<book>
<title> <xhtml:em> DBI: </xhtml:em>
The Course I Wish I never Took
</title>
<comment> My <xhtml:b> favorite </xhtml:b> book!
</comment>
</book>
</forsale>
21
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Namespaces and Element Names
<?xml version='1.0' encoding='utf-8'?>
<forsale date="12/2/03"
xmlns:xhtml = "urn:http://www.w3.org/1999/xhtml">
<book>
<title> <xhtml:em> DBI: </xhtml:em>
The Course I Wish I never Took
</title>
<comment> My <xhtml:b> favorite </xhtml:b> book!
</comment>
</book>
</forsale>
namespaceURI =
urn:http://www.w3.org/1999/xhtml
localName = em
qName = xhtml:em
namespaceURI = ""
localName = book
qName = book
22
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Receiving Parsing Information (cont.)
An easy way to implement the ContentHandler interface is
the extend the DefaultHandler, which implements this
interface (and a few others) in an empty fashion
To actually parse a document, create an InputSource from
the document and supply the input source to the parse
method of the XMLReader
23
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
import java.io.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
public class InfoWithSax extends DefaultHandler {
public static void main(String[] args) {
System.setProperty("org.xml.sax.driver",
"org.apache.xerces.parsers.SAXParser");
try {
XMLReader reader =
XMLReaderFactory.createXMLReader();
reader.setContentHandler(new InfoWithSax());
reader.parse(new InputSource(new FileReader(args[0])));
} catch(Exception e) { e.printStackTrace()}
}
24
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
public static startDocument() throws SAXException {
System.out.println(START DOCUMENT);
}
public static endDocument() throws SAXException {
System.out.println(END DOCUMENT);
}
int depth;
String indent = ;
private void println(String header, String value) {
for (int i = 0 ; i < depth ; i++) System.out.print(indent);
System.out.println(header + ": " + value);
}
25
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
public void characters(char buf[], int offset, int len)
throws SAXException {
String s = (new String(buf, offset, len)).trim();
if (!"".equals(s)) println("CHARACTERS", s);
}
public void endElement(String namespaceURI,
String localName, String name)
throws SAXException {
depth--;
String elementName = name;
if (!"".equals(namespaceURI) && !"".equals(localName))
elementName = namespaceURI + ":" + localName;
println("END ELEMENT", elementName);
}
26
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
public static startElement(String namespaceURI,
String localName, String name, Attributes attrs)
throws SAXException {
String elementName = name;
if (!"".equals(namespaceURI) && !"".equals(localName))
elementName = namespaceURI + ":" + localName;
println("START ELEMENT", elementName);
if (attrs != null && attrs.getLength() > 0) {
for (int i = 0; i < attrs.getLength(); i++)
println("ATTRIBUTE", attrs.getLocalName(i) + = +
attrs.getValue(i));
}
depth++;
}
27
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Bachelor Tags
What do you think happens when the parser parses a
bachelor tag?
<rating stars="five" />
28
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Attributes Interface
Elements may have attributes
There is no distinction between attributes that are defined
explicitly from those that are specified in the DTD (with a
default value)
29
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Attributes Interface (cont.)
int getLength();
String getQName(int i);
String getType(int i);
String getValue(int i);
String getType(String qname);
String getValue(String qname);
etc.
30
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Attributes Types
The following are possible types for attributes:
"CDATA",
"ID",
"IDREF", "IDREFS",
"NMTOKEN", "NMTOKENS",
"ENTITY", "ENTITIES",
"NOTATION"
31
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Setting Features
It is possible to set the features of a parser using the
setFeature method.
Examples:
reader.setFeature(http://xml.org/sax/features/namespaces, true)
reader.setFeature(http://xml.org/sax/features/validation", false)
For a full list, see:
http://www.saxproject.org/?selected=get-set
32
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
ErrorHandler Interface
We implement ErrorHandler to receive error events (similar
to implementing ContentHandler)
DefaultHandler implements ErrorHandler in an empty
fashion, so we can extend it (as before)
An ErrorHandler is registered with
reader.setErrorHandler(handler);
Three methods:
void error(SAXParseException ex);
void fatalError(SAXParserExcpetion ex);
void warning(SAXParserException ex);
33
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
public void warning(SAXParseException err)
throws SAXException {
System.out.println(Warning in line + err.getLineNumber() +
and column + err.getColumnNumber());
}
public void error(SAXParseException err)
throws SAXException {
System.out.println(Oy vaavoi, an error!);
}
public void fatalError(SAXParseException err)
throws SAXException {
System.out.println(OY VAAVOI, a fatal error!);
}
Extending the InfoWithSax Program
Will these
methods be
called in the
case of a
problem?
34
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Lexical Events
Lexical events have to do with the way that a document was written
and not with its content
Examples:
A comment is a lexical event (<!-- comment -->)
The use of an entity is a lexical event (&gt;)
These can be dealt with by implementing the LexicalHandler interface,
and set on a parser by
reader.setProperty("http://xml.org/sax/properties/ lexical-
handler", mylexicalhandler);
35
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
LexicalHandler
// Methods (partial list)
public void startEntity(String name);
public void endEntity(String name);
public void comment(char[] ch, int start,
int length);
public void startCDATA();
public void endCDATA();
36
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
<?xml version=1.0> <material> XML Lectures Notes <section id=08> Simple API for XML </section> </material>
Info
Course name:
Special Selected Topic in
Information System
Section: Simple API for XML
Number of slides: 36
Updated date: 12/02/2006
Contact: Mr.Phan Vo Minh Thang
(minhthangpv@hcmuaf.edu.vn)

You might also like