You are on page 1of 38

The Simple API for XML (SAX)

Part II

Ethan Cerami
New York University

©Copyright 2003-2004. These slides are


based on material from the upcoming book,
“XML and Bioinformatics” (Springer-Verlag)
by Ethan Cerami. Please email
cerami@cs.nyu.edu for permission to copy.
10/17/08 Simple API for XML (SAX), Part II 1
Road Map
 Validating Documents with SAX
 SAXValidator.java
 Introduction to XML Namespaces
 Declaring Namespaces
 Qualified Names
 Default Namespaces
 Working with SAX Elements, Attributes and
Namespaces
 SAXElementAttribute.java

10/17/08 Simple API for XML (SAX), Part II 2


Validating Documents with SAX

10/17/08 Simple API for XML (SAX), Part II 3


Error Categories
 The XML 1.0 specification defines three
types of errors:
 Fatal Errors: these are usually errors in
well-formedness. Parser must stop normal
processing if a fatal error is encountered.
 Errors: these are non-fatal errors, usually
related to document validity.
 Warnings: catch-all category for other
minor problems.

10/17/08 Simple API for XML (SAX), Part II 4


Defaults
 By default, the Xerces XML parser (and most
other parsers) will check for well-formedness,
but they will not automatically check for
validity.
 To check for validity, you must follow three
steps:
 Turn the SAX Validation Feature On
 Implement an ErrorHandler interface
 Register your error handler

10/17/08 Simple API for XML (SAX), Part II 5


Validating Documents
 Turn the SAX validation feature on.
try {
parser.setFeature("http://xml.org/sax/features/validation",
true);
} catch (SAXNotRecognizedException e) {
System.out.println ("SAX Feature Not Recognized: "
+e.getMessage());
} catch (SAXNotSupportedException e) {
System.out.println ("SAX Feature Not Supported: "
+e.getMessage());
}

10/17/08 Simple API for XML (SAX), Part II 6


Working with SAX Features
 The XMLReader interface defines
setFeature()/getFeature() methods.
 The setting of properties or features may
trigger a:
 SAXNotRecognizedException: the requested
feature is not recognized.
 SAXNotSupportedException: the feature is
recognized, but not supported. For example, not
all parsers support the validation feature.

10/17/08 Simple API for XML (SAX), Part II 7


Validating Documents
 Implement a SAX Error Handler Interface.
 The ErrorHandler interface defines three
error methods, corresponding to the three
levels of errors defined within the XML 1.0
specification:
 fatalError()
 error()
 warning()

10/17/08 Simple API for XML (SAX), Part II 8


Error Handler Implementation
 Each of the ErrorHandler methods receives a
SAXParseException parameter. The
exception encapsulates the error and its
location within the document.
 The ErrorHandler implementation has two
main options:
 throw the embedded SAXException, and thereby
stop normal parsing.
 record the exception (for example, write to a log
file) and not throw the embedded SAXException.
The parser will therefore continue normal parsing.

10/17/08 Simple API for XML (SAX), Part II 9


Error Handler Implementation
 For example, the following implementation stops
parsing when regular errors are encountered:
/**
* Receives notification of a recoverable error.
* Validation Errors are reported here.
* In this case, validation errors result in SAXExceptions.
*/
public void error(SAXParseException exception) throws SAXException {
logError(exception);
throw exception;
}

 Note: Regardless of your implementations, fatalErrors


always result in SAXExceptions being thrown.

10/17/08 Simple API for XML (SAX), Part II 10


Registering your Error Handler
 Once you have an implementation of Error
Handler, you must register it with your parser:
 parser.setErrorHandler(errorHandler);
 Note: Default Handler also provides a no-op
implementation of the Error Handler interface.
 Complete example follows on the next few
slides.

10/17/08 Simple API for XML (SAX), Part II 11


Example: SAXValidator.java
package com.oreilly.bioxml.sax;

import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.XMLReader;
import org.xml.sax.SAXNotRecognizedException;
import org.xml.sax.SAXNotSupportedException;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;

import java.io.IOException;

/**
* SAX Validator.
* Illustrates Basic Error Handling.
*/
10/17/08 Simple API for XML (SAX), Part II 12
public class SAXValidator extends DefaultHandler {
private boolean isValid = true;

/**
* Receives notification of a recoverable error.
* Validation Errors are reported here.
*/
public void error(SAXParseException exception)
throws SAXException {
isValid = false;
reportError("Error", exception);
}
Log Errors and
/** Warnings
* Receives notification of a warning.
*/
public void warning(SAXParseException exception)
throws SAXException {
reportError("Warning", exception);
}

10/17/08 Simple API for XML (SAX), Part II 13


/**
* Reports SAXParseException Information
*/
private void reportError(String errorType, SAXParseException exception) {
System.out.println(errorType+": "+exception.getMessage());
System.out.println(" Line: " + exception.getLineNumber());
System.out.println(" Column: " + exception.getColumnNumber());
}

public boolean isValid () {


return isValid;
}

/**
* Prints Command Line Usage
*/
private static void printUsage() {
System.out.println("usage: SAXValidator xml-file");
System.exit(0);
}

10/17/08 Simple API for XML (SAX), Part II 14


/**
* Main Method
*/
public static void main(String[] args) {
if (args.length != 1) {
printUsage();
}
try {
SAXValidator errorHandler = new SAXValidator();
XMLReader parser = XMLReaderFactory.createXMLReader
("org.apache.xerces.parsers.SAXParser");

// Turn Validation On and Set Error Handler


turnValidationOn(parser);
parser.setErrorHandler(errorHandler);
parser.parse(args[0]);

// If SAXException has not been thrown,


// document must be well-formed

Set Error Handler

10/17/08 Simple API for XML (SAX), Part II 15


System.out.println ("The Document is well-formed.");
if (errorHandler.isValid()) {
System.out.println ("The Document is valid.");
}
} catch (SAXException e) {
System.out.println(e.getMessage());
} catch (IOException e) {
e.printStackTrace();
} Turn
} Validation On
private static void turnValidationOn(XMLReader parser) {
try {
parser.setFeature
("http://xml.org/sax/features/validation", true);
} catch (SAXNotRecognizedException e) {
System.out.println ("SAX Feature Not Recognized: "+e.getMessage());
} catch (SAXNotSupportedException e) {
System.out.println ("SAX Feature Not Supported: "+e.getMessage());
}
}
}

10/17/08 Simple API for XML (SAX), Part II 16


Example Invalid Document
<?xml version='1.0' standalone='no' ?>
<!DOCTYPE DASDNA SYSTEM
'http://servlet.sanger.ac.uk:8080/das/dasdna.dtd' >
<DASDNA>
<SEQUENCE version="8.30" start="1000" stop="1050">
<DNA length="51">
taatttctcccattttgtaggttatcacttcactctgttgactttcttttg
</DNA>
</SEQUENCE>
<SEQUENCE id="2" version="8.30" start="1000" stop="1050">
<DNA length="51">
taatgcaactaaatccaggcgaagcatttcagcttaaccccgagacttttg
</DNA>
</SEQUENCE> This document is invalid
</DASDNA>
because I deleted the id
attribute for the first
10/17/08 sequence element.
Simple API for XML (SAX), Part II 17
Example Output
Error: Attribute "id" is
required and must be
specified for element type
"SEQUENCE".
Line: 5
Column: 53
The Document is well-formed.

10/17/08 Simple API for XML (SAX), Part II 18


Introduction to XML Namespaces

10/17/08 Simple API for XML (SAX), Part II 19


Introduction to XML Namespaces
 The biggest difference between SAX 1.0 and
SAX 2.0: support for XML Namespaces.
 We therefore need to digress for a while to
introduce the basics of Namespaces.
 Attribution: These namespace slides come
from the XML Namespaces Tutorial at:
http://www.w3schools.com/xml/xml_namespaces

10/17/08 Simple API for XML (SAX), Part II 20


Name Conflicts
 Name conflicts can occurs. For
example, consider these two
documents:
 This XML document carries information
in an XHTML table:
<table>
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>
10/17/08 Simple API for XML (SAX), Part II 21
Name Conflicts
 This XML document carries information
about a table (a piece of furniture):

<table>
<name>Coffee Table</name>
<width>80</width>
<length>120</length>
</table>

10/17/08 Simple API for XML (SAX), Part II 22


Name Conflicts
 If these two XML documents were added
together, there would be an element name
conflict:
 both documents contain a <table> element with
different content and definition.
 To solve the problem, we use XML
Namespaces.
 Namespaces enables us to distinguish
elements even if they have the same name.
 Namespaces Specification was created about
a year after the regular XML 1.0 spec.

10/17/08 Simple API for XML (SAX), Part II 23


Adding Namespaces
 To add a namespace, you must specify a
namespace attribute:
 xmlns:namespace-prefix=“namespace”
 For example:
 This declares that a namespace for XHTML:
 <h:table xmlns:h="http://www.w3.org/TR/html4/">
 This declares a namespace for furniture:
 <f:table
xmlns:f="http://www.w3schools.com/furniture">
 The namespace value is usually a URL, but it
doesn’t have to be.

10/17/08 Simple API for XML (SAX), Part II 24


Qualified Names
 Once you have declared a namespace,
you specify elements and attributes with
Qualified Names:
 element-prefix:local-name
 The next two slides show complete
examples.

10/17/08 Simple API for XML (SAX), Part II 25


Example #1
<h:table xmlns:h="http://www.w3.org/TR/html4/">
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>

All qualified names that begin with “h” are within the
XHTML namespace.

10/17/08 Simple API for XML (SAX), Part II 26


Example #2
<f:table xmlns:f="http://www.w3schools.com/furniture">
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
 All qualified names that begin with “f” are within the
W3Schools Furniture namespace.
 You can now combine both examples into one
document, and you no longer have namespace
conflicts.

10/17/08 Simple API for XML (SAX), Part II 27


Default Namespaces
 You can also specify a default namespace
like this:
<table xmlns="http://www.w3.org/TR/html4/">
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>
 All elements within the <table> element are
considered part of the XHTML namespace.

10/17/08 Simple API for XML (SAX), Part II 28


Working with SAX Element,
Attributes and Namespaces

10/17/08 Simple API for XML (SAX), Part II 29


ContentHandler
 Now that we understand Namespaces,
we return to the SAX Content Handler.
 The example on the next few slides
illustrates how to handle elements,
attributes and namespaces.

10/17/08 Simple API for XML (SAX), Part II 30


package com.oreilly.bioxml.sax;

import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;
import org.xml.sax.SAXException;
import org.xml.sax.Attributes;
import org.xml.sax.XMLReader;
import org.xml.sax.Locator;

import java.io.IOException;

/**
* SAXElementAttribute.
* Illustrates Elements, Attributes and Namespace Functionality.
* Also illustrates use of Document Locator object.
*/
public class SAXElementAttribute extends DefaultHandler {
private Locator _locator;

10/17/08 Simple API for XML (SAX), Part II 31


/**
* Prints out all three name/namespace parameters.
* Also prints out all attribute information.
*/
public void startElement(String namespaceURI, String localName,
String qName, Attributes atts) throws SAXException {
System.out.println ("Start Element: ");
System.out.println ("... Line: " + _locator.getLineNumber());
System.out.println ("... Column: " + _locator.getColumnNumber());
System.out.println ("... Namespace URI: "+ namespaceURI);
System.out.println ("... Local Name: "+localName);
System.out.println ("... qName: "+qName);
for (int i=0; i< atts.getLength(); i++) {
System.out.println ("> Attribute: ");
System.out.println ("... URL: "+atts.getURI(i));
System.out.println ("... Local Name: "+atts.getLocalName(i));
System.out.println ("... QName: "+atts.getQName(i));
System.out.println ("... Type: "+atts.getType(i));
System.out.println ("... Value: "+atts.getValue(i));
}
}

10/17/08 Simple API for XML (SAX), Part II 32


/**
* Start Prefix Mapping for XML Namespaces
*/
public void startPrefixMapping(String prefix, String uri)
throws SAXException {
System.out.println ("Start Prefix Mapping: ");
System.out.println ("... Prefix: "+prefix);
System.out.println ("... URI: "+uri);
}

/**
* End Prefix Mapping for XML Namespaces
*/
public void endPrefixMapping(String prefix) throws SAXException {
System.out.println ("End Prefix Mapping: "+prefix);
}

/**
* Stores Document Locator
*/
public void setDocumentLocator (Locator locator) {
this._locator = locator;
}
10/17/08 Simple API for XML (SAX), Part II 33
/**
* Prints Command Line Usage
*/
private static void printUsage() {
System.out.println ("usage: SAXElementAttribute xml-file");
System.exit(0);
}

/**
* Main Method
*/
public static void main(String[] args) {
if (args.length != 1) {
printUsage();
}
try {
SAXElementAttribute saxHandler = new SAXElementAttribute();
XMLReader parser = XMLReaderFactory.createXMLReader
("org.apache.xerces.parsers.SAXParser");

10/17/08 Simple API for XML (SAX), Part II 34


parser.setContentHandler(saxHandler);
parser.parse(args[0]);
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}

10/17/08 Simple API for XML (SAX), Part II 35


Sample Document
<?xml version="1.0"?>
<xhtml:html xmlns:xhtml='http://www.w3.org/TR/REC-html40'>
<xhtml:head>
<xhtml:title>XML and Bioinformatics</xhtml:title>
</xhtml:head>
<xhtml:body>
<xhtml:table xhtml:width="100%">
<xhtml:tr><xhtml:td>Welcome!</xhtml:td></xhtml:tr>
</xhtml:table>
</xhtml:body>
</xhtml:html>

Note: All elements and attributes are within the


XHTML namespace.

10/17/08 Simple API for XML (SAX), Part II 36


Sample Output
Start Prefix Mapping:
... Prefix: xhtml
... URI: http://www.w3.org/TR/REC-html40
Start Element:
... Line: 2
... Column: 59
... Namespace URI: http://www.w3.org/TR/REC-html40
... Local Name: html
... qName: xhtml:html
Start Element:
... Line: 3
... Column: 16
... Namespace URI: http://www.w3.org/TR/REC-html40
... Local Name: head
... qName: xhtml:head
… (output continues…)

10/17/08 Simple API for XML (SAX), Part II 37


Summary
 To validate an XML document with SAX, you
must explicitly turn validation on, and implement
a SAX Error Handler.
 XML Namespaces enable you to different
elements, even if they have the same names.
 The startElement() SAX method passes all
namespace information.
 Review SAXElementAttribute.java for details.

10/17/08 Simple API for XML (SAX), Part II 38

You might also like