You are on page 1of 24

Welcome to:

Introduction to Databases and XML

Copyright IBM Corporation 2004


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3.1

Unit Objectives
After completing this unit, you should be able to:
List the characteristics of an XML document that help determine the
right type of database
Define and describe content management databases
Compare relational database structures to XML document
structures
List the limitations of relational data tables with structured data
Define and describe what Object-Oriented databases provide
Describe the status of XML-based queries

Copyright IBM Corporation 2004

Considerations
Start with what you are using the database for.
What type of application are you supporting? Is XML being used as
a transport between the database and the application?
Are you using legacy data?
Are you more interested in the data or in the document structure?
Are you storing Web pages or Web pages' content?
Is your data used by other, perhaps non-XML, applications?
Are you updating the DB from XML?

Consider whether your XML document is more


data-centric or document-centric.
Copyright IBM Corporation 2004

Data-centric versus Document-centric


Data-centric

Document-centric

Highly regular structure

Less regular structure

Fine grained data

Larger grained data

Order of elements not significant

Extensive prose

Little or no mixed content

Mixed content

XML being used as data


transport

Order significant
(especially for siblings)

Often designed for machine


consumption

Used for human consumption

Legacy data

Not all documents are data or document.


May be a combination of both.
Copyright IBM Corporation 2004

Types of Databases
Remember: An XML document is a hierarchical,
ordered, and untyped document.
Relational Database (RDB) structures are not hierarchical. Much
of the world's current data exists in RDBs.
Object-Oriented Databases (OODB) are slow to catch on, but show
promise of storing XML data objects. Existing OODBs may have
complex relationships.
Native XML database or content-management systems are
designed specifically to store XML. Oriented towards the
document-oriented XML systems.
Existing database systems must use some type of attachment or
filter to deal with XML data. Many RDB vendors are building this
capability into their products.

Copyright IBM Corporation 2004

Typical Java-based
Database Interchange Solution
Java Application
Connect to Database using JDBC
Retrieve an SQL result set (text string) containing
the XML document

Use the XML parser to create XML DOM from the


SQL result set
Use the XSLT processor to extract the required
element from the XML source tree

XML
Apps

XML editing application to update the


element contents of the result DOM

Insert updated element object into source DOM


Use XSLT to write DOM to SQL
Parse XML docs with SAX

Use JDBC to update the database


record

Copyright IBM Corporation 2004

Database
back end

Challenges Mapping to RDB or OODB


Element and Attribute Mapping
Mapping complex relationships from RDB and OODB into the
hierarchical XML structure.
Extracting tabular material from multiple tables into the XML
document.
Validation
Character encoding
Conversion from text to data types
Null data
Binary data
Storage of processing instructions and comments
Storing markup

Copyright IBM Corporation 2004

XML to RDB - Where Should the Data Go?


Department

<department>
<dept-nbr>X333</dept-nbr>
<department-name>
XML developers
</department-name>
<employee>
<last>Smith</last>
<first>John</first>
<ID>250243</ID>
<dept-nbr>X333</dept-nbr>
</employee>
<employee>
<last>Adams</last>
<first>Tom</first>
<ID>432453</ID>
<dept-nbr>X333</dept-nbr>
</employee>
</department>

dept-nbr
X333
Z568
...

department-name
XML developers
Human Resources

Employee
ID
last
250243 Smith

first
John

432453 Adams Tom

dept-nbr
X333

...
...

X333

...

OR
XML_Table

Key
XML_Doc
X333 <department>
<dept-nbr>X333</dept-nbr>
<department-name>XML
developers</department-name>
<employee> <last>Smith</last> ...
Z568 <department>
<dept-nbr>Z568 ....

Copyright IBM Corporation 2004

XML to RDB - And If We Have Attributes?


Department

<department deptnbr="X333">
<department-name>
XML developers</department-name>

<employee ID="250243" deptnbr="X333">


<last>Smith</last>
<first>John</first>
<phone>533-4333</phone>
<e-mail>jsmith@XML5.com</e-mail>
</employee>

dept-nbr
X333
Z568
...

department-name
XML developers
Human Resources

Employee
ID
last
250243 Smith

first
John

432453 Adams Tom


...

dept-nbr
X333

...
...

X333

OR
<employeeID="432453" deptnbr="X333">
<last>Adams</last>
<first>Tom</first>
<phone>544-4444</phone>
<e-mail>tadams@XML5.com</e-mail>
</employee>
</department>

XML_Table
Key

XML_Doc
<department deptnbr="X333">
<department-name>
X333 XML developers</department-name>
<employee ID="250243"
deptnbr="X333">...
<department deptnbr="Z568">
Z568
<department-name>...

Copyright IBM Corporation 2004

Storing XML in an RDB without Mapping


XML documents are typically stored as large text strings. Data
types (actual names can be RDBM-specific) include:
VARCHAR (Textual character data)
CLOB (Single-Byte Character Large Object)
DBCLOB (Double-Byte Character Large Object)
No distinction between an XML document and traditional SQL data.
No facility for accessing XML elements and attributes.
No validation of XML documents on insert or update.

Copyright IBM Corporation 2004

Object-Oriented Databases
Object-Oriented Database (OODB) features:
Persistence of objects.
Extend semantics of O-O programming languages.
Unification of data model and database structure.
Requires less code.
Ease of code base maintenance.
Relational Database (RDB) comparison:
Data structures must be flattened to fit joined tables.
Structures maintained in memory.
No built-in object management.
OODB real-world applications:
Risk analysis systems, telecom systems, WWW document
structures, design and manufacturing systems, hospital patient
record systems with complex data interrelationships.

Copyright IBM Corporation 2004

XML Native
DB/Content-management Systems (1 of 2)
Can be good for document-oriented XML, much less useful for
data-oriented XML.
Good for less structured data which could result in many null
columns in an RDB.
Preserves physical structure of document.
No need for schema or DTD.
Limited to XML interfaces.
Do not use for data serving a variety of applications.
And provide very fast retrieval speed for entire documents.
Search for specific views of data likely to be slower then RDB.
Can only return data as XML.

Copyright IBM Corporation 2004

XML Native
DB/Content-management Systems (2 of 2)
Two main categories:
Text-based storage

Store the entire document.


Provide limited DB function against the document.
Provide an exact roundtrip of the document.

Model-based storage

Store a DOM presentation of the XML document into an existing or


custom data store. May use an RDB underneath.
Roundtrip at the level of the underlying model (can maintain order).

Copyright IBM Corporation 2004

What about Character Encoding?


Different languages
SBCS
DBCS
Convert?
Loss of
information?

UTF-8

Database

Unicode
CCSID

ASC II
EBCDIC

Copyright IBM Corporation 2004

RDB to XML Middleware: DXX


There is help out there
A variety of middleware products exist to help you map between
your RDB and XML.
Middleware Example (DB2 with XML Extender):
XML Extender helps integrate the abilities of DB2 with the
flexibility of XML.
XML data can be combined with traditional relational data.
XML Extender provides the ability to search XML documents
based on XML element or attribute values, in addition to
structural text searching.
Add-on to DB2 (no charge).

Copyright IBM Corporation 2004

Other Uses of XML with Databases


Describe databases:
Tables
Columns
Foreign Keys
And so forth
Database exchange format.
Format for loading a database.
Example, different vendors
Represent result sets from database queries and updates for
programs and for humans.
Storing XML documents in databases as objects.

Copyright IBM Corporation 2004

Meta Data Aspect


XML DTDs/Schema are the metadata for an XML document.
The DB schema is the meta data for the database.
Given the meta data for one system, you can create the meta data
for the other system.
Decide on mapping standards
Column = child element
Type mapping in DB to restrictions in Schema
Non-nullable columns to required elements

Probably not optimal schema for the other system, but good
starting point or good enough for your use.
Likely to ease mapping between the DB and XML documents.
Design time, not run time, activity.

Copyright IBM Corporation 2004

Database Schemas Example

CREATE TABLE employee


(emp_nbr Char(10)
NOT NULL PRIMARY KEY,
dept_nbr Char(6),
type Varchar(40),
last Varchar(40),
first Varchar(40));

Database Create Table

<element name="EMPLOYEE" type="ilscs01:EMPLOYEE">


<key name="EMPLOYEEPRIMKEY">
<selector xpath="ilscs01:EMPLOYEE"/>
<field xpath="EMP_NBR"/>
</key>
</element>
<complexType name="EMPLOYEE">
<sequence>
<element name="EMP_NBR">
<simpleType>
<restriction base="string">
<length value="10"/>
</restriction>
</simpleType>
</element>
<element name="DEPT_NBR">
<simpleType>
<restriction base="string">
<length value="6"/>
</restriction>
</simpleType>
</element>
...
</sequence>
</complexType>
Database Schema Example
...

Copyright IBM Corporation 2004

XML Query Language (XQuery) (1 of 2)


Goals
"The goal of the XML Query WG is to produce a data model for
XML documents, a set of query operators on that data model, and
a query language based on these query operators."
XQuery consists of six Working Drafts:
XQuery Requirements
XQuery Use Cases
XQuery 1.0 and XPath 2.0 Data Model
XQuery 1.0 Formal Semantics
XQuery 1.0: An XML Query Language
XML Syntax for XQuery 1.0 (XQueryX)

Copyright IBM Corporation 2004

XML Query Language (XQuery) (2 of 2)


XQuery Goals/Usage Scenarios
Human-readable documents
Data-oriented documents
Mixed model documents
Administrative data
Stream filtering
DOM queries
Native XML repositories
Catalog search
Multiple syntactic environments

Copyright IBM Corporation 2004

More Information
Reference

Description

http://www-106.ibm.com/
developerworks/xml/library/
x-matters8/index.html
http://www.rpbourret.com/xml/
XMLAndDatabases.htm#intro

Putting XML in context with


hierarchical, relational, and
object-oriented models by
David Mertz
XML and Databases by Ronald
Bourret

http://www.rpbourret.com/xml/
XML Database products by
XMLDatabaseProds.htm#xmlservers Ronald Bourret
http://www-106.ibm.com/
developerworks/library/x-struct/

XML Structures for Existing


Databases by Kevin Williams
and others

http://www.xml.com/pub/a/2001/05/
09/dtdtodbs.html

Mapping DTDs to Databases


by Ronald Bourret

Copyright IBM Corporation 2004

Checkpoint Questions (1 of 2)
1. How can an XML document be stored in an RDB? (select all that
apply):
A. In a Table column (CLOB)
B. SGML
C. Decomposed into different columns/tables
D. Into a DTD file
E. Compressed into an integer column
2. While RDBs are row-based XML documents are:
A. Record based
B. Hierarchical
C. Obsolete
D. Rectangular

Copyright IBM Corporation 2004

Checkpoint Questions (2 of 2)
3. I should use an RDB to store my XML if:
(select all that apply)
A. I have lots of proprietary file formats
B. I need to retrieve large number of documents based on a specific
element
C. I need to exchange data with a business partner
D. I need to represent my data in Esperanto

Copyright IBM Corporation 2004

Unit Summary
In this unit we learned:
How to compare relational database structures to XML document
structures.
The limitations of relational data tables with structured data.
What Object-Oriented databases provide.
The status of XML-based queries.

Copyright IBM Corporation 2004

You might also like