Professional Documents
Culture Documents
Overview
HTML Model
XML Model
RDBMS vs. XML
XML Schema
XML Tools
…
2
What is a Data Model?
Structure of Data
Mathematical representation of data.
Examples: relational model = tables; semi-
structured model = trees/graphs.
Operations on data.
Constraints.
History of HTML
HTML: Hyper-Text Markup Language
Invented by Tim Berners-Lee and Robert Caillau at CERN
in 1991
What is hyper-text?
A document that contains links to other documents (and
text, sound, images...)
Invented around 1945 by Vannevar Bush
What is a markup language?
A notation for writing text with markup tags (<tag>)
Tags indicate the structure of the text
Tags have names and attributes
Tags may enclose a part of the text
Invented around 1970 by Charles F. Goldfarb (SGML)
4
HTML
HTML was designed to display data and to focus
on how data looks.
5
XML
XML is a framework for defining markup languages:
XML was designed to describe data, not format.
There is no fixed collection of markup tags. One must
define your own tags, tailored for our kind of information
Allow tailor-made markup for any imaginable application
domain
XML uses a schema language (eg, DTD, XML-Schema) to
formally describe the data.
XML separates syntax from semantics to provide a
common framework for structuring information
Web browser rendering semantics is separately defined by
stylesheets
6
XML
XML is not a replacement for HTML:
HTML should ideally be just another XML
language
in fact, XHTML is just that
XHTML is a (very popular) XML language for
hypertext markup
7
HTML vs. XML
HTML XML
8
HTML vs. XML
Need a stylesheet to define browser
presentation semantics
Browser Browser
9
HTML vs. XML
Need a stylesheet to define browser
presentation semantics
Browser Browser
10
Database Perspective
DB must support
Capture
Storage
Retrieval
Exchange
11
RDBMS vs. XML Example
AFV receives 100+ videos every week
Build a DB to be able to answer the following queries:
Who sent which videos?
Which is the video with the best rating for the 1st week ofJan?
12
ER Model
Category
Rating
Sends
Phone Name
Owners
Gender Address
13
ER => RDBMS
Category Videos
Rating
Sends
Sends
14
RDBMS
VID Category Date Rating Video
100 Comedy 2005/1/1 5
200 Action 2005/1/10 4
300 SF 2004/12/31 5
15
Changes #1
VHS video => VHS, CD, DVD
100+ videos => 1 million videos
16
RDBMS
VID Category Format Date Rating Video
100 Comedy VHS 2005/1/1 5
…
1000000 SF DVD 2004/12/31 5
…
Tom 123-4567 1000000
…
17
Changes #2
Arbitrary name formats for owners
Eg, J. Doe vs. Dr. “Jonny” John Jay Doe Jr
100+ different ways to capture owners’
information
“1781 Louisiana St #200, Lawrence, KS, 66046”
adr1=“1781 Louisiana St #200”, adr2=“Lawrence,
KS, 66046”
street=“1781 Louisiana St #200”, city=“Lawrence”,
state=“KS”, zip=“66046”
100+ different video formats with varying
properties => 1000+ attributes for Videos
18
RDBMS: Finest Granularity
Video VID Category Date Rating Att1 Att2 … att1000
100 Comedy 2005/1/1 5 10 … T1
200 Action 2005/1/10 4 20 …
300 SF 2004/12/31 5 … S20
19
RDBMS: Coarsest Granularity
Video VID Category Date Rating Att1to1000
100 Comedy 2005/1/1 5 10, T1
200 Action 2005/1/10 4 20
300 SF 2004/12/31 5 S20
20
RDBMS: Ideal Case
Video VID Category Date Rating Att1 Att2 … att1000
100 Comedy 2005/1/1 5 10 … T1
200 Action 2005/1/10 4 20 …
300 SF 2004/12/31 5 … S20
21
XML
Video VID Category Date Rating Att1 Att2 … att1000
100 Comedy 2005/1/1 5 10 … T1
200 Action 2005/1/10 4 20 …
300 SF 2004/12/31 5 … S20
<VideoTable>
<Video vid=“100” category=“comedy” date=“2005/1/1”
rating=“5” att2=“10” att1000=“T1” />
<Video vid=“200” category=“action” date=“2005/1/10”
rating=“4” att1=“20” />
<Video vid=“300” category=“SF” date=“2004/12/31”
rating=“5” att1000=“S20” />
</VideoTable>
22
Address Address
Adr1 Adr2
XML Street City State Zip
Owners State
<OwnerTable>
<Owner phone=“564-3456” gender=“F”>
<Name FN=“Jenny” />
<Address>
<Street>1050 Harvard</Street><City> Denver</City>
<State>CO</State><Zip>66049</Zip>
</Address>
</Owner>
<Owner phone=“123-4567” gender=“M”>
<Name Prefix=“Dr.” NN=“Jonny” FN=“John” MN=“Jay” LN=“Doe” Suffix=“Jr.” />
<Address> <1322 W 15th</Adr1> <Adr2><State>CO</State></Adr2> </Addres>
</Owner>
</OwnerTable>
23
RDBMS vs. XML
RDBMS XML
24
A conceptual view of XML
An XML document is an ordered, labeled
tree
Character data leaf nodes contain the actual
data (text strings)
Elements nodes are each labeled with
a name (often called the element type), and
a set of attributes, each consisting of a name
and a value,
and can have child nodes
25
A concrete view of XML
An XML document is a text with markup tags
and other meta-information.
Markup tags denote elements:
..<foo attr="val" ...>bar</foo>...
| | | |
| | | a matching element end tag
| | the contents “bar” of the element
| an attribute with name attr and value val, enclosed by ' or "
an element start tag with name foo
An XML document must be well-formed:
start and end tags must match
element tags must be properly nested
26
Example of XML document
<?xml version="1.0"?>
<note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading>
<body>Don't forget me this weekend!</body> </note>
28
Applications of XML
XML is a meta-language to create another
languages; the main application of XML is
making new languages
XHTML: W3C's XMLization of HTML 4.0.
29
Applications of XML (cont.)
CML: Chemical Markup Language
<molecule id="METHANOL">
<atomArray>
<stringArray builtin="elementType">C O H H H H</stringArray>
<floatArray builtin="x3" units="pm"> -0.748 0.558 -1.293 -1.263
-0.699 0.716 </floatArray>
</atomArray>
</molecule>
31
32
Conclusion
XML is an important language that one
should learn
Plenty of research issues for Database
Researchers
XML query language issue
Conversion issue btw XML and other (eg,
relational) models
Storage issue for native XML database
Novel indexing issue
System design and implementation issue
33
Further References
World-Wide Web Consortium:
www.w3c.org/xml/
www.brics.dk/~amoeller/XML
www.xml101.com/xml/default.asp
XML WIKI:
en.wikibooks.org/wiki/XML
35