You are on page 1of 29

<?xml version=1.

0> <course startdate=February 06, 2006> <title> eXtensible Markup Language </title> <lecturer>Phan Vo Minh Thang</lecturer> </course>

eXtensible Markup Language


Document Type Definitions (DTDs)

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

Whats DTD for?


DTD explains precisely which elements and entities may appear where in the document and what the elements' contents and attributes are DTD defines the valid structure of a XML document rules
Valid element names
<BOOK><TITLE><AUTHOR><PRICE><ISBN>

Valid attribute names and values


<AUTHOR id=234 sex=F>

Relationship between elements


<BOOK><TITLE></TITLE></BOOK>

Different XML applications can use different DTDs to specify what they do and do not allow
eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

Its Complex, but More Powerful


With DTD you can
Define your own meaningful tags
Easy to read Easy to search Easy to transform to other formats

You can Validate whether your document is structure correct

An XML document is valid if it has an DTD and the document conforms with the DTD The document type declaration must appear before the first element in an XML

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

Getting Started with DTD


DTD can be stored in a separate file from the document it describes DTD can be included inside the document it describes DTDs allow forward, backward, and circular references to other declarations

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

Document Type Declaration


A valid document includes a reference to the DTD to which it should be compared
<!DOCTYPE person SYSTEM "http://../person.dtd">
The root element is person The DTD of this document can be found in "http://../person.dtd"

The document type declaration is included in the prolog of the XML document after the XML declaration but before the root element You can use a relative URL (or just the filename) instead of the absolute form, if the document resides in the same server (or directory) as the DTD

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

DOCTYPE Declaration
<!DOCTYPE root_element source location [ internal DTD ]>

Starts with <! to signify a declaration and ends with > Must include DOCTYPE keyword and root_element One of two source keywords PUBLIC or SYSTEM One or two location1 location2 to associate a DTD Internal DTD may be included within [ internal DTD ]

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

Internal DTD Subsets


DTD can be included inside the document it describes Some document type declarations contain some declarations directly but link in others
The part of the DTD between the brackets is called the internal DTD subset All the parts that come from outside this document are called the external DTD subsets As a general rule, the two subsets must be compatible. Neither can override the element declarations the other makes

When you use an external DTD subset, you should give the standalone attribute of the XML declaration the value no

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

Internal and External DTD


Internal DTD: Put DTD and XML in one file
<!DOCTYPE RootElement [ <!ELEMENT author (#PCDATA)> .. ]>

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

DOCTYPE examples
<!DOCTYPE BookCatalog SYSTEM http://www.wrox.com/DTDs/BookCatalog.dtd>

<!DOCTYPE BookCatalog SYSTEM file: ///DTDs/BookCatalog.dtd>

<!DOCTYPE BookCatalog PUBLIC PublishingConsortium/BookCatalog >

<!DOCTYPE BookCatalog PUBLIC PublishingConsortium/BookCatalog http://www.wrox.com/DTDs/BookCatalog.dtd >


eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

Element Declaration
DTD is a mechanism to describe every object (element, attribute,) that can appear in an XML document Element Declaration
<!ELEMENT element-name content-specification >
Content specification specifies what children the element may or must have in what order
<!ELEMENT address-book (entry+)>

Parentheses are used to group elements in content specification


<!ELEMENT name (lname, (fname | title))>

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

Special Keywords in Content Model


Special Keywords in Content Model
#PCDATA: parsed character data (text)
<!ELEMENT phone_number (#PCDATA)> CDATA sections appear anywhere #PCDATA appears

EMPTY: empty element ANY: can contain any other elements declared in the DTD (including mixed content, child elements)

Mixed Content
Element contents that have #PCDATA

Element Content
Element contents that contain only elements
eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

The Secret of +, *, ?
+, *, ?: Occurrence indicators
No occurrence indicator: appear once and only once +: appear one or several times *: appear zero or more times ?: appear once or not at all

Example
<!ELEMENT entry (name,address*,tel*,fax*,email*)> <!ELEMENT address (street,region?,postal-code,locality,country)>

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

The Secret of , |
, | : Connectors
, : Elements must appear in the same order | : Only one element must appear

Examples
<!ELEMENT name (#PCDATA | fname | lname)*> <!ELEMENT entry (name,address*,tel*,fax*,email*)>

Mixed Content: Components must always separated by a |, and the model must repeat
<!ELEMENT name (#PCDATA, fname, lname)> <!ELEMENT name (#PCDATA | fname | lname)*>
PCDATA must be the first child in the list
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

More Examples
<!ELEMENT cover (title, (author | subtitle))> <!ELEMENT circle (center, (radius | diameter))> <!ELEMENT name (last_name | (first_name, ((middle_name+, last_name) | (last_name?)))> <!ELEMENT paragraph (#PADATA | name | profession | footnote | emphasize | date)* > <!ELEMENT image EMPTY)
<image source="bus.jpg" width="152" />

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

Attribute Declaration
Attribute Declaration
<!ATTLIST element-name attribute-name attribute-type defaultvalue>
<!ATTLIST tel preferred (true|false) false> <!ATTLIST email href CDATA #REQUIRED preferred (true|false) false>

Can appear anywhere in the DTD


Best to list attributes immediately after the element declaration

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

Attribute Type
CDATA: String ID: Identifier unique in the document IDREF: Value of an ID IDREFS: List of IDREF separated by space ENTITY: Name of an external entity ENTITIES: List of ENTITY NMTOKEN: Word without spaces NMTOKENS: List of NMTOKEN Enumerated-type list: Closed list of NMTOKEN separated by | Notation: name of a notation declared in the DTD
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

Attribute Types (Cont.)


CDATA
Any text string acceptable in a well-formed XML attribute value The most general attribute type Can be used for prices, URIs, email addresses, citations

NMTOKEN (named token)


Consist of the same characters as an XML name, but any allowed characters can be the first character in a name token 12, .cshrc

NMTOKENS
one ore more XML name tokens separated by whitespace <performances dates="08-21-2001 08-23-2001 08-27-2001"> Kat and the Kings </performances>
eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

Attribute Types (Cont.)


Enumeration
A list of all possible values for the attribute, separated by | Each possible value must be an XML name token <!ATTLIST date year (2000 | 2001 | 2002 | 2003) #REQUIRED>

ID assign unique identifiers to elements


An XML name that is unique within the XML document (no other ID type attribute in the document can have the same value) Each element must have no more than one ID type attribute <!ATTLIST employee ssn ID #REQUIRED>
<employ ssn="_078-05_123" />

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

Attribute Types (Cont.)


IDREF
Refer to the ID type attribute of some element in the document Used to establish relationships between elements when simple containment won't suffice Do not constrain the person attribute of the team_member element to match only employee IDs or constrain the project_id attribute of the assignment element to match only project IDs

IDREFS
Contain a whitespace-separated list of XML names, each of which must be the ID of an element in the document

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

Attribute Types (Cont.)


ENTITY
Contain the name of an unparsed entity declared elsewhere in DTD <!ATTLIST move source ENTITY #REQUIRED>
<move source="X-Men-trailer" />

ENTITIES
Contain the name of one ore more unparsed entities declared elsewhere in the DTD, separated by whitespace <!ATTLIST slide_show slides ENTITIES #REQUIRED>
<slide_show slides="slide1 slide2 slide3 slide4 />

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

Attribute Types (Cont.)


NOTATION
Contain the name of a notation declared in the document's DTD It could be used to associate types with particular elements, as well as limiting the types associated with the element
<!NOTATION <!NOTATION gif tiff SYSTEM SYSTEM SYSTEM SYSTEM "image/gif"> "image/tiff"> "image/jpeg"> "image/png">

<!NOTATION jpeg <!NOTATION png

<!ATTLIST image type NOTATION (gif | tiff | jpeg | png) #REQUIRED>

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

Default Value
#REQUIRED
The attribute is required. Each instance of the element must provide a value for the attribute No default value is provided <!ATTLIST person name CDATA #REQUIRED>

#IMPLIED
The attribute is optional. Each instance of the element may or may not provide a value for the attribute No default value is provided <!ATTLIST person born CDATA #IMPLIED>

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

Default Value (Cont.)


#FIXED (follow by a value)
The attribute value is constant and immutable. The attribute has the specified value regardless of whether the attribute is explicitly noted on an individual instance of the element If it is included, it must have the specified value <!ATTLIST biography xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink>

Literal Value
The attribute will take this value if no value is given in the document <!ATTLIST web_page protocol NMTOKEN "http">

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

General Entity Declaration


Replace texts in XML documents defined by the DTD <!ENTITY entity_name "replacement_text">
<!ENTITY super "superman in the jungle">
&super; superman in the jungle

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

Limitation of DTD
Limitations of DTD
Content is limited to textual Difficult to put in repetition constraints DTD does not use XML syntax

DTD does not say the following


What the root element of the document is How many of instances of each kind of element appear in the document What the character data inside the elements looks like The semantic meaning of an element A DTD never says anything about the length, structure, meaning, allowed values, or other aspects of the text content of an element
eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

Advanced DTD
General Entity Declaration External Parsed General Entities External Unparsed Entities and Notations Parameter Entities Conditional Inclusion

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

Validation
A validating parser compares a document to its DTD and lists any places where the document differs from the constraints specified in the DTD
The parser can decide what it wants to do about any violations
A validity error is not necessarily a fatal error

Validation is an optional step in processing XML

Everything not permitted in the DTD is forbidden

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

Validating a Document
Web browsers do not validate documents but only check them for well-formedness Online validating parsers
http://www.stg.brown.edu/service/xmlvalid http://www.cogsci.ed.ac.uk/%7Erichard/xml-check.html

Validating parser software


Most XML parser class libraries include a simple program to validate documents http://www.cogsci.ed.ac.uk/%7Erichard/xml-check.html http://www.topologi.com

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=03> Document Type Definition </section> </material>

Info

Course name:

Special Selected Topic in Information System


Section: Document Type Definitions (DTDs) Number of slides: 29 Updated date: 12/02/2006 Contact: Mr.Phan Vo Minh Thang
(minhthangpv@hcmuaf.edu.vn)
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.

You might also like