You are on page 1of 26

<?xml version=1.

0> <course startdate=February 06, 2006> <title> eXtensible Markup Language </title> <lecturer>Phan Vo Minh Thang</lecturer> </course>

eXtensible Markup Language


Fundamentals

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

XML Documents and XML Files


An XML document contains text, never binary data An XML document can be opened with any program that knows how to read a text file It is usual to give a .xml extension file name MIME media type: application/xml or text/xml
<person> Alan Turing </person>

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

Elements, Tags, and Character Data


The previous example is composed of a single element named person
Start-tag: <person> End-tag: </person>

Everything between start-tag and end-tag is called content


Content encompasses real information Whitespace is part of the content, though many applications will choose to ignore it

<person> and </person> are markup Alan Turing and its surrounding whitespace are character data
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

Tag Syntax
Like XML tags
Start-tags begin with < and end-tags begin with </ Both of start-tags and end-tags are followed by the name of the element and are closed by >

You are allowed to make up new XML tags


Tag names generally reflect the type of content inside the element, not how that content will be formatted

Case sensitivity
<Person> <PERSON> <person>

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

Empty Element
Empty element: elements that have no content
For the value of their attributes

Example
<email href=mailto></email>

Shorthand notation
<email href=mailto />

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

XML Trees
Elements can contain other elements that in turn can contain text or elements and so on
Start and end tags must always be balanced and children are always completed enclosed in their parents. Use Last-In-First-Out
<name><fname>Jack</fname><lname>Smith<name></lname>
Illegal

Parent Child Sibling Each element (except the root element) has exactly one parent element An XML document is a tree of elements Root (document) element: the first element in the document and the element that contains all other elements
eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

A Tree Diagram for Example 2-2

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

Tree of the Address Book

address book entry entry

name

address

tel

tel

email

name

tel

email

street

region

postalcode

locality

country

fname

lname

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

Mixed Content
The dichotomy between elements that contain only character data and elements that contain only child elements is common in data-oriented XML documents Mixed content: some elements may contain sub-elements and raw data
Common in XML documents containing articles, essays, stories, books, novels, reports, web pages document-oriented applications

Element content type:


Element content Mixed content Simple content Empty content
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

Attributes
Attach additional information to elements An attribute is a name-value pair attached to an elements start-tag
One element can have more than one attribute Name and value are separated by = and optional whitespace Attribute value is enclosed in double or single quotation marks <tel preferred=true>03-5712121</tel> Attribute order is not significant

Example 2-4
<person born=1912-06-23 died=1954-06-07> Alan Turing </person>
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

Attributes and Elements


When and whether one should use child elements or attributes to hold information?
Attributes are for metadata about the element, while elements are for the information itself

Each element may have no more than one attribute with a given name The value of attribute is simply a text string limited in structure An element-based structure is a lot more flexible and extensible If you are designing your own XML vocabulary, it is up to you to decide when to use which
eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

XML Names
Rules for naming elements, attributes
May contain essentially any alphanumeric character and non-english letters, numbers, and ideograms May contain underscore(_), period (.), and hyphen (-) XML may not contain whitespace of any kind All names beginning with the string xml (in any combination of case) are reserved for standardization in W3C XML-related specifications Start with either letters. ideograms and underscore (_) No limit to the name length
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

XML Names (Cont.)


HTML elements in XML are always in uppercase XML elements are frequently written in lowercase
When a name consists of several words, the words are usually separated by a hyphen (-)
address-book

OR The first letter of each word in XML elements are frequently in uppercase and no separation character
AddressBook

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

Element names example


Which tags are named correctly?

<first.name> <xml-tag> <10things> <_oldname> <more=less>

<authors.name> <Chapter Title>

<SUV-rating> <engine:part>

<myFavoriteColor> <thingamabob> <50s.music> <the10popular> <small_version>

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

Entity References
What if the character data inside an element contains < ? Entity reference when an application parses an XML document, it replaces the entity reference with the actual characters to which the entity reference refers Entity references are markups XML predefines 5 entity references you can define more
&lt; the less-than sign &amps; the ampersand (&) &gt; the greater-than sign &quot; the straight, double quotation marks (") &apos; the straight single quote (')

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

CDATA Sections
What if your character data have a lot of <, &, ', " Enclose the character data in a CDATA section
<![CDATA[ .. ]]>

Everything inside a CDATA section is treated as raw character data not markup The only thing that cannot appear in a CDATA section is the CDATA section end delimiter ]]>

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

Comments
XML documents can be commented so that coauthors can leave notes for each other and themselves
Begin with <!-- and end with the first occurrence of --> The double hyphen -- should not appear anywhere inside the comment until the closing -->

Comments may appear anywhere in the character data of a document Comments may appear before or after the root element Comments may not appear inside a tag or inside another comment Comments are strictly for making the raw source code of an XML document more legible to human readers
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

The XML Declaration


XML documents should (but not have to) begin with an XML declaration
The XML declaration must be the first thing in the document It must not be preceded by any comments, whitespace

An XML declaration specifies encoding and standalone


Encoding: specify the character set used in the XML document
Default to Unicode/UTF8

Standalone: if the value is "no", then an application may be required to read an external DTD to determine the proper values for parts of the document

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

Rules for Well-Formed XML


Rule 1: Mandatory closing tags
The set of tags is unlimited but all container tags must have end tags

Example of legal XML


<person> <name> Phan Minh Vo Thang </name> <title> Associate Professor </title> <age> 25 </age> </person>

Rule 2: There must be exactly one root element

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

Rules for Well-Formed XML (Cont.)


Rule 3: Proper element nesting
All tags must be nested correctly. Like HTML, XML can intermix tags and text, but tags may not overlap each other. Legal XML
<person> Hao-Ren Ke is an <role> pioneer</role>for<service>Computerized Interlibrary Loan</service> in Taiwan</person>

Illegal XML
<person><name>Claven</name> <keypoint><hd>XML provides a data bus</hd> </person><more></more></keypoint>

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

Rules for Well-Formed XML (Cont.)


Rule 4: Attribute values must be single or double quoted
Legal
<tag attribute=value> <tag attribute=value>

Illegal
<font size=6>

Rule 5: An element may not have two attributes with the same name Rule 6: Comments and processing instructions may not appear inside tags Rule 7: No unescaped < or & signs may occur in the character data of an element or attributes
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

Rules for Well-Formed XML


Rule 8: Empty elements must be written in an abbreviated form using special XML syntax.
Legal
<BR /> <HR /> <TITLE></TITLE> is equivalent to <TITLE/>

Illegal
<BR> <HR>

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

Four Common Errors


Forget End Tags Forget that XML is Case Sensitive Introduce Spaces in the Name of Element Forget the Quotes for Attribute Value

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

Exercise
Is it a well-formed XML document?
<?xml version=1.0 standalone=yes?> <Book> <Title>The XML Handbook</title> <Publisher>Prentice Hall PTR </Publisher> <Author>Charles F. Goldfarb</Author> </Book>
<web-class> <Title>XML Basics</Title> <xml-professor>Carolyn Strong</xml-professor> <1st.class>April 17</1st.class > </web-class>
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

Well-formed checking using tools

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=02> XML Fundamentals </section> </material>

Info

Course name:

Special Selected Topic in Information System


Section: XML Fundamentals Number of slides: 26 Updated date: 12/02/2006 Contact: Mr.Phan Vo Minh Thang
(minhthangpv@hcmuaf.edu.vn)
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.

You might also like