You are on page 1of 100

<?xml version=1.

0> <course startdate=February 06, 2006> <title> eXtensible Markup Language </title> <lecturer>Phan Vo Minh Thang</lecturer> </course>

eXtensible Markup Language


Foundation, Technologies & Applications

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

XML is an Acronym standing for: eXtensible Markup Language XML is fast becoming the new Internet standard for information exchange. For complex information reuse, XML is the technology of choice

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Markup Language
Whats a markup language?
Markup is a collection of characters that group, organize, and label the pieces of content in a document
In XML, markup is primarily meta content -information about information

Markup Language is a set of rules for representing data and encoding structures

eXtensible Markup Language


3

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Types of markup
Procedural markup specifies what to do with data
How to process the data

Presentational markup defines how to display data


Example: Make this BOLD, center this text

Descriptive markup describes what the different data elements are


Especially in relationship to other data elements
eXtensible Markup Language
4

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Markup: HTML vs. XML


HTML designed to
display data focus on how data looks
Markup includes structural and presentational tags

XML designed to
describe data focus on what data is
Markup is primarily descriptive or declarative

eXtensible Markup Language


5

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

HTML vs. XML example

HTML
<html> <body> <p>333 MHz Pentium II with 256K internal cache, 512K external cache, 32MB standard RAM, 512MB max. RAM </p> </body> </html>

XML
<pcinfo> <processor> <type>Pentium II</type> <speed>333</speed> <intcache>256</intcache> </processor> <extcache>512</exctache> <ram> <standard>32</standard> <max>512</max> </ram> </pcinfo>
eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

XML replaces HTML?


No, XML is not a replacement for HTML XML and HTML were designed with different goals
HTML is about displaying information XML is about describing information

eXtensible Markup Language


7

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

XML doesnt do anything


XML is created as a way to structure, store and send information in plain text files It is just pure information wrapped in XML tags
<note> <to>Kelly</to> <from>Jim</from> <heading>Reminder</heading> <body>Don't forget class tonight!</body> </note>

You need other software to send, receive or display XML


eXtensible Markup Language
8 Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Format vs. content


XML separates presentation from content
In fact, XML requires a stylesheet or conversion to HTML in order to be displayed

The advantages are significant


Easy to create multiple views of the same content Content is independent of any application Data can be queried and manipulated much like a database application

eXtensible Markup Language


9

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Why use XML?


It is a universally accepted standard way of structuring data (syntax). It is a W3C recommendation (W3C = World Wide Web Consortium) The marketplace supports it with a lot of free/inexpensive tools. The alternative to using XML is to define your own proprietary data syntax, and then build your own proprietary tools to support the proprietary syntax (Not a very appealing idea).
eXtensible Markup Language
10 Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Alphabet Soup
Whats with all the different markup languages?
SGML HTML XML XHTML

eXtensible Markup Language


11

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

A Brief History of XML


First SGML (Standard Generalized Markup Language)
The grandfather of all markup languages An ISO standard for markup of text, Adopted in1986 SGML describes the logical structure of a document, its components, and their relationship to each other, not how the document should be formatted (Separates content from presentation and format) Conform to a formal model DTD (Data Type Definition)
Documents are regarded as having types defined by DTD

Disadvantages of SGML
Comprehensive but very complex and is difficult to learn and apply. SGML tools have traditionally been very expensive. SGML has been primarily a technology for print publishing

Developed at IBM by Goldfarb, Mosher & Lorie in 1969


eXtensible Markup Language
12 Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

A Brief History of XML (Cont.)


Then HTML, language of Web an simplified application of SGML
HTML is a fixed tag set. You cannot add your own tags to HTML. HTML is designed for display (in Web browser). It is not effective for print or other formats. HTML is static. Its display is fixed, so providing information in different ways based on user request is difficult. HTML is not structural (with the exception of lists and tables). It's primarily a linear presentation markup HTML is not really a standard. Browser vendors have created their own proprietary codes, which impedes standardization

eXtensible Markup Language


13

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

XML, HTML, and SGML


HTML is a specific application of SGML
HTML is defined by a DTD written in SGML

XML is a abbreviated version of SGML

SGML

DTD for HTML

XML is a abbreviated version of SGML HTML Documents

eXtensible Markup Language


14

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

XML, HTML, and SGML


XML is an abbreviated version of SGML
Omit more complex and less-used parts of SGML Easier to define new document types Easier to write program to handle XML documents More suited to delivery and interoperability over WWW XML is more SGML-- rather than HTML++

eXtensible Markup Language


15

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

What is XML?
XML is NOT a set of tags that you can apply to documents
XML is a specification that set rules for the creation of tag sets that you can apply to document (eXtensible) XML does not define the tag names you do

XML is NOT a programming language like C++ XML is NOT a network transport protocol like HTTP, FTP XML is NOT a database
A database can contain XML data, but the database itself is not an XML document You can store XML data into a database or retrieve XML data from a database, but you need to run software written in a real programming language such as C and Java
eXtensible Markup Language
16 Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Design Goals of XML


XML will be designed for use on the Internet, but shall support a wide variety of applications (such as Web-based publishing and e-commerce) XML will be based on and compatible with SGML
XML keeps the best of SGML and reduces the complexity

It will be easy to write programs that processes XML documents XML documents will be easy to create, readable without specialized tools, and reasonably clear
XML is in ASCII (or text) format

The design of XML will be formal and concise


XML is a precise standard
eXtensible Markup Language
17 Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

New Computing Model


Networking - TCP/IP network-independent

transport
Web client - Removes the platform dependency Java - Write code not specific to an OS XML - Makes data independent of SW

XML is best viewed as the new ASCII of the Internet


eXtensible Markup Language
18

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

XML can be used to


Manage meta content Provide rich document descriptions Publish and exchange database contents Provide a messaging format for communication between applications

eXtensible Markup Language


19

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

HTML has limited structure


Restricted to a fixed set of tags Isnt extensible for specific applications Presents barriers for
Reuse in multiple formats Interchange between applications or programs Automation or ability to process programmatically

eXtensible Markup Language


20

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Revert to using SGML?


SGML overcomes HTMLs drawbacks
Information model of freedom and extensibility Write once, reuse many times Future-proof, platform-proof Validation for completeness and correctness Infinite possibilities for expressing information (infinite tag set) SGML has its problems: Its complex and difficult to learn and use
Theres no mainstream browser support

Plus SGML only standardizes structure


Includes no support for styles
eXtensible Markup Language
21

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

XML to the Rescue


XML was invented in 1997 to enable the delivery of SGML information over the Web Well-behaved subset of SGML
Stricter rules drive consistency Enables machine-to-machine communications

SGML --, not HTML ++


Retains the power and flexibility of SGML without the complexity

A World Wide Web Consortium (W3C) standard Has overwhelming vendor support

eXtensible Markup Language


22

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Introduction to World Wide Web Consortium


W3C is responsible for the development of Web specifications (recommendations) that describe communication protocols and the technologies for the Web. Role of W3C in defining XML related specifications
W3C has laid down certain rules that need to be followed by all XML applications. Listed below are some of these rules:
XML must be directly usable over the Internet. XML must support a wide variety of applications. XML must be compatible with SGML.

Offical website is at http://www.w3c.org

eXtensible Markup Language


23

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Introduction to World Wide Web Consortium (Contd.)


The number of optional features in XML needs to be kept to the absolute minimum, ideally zero. XML documents must be human-legible and clear. XML design must be formal and concise.

eXtensible Markup Language


24

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

XML Specification
XML 1.0 specification released
By W3C consortium on Feb. 10, 1998 As a recommendation - the highest level of endorsement possible

Cross-platform, vendor neutral standard


Developed by XML Working Group Many vertical industry initiatives

eXtensible Markup Language


25

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

XML working group


Consists of 14 companies and organizations
Adobe, ArborText, DataChannel, FujiXerox, HP, Inso, Isogen, Microsoft, Netscape, SoftQuad, Sun Microsystems, University of Chicago Along with a W3C representative and James Clark

Yes, Sun and Microsoft can work together!

eXtensible Markup Language


26

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

XML 1.0 spec has 2 parts


One for XML documents
How to use tagged markup to indicate the meaning of data

One for XML Document Type Definitions (DTDs)


How to indicate the allowable structure for a class of XML documents Can constrain the pieces of data that might occur, the hierarchy of data and the number of times each piece of data might occur

eXtensible Markup Language


27

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Summary: XML Advantages


Over SGML
Faster download Supported by mainstream browsers Standard linking Standard stylesheet

Over HTML
Interchangeable Reusable Enables automation Searchable

eXtensible Markup Language


28

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

What about XHTML?


EXtensible HyperText Markup Language
An XML application Stricter and cleaner version of HTML Almost identical to HTML 4.01
Is aimed to replace HTML

Better support for multimedia, wireless devices An official W3C Recommendation - January 26, 2000

eXtensible Markup Language


29

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Using XHTML
Some dont recommend XHTML because
You have all the pain of XML without any of the gain Markup is still structural and presentational

But, your pages can be validated


And work without a hitch (or youll receive error messages)

Theres a tool to convert HTML to XHTML


HTMLTidy www.w3.org/People/Raggett/tidy/

eXtensible Markup Language


30

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Exercise A
1. Metadata is information about information 2. XML is a programming language
FALSE TRUE

3. XML assumes that a document is structured TRUE hierarchically 4. XML is platform and language independent 5. You can use any text editor to create XML TRUE documents 6. XHTML lets you define your own tags
FALSE TRUE

eXtensible Markup Language


31

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Exercise A (continued)
7. XML content can be in any language of the world 8. Internet Explorer 5 and Netscape 6 support XML
TRUE

9. XML is about adding procedural markup code to TRUE documents 10. XML provides 50% of the capabilities of SGML 11. XML is a good solution to achieve data independence 12. A parser checks to see if your markup tags are spelled correctly
FALSE FALSE TRUE

FALSE
eXtensible Markup Language
32 Lecturer: Phan Vo Minh Thang MSc.

A Look at XML

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

A Sample HTML Document


Sample HTML document For pure display, HTML is very effective
Simple code, workable output, clear delineation of content elements and their nesting by tags

HTML is a good example of a simple markup language


There are start tags and end tags (<h2></h2>) The end tag is differentiated from the start tag by the forward slash The procedure file shows nesting of the elements of the document

HTML is concerned solely with presentation


It does nothing to help you understand what the information is
Need interpret the content, not through the HTML tags
eXtensible Markup Language
34

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

A Sample HTML Document Illustration

eXtensible Markup Language


35

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

A Sample XML Document


Tagging for XML is similar to HTML, with start and end tags enclosing content. Butno predefined tags (you can define the tags you need) Specific rules for XML tagging stricter than HTML
You must have closing tags for all elements Tags must be nested, never overlapping Tag names must match case

The tag names identify what they contain


You define the tag names to suit the information that you are marking up Instead of generic tags, you get semantic tags Ideally, the names should come from the semantic model for your information
eXtensible Markup Language
36 Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

A Sample XML Document Illustration

eXtensible Markup Language


37

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Benefits to Defining Your Own Tag Names


Tag names have meaning for you and your authors Names can reflect the content Tag names have nothing to do with formatting
Formatting can be defined later, when you know the exact purposes of the document. Nothing in the markup will limit the formats to which you can output

You can have as many or as few tags as you need

eXtensible Markup Language


38

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

A First Look at XML


XML makes essentially two changes to HTML
No predefined tags Stricter

eXtensible Markup Language


39

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

No Predefined Tags
<price currency=usd>499.00</price> <toc xlink:href=/pineapple>Pineapplesoft</toc>

<P> The Price is $499.00 <TABLE> <TR> <TD> <A HREF=/pineapple><B>Pineapplesoft</B></A> </TD> <TR> </TABLE>
eXtensible Markup Language
40

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

No Predefined Tags (Cont.)


XML is extensible -- predefine no tags
Authors create tags that are needed for their applications

Issues
How to look at XML documents
WWW browsers?
How to map XML to HTML?

Can you do operation on XML data?


Can you compare different prices?

How does XML simplify Web site maintenance


eXtensible Markup Language
41 Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Stricter
HTML has a very forgiving syntax
Difficult to develop software (like browsers) to process HTML Software size is large

XML adopts a very strict syntax


Result in smaller, faster, and lighter processing software (like browsers)

eXtensible Markup Language


42

Lecturer: Phan Vo Minh Thang MSc.

Document Structure

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Simple E-mail Document


Receiver: John Sender: Harry Subject: Harry

Dear John, Hello .. This is the document for you! Harry April 12, 2000

Structure Content Display


eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.

44

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

The Structure of a Letter

eXtensible Markup Language


45

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Structure, Content, Display


A document includes
Structure Content Display (presentation)

HTML: define structure and display XML: only define structure


But it relies on special browsers and/or XSL for displaying it

eXtensible Markup Language


46

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Benefits of XML
XML is a text-based format that lets developer describe, deliver, and exchange structured data between a range of applications to clients for local display and manipulation Information will be more accessible and reusable XML brings power and flexibility to Web-based applications for exchanging structure data XML offers the tantalizing possibility of truly cross-platform, long-term data formats

eXtensible Markup Language


47

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Operation Model of XML

Schema Language

Structure

LP XM

ser ar

XML Document Content

Processing

Documents with Special Format (HTML, WML)

XSL

Display

eXtensible Markup Language


48

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Related XML Standards


XML Schema - offers more extensive rules than

DTDs
Extensible Stylesheet Language (XSL) - used

for displaying XML


XSL Transformations (XSLT) - used to

transform from one data format to another


XML Namespaces - used to distinguish one XML

vocabulary from another and prevent overlaps in names


eXtensible Markup Language
49

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Related XML Standards


XPath is a querying language for addressing

parts of an XML document


XLink and XPointer are languages used to link

XML documents to each other


Xlink describes how to associate two or more resources XPointer describes how to address a resource SAX (simple API for XML) is a simple Java

interface

eXtensible Markup Language


50

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

What About DOM?


Document Object Model (DOM)

is an application programming interface (API) for HTML and XML documents Allows for content to be accessed and manipulated even AFTER it has become part of an HTML or XML document Published as a Recommendation by W3C

eXtensible Markup Language


51

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

DOM and XML


One of earliest uses of DOM was Dynamic HTML
Client-side scripts manipulate, display (redisplay) an HTML document based on user actions

The DOM provides a common way of accessing data structures from structured documents
Opens the door for XML as the lingua franca of data interchange on the Internet

eXtensible Markup Language


52

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

XML Software
XML Browser
View and print XML documents Microsoft Internet Explorer

XML Editor
Microsoft XML notepad

XML Parser
Shield programmers from the XML syntax IBMs XML for Java

XSL Processor
Transform XML to HTML LotusXSL
eXtensible Markup Language
53 Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Applications of XML
Document-oriented applications
Intended for human consumption Document publishing

Data-oriented applications
Intended for software consumption Data Exchange
Data Exchange in Digital Libraries

eXtensible Markup Language


54

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Document Applications
XML concentrates on the document structure, and this makes it independent of the delivery medium Possible to edit and maintain documents in XML and automatically publish them on different media
Many publications are available online and in print Web sites are optimized for specific viewers versions and one optimized for some users
If done manually very costly

one generic

Make sense to maintain a common version of the documentation in a media-independent format (XML), and to automatically convert it into publishing formats such as HTML, PostScript, RTF, PDF

eXtensible Markup Language


55

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Document Structure in XML

eXtensible Markup Language


56

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Document Applications

eXtensible Markup Language


57

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Example RDBMS Table

eXtensible Markup Language


58

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Database Structure in XML

eXtensible Markup Language


59

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Write XML
printf(<? xml version=\1.0\ ?>); SELECT IDENTIFIER, NAME, PRICE FROM TABLE1 INTO :IDENTIFIER, :NAME, :PRICE printf(<Record>); printf(<Identifier> :IDENTIFIER </Identifier> <Name> :NAME </Name> <Price> :PRICE </Price>); printf(</Record>);

eXtensible Markup Language


60

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

XML Result
<?xml version=1.0 ?> <Records> <Record> <Identifier>p1</Identifier> <Name>XML Editor</Name> <Price>$499.00</Price> </Record> <Record> </Record> </Records>
61

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Applications Exchanging Data Over the Web

eXtensible Markup Language


62

Lecturer: Phan Vo Minh Thang MSc.

Characteristics of XML

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Structured Content
Book chapter title -- section summary When you examine similar information product, structures are not consistent from product to product
Big problem for reuse

In XML, structure can be defined in a Document Type Definition (DTD) or Schema. DTD
Defines all the elements (XML tags) that can be used in a document. Defines the relationship of those elements to other elements Define the hierarchy of elements ("a chapter contains"), the order of elements, the number of elements Maintain structural consistency
eXtensible Markup Language
64 Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Separation of Content and Format


Tradition tools like Word, HTML editor: WYSIWYG
Not good for reuse Formatting is specific to the output the tool is designed to support

XML makes documents transportable across systems and applications XML focuses on the structure of a document, not the presentation. XML maintains the presentation information (style) in separate files that are associated with the document when it is published or used.

eXtensible Markup Language


65

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Separation of Content and Format Example

eXtensible Markup Language


66

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Built-in Metadata
XML is a set of rules for creating markup tags The tag names themselves offer additional detail about the information.
The tag names become metadata Attributes can be used to further define metadata

Example
Using attributes identifies the audience for each specific step, option, or even word

eXtensible Markup Language


67

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

eXtensible Markup Language


68

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Database Orientation
XML makes you look at information in the way of data XML DTD designers are not interested in the actual data values in design. They are concerned with the type of information, the hierarchy of information, and the relationship of the pieces of information
The result is a structural format that can be stored very easily in DB Each element can become a field of a table

eXtensible Markup Language


69

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Use of XML
How to format XML information for presentation? XSL is a powerful mechanism for both transforming and formatting XML documents
XSL is an XML markup language itself and can
Format content for online display or for paper-based delivery Add constant text or graphics Filter content Sort or reorder text

Three parts of XSL


XPath XSL transformations XSL-FO (formatting objects)

eXtensible Markup Language


70

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Use of XML (Cont.)


XPath identify and format specific elements in an XML doc
Can specify and apply specific formatting or transformation to elements, such as a title following a chapter, the first paragraph in a section, and every other bullet in a list

XSLT
Can manipulate the information to reorder, repeat, filter out information, or even add information based on details in the file Can transform an XML document into another markup language

XSL-FO
Provide style sheet capabilities for converting XML to paper-based format such as PDF Include page layouts, headers, footers
71

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Document Applications

eXtensible Markup Language


72

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Personalization
Personalization is information that can be manipulated to serve the needs of a specific user Personalization can be user defined, or can managed by software, based on a user's login information Personalization that is managed by software may be controlled by observing user behavior, and/or combined with preferences to create a personalized experience With XML, documents can be broken down, stored as separate physical pieces in a database, and then assembled in any order to meet user demands

eXtensible Markup Language


73

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

HTML Document
<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <title>Phn ni dung tiu </title> </head> <body> <p align=center> Ni dung bn trong phn thn <br/> ca mt ti liu HTML </p> </body> </html>
74

Tags used in HTML pages are predefined

Element Attribute Text

eXtensible Markup Language

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

HTML Document (Cont.)


Displayed result while the HTML document is browsed in IE 6.0

eXtensible Markup Language


75

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

XML Document
<?xml version="1.0" <noidung> <phandau> <tieude>Phn ni dung tiu </tieude> </phandau> <phanthan> <gioithieu>Ni dung bn trong phn thn ca mt element XML</gioithieu> </phanthan> </noidung>
eXtensible Markup Language
76 Lecturer: Phan Vo Minh Thang MSc.

encoding="UTF-8" standalone="yes"?>

Tags used in XML documents are defined by users

Element Attribute Text

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

XML Document (Cont.)


XML document is display in Grid format by using the XMLSpy Altova Application

eXtensible Markup Language


77

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

XML Document (Cont.)


Displayed result while the XML document is browsed in IE 6.0

eXtensible Markup Language


78

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Problem Statement 1.D.1


CyberShoppe requires a centralized repository of data about the products sold through its e-commerce site. It has three branches, which maintain data on their local computer systems. Data from all the three branches must be collated and housed in a centralized location. This data must be made available to the Accounts and Sales sections of these branches, regardless of the hardware and software platforms being used at the branches. The sales personnel also require access to the data using equipment, such as palmtops and cellular phones.

eXtensible Markup Language


79

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Problem Statement 1.D.1 (Contd.)


The product details of CyberShoppe consist of the name of the product, a brief description of the product, the price, and the available quantity on hand. Each product is uniquely identified by a product ID.

eXtensible Markup Language


80

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Task List
Identify the method to store data in a device-independent format. Identify the structure of the document in which data is to be stored. Create an XML document to store data. View the XML document in a browser.

eXtensible Markup Language


81

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

Task 1: Identify the method to store data in a device-independent format.


Result

<section id=01> Introduction to XML </section> </material>

XML provides a way to store structured data that is capable of being recognized by different kinds of devices. In other words, it enables device-independence.

eXtensible Markup Language


82

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

Task 2: Identify the structure of the document in which data is to be stored.

<section id=01> Introduction to XML </section> </material>

Before you store data in an XML document, you need to organize it. An XML document is composed of a number of components that can be used for representing information. These components are:
Processing Instruction
An XML document usually begins with the XML declaration statement or the Processing Instruction (PI). The PI provides information regarding the way in which the XML file should be processed.

eXtensible Markup Language


83

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Task 2: Identify the structure (Contd.)


The PI statement can be written as:
<?xml version="1.0" encoding="UTF-8" ?>

In the above example, the PI states that version 1.0 is used. The PI uses the encoding property to specify information about the encoding scheme that is used to create the XML file.

eXtensible Markup Language


84

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Task 2: Identify the structure (Contd.)


Tag
Tags are used to specify a name for a given piece of information. Tags usually occur in pairs. Each pair consists of a start tag and an end tag. The start tag only contains the name of the tag while the end tag includes a forward slash (/) before the name of the tag.

eXtensible Markup Language


85

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Task 2: Identify the structure (Contd.)


Elements
Elements are the basic units that are used to identify and describe data in XML. They are the building blocks of an XML document. Elements are represented using tags. An XML document must always have a root element. All other elements are specified within the opening and closing tags of the root element.

eXtensible Markup Language


86

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Task 2: Identify the structure (Contd.)


Content
The information that is represented by the elements of an XML document is referred to as the content of that element.

An element can contain any of the following:


Character or Data Content
Elements can contain only textual information.

Element Content
Elements can contain other elements. The elements contained in another element are called child elements. The containing element is called the eXtensible Markup Language parent element.
87 Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Task 2: Identify the structure (Contd.)


Element Content
A parent element can contain many child elements. All the child elements of a parent element are siblings and are thus related to one another.

Combination
Elements can contain textual information as well as other elements.

eXtensible Markup Language


88

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Task 2: Identify the structure (Contd.)


Attributes
Attributes provide additional information about the elements for which they are declared. An attribute consists of a name-value pair. Elements can have one or more attributes. Attributes or attribute values can be either mandatory or optional.

eXtensible Markup Language


89

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Task 2: Identify the structure (Contd.)


Attributes
While deciding whether to represent information as an element or an attribute, you can follow the guidelines given below:
If the data must be displayed, you can represent it as an element. In general, element attributes are used for intangible, abstract properties such as ID. If the data must be updated frequently, it is better represented as an element because it is easier to edit elements than attributes with XML editing tools. If the value of a piece of information must be checked frequently, it may be represented as an attribute.

eXtensible Markup Language


90

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Task 2: Identify the structure (Contd.)


Entities
An entity can be described as a short cut to a set of information. It is a name that is associated with a block of data. This data can be a chunk of text or a reference to an external file that contains textual or binary information. XML supports the use of three kinds of entities: internal, general, and parameter entities.

eXtensible Markup Language


91

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Task 2: Identify the structure (Contd.)


Internal Entities
An internal entity consists of a name that is associated with a block of information. This can be identified easily as it is always preceded by an ampersand (&) symbol and terminated with a semicolon.

eXtensible Markup Language


92

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Task 2: Identify the structure (Contd.)


Comments
Comments are statements that are used to explain the code. When the code is executed, comment entries are ignored by the parser. Comments are not essential in an XML file.

eXtensible Markup Language


93

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Task 2: Identify the structure (Contd.)


Result Structure of the XML document to be used for storing products data:
PRODUCTDATA PRODUCTDATA PRODUCT PRODUCT PRODUCTNAME PRODUCTNAME DESCRIPTION DESCRIPTION PRICE PRICE QUANTITY QUANTITY

eXtensible Markup Language


94

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Task 3: Create an XML document to store data.

Rules for Creating Well-formed XML Documents


Every start tag must have an end tag. Empty tags must be closed using a forward slash (/). All attribute values must be given in double quotation marks. Tags must nest correctly. XML tags are case-sensitive. They must match each other in every implementation.

eXtensible Markup Language


95

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Task 4:View the XML document in a browser.

eXtensible Markup Language


96

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Problem Statement 1.P.1


The details of books sold by CyberShoppe need to be stored at a centralized location. This data needs to be made available to the various branches of CyberShoppe, regardless of the platforms used at various branches. The book details consist of the title of the book, the first and last names of the author of the book and the price of the book. Each book is uniquely identified by a book ID.

eXtensible Markup Language


97

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Summary (Contd.)
An XML document consists of:
Processing Instructions Elements Attributes Entities Comments Content

eXtensible Markup Language


98

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Summary (Contd.)
The rules that govern the creation of a well-formed XML document are as follows:
Every start tag must have an end tag. Empty tags must be closed using a forward slash (/). All attribute values must be given in double quotation marks. Tags must nest correctly. XML tags are case-sensitive. They must match each other in every implementation.

eXtensible Markup Language


99

Lecturer: Phan Vo Minh Thang MSc.

<?xml version=1.0> <material> XML Lectures Notes

<section id=01> Introduction to XML </section> </material>

Info

Course name:

Special Selected Topic in Information System


Section: Introduction to XML Number of slides: 100 Updated date: 12/02/2006 Contact: Mr.Phan Vo Minh Thang
(minhthangpv@hcmuaf.edu.vn)
eXtensible Markup Language
10 0 Lecturer: Phan Vo Minh Thang MSc.

You might also like