Professional Documents
Culture Documents
2 An element can be of the following types: Empty: Empty elements have no content and are marked up as <emptyelement/>. Unrestricted: The opposite of an empty element is an unrestricted element, which can contain any element declared elsewhere in a DTD. Container: Container elements can contain character data and other elements. Declaring Empty Elements: An empty element can be declared by specifying the content type as EMPTY. Consider the following example: <!ELEMENT emptyelement EMPTY> In this example, the element emptyelement is declared and the content type is specified as EMPTY. In this case, emptyelement can contain attributes. However, it cannot contain textual content or other elements. Declaring Unrestricted Elements: An unrestricted element can be declares by specifying the content type an ANY. Consider the following example: <!ELEMENT anyelement ANY> In this example, the element anyelement is declared and its content-type is specified as ANY. In this case, anyelement can contain any typeof data , including other elements that are declared elsewhere in a DTD. Declaring Container Elements: Using element declaration in a DTD, you can specify which other elementsare allowed inside an element, how often they may appear, and in what order. You do this by specifying the element content model. Consider the following structure: <BOOK> <TITLE> LET US C <\TITLE> <AUTHOR> YASHWANT KANETKAR <\AUTHOR> <\BOOK> For this XML document to be valid, you need to create a DTD that contains declaration of three elements: BOOK, TITLE, AUTHOR. In addition, you also need to decide whether TITLE and AUTHOR are mandatory or optional, whether they can be in any order or have to be in a specific order, and the number of times they can appear in an XML document. You can write element declarations for these decisions. For example, if both TITLE and AUTHOR have to be specified and TITLE should be followed by AUTHOR, the DTD would be written as: <! ELEMENT BOOK (TITLE, AUTHOR)> <!- Element content -> <! ELEMENT TITLE (#PCDATA)> <!- Character content -> <! ELEMENT AUTHOR (#PCDATA)> <!- Character content -> In this code, the BOOK element is declared with TITLE and AUTHOR as child elements. The TITLE and AUTHOR elements have the content type as PCDATA(Parsable Character Data). PCDATA is used to represent character content. PCDATA is prefixed with a hash (#) symbol so that it is not confused with a normal element name. In a DTD, different symbols are used to specify whether an element is mandatory or optional and whether it can occur more than once.
3 Table given below lists the various symbols used whi8le specifying the element content in a DTD. Symbol Meaning Example Description , and TITLE,AUTHOR TITLE and AUTHOR , in that order | or TITLE|AUTHOR TITLE or AUTHOR ? optional, can occur AUTHOR? AUTHOR need not only once within the be present, but if it parent element is present, it can occur only once * Can be zero or (TITLE|AUTHOR)* Any number of multiple occurrences TITLE or of the element order AUTHOR elements can be present. + At least one AUTHOR+ Can have multiple occurrence of the AUTHOR element; Can have elements multiple occurrences within the parent element Declaring Attributes In addition to declaring elements, you can also declare attributes in a DTD. These declarations are used during the process of validation to check the structure of an XML document. The syntax for declaring attributes in a DTD is: <! ATTLIST elementname attributename valuetype [attributetype] [default]> The attributename valuetype [attributetype] [default] section is repeated as often as necessary to create multiple attributes for an element. Each attribute declaration must include the attribute name and a value type. For assigning the values to an attribute you must know the different types of values that can be assigned to attributes. The following table shows various value types that can be specified for an attribute in a DTD. Table: Value Types used in a DTD Value Type Description PCDATA Used to represent plain text values ID Used to assign a unique value to each element in the document; must begin with an alphabetic character (enumerated) Used to assign a specific range of values; values are specified within parenthesis
4 In addition to specifying the value type of an attribute, you also need to specify whether the attribute is optional or mandatory. You can do so by setting the attribute type in a DTD. The attribute types are displayed in the following table. Table: Attribute Types used in DTD Attribute Type Description REQUIRED If the attribute of an element is specified as #REQUIRED, then the value for that attribute must be specified each time the element is used in an XML document. If the value for the REQUIRED attribute is not specified, the XML document will be invalid. FIXED If the attribute of an element is specified as #FIXED, then the value of the attribute cannot be changed in an XML document. IMPLIED If the attribute of an element is specified as #IMPLIED, then the attribute is optional. In other words, an IMPLIED attribute need not be used each time its associated element is used. An IMPLIED attribute can take text strings as its values. Consider the example, <! ATTLIST PRODUCT PRODID #REQUIRED> An attribute called PRODID is declared for the PRODUCT element. The value type for this attribute is set to ID, which indicates that the value of PRODID is unique for each appearance of the PRODUCT element in an XML document. Types of DTDs A DTD can be classified into two types: 1. Internal DTD and 2. External DTD Table shows the difference between Internal and External DTD. Internal DTD External DTD A part of an XML document Is maintained as a separate file. A reference to this file is included in an XML document Can be used only by the document Can be used across multiple in which it is created and cannot documents. be used across multiple documents Validating the structure of Data To validate the structure of the data stored in an XML document against a DTD, you need to use parsers. Parsers are software programs that check the syntax used in XML file. There are two types of Parsers. 1. Nonvalidating parsers 2. Validating parsers
Nonvalidating Parsers A nonvalidating parser checks if a document follows the XML syntax rules. It builds a tree structure from tags used in an XML document and returns an error only when there is a problem with the syntax of the document. Nonvalidating parsers process a document faster than a validating parser because they do not have to check every elements against a DTD. In other words, these parsers check whether an XML document adheres to the rules of well formed documents. Validating Parsers A validating parser checks the syntax of the elements, builds the tree structure of an XML document, and compares the structure of an XML document with structure specified in the DTD associated with the document. In other words, in addition to checking whether an XML document is well formed, validating parsers also check whether the XML document adheres to the rules in the DTD used by the XML document. Example for DTD: <?xml version1.0 encoding=UTF-8?> <DOCUMENT> <GREETING> Hello From XML <\GREETING> <MESSAGE> Welcome to the world of XML. <\MESSAGE> <\DOCUMENT> Most of the XML browsers will check your document to see whether it is well formed. Some of them also can check whether its valid. An XML document is valid if there is a Document Type Definition(DTD) associated with it and if the document compiles with that DTD. A documents DTD specifies the correct syntax of the document. DTDs can be stored in a separate file or in the document itself, using <!DOCTYPE> element.
6 The previous example using DTD becomes: <?xml version1.0 encoding=UTF-8?> <?xml stylesheet type=text/css href=greeting.css?> <! DOCTYPE DOCUMENT [ <! ELEMENT DOCUMENT (GREETING, MESSAGE)> <! ELEMENT GREETING (#PCDATA)> <! ELEMENT MESSAGE (#PCDATA)> ]> <DOCUMENT> <GREETING> Hello From XML <\GREETING> <MESSAGE> Welcome to the world of XML. <\MESSAGE> <\DOCUMENT> DTD indicates that you can have <GREETING> and <MESSAGE> elements inside a <DOCUMENT> element, that the <DOCUMENT> element is the root element and that the <GREETING> and <MESSAGE> elements can hold text.