You are on page 1of 60

Dublin Core Metadata Initiative

Stuart Weibel
OCLC Office of Research Director, Dublin Core Metadata Initiative

Presentation Outline
Introduction to Metadata Dublin Core Metadata Initiative Metadata Registries Syntax Alternatives for Web Metadata A Few Strategic Applications

Introduction to Metadata

The Web as an Information System


Search systems are motivated by business models, not user needs Index coverage is unpredictable and limited Too much recall, too little precision Index spam abounds Resources (and their names) are volatile Archiving is presently unsolved Authority and quality of service are spotty Managing intellectual property rights is hard
4

Metadata: Part of a Solution


Structured data about data
Organization and management of content Support discovery Direct content in channels Enable automated discovery/manipulation

Internet Commons includes Multiple Communities


Home Pages Commerce
Library
Geo

Scientific Data

Internet Commons
Museums

Whatever...

requires conventions about:


Semantics
The meaning of the elements

Interoperability

Structure
human-readable machine-parseable

Syntax
grammars to convey semantics and structure
7

Havent we done metadata already?

The MARC family of standards is the single most successful resource description standard in the world

Whats wrong with this model on the Web?


Expensive
Complex Professional catalogers required

Bias towards bibliographic artifacts

Anglo-centric

Fixed resources Incomplete handling of resource evolution and other resource relationships MARC 21 accounts for of MARC records, but there are other varieties
9

Dublin Core Metadata Initiative

History of the Dublin Core


1994: Simple tags to describe Web pages 1995: The Dublin Core is one of many vocabularies needed ("Warwick Framework") 1996: The Dublin Core: 13 elements expanded to 15 - appropriate for Text and Images 1997: WF needs formal expression in a Resource Description Framework (RDF) 2000: Dublin Core Metadata Initiative recommends qualifiers, broadens its organizational scope beyond the Core
11

Dublin Core Metadata Initiative


The mission of DCMI is to make it easier to find resources using the Internet through the following activities:
Developing metadata standards for discovery across domains (example: the Dublin Core) Defining frameworks for the interoperation of metadata sets Facilitating the development of community or disciplinary specific metadata sets
12

DCMI Organizational Structure


Board of Trustees
Executive Director Managing Director

Directorate Advisory Board


DCMI Activity Areas
13

Usage Board DCMI Subscribers

Standards Development Infrastructure


User Support and Education

WGs

WGs

WGs

Liaison

DCMI Activities
Standards development and maintenance Metadata registry and infrastructure Technical working groups and periodic workshops Tutorial materials and user guides Education and training Open source software Liaisons with other standards or user communities
14

Unqualified Dublin Core is the Pidgin metadata language


Metadata is language Dublin Core is a small and simple language -a pidgin -- for finding resources across domains using the internet. Speakers of different languages naturally "pidginize" to communicate

15

Qualifiers and Domain-specific Extensions


The Dublin Core architecture supports more sophisticated metadata solutions through the addition of:
Qualifiers Domain-specific extensions Application Profiles of involving mixed namespaces (more on this later)

Increased sophistication comes at the cost of some degree of interoperability


16

Varieties of Qualifiers: Value Encoding Schemes


Says that the value is a term from a controlled vocabulary (e.g., Library of Congress Subject Headings) a string formatted in a standard way (e.g., "2001-05-02" means May 2, not February 5) Even if a scheme is not known by software, the value should be "appropriate" and usable for resource discovery.
17

Varieties of qualifiers: Element Refinements


Make the meaning of an element narrower or more specific. a Date Created versus a Date

If your software does not understand the qualifier, you can safely ignore it.
18

Modified an IsReplacedBy Relation versus a Replaces Relation

A Grammar of Dublin Core


http://www.dlib.org/dlib/october00/baker/10baker.html

By design not as subtle as mother tongues, but easy to learn and useful in practice Pidgins: small vocabularies (Dublin Core: fifteen special nouns and lots of optional adjectives) Simple grammars: sentences (statements) follow a simple fixed pattern...
19

implied verb implied subject

one of 15 properties
DC:Creator DC:Title DC:Subject DC:Date...

property value (an appropriate literal)

Resource has

property

X
qualifiers (adjectives)

Resource has Subject

"Languages -- Grammar"

Resource has

Date

"2000-06-13"

Dumb-Down Principle for Qualifiers


The fifteen elements should be usable and understandable with or without the qualifiers Qualifiers refine meaning (but may be harder to understand) Nouns can stand on their own without adjectives If your software encounters an unfamiliar qualifier, look it up -- or just ignore it!
22

Using DC with other vocabularies


Specialized application profiles may need to: Use general-purpose Dublin Core elements Use elements from another, more domain-specific standard Narrow standard definitions of DC elements for specific local uses Invent local elements outside the scope of existing standards
23

What is an Application Profile?


A metadata schema incorporating a set of elements from one or more metadata element sets A set of policies defining how the elements should be applied to the domain of the application A set of guidelines that make the policies concerning elements explicit
24

Multiple Namespace Fragment


xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:co="http://purl.org/rss/1.0/modules/company/" <dc:publisher>The O'Reilly Network</dc:publisher> <dc:creator>Rael Dornfest</dc:creator> <dc:rights>Copyright &#169; 2000 O'Reilly &amp; Associates, Inc.</dc:rights> <dc:date>2000-01-01T12:00+00:00</dc:date> <dc:description> XML is placing increasingly heavy loads on the existing technical infrastructure of the Internet. </dc:description> <co:name>XML.com</co:name> <co:market>NASDAQ</co:market> <co:symbol>XML</co:symbol>

26

Namespaces and Translation


Dublin Core has been translated into 26 languages machine-readable tokens are shared by all human-readable labels are defined in different languages translations are distributed, maintained in many countries eventually linked in DCMI registry
28

with labels in many languages

One concept identifier

Verfasser

rdfs:label

rdfs:label

dc:creator

Creator

[German]
rdfs:label Pencipta

[English]

[Indonesian]

29

Metadata Registries: Dictionaries of Metadata terms and Usage

Metadata is language
Metadata schemas are languages for making statements about resources:
Book has Title "Gone with the Wind". Web page has Publisher "Springer Verlag".

Vocabulary terms (elements) are defined in standards like Dublin Core Metadata grammars constrain the statements and data models one can form
31

Metadata languages are Multilingual


Metadata is not a spoken language The words of metadata -- "elements" -- are symbols that stand for concepts expressible in multiple natural languages Standards may have dozens of translations Are concepts like "title", "author", or "subject" used the same way in English, Finnish, and Korean?
32

Languages Evolve With Use


Inevitably, languages resist stability People stretch official definitions Implementers misunderstand the intended meaning or use of elements Implementors coin local terms and extensions If the application does not fit the standard, the standard is often "customized" to fit the application
33

How do we manage this evolution?


How can we monitor the usage of a language that is:

How can dictionary editors help a metadata language evolve and grow in response to usage? How can this evolution occur across (human) languages?
34

Never spoken? Rarely published in a way that can be harvested?

RDF Schemas (RDFS) -W3C standard


A dictionary format for metadata terms: Example: "Title" (Dublin Core)
Simple XML format for namespaces, terms and definitions Human-readable label and definition: Unique, machine-readable identifiers
Title: A name given to the resource. dc:title

Support for cross-references

Between multiple language renditions of a namespace between terms in related standards between local adaptations and related standards
35

Registries can function as dictionaries


Metadata dictionaries can help metadata vocabularies evolve more like other human languages
Not just top-down, like traditional standards Also bottom-up, in response to usage

36

DCMI Metadata Registry


Stores official metadata element definitions in a central database or repository Managing a namespace (as a standards agency): publish qualifiers as available, with version control
Managing translations of the standard in multiple languages

Eventually:
User guide interface Support for standardisation processes (peer review) Downloadable input to software tools for generating, editing, validating DC metadata
37

Dictionaries as a tool for harmonization


Knowledge of how other projects are using standards will avoid "reinventing the wheel" To help information providers harmonize their schemas for improved access within domains:
Between countries (Nordic Metadata Project) Preprint repositories (Open Archives Initiative) Subject gateways (Renardus) Theses and dissertations (NDLTD) Mathematics and physics (MathNet, PhysNet)
38

A global registry infrastructure?


RDF Schema format suggests a scalable ecology of metadata vocabularies on the Web Sharing machine-readable elements translated into many languages suggests a global (multilingual) metadata language for digital libraries Can a well-managed registry infrastructure allow this language to evolve -- with flexible innovation in usage alongside more stable standards?

39

for Schema Infrastructure


Harvests RDF Schemas
Schemas distributed on multiple Web servers Creates huge database of schemas for searching Web interface functions as a "metadata browser" Click on cross-references between linked terms

EOR -- an RDF Toolkit

Downloadable as open source software http://eor.dublincore.org/


40

EOR Toolkit
Integrate RDF components for supporting search services, topic-maps, site-maps, annotation environments and semantic metadata registries Base-level functionality of this toolkit includes:
Creation, deletion, and management of RDF databases. Ability to infuse RDF instance data into RDF databases. Ability to search RDF databases. Generic interface design capabilities to support RDF applications. Web interface functions as a "metadata browser

Open Source: http://eor.dublincore.org


41

Syntax Alternatives for Web Metadata

Syntax Alternatives: HTML


Advantages:
Simple Mechanism META tags embedded in content Widely deployed infrastructure (the Web) Public domain tools

Disadvantages
Limited structural richness (wont easily support hierarchical,tree-structured data or entity distinctions ).
43

Syntax Alternatives: XML


The standard for networked text and data Wide-spread tool support
Parsers (DOM and SAX) Extensibility (namespaces) Type definition (XML Schema) Transformation and Rendering (XSLT) Rich linking semantics (XLINK)

44

XML DTDs
Works, but DTDs are a stopgap measure
Extensibility is problematic Many ways to say the same thing (too much flexibility) Interoperability must be pre-coordinated DTDs cannot evolve gracefully Granularity is at the level of the DTD

45

XML Schemas
Rich XML-based language for expressing type semantics Replaces arcane and limited DTD (origin in SGML) Facilities
Data typing (both complex and primitive) Constraints Defaults
46

Syntax Alternatives: RDF


RDF (Resource Description Format) The instantiation of the Warwick Framework on the Web Rich data model supporting notions of distinct entities and properties Syntax expressed in XML Granularity is at the level of the element, not the entire schema as with XML DTDs
47

RDF Components
RDF Model and Syntax WG
Formal data model Syntax for interchange of data

RDF Schema (RDFS)


Type system (schema model)

48

RDF Schemas
Declaration of vocabularies
properties defined by a particular community characteristics of properties and/or constraints on corresponding values

Schema Type System - Basic Types


Property, Class, SubClassOf, Domain, Range Minimal (but extensible) at this time minimize significant clashes with typing system designed for XML Schema WG

Expressible in the RDF model and syntax


49

RDF: In Summary
RDF Metadata transmission
Embedded (e.g. <META>), Transmitted with resource (HTTP), or from a trusted 3rd Party

RDF Data Model


Support consistent encoding, exchange and processing of metadata critical when aggregating data from multiple sources

RDF Schema
Declare, define, reuse vocabularies
50

Unresolved Issues Concerning RDF and XML Schemas


RDF Schemas and XML Schemas have overlapping functionality

Resolution of overlap and market acceptance will determine the future of each Semantic Web Activity in the W3C Chartered to address such issues: http://www.w3.org/2001/sw
51

XML Schemas provide strong data typing, but also supports semantic specifications RDF is focused on semantic data model and extensible namespace management

A Few Strategic Projects

Open Archives Initiative http://www.openarchives.org


Protocols to support alternative scholarly publishing solutions: Federated repositories for:
ePrints Libraries Publishers

OAI archives may contain full text or surrogates (metadata) Metadata harvesting protocols
53

OAI Metadata

OAI archives will use specific metadata sets and formats that suit the needs of their communities and the types of data they handle. However, interoperability depends on a shared format for exchanging metadata and therefore archives should implement the basic Open Archives Metadata Set.

54

OAI Metadata Solutions


Adoption of unqualified Dublin Core Element Set as required metadata. Support for parallel metadata sets maintained
EPMS (e-print community) Others
Research library community Museum community

55

Renardus Project (EU)


http://www.konbib.nl/coop/reynard

Goal: integrated access to subject gateways in Europe High-level agreement on simple, DublinCore-based schema as common denominator

National libraries (Netherlands coordinates) NDR: National Digital Resource in UK Die Deutsche Bibliothek

56

Networked Digital Library of Theses and Dissertations (NDLTD)


http://www.ndltd.org International consortium of projects putting dissertations online NDLTD agreement on a small Dublin-Corebased set of metadata elements with extensions to support application-specific needs http://www.ndltd.org/standards/metadata /current.html
57

Publishing Requirements for Industry Standard Metadata


PRISM XML metadata standard for syndicating, aggregating, post-processing and multi-purposing content from magazines, news, catalogs, books and mainstream journals. Uses DC and its relation types as the foundation for its metadata Adobe, Time, Inc, Getty Images, Conde Nast, Sothebys, Interwoven. http://www.prismstandard.org
58

PRISM

Rich Site Summary (RSS) http:/purl.org/RSS


Metadata for content syndication (news feeds) Used in developing media content portals Built on established vocabularies (DC), using RDF syntax Layers of application-specific semantics: syndication vocabularies, annotation vocabularies, etc.
59

For further information....


"Metadata Watch Reports" of SCHEMAS Project, http://www.schemas-forum.org
Critical overview (with expert commentary) on the metadata landscape as it evolves Related database of individual activity reports

D-Lib Magazine, http://www.dlib.org/dlib/ Ariadne, http://ariadne.ac.uk DCMI Homepage, http://dublincore.org


60

DC-2001
DC-2001 in Tokyo
October 22-26, 2001

Three tracks:
Technical working group meetings Implementation reports and research papers General introduction and tutorials for nonexperts
61

How to Participate
Join the DC-General mailing list Join a working group Create a working group Information on lists and working groups is available at

http://dublincore.org

62

You might also like