JISC Metadata Application Profiles, Data Models and Interoperability

JISC Metadata Application Profiles, Data Models and
Interoperability
Pete Johnston, Eduserv Foundation & Rosemary Russell, UKOLN
1 Introduction
The JISC Repositories & Preservation Programme has funded a number of projects to
work on metadata application profiles for the description of a range of resources. The
metadata application profiles developed so far - the Scholarly Works Application
Profile (SWAP)1, the Images Application Profile (IAP)2, the Geospatial Application
Profile (GAP)3 - are all Dublin Core Application Profiles, i.e. they are based explicitly
on the Dublin Core Abstract Model. The project currently working on the profile for
Time-Based Media (TBMAP) is also operating on this basis.
The SWAP, the IAP and the TBMAP each have as their focus the description of a
particular class or genre of resources. The GAP differs slightly in that it is intended to
be used in conjunction with other profiles; and it focuses on a specific set of
characteristics which may be applied to resources of many different types, the
distinguishing characteristic being that they have some relationship with “place”.
This note provides some issues for discussion around the possible uses of the profiles.
It should be emphasised that it is an attempt to raise some tentative questions, rather
than to provide definitive solutions.
2 Dublin Core Application Profiles, the DCMI Abstract Model

and Domain Models
The DCMI Abstract Model (DCAM)4 describes an information structure called a DC
“description set” and specifies how those description sets are to be interpreted as
providing information about resources and the relationships between resources. It is a
model for/of metadata.
Although the current DCMI specification probably doesn’t make this as clear as it
should, the DCAM is based on RDF, so a DC description set is, or can be mapped to,
an RDF graph, and the DCAM uses the RDF and RDFS semantics rules for the
merging of data and for inferencing on that data.5
Note that the DCAM doesn’t specify either a model of the “world” being described by
any particular description set, nor does it specify any particular set of metadata terms
to be referenced by a description set. A DC description set is not limited to using the
set of terms defined/owned by DCMI, and indeed it may be the case that a DC
description set references none of those terms, and refers only to terms defined/owned
by agencies other than DCMI.
A Dublin Core Application Profile (DCAP)6 is a specification which describes the
construction of some specific set of DC metadata records. It provides:
• A specification of the requirements to be addressed, what functions/operations
the metadata should support
• A “domain model” of the entities to be described and their attributes and
relationships (based on the requirements to be supported)
1
• A specification of the structural constraints on the description sets, used to
represent instances of that domain model, i.e. a specification of the resources that
may be described, the properties referenced in statements, and the way value
surrogates are provided
• A set of guidelines for how to apply the profile
• A specification of any concrete syntax(es) to be used
i.e within the Dublin Core metadata framework, the choice of domain model and the
choice of vocabulary are addressed at the level of the DC Application Profile. It is
important to note that there is no “global” domain model provided by DCMI to
underpin all DC metadata. In particular, “Simple Dublin Core” is just one DCAP,
based on a (not always clearly articulated) “domain model” in which all resources are
treated as having the same set of 15 attributes. The same set of properties used within
the Simple Dublin Core DCAP (i.e. the Dublin Core Metadata Element Set) may be
deployed in whole or in part in the context of other DCAPs based on other domain
models.
It is perhaps also worth emphasising that two different DCAPs might be based on the
same domain model, or on variants of a single domain model, but differ in terms of
the sets of metadata terms referenced and/or the structural constraints imposed by the
description set profile.
3 Linking, Merging and Querying

3.1 Relationships & Linking
References to resources within a DC description set are made using URIs, and there is
no constraint on whether those URIs are owned by the provider of the description set.
A description set can make statements about relationships between any two resources,
i.e. “anyone can say anything about anything”.
For an individual DCAP, the types of relationships supported are determined by the
design of the domain model. So an individual DCAP typically refers to various
properties for expressing specific types of relationships between resources of specific
types: “A resource of type Book is-created-by a resource of type Person”, “a resource
of type Work is-realized-in a resource of type Expression”, and so on.
3.1.1 Linking based on a Single DCAP

Consider the case of two “repository” services exposing metadata based on the same
DC Application Profile. e.g. two repositories based on the Scholarly Works
Application Profile.
Because the DCAP imposes no limitations on the URIs of the resources described
within an individual description set, it is quite possible for a description set exposed
by repository B to express a relationship with a resource described in a description set
exposed by repository A, and vice versa. The only requirements are that the DCAP
supports the required property/relationship type, and any other structural constraints
on the description of the resource are met.
2
3.1.2 Linking based on Multiple DCAPs
Now consider the case of two “repository” services exposing metadata based on two
different DCAPs, based on two different domain models. There may be perfectly
good reasons for those differences, based on the requirements the DCAP designers set
out to address. Although in principle, the same consideration as above applies, and a
description set exposed by repository B to express a relationship with a resource
described in a description set exposed by repository A, and vice versa, the additional
factor to consider here is the compatibility of the domain models underpinning the
two DCAPs.
If for example, repository B deploys a FRBR based model, the expectation may be
that the type of relationship in question is expressed between two FRBR Works or
between two FRBR Expressions. This is the case, for example, with whole-part
relationships in FRBR. But if the data exposed by repository A is based on a different
model which does not include concepts of Work and Expression, then it may not be
clear how those relationships can be expressed using the FRBR-based model.. The
owner of repository B is faced with the choice of remodelling/redescribing the
resources within the FRBR model, or trying to adapt their model to encompass some
more general relationship types.
3
For examples of this sort of scenario, the Time-Based Media project has a use case
involving relationships between still images and videos from which the stills are
taken; and similarly, although the Scholarly Works Profile addressed relationships
between paper and presentations (as distinct Works), it may also be useful to capture
the relationships with audio and video captures of presentations.
3.2 Aggregating, Merging, & Querying

If services expose metadata records based on DCAPs such as the SWAP and the IAP,
then other services can aggregate those records and offer functionality across the
merged dataset.
The common use of the DCAM/RDF models provides the basic rules for merging
data, based on the use of URIs to identify resources (though some additional
information may be required if two sources have independently used different URIs to
refer to the same resource, which might happen if e.g. two different repositories using
the SWAP independently catalogue different expressions of the same scholarly work).
3.2.1 Aggregation/Merging of Metadata based on a Single DCAP

Consider the case of an aggregator service which draws data from two sources, both
based on the same DC Application Profile.
The common use of the metadata vocabularies and structural patterns for the use of
those vocabularies specified by a single DCAP facilitate the use of predictable query
patterns on the aggregated dataset.
So e.g. two datasets based on the Scholarly Works Application Profile can be merged
relatively easily, and the same query patterns can be applied to the aggregated dataset
as to the two individual datasets.
3.2.2 Aggregation/Merging of Metadata based on Multiple DCAPs

Consider the case of an aggregator service which again draws data from two sources,
but this time based on two different DC Application Profiles.
Again, the common use of the DCAM/RDF model provides the basic rules for
merging data.
4
And again in this scenario, the aggregator’s knowledge of the vocabularies and
structural patterns specified by the two DCAPs means that query patterns can be
predictably constructed.
What is different in this case, however, is that, depending on the differences between
the two profiles, the aggregator may need to apply multiple query patterns across the
merged data.
If the individual DCAPs are based on a common “domain model”, and use some
common vocabulary of metadata terms, even if the profiles also differ in some more
specific aspects. then some common query patterns may apply alongside some
“profile-specific” patterns. The greater the divergence between the two domain
models, the less likely it is that common query patterns will be usable, and the more
likely it is that “profile-specific” patterns are required.
As the number of DCAPs and the number of different models increases, so the
number of different query patterns to be managed by the aggregator increases. This
doesn’t make such querying impossible: it just increases the level of complexity to be
managed by the aggregator.
4 The JISC DC Application Profiles: Some Questions for

Consideration
Both the DCAM and RDF are designed to support the use of diverse domain models
and of multiple independently created vocabularies. They impose no requirement for
the use of any single domain model or any single vocabulary.
A DCAP introduces a set of constraints on DC description sets, based on the
specification of a domain model, selected or designed to meet some set of
requirements.
The JISC-funded DCAPs have been developed on a “per resource type” basis, and
this should facilitate semantic interoperability between services exposing metadata
using a single DCAP, so e.g. the development of services aggregating data describing
ePrints based on SWAP, or data describing images based on IAP, and so on. Similarly
the use of the profiles should enable the expression of relationships between resources
described by different data providers using the same profile.
The other question to be considered is the extent to which it is necessary to perform
operations which “cut across” the different resource types and their descriptions
constructed using the different DCAPs (e.g. how to capture the information that
documents are transcripts of videos, Powerpoint presentations are accompanied by
audio or video records of their delivery, and so on).
So
• Are instances of these different resource types created, described and used
independently of each other, or are they sometimes created, described and used in
combination?
• Is it necessary to express relationships between resources covered by different
DCAPs?
• Is it necessary to merge and query aggregated data based on different DCAPs?
5
In these contexts, the similarities and differences between the models underpinning
the different DCAPs may become significant. This might be addressed in various
ways:
• Is it possible to simply extend the individual models?
• Is it necessary/desirable to try to “harmonise” those individual models?
• Is it necessary/desirable to map from those individual models into a separate
model which does try to address the full range of resource types within a single
model? If so, what are the options? Simple Dublin Core? BIBO? FRBR? CIDOC
CRM? Something else?
6
1
Scholarly Works Application Profile (SWAP)
http://www.ukoln.ac.uk/repositories/digirep/index/Eprints_Application_Profile
2
Images Application Profile (IAP) http://www.ukoln.ac.uk/repositories/digirep/index/Images_Application_Profile
3
Geospatial Application Profile (GAP) http://www.ukoln.ac.uk/repositories/digirep/index/Geospatial_Application_Profile
4
DCMI Abstract Model (DCAM) http://dublincore.org/documents/2007/06/04/abstract-model/
5
RDF Semantics http://www.w3.org/TR/2004/REC-rdf-mt-20040210/
6
The Singapore Framework for Dublin Core Application Profiles http://dublincore.org/documents/2008/01/14/singapore-
framework/

JISC Metadata Application Profiles, Data Models and Interoperability

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

JISC Metadata Application Profiles, Data Models and Interoperability

Uploaded by

Copyright:

Available Formats

JISC Metadata Application Profiles, Data Models and

2 Dublin Core Application Profiles, the DCMI Abstract Model

3 Linking, Merging and Querying

3.1.1 Linking based on a Single DCAP

3.2 Aggregating, Merging, & Querying

3.2.1 Aggregation/Merging of Metadata based on a Single DCAP

3.2.2 Aggregation/Merging of Metadata based on Multiple DCAPs

4 The JISC DC Application Profiles: Some Questions for

You might also like