Project Number: | RE 4004 (RE) |
Project Title: | DESIRE II - Development of a European Service for Information on Research and Education II |
Deliverable Type: | PU |
Deliverable Number: | D3.5 |
Contractual Date of Delivery: | 31.12.99 |
Actual Date of Delivery: | 31.03.00 |
Title of Deliverable: | DESIRE metadata registry framework |
Workpackage(s) contributing to the Deliverable: | WP3 |
Nature of the Deliverable: | RE |
Author: | Rachel Heery, Tracy Gardner, Michael Day, and Manjula Patel |
Contact Details: | UKOLN: the UK Office for Library and Information Networking Tel: +44 1225 826580 |
URL |
Abstract | Metadata registries enable authoritative information about metadata schemes to be declared and thus support the extensibility and evolution of element sets and provide some basis for interoperability. The DESIRE metadata registry demonstrates how a metadata registry might work. Elements from several different metadata element sets, including Dublin Core, have been added. This report gives a detailed technical overview of the DESIRE metadata registry implementation and its data model, additional information on the element sets (namespaces) included in the registry and some comments on metadata mappings and cross-walks. |
Keywords | Metadata registries |
Distribution List: | DESIRE Project Team, European Commission, DESIRE project Web site. |
Issue: | 1.0 |
Reference: | registry-v10.doc |
Total Number of Pages: | 33 |
Issue Number | Issue Date | Reason for Change |
0.1 | 17-02-00 | Version sent to peer-reviewers |
1.0 | 31-02-00 | Slightly revised in accordance with peer-reviewers comments. |
Metadata registries are formal systems that can disclose authoritative information about the semantics and structure of the data elements that are included within a particular metadata scheme. Registries would typically define the semantics of metadata elements, give information on any local extensions in use, and provide mappings to other metadata schemes.
This information would ideally need to be stored in a syntax that is machine-readable (e.g. in XML/RDF) as well as in human readable form so that information can be disclosed to both:
In order to demonstrate the feasibility of creating useful and scalable metadata registries, the DESIRE project has developed a prototype metadata registry. The registry has been implemented using a relational database (mySQL) but a standard Web browser provides both administrative and user interfaces. Access to the registry is available from:
http://desire.ukoln.ac.uk/registry/
The DESIRE registry implementation follows the general principles of the ISO/IEC 11179 standard for the specification and standardisation of data elements. Unlike most ISO/IEC 11179 based registries, however, the DESIRE registry implementation has been designed to present data elements from multiple namespaces in a consistent manner, rather than for the maintenance of authoritative definitions of data elements under a single namespace. This means that in addition to providing basic registry functions, the DESIRE registry implementation can provide mappings between different metadata schemes. Within the registry, data elements are mapped onto a single semantic layer - in this case those defined in the ISO Basic Semantics Register (BSR) - so that the mapping process is simplified if and when new metadata vocabularies are added to the registry.
The registry is based on a data model intended to be rich enough to support the registration of elements from multiple namespaces. This data model is influenced by existing data models (e.g. that developed for Dublin Core) but is not based on them directly.
The prototype registry implementation currently presents information in human-readable format only, due to constraints on time and effort available. The registry is accessible via a Web interface. An index page gives access to browse and search interfaces for all of the entities that can be registered, and to the page that generates crosswalks. For each registered namespace, information is available on its registration authority (who registered or is responsible for it) and the namespace concept to which it belongs (e.g. Dublin Core). Each data element registered is defined within a particular namespace so that elements that share the same name but belong to different namespaces can be identified separately. An application profile groups together sets of elements for use in a particular context. Within the registry, cross-walks between namespaces are automatically generated via the BSR.
Several different element sets have been included within the demonstrator. The Dublin Core is represented by three namespaces that correspond to Dublin Core 1.0 (RFC 2413, 1998), Dublin Core 1.1 and a form of qualified DC. Examples of what might be seen as forms of extended DC are the BIBLINK Core elements used by the BIBLINK project for metadata conversion, and the simple description elements developed as part of an UK eLib supporting study on collection level description. The registry also includes elements from selected ROADS/IAFA Template-Types used by ROADS- based information gateway services.
This report introduces the concept of metadata registries and describes the development and implementation of a prototype registry as part of the DESIRE project. The objective of the deliverable is to demonstrate the usefulness of metadata registries to the developers of metadata schemes and their implementers. The deliverable should also be of interest to those with a research interest in topics like metadata registries, the ISO/IEC 11179 standard, format conversion and metadata cross-walks.
Several different groups concerned with metadata, e.g. the EU-NSF Working Group on Metadata (1998), have suggested that the development of metadata registries will be important to authoritatively define the semantics of metadata schemes, to promote their use, their extensibility and their interoperability with other schemes. The prototype registry described in this report is important because it demonstrates the implementation of an ISO/IEC 11179 compliant registry for the Dublin Core elements, ROADS Templates and other resource discovery metadata schemes. It also tests the automatic generation of cross-walks using an underlying semantic layer provided by the ISO Basic Semantics Register (BSR). It relates to other DESIRE deliverables, in particular to the chapter on interoperability in the DESIRE Information Gateways Handbook (D3.4). This chapter suggests that registries should be developed to provide canonical definitions of all metadata elements within a particular scheme, to disclose information on local usage, and to publish mappings to other schemes.
http://www.desire.org/handbook/3-7.html
The initial input into the prototype DESIRE metadata registry has been the Dublin Core and BIBLINK Core element sets, the eLib simple collection level description elements and selected ROADS templates. This has allowed the developers to test the structure of the registry database as regards elements, qualifiers, local usage, and permitted values. It will now be possible to populate the registry with other schemes in use within the DESIRE project, e.g. the LDAP directory schema used in D3.3.
This report is best read in conjunction with some investigative access to the registry itself, indeed without use of the registry it may prove difficult to fully understand. The registry is accessible at:
http://desire.ukoln.ac.uk/registry/
AIHW
Australian Institute of Health and Welfare.
Application Profile
A set of elements with associated descriptions of usage for use in a particular context e.g. in a project, service or group of collaborating services. An application profile may register elements or schemes that are valid for use with particular elements.
BIBLINK
European project providing a flow of metadata between publishers and national bibliographic services.
BSR
Basic Semantics Register. An ISO standard that identifies and defines semantic components for use in data exchange.
BSU
Basic Semantic Unit.
Cedars
CURL Exemplars in Digital Archives. Project run by CURL and funded by eLib to investigate the problems of digital preservation.
Cross Walk
A mapping from the elements of one namespace to the elements of another namespace.
CSS
Cascading Style Sheets.
CURL
Consortium of University Research Libraries.
Data Element / Element
The realisation of a semantic unit in a particular namespace.
DC
Dublin Core.
DCMI
Dublin Core Metadata Initiative.
DDC
Dewey Decimal Classification.
Element
A data element - in ISO/IEC 11179 terms, a "unit of data for which the definition, identification, representation and permissible values are specified by means of a set of attributes."
Element Usage
A description of the interpretation of a particular element for use in specific contexts. Unlike an element definition, an element usage definition does not introduce a new element name - it describes a local usage of an existing element.
eLib
The Electronic Libraries Programme. A series of UK digital library research projects funded by the JISC.
Enumerated List Scheme
A scheme which specifies a set of valid values - scheme elements. Scheme elements may be registered within the registry, or they may be indicated via a reference to an external definition.
EPA
Environment Protection Agency.
GILS
Global Information Locator Service.
IAFA
Internet Anonymous FTP Archive.
IEC
International Electrotechnical Commission.
IETF
Internet Engineering Task Force.
IMS
IMS Global Learning Consortium, an international consortium of over 200 educational, commercial and governmental organisations with the aim of promoting technical specifications for management tools and educational content supporting distributed learning. The abbreviation once stood for "Instructional Management Systems."
Indecs
An international initiative of rights owners, creating metadata standards for e-commerce.
ISO
International Organization for Standardization.
JISC
Joint Information Systems Committee of the UK Higher Education Funding Councils.
LCSH
Library of Congress Subject Headings.
LDAP
Lightweight Directory Access Protocol.
MARC
Machine-Readable Cataloguing.
MPEG
Moving Picture Experts Group.
MPEG-7
Formally named "Multimedia Content Description Interface", MPEG-7 aims to create a standard for describing multimedia content data.
Namespace
A scoping device used for uniquely identifying registered entities. Identically named entities in different namespaces can be distinguished. A namespace is identified via a URL including a name and version identifier.
Namespace Concept
The shared basis for different versions of a namespace. A namespace is introduced as a version of a particular namespace concept.
NEDLIB
European project attempting to create the infrastructure underlying a networked European deposit library.
NHIK
National Health Information Knowledgebase. A registry run by the Australian Institute of Health and Welfare.
NSF
National Science Foundation.
PHP
An open-source, cross-platform, HTML-embedded scripting language used to create dynamic web pages.
Qualifier
A term that helps to refine the meaning of an element or attribute value. These are sometimes separated into 'element qualifiers' (that refine the semantics of elements) and 'value qualifiers' (contextual information about an element value, e.g. a 'scheme').
RDF
Resource Description Framework.
Registration Authority
Any organisation authorised to register data elements.
ROADS
Resource Organisation and Discovery in Subject-based Services. Software toolkit for Internet information gateways initially funded as part of eLib.
Rule Set Scheme
A scheme specified by a set of rules that define or describe valid values. The rule set is indicated via a reference to an external definition. The semantics of rule sets cannot be captured in any way within the registry at present.
Schema
Detailed formal descriptions of metadata element sets, e.g. as in RDF Schemas.
Scheme
A description or specification of valid values, e.g. a type of qualifier. One or more schemes may be associated with an element to specify valid values.
Semantic Unit
An informational element described independently of a specific namespace.
Semantic Layer
The set of semantic units registered in a registry.
SQL
Structured Query Language.
Sub Element
An Element which refines the definition of an existing element. The sub element inherits the definition associated with the element it refines.
Value Components Scheme
A scheme which splits a value domain into multiple value components. A valid value is then made up of a tuple of valid values from the value components. Note that it is the tuple that is a valid value - not each of the values associated with value components.
Vocabulary
A list of element terms used by a particular metadata namespace. For example, in RDF terms, an RDF Schema will describe a vocabulary developed to suit specific needs, e.g. the Dublin Core RDF vocabulary.
New patterns for managing metadata are emerging in relation to the various process of metadata creation, maintenance of the metadata repository and inter-working between other services. Humans and software are involved in these processes and need to be able to locate information about the metadata schema that exist. One way of being able to declare information about the structure and semantics of metadata element sets is through the development of metadata registries.
Metadata registries have been defined as (Bargmeyer, et al., 1997) "a formal system that records the semantics, structure, and interchange formats of any type of data." The EU-NSF Working Group on Metadata (1998) has elaborated their purpose:
The metadata schemas available on the Web will form a global collection of namespaces that will effectively function as a distributed registry. These registries will need to be managed, coordinated, and ultimately connected. Registries will define the elements of metadata schemas in a machine-readable syntax (e.g., RDF) and offer authoritative listings of legal values, local extensions, mappings to other schemas, and guidelines for good usage. They will serve both humans, with readable text, and programs, with structured content that can automatically be parsed. Their role will be both to promote and to inform, thereby encouraging the use of standard formats.
Metadata registries, therefore, are systems that are designed to disclose authoritative information about the structure and semantics of metadata element sets for both:
In short, metadata registries permit both the extensibility and interoperability of metadata element sets (Heery, 1997). This is particularly important because there are a growing number of metadata schemas now under development. Most have been designed for particular purposes but will need to interoperate with metadata from other schemes. To give an idea of the extremely broad basis of this development work, metadata schemes have been (or are being) developed for quite different domains and with quite different functional requirements. For example:
A small number of metadata registries have been developed. Some are structured according to the ISO/IEC 11179 standard, " Specification and Standardization of Data Elements."
The Australian Institute of Health and Welfare (AIHW) maintains its National Health Information Knowledgebase (NHIK) as an electronic storage site for Australian health metadata. The registry provides information about the use of particular data elements as well as definitions and information about permitted values. The Knowledgebase has been constructed according to the ISO/IEC 11179 standard.
http://www.aihw.gov.au/services/health/nhik.html
The US Environment Protection Agency (EPA) has established an ISO/IEC 11179 compliant Environmental Data Registry that permits the retrieval of information about data elements and data concepts found in selected EPA systems. The context of the registry is that of data surveys and data collection, with an acknowledged hierarchy of authority for formulating definitions and permitted values.
Although not compliant with ISO/IEC 11179, the ROADS Template Registry provides authoritative information about existing ROADS Template-Types and enables users of the ROADS software to define new data elements and/or Template-Types.
http://www.ukoln.ac.uk/metadata/roads/templates/
As part of the DESIRE project, UKOLN has built a demonstrator of a metadata registry. Its development was intended to help investigate a registry's functionality, in particular with regard to the authoritative disclosure of metadata usage. For example:
The registry is not designed for the purpose of managing a single namespace, but is intended to provide information across a range of metadata schemes. In this way it differs from work taking place within the Dublin Core Metadata Initiative (DCMI), although we believe it is relevant to questions of data modelling in this context. In time we would expect a variety of registries to evolve. For example each namespace, such as the Dublin Core (DC), might be registered authoritatively by a registry owned by their own maintenance agency, with 'implementation level' registries linking into such registries as appropriate.
Our initial input into the DESIRE registry has been different variants of Dublin Core, the BIBLINK Core element set, ROADS templates and a simple collection level description scheme. This has allowed the developers to test the structure of the database as regards elements, qualifiers, local usage, and permitted values.
We have used ISO/IEC 11179 as a guide to constructing the registry and the chosen data model has been strongly influenced by this standard. We are using units from the Basic Semantics Register (BSR) as the basis for mapping between schemes. We hope this will ensure that our work fits in with parallel activity in the wider forum of data registries.
The DESIRE Metadata registry follows the principles of the ISO/IEC 11179 standard for metadata registries. ISO/IEC 11179 provides standards for the informational and organisational structure for metadata registries. This approach ensures that the DESIRE registry builds on existing best practice and is consistent with other work in the area. Where possible, the DESIRE registry uses terminology from the ISO/IEC 11179 standard.
The key concept in ISO/IEC 11179 is the Data Element:
Data element
A unit of data for which the definition, identification, representation and permissible values are specified by means of a set of attributes.
The ISO/IEC 11179 standard consists of six parts:
11179-1: Framework for the Specification and Standardization of Data Elements - Part 1 of the standard provides an overview of registry structure and has influenced the design of the DESIRE registry.
11179-2: Classification for Data Elements - This part of the standard is concerned with the structure of the data model and supports the discovery of data elements. The data model and prototype implementation has been influenced by this part of the standard.
11179-3: Basic Attributes of Data Elements - This part of the standard influenced the attribute sets used for defining entities that can be registered within the DESIRE Registry. For the prototype a subset of the required attributes was chosen; this subset was sufficient for a proof-of-concept application.
11179-4: Rules and Guidelines for the Formulation of Data Definitions - Part 4 of the standard discusses best practice for definition writing. This part of the standard was not directly relevant to the DESIRE registry, which was concerned with registering existing definitions.
11179-5: Naming and Identification Principles for Data Elements - Part 5 of the standard influenced the construction of identifiers within the DESIRE registry, but as with definitions, element names were taken from existing metadata vocabularies.
11179-6: Registration of Data Elements - Part 6 provides details of the organisational infrastructure for metadata registries. Terminology from this part of the standard has been employed within the DESIRE registry but this part of the standard is not fully implemented within the prototype.
It is useful to highlight the different uses of the term metadata in ISO/IEC 11179 and within the resource discovery community. In ISO/IEC 11179 the term metadata refers to data that defines the structure of data, for example, a Letter might consist of an Address, Salutation, Body, etc. Descriptions of the Address and other components form the metadata for the Letter.
The elements registered in the DESIRE registry describe the structure of data, which is metadata in the context of resource discovery. In other words we are registering metadata (in the ISO/IEC 11179 sense) about metadata (in the resource discovery sense).
In the DESIRE Metadata Registry data elements represent the elements or attributes within a metadata vocabulary or element set.
The DESIRE registry differs in intent from the majority of ISO/IEC 11179 metadata registries. Typically a registry is responsible for maintaining definitions of data elements under a particular namespace, for which the registering organisation has control. In the case of the DESIRE registry, the aim is to present data elements from multiple namespaces in a consistent manner. This distinction means that within the DESIRE registry, namespaces can also be registered, and each data element is associated with a particular namespace.
In the future, the DESIRE registry could be expected to obtain definitions of elements from various namespaces from their associated registration authorities. Currently, those authorities do not maintain machine accessible metadata registries (although for, example, in the case of the Dublin Core Metadata Initiative, there are plan to develop such a registry). Since metadata cannot currently be obtained from external registries, the current approach has been to directly register elements from multiple namespaces.
In addition to basic metadata registry functionality, the DESIRE metadata registry aims to provide mappings between different metadata vocabularies.
This can be achieved via hard-coded mapping tables detailing the relationships between the elements in a source vocabulary and those in a target vocabulary. This approach has a high development and maintenance cost due to the large number of mappings that must be developed to provide full coverage.
The DESIRE registry takes an alternative approach. Instead of mapping between every pair of vocabularies, every vocabulary is mapped onto an underlying semantic layer. The aim is that instead of mapping from vocabulary A to vocabulary B, we map from A onto the underlying semantic layer, and then back on to B. The result is that instead of having to create mappings between every pair of vocabularies, it is only necessary to map between each vocabulary and the semantic layer. This means that when introducing a new vocabulary into a registry with 20 vocabularies, it is only necessary to add a single mapping (between the new vocabulary and the underlying semantic layer) rather than 20 mappings, to support translation between the new vocabulary and those already registered.
This approach does have potential disadvantages. Since mappings are not hand-crafted there is a potential for a reduced quality level in auto-generated mappings. To counteract this it will be necessary to build up a detailed and complex semantic layer so that vocabulary elements can be precisely explained. If the semantic layer is not detailed enough then the translation will suffer from information loss.
The DESIRE registry is a pilot application to trial this approach and provides a platform for future research in this area.
The data model of the DESIRE registry is intended to be rich enough to support the registration of elements from multiple namespaces. The data model is influenced by the models used for existing namespaces (such as Dublin Core and BSR) but does not aim to reproduce them directly. The data model is intended to be generic enough to support the registration of elements from multiple namespaces and detailed enough to support advanced functionality such as the automatic generation of cross-walks.
A namespace is a scoping construct that supports the definition of unique identifiers. Identifier x in namespace A is distinct from identifier x in namespace B.
Namespaces can be registered in the DESIRE registry, and registered elements can be assigned to a namespace where appropriate. For example, the fifteen Dublin Core elements can be registered as belonging to a Dublin Core namespace. The data elements associated with a particular namespace form a metadata vocabulary.
Since metadata vocabularies evolve over time, it is necessary to provide a mechanism for recording this with the registry. To support this, namespaces have an associated version so Dublin Core version 1.0 and Dublin Core version 1.1 can be registered. An underlying `Namespace Concept' is also registered to tie the versions together, in this case Dublin Core is registered as a namespace concept.
In the initial version of the registry, versioning is supported only at the namespace level. Individual elements cannot be versioned.
The purpose of the semantic layer is to provide an underlying set of concepts onto which registered vocabularies can be mapped - for example, notions of author and abstract will be needed. The registered concepts must be precisely defined and unique. There should never be a situation where multiple registered concepts have the same semantics. If concepts are not unique then the quality of mappings between vocabularies will be reduced: elements may have the same semantics, but if they map to different registered concepts then this information will not be available for use in auto-generation of mappings. The semantic layer must therefore be managed, or be based on a managed namespace with limited scope for extension.
The Basic Semantics Register provides a set of elements suitable for use in the DESIRE registry. Mappings already exist between BSR and the Dublin Core and GILS metadata vocabularies. For the initial prototype DESIRE registry, BSR elements corresponding to the Dublin Core have been registered.
Note that the BSR only provides data for the DESIRE Registry, the registry is not limited to BSR for its semantic layer. The data model allows concepts from other namespaces to be registered. However, it should be emphasised that the concepts must come from a managed namespace. If BSR is found to be appropriate for this purpose then any concepts that need to be added to support registry functionality should either be added to the BSR standard or managed as a namespace extension.
The data model of the BSR also introduces a layer of `Basic Semantic Units' between vocabulary elements and concepts. This layer allows a `representation class' to be attached to a concept to further refine its description. A representation class describes the data type associated with the value space of a concept - Name, Text and Code are examples of representation classes.
For some concepts, there is only one appropriate representation class. In this case the BSR introduces only a BSU (since there is a 1-1 mapping between concepts and BSUs in such cases, the introduction of a separate concept is superfluous). In other cases, multiple representation classes are possible - for example a subject classification may be expressed as Text or as a Code.
The DESIRE registry currently has only a single semantic layer that combines concepts and BSU from BSR. This approach offers a reduction in complexity and has provided sufficient modelling power for to meet the requirements of the DESIRE registry.
Data elements are the units from which metadata vocabularies are built. A metadata vocabulary consists of all the data elements in a particular namespace, or all the elements associated with a particular application profile. For example, Dublin Core version 1.1 can be registered as a namespace with the fifteen DC elements as data elements of that namespace.
A data element is a realisation of a BSU in a specific context.
Application profiles describe data element usage for a particular application. The application may be a specific project, a piece of software, an interchange format, etc.
Application profiles cannot introduce new data elements, data elements must have an associated namespace. Application profiles can group together data elements from multiple vocabularies. An application profile can also associate a scheme with a data element to specify valid values for that data element in a specific application.
Schemes provide a mechanism for attaching information about valid values for a particular data element.
There are three kinds of scheme possible in the DESIRE registry:
Recommended schemes may be associated with data elements. Additionally, schemes may be associated with application profiles to reflect actual usage (strictly, a relationship can be introduced between an application profile, a data element, and a scheme).
Where multiple schemes are permitted for a data element in a specific application it must be possible to specify the scheme that has been used along with a particular value.
Where an element is repeated within a record with the same value component scheme, it must be possible to distinguish the value components associated a particular element from the value components associated with another. That is, there must be some grouping mechanism that combines value components into a tuple, which is the value of the element.
Qualified Dublin Core provides a rich set of elements and schemes that can be represented within the framework of the DESIRE registry.
Notes:
The BIBLINK project defines a vocabulary, BIBLINK Core, which incorporates Dublin Core elements as well as BIBLINK-specific elements. Vocabularies such as BIBLINK that describe specific usage, and potentially draw elements from multiple namespaces, are referred to as Application Profiles in the DESIRE registry.
The relationship between BIBLINK Core and the Dublin Core is relatively complex. BIBLINK Core elements are one of:
The following diagram illustrates how this is represented within the DESIRE registry.
Notes:
The registry automatically generates cross-walks between namespaces based on relationships with the underlying semantic layer that is currently based on BSR.
BSR includes a mapping to Dublin Core (assumed to be DC 1.0). A mapping from ROADS/IAFA elements to Dublin Core (also DC 1.0) is also available:
http://www.ukoln.ac.uk/metadata/interoperability/dc_iafa.html
The relationships between DC 1.0 elements and the BSR semantic units were registered. Rather than registering relationships between DC 1.0 elements and ROADS elements, relationships between ROADS elements and the semantic units were deduced.
A cross-walk from ROADS to DC 1.0 can now be generated based on relationships with the semantic layer.
The DESIRE metadata registry currently presents information in a human-readable format (via a web interface). However, it also provides the basis for future work into machine-accessible metadata registries.
The prototype implementation does not provide support for end-user registration of new elements.
The DESIRE registry offers both search and browse interfaces for navigating the registry.
The index is the first registry page that the user is presented with after viewing the introductory explanation text. The index can be returned to from any page within the registry by clicking the DESIRE Registry logo that appears at the top of each page.
The index provides access to the browse and search interfaces for each of the entities that can be registered, and to the page for generating cross-walks.
All of the registered entities of a particular type, for example namespaces, can be viewed via the browse interface. The listings for each entity type can be accessed from the Index, or, for the most widely used entities, from the menu that appears at the foot of each page.
The browse listings show a short description of each registered entity. For entities with further associated information, the full description can be accessed by selecting Detail' for a particular item.
The search interface for a particular entity type can be accessed from the Index page. Equality and substring (contains) searches are supported for each entity type.
For example, via the Namespace search, it is possible to search for namespaces where the id contains dc.
For each major entity type that can be registered within the registry, a short (browse) form is displayed when browsing all entities of a particular type or viewing search results. Where further information is associated with an entity, a full description can be accessed by clicking on 'Detail' in the short form. Minor entities (such as the value elements associated with a scheme) are only accessible through navigating links from major entities.
The registry supports navigation through the data model via hyperlinks. Where one entity refers to another, for example where an element refers to its registration authority, the related entity can be accessed by clicking on its identifier.
According to ISO/IEC 11179 a registration authority is 'any organization authorized to register data elements'. ISO/IEC 11179 provides detailed instructions for the setup and conduct of a registration authority. The registration authorities referred to in the DESIRE registry prototype are not registration authorities in this formal sense although they play the same role. The formal registration of authorities was not considered necessary for the registry prototype although it would be appropriate for a real service.
Within the DESIRE registry, registration authorities are used to indicate the source of the definition of a data element or other entity that is registered. For example, the Dublin Core Metadata Initiative is the registration authority for elements in the dc/1.0 and dc/1.1 namespaces.
For each registration authority a name and a URL associated with the authority is given. A short identifier is also introduced, this is used for cross-referencing.
Example:
ID | Name | URL |
DC | Dublin Core Metadata Initiative |
Namespace concepts relate different versions of the same namespace. A namespace is formed by associating a version number with a namespace concept. For example dc/1.0 and dc/1.1 are both versions of the Dublin Core (dc) namespace concept.
The full view for namespace concepts includes the following information:
Name | Dublin Core |
ID | dc |
Registration Authority | DC |
Description | The Dublin Core is a simple metadata element set intended to facilitate discovery of electronic resources. |
Status | Community Consensus - RFC |
URL | |
Comment |
The namespaces derived from this namespace concept are also listed:
Namespace |
dc/1.0 |
dc/1.1 |
Every element within the DESIRE registry is defined within a namespace. This means that elements that share the same name, but belong to different namespaces, can be uniquely identified. For example, the element 'Title' appears in both dc/1.0 and dc/1.1.
ID | dc/1.0 |
Version | 1.0 |
Description | Version 1.0 of the Dublin Core Element Set. 15 elements are defined. |
Registration Authority | DC |
Namespace Concept | dc |
Status | RFC |
URL |
The elements and/or semantic units that are defined within the displayed namespace are also shown. This means that the Namespaces provide a useful navigation point for the registry.
Semantic Units define concepts independently of a specific context. The definitions of semantic units are taken from the Basic Semantics Register (BSR). Only definitions that correspond to the 15 Dublin Core elements have been registered for the purposes of the prototype.
Example:
Name | InformationResource.Name |
ID | bsr/1.0/2043 |
Namespace | bsr/1.0 |
Definition | The name given as the distinctive designation of the information resource. Note: GILS name: title, Dublin Core name: title; MARC 245$a |
Status | Under Review |
Comment |
The status 'Under Review' corresponds to the status of ISO BSR.
Elements (also referred to as data elements or metadata elements) are the central items in the registry. All elements should have a corresponding semantic unit in order to support the automatic generation of cross-walks.
Example:
ID | dc/1.1/title |
Name | Title |
Definition | A name given to the resource. |
Datatype | Character String |
Obligation | Optional |
Namespace | dc/1.1 |
URL |
The fields datatype and obligation are required for all elements under ISO/IEC 11179.
The following items are also displayed and can be used for navigation:
Schemes specify valid values for data elements. Schemes encapsulate representation details that are required for ISO/IEC 11179. Specifying schemes separately from data elements allows multiple schemes to be associated with the same element. This is appropriate when the registered elements are metadata elements with multiple valid schemes. For example, there may be multiple valid classification schemes associated with a 'subject' data element, any one of these schemes could be used provided the scheme is specified along with the value.
Example:
ID | RFC1766 |
Description | Language codes |
Registration Authority | IETF |
URL |
The following related items are also shown:
Note that the same scheme may be associated with multiple data elements - for example the ISO8601 scheme for dates may be associated with any data element that can take a date value.
An application profile groups together a set of elements for use in a particular context. For example, the BIBLINK application profile describes the metadata element set used within the BIBLINK project.
An application profile may use registered elements directly, or may introduce context-specific details. Schemes can also be associated with elements via application profiles. The schemes that apply to data elements directly are assumed to be valid within the application profile (a possible extension to the data model would be to support the overriding of inherited schemes with those within an application profile).
Example:
ID | biblinkcore |
Name | BIBLINK Core |
Version | 1.0 |
Description | The 19 BIBLINK Core fields. |
Status | Deployed |
Registration Authority | BIBLINK |
Constraints |
The constraints field allows constraints that apply to more than one element to be specified. For example, it would be possible to specify that at least one of the 'publication date' and 'version fields' must appear in a metadata record.
The following information is also presented:
The DESIRE registry supports the automatic generation of cross-walks between layers, via the Basic Semantics Register (BSR).
The namespaces to map from and to are selected and clicking on 'Go' will generate a cross-walk.
For example, generating a cross-walk from dc/1.0 (Dublin Core version 1 elements) to roads/2.0 (ROADS 2.0 template attributes) gives the following result:
dc/1.0 | via | roads/2.0 |
dc/1.0/title | bsr/1.0/2043 | roads/2.0/Title |
dc/1.0/creator | bsr/1.0/2044 | roads/2.0/Author-Name |
dc/1.0/date | bsr/1.0/2046 | roads/2.0/Creation-Date |
dc/1.0/Subject | bsr/1.0/2050 | roads/2.0/Keywords |
dc/1.0/Description | bsr/1.0/2049 | roads/2.0/Description |
dc/1.0/Publisher | bsr/1.0/2071 | roads/2.0/Publisher-Name |
dc/1.0/Contributor | ||
dc/1.0/Type | bsr/1.0/2069 | roads/2.0/Category |
dc/1.0/Format | bsr/1.0/2094 | roads/2.0/Format |
dc/1.0/Identifier | bsr/1.0/2095 | roads/2.0/URI |
dc/1.0/Source | bsr/1.0/2096 | roads/2.0/Source |
dc/1.0/Language | bsr/1.0/2048 | roads/2.0/Language |
dc/1.0/Relation | ||
dc/1.0/Coverage | ||
dc/1.0/Rights |
All elements of the 'from' namespace (dc/1.0 in this case) are listed in the left-hand column. For all cases where there is a corresponding element in the 'to' namespace (roads/2.0) in this case, the 'to' element is listed together with the semantic unit via which the mapping was made.
Note that direct mappings from dc/1.0 to roads/2.0 are not stored in the registry. The mapping is possible because both namespaces are mapped to semantic units.
The glossary within the registry provides definitions of terms used within the DESIRE registry which do not correspond to entities to be registered. These terms include the standards used (such as ISO/IEC 11179), terms used in the data model (such as Element) and terms associated with registered namespaces (such as Dublin Core).
A list of all glossary terms can be viewed via the browse interface (accessible from the Index or the Menu at the foot of each page). Alternatively, specific terms can be searched for via the search interface (accessible from the Index).
For each glossary term an associated definition and URL is provided. The URL links to the external source of the definition, if appropriate, and to the registry itself for terms that have a particular meaning within the DESIRE registry.
The glossary section of this document includes the terms defined in the DESIRE registry glossary but is extended to expand acronyms used within this document.
The DESIRE metadata registry demonstrator does not manage a single namespace but provides information across a range of metadata schemas. Several different element sets have been included within the registry so that it can demonstrate how an application profile can group together a set of elements for use in a particular context. For example, the BIBLINK Core application profile would include all 19 BIBLINK Core elements, made up of elements taken from the both the dc/1.0 and biblink/1.0 namespaces.
The following sub-sections give general background information on all of the element sets currently included in the demonstrator.
BIBLINK was a multi-partner project that was funded by European Commission DG XIII/E-4 under the Telematics Applications Programme of the European Union's Fourth Framework Programme. The project commenced in 1996 and was concerned with the establishment of electronic links between national bibliographic agencies and publishers (Day, Heery and Powell, 1999). The project developed a demonstrator system that enabled publishers of digital objects to submit metadata to a BIBLINK Workspace (BW) where it could be converted into the (mostly) MARC-based formats used by the participating national bibliographic agencies. These agencies could then enhance the records (e.g. by the application of authority control for proper names or the addition of subject information) for inclusion in a national bibliography or for returning to the publisher.
In order to help demonstrate the feasibility of the metadata conversion process and to provide a relatively simple format in which publishers could submit metadata, Project BIBLINK defined the semantics of 19 metadata elements - the BIBLINK Core (BC). The BC elements were based on Dublin Core and constituted an extended form of DC (as it was then defined). BC included 12 of the basic 15 DC elements (some qualified), with an additional 7 elements derived from the participating national libraries own metadata requirements. The semantics of the BC elements are described at:
http://hosted.ukoln.ac.uk/biblink/wp8/fs/bc-semantics.html
The Dublin Core Metadata Initiative (DCMI) is an international and interdisciplinary attempt to define a 'core' set of descriptive metadata elements for resource discovery. The element set was initially developed through a series of workshops sponsored by the Online Computer Library Center (OCLC) and other organisations, the first workshop being held at OCLC's US headquarters in Dublin, Ohio in March 1995. This arrangement has since became more formalised with the creation of a Dublin Core Directorate (hosted by the OCLC Office of Research) with an Executive Committee and an Advisory Committee.
The Dublin Core Metadata Element Set currently consists of 15 elements. These elements were first formally defined in RFC 2413 (1998), but the most recent definition is contained in the Reference Description of the Dublin Core Metadata Element Set Version 1.1 (1999). Both versions have been included in the DESIRE metadata registry.
The Dublin Core home page can be found at:
RFC 2413 (1998) provided definitions of the semantics of the fifteen Dublin Core elements. These definitions are known as the Dublin Core Metadata Element Set Version 1.0. DC Version 1.0 has since been superseded by DC Version 1.1. A Reference Description of DC Version 1.0 can be found at:
http://purl.org/DC/documents/rec-dces-199809.htm
The Dublin Core Metadata Element Set Version 1.1 contains updated definitions for the metadata elements originally defined in RFC 2413 (1998). In Version 1.1 the element definitions are formally described using ten attributes taken from ISO/IEC 11179. A Reference Description of DC Version 1.1 can be found at:
http://purl.org/DC/documents/rec-dces-19990702.htm
From an early stage of the development of Dublin Core it has been envisaged that the basic 15 DC elements would need to be refined by the use of qualifiers. In DC terms, qualifiers are attributes that may be used to further refine (but not extend) the meaning of a DC element. The DC-4 workshop in Canberra defined three types of qualifier known as TYPE (or sub-element), SCHEME and LANGUAGE (Weibel, Iannella and Cathro, 1997). Work on implementing DC in terms of the Resource Description Framework (RDF) revealed, however, that the use of these terms could at times be confusing, as similar tasks can often be "tackled by means of different solutions" (Miller, Miller and Brickley, 1999). As a result, the DC Data Model Working Group began to reconsider DC qualifiers from first principles and then evolved the following structure:
DCMI has set up working groups to identify the qualifiers that pertain to their group's focus - typically a single element. General information on the DCMI Working Group Qualifier Proposals can be found at:
http://purl.org/DC/groups/qualifierlist.htm
In 1997, following a workshop on "Integrating access to resources across domains" (MODELS 4), the UK Electronic Libraries Programme (eLib) commissioned a 'supporting study' on collection level description. Some phase 3 eLib projects had noted the need for some cross-domain collection description standard to aid large-scale resource discovery services (or clumps). A Collection Description Working Group was set up and produced a review of existing practice and a proposal for a core set of collection level description attributes (Powell, 1999). The proposal contained 23 elements (12 of them taken from DC) grouped into those that describe a collection itself and those that describe a service that provides access to a collection.
The proposed set of collection description metadata elements were intended to allow:
Definitions of the semantics of these simple collection level description elements can be found at:
http://www.ukoln.ac.uk/metadata/cld/simple/
A working group (WG 1) investigating the creation of a Basic Semantics Register (BSR) was set up in 1998 by ISO Technical Committee 154 (TC 154) to "act as a central reference to assist in the universal, multilingual understanding of data across commerce, industry and administration." The BSR has been defined (Chapdaniel, 1999) as an "official ISO register of non-ambiguously defined semantic data." BSR data is identified by numbers - so it is not dependent on any particular language - and are intended to describe concepts independently of any particular context. Because BSR concepts are context-independent, BSR-defined semantic units are a useful way to share semantics in a "neutral" way (Chapdaniel, 1999). BSR can also act as a tool for establishing bridges between different data dictionaries. The definitions of the terms in the register have been defined using the rules laid down in ISO/IEC 11179 (Bryan and Li, 1999).
Both semantic components and semantic units have been proposed as BSR content. Basic Semantic Units (BSUs) have been defined as (Premenos, 1995):
... a concept unambiguously defined and applicable in one or more contexts in an EDI environment. It may be part of a broader concept in which case it shall possess at least all the characteristics of that concept.
An semantic unit description in BSR includes its identification number, its type (in this context, usually a BSU), a definition (including notes of corresponding fields in GILS, DC and MARC) and its name in both English and French. For example, the proposed BSU for language codes takes the following form:
ID | TYPE | DEFINITION | NAME (ENGLISH) | NAME (FRENCH) |
2048 | BSU | The code identifying the language of the information resource. Note: GILS name: language of resource; Dublin Core name: language; MARC 041$a | InformationResource.Language.Code | GILS name: langue de la resource |
Within the DESIRE metadata registry, cross-walks between element sets (namespaces) are produced by mapping all elements (where possible) to the BSR.
More information on BSR and the work of ISO TC 154 WG 1 can be found at:
http://forum.afnor.fr/afnor/WORK/AFNOR/GPN2/TC154WG1/index.htm
More information on the work of ISO TC 154 can be found at:
http://www.iso.ch/meme/TC154.html
The ROADS software is a suite of programs intended to aid in the setting up and day to day running of World Wide Web based catalogues of on-line resources. The UK Electronic Libraries Programme (eLib) initially funded the development of ROADS as part of its Access to Network Resources (ANR) strand. eLib also funded a number of gateway-type services, some of which implemented the ROADS software tools and contributed to its development. A number of Internet subject guides or gateways are currently based on ROADS.
ROADS-based services use a metadata format known as ROADS templates. They are based on the Internet Anonymous FTP Archive (IAFA) templates that were published in an Internet-Draft in 1994 (Deutsch et al., 1994). For this reason they are sometimes referred to as ROADS/IAFA templates. The templates themselves are text (ASCII) based and take the form of simple attribute-value pairs separated by a colon and a space. ROADS templates are currently defined for 15 different resource-types. These are known as Template-Types. Some of these Template-Types (e.g. DOCUMENT, MAILARCHIVE and SERVICE) existed in the original IAFA template specification. Others have been developed specifically for ROADS-based services (e.g. PROJECT). At least one of the others (TRAINMAT) was independently developed and has been published as RFC 2007.
The DESIRE metadata registry contains attributes used in the ROADS Template-Types used for SERVICE and DOCUMENT.
More information about the different Template-Types available and a template registry can be found at:
http://www.ukoln.ac.uk/metadata/roads/templates/
More information about the ROADS project can be found at:
http://www.ilrt.bris.ac.uk/roads/
In order to promote semantic interoperability between metadata formats, metadata registries need to contain mappings and cross-walks between metadata standards. In a resource discovery context, these mappings and cross-walks can be used both for format conversion (converting one metadata format into another) and for enabling the development of systems that enable searching across heterogeneous data sources.
The EU-NSF Working Group on Metadata (1998) has attempted to define what is meant by these terms:
Mappings represent relationships that are unambiguous; they support transparent searching across domains. Crosswalks are more complex frameworks that establish the relationship between schemas that have significantly different syntaxes or semantics.
A number of metadata mappings and cross-walks have been published (e.g. Day, 1996). Perhaps the most well known of these is the Dublin Core, MARC 21 and GILS cross-walk produced by the Library of Congress Network Development and MARC Standards Office (1999).
Producing accurate mappings and cross-walks is not particularly easy. St. Pierre and LaPlant (1998) comment that:
Unfortunately, the specification of a crosswalk is a difficult and error-prone task requiring in-depth knowledge and specialized expertise in the associated metadata standards. Obtaining the expertise to develop a crosswalk is particularly problematic because the metadata standards themselves are often developed independently, and specified differently using specialized terminology, methods and processes.
For this reason, once an authoritative mapping or cross-walk has been developed, a metadata registry provides a logical place to declare the information.
The prototype DESIRE metadata registry implementation does not contain direct mappings between the element sets included within the registry, even where these are available, e.g. for ROADS Templates to Dublin Core (Day, 1996). Instead, all elements should be mapped onto a semantic unit taken from the ISO Basic Semantic Registry. For example:
Dublin Core 1.1 | BSR Semantic Unit | ROADS Template |
Language | 2048 InformationResource.Language.Code | Language-v1 |
Using this information, the registry can automatically generate a cross-walk between two namespaces. This approach means that it will be easier to keep the registry up to date and to add new metadata vocabularies.
However there are disadvantages as well. If the semantic layer itself is not very detailed or consistent, inaccuracies could creep into the generated cross-walks. In particular, if the semantic layer is not detailed enough, the translation will begin to suffer from information loss. On the other hand, if the semantic later is not general enough, 'coarse-grained' navigation and discovery could begin to suffer. In principle, these problems could be addressed by defining hierarchical relationships within the intermediate layer and defining more detailed relationships between this layer and the elements in a particular scheme.
The prototype DESIRE Registry, described here, is accessible from:
http://desire.ukoln.ac.uk/registry/
The DESIRE registry prototype is built on top of a relational database. This approach is well understood and straightforward to implement. Potential alternatives would have been an XML database or a database storing RDF triples. Alternative approaches to storing metadata are being considered elsewhere in the DESIRE project and the outcomes of that research will influence future development of metadata registries. For the DESIRE metadata registry prototype implementation a simple relational database was sufficient to implement the data model and provide a proof of concept application.
The freely available and widely used mySQL database was used on a Solaris system.
The user interface for the registry was implemented as a web application. The user interface for the prototype provides read-only access to registry data.
PHP was chosen as a scripting language for its rapid development cycle and integrated access to mySQL. The user interface is accessed via a web browser. CSS1 style sheets have been used for layout but the registry is still accessible via older browsers without support for style sheets.
The admin interface to the DESIRE registry prototype was provided by the PHPMyAdmin tool, which provides a web interface for managing mySQL databases. This interface provides read/write access to the database and was used for data entry.
The admin interface requires users to understand the underlying relational database schema and is not suitable for general end users. The prototype does not provide an interface for end-users to update the registry.
Bargmeyer, B., McCarthy, J., Olken, F. and Miller, N., 1997, Joint Workshop on Metadata Registries: workshop report. Draft 1.6, 23 December. http://pueblo.lbl.gov/~olken/Workshop/report.html
Chapdaniel, A., 1999, Basic Semantics Register - tools for federation. Recent Developments in Standards for Electronic Publishing, Paris, 21-22 January. http://inf2.pira.co.uk/agenda9.htm
Day, M., 1996, Mapping between metadata formats. Bath: UKOLN: the UK Office for Library and Information Networking. http://www.ukoln.ac.uk/metadata/interoperability/
Day, M., Heery, R. and Powell, A., 1999, National bibliographic records in the digital information environment: metadata, links and standards. Journal of Documentation, 55 (1), 16-32.
Deutsch, P., Emtage, A., Koster, M. and Stumpf, M., 1994, Publishing information on the Internet with Anonymous FTP. Internet-Draft. http://info.webcrawler.com/mak/projects/iafa/iafa.txt
Bryan, M. and Li, M.-S., 1999, Electronic Data Interchange (EDI) Standards. Luxembourg: European Commission Information Society DG. http://www2.echo.lu/oii/en/edi.html
EU-NSF Working Group on Metadata, 1998, Metadata for digital libraries: a research agenda. Draft 10. Le Chesnay: ERCIM. http://www.ercim.org/publication/ws-proceedings/EU-NSF/metadata.html
Heery, R., 1997, Naming names: metadata registries. Ariadne (Web version), No. 11, September. http://www.ariadne.ac.uk/issue11/metadata/
ISO 8601:1988, Data elements and interchange formats -- Information interchange -- Representation of dates and times. Geneva: International Organization for Standardization.
ISO/IEC 11179 (Parts 1 to 6), Information technology -- Specification and standardisation of data elements. Geneva: International Organization for Standardization.
Miller, E., Miller, P. and Brickley, D., 1999, Guidance on expressing the Dublin Core within the Resource Description Framework (RDF). Dublin Core Metadata Initiative, Working Draft. Bath: UKOLN. http://www.ukoln.ac.uk/metadata/resources/dc/datamodel/WD-dc-rdf/
Network Development and MARC Standards Office, 1999, Dublin Core/MARC/GILS Crosswalk. Washington, D.C.: Library of Congress, 14 October. http://lcweb.loc.gov/marc/dccross.html
Powell, A., ed., 1999, Simple Collection Description. Draft Version. Bath: UKOLN: the UK Office for Library and Information Networking, 2 August. http://www.ukoln.ac.uk/metadata/cld/simple/
Premenos, 1995, Re: BSR specs? Mail to EDI-L mailing-list, 23 January. http://mlarchive.ima.com/edi-l/1995/0187.html
RFC 2007, 1996, Catalogue of network training materials. Internet Engineering Task Force. http://www.ietf.org/rfc/rfc2007.txt
RFC 2413, 1998, Dublin Core metadata for resource discovery. Internet Engineering Task Force, September. http://www.ietf.org/rfc/rfc2413.txt
St. Pierre, M. and LaPlant, W.P., 1998, Issues in crosswalking content metadata standards. Bethesda, Md.: NISO, 15 October. http://www.niso.org/crsswalk.html
Weibel, S., Iannella, R. and Cathro, W., 1997, The 4th Dublin Core Metadata Workshop report. D-Lib Magazine, June. http://www.dlib.org/dlib/june97/metadata/06weibel.html