Back | Next

1 Introduction

1.1 Aims and Objectives

This report is the first deliverable of the resource discovery strand of the DESIRE II project. DESIRE II intends to deliver tools to facilitate the development of network information services targeted at the European academic community. The project aims to progress provision of an infrastructure that will enable access to information, in particular for the purposes of research. DESIRE II will develop tools and prototypes to assist with automatic indexing of research material, to facilitate creation of resource descriptions, to assist with resource selection and to provide content rating.

This work draws upon a number of existing activities undertaken in the DESIRE project. Drawing in particular upon the DESIRE Quality Guidelines, it explores the application of machine-readable metadata vocabularies to a number of issues relating to information quality. In particular, this deliverable builds upon the following activities:

· DESIRE Subject Gateway and Web Indexing activity

· DESIRE Quality Guidelines

· W3C PICS standard for labelling/rating, filtering, searching

· W3C Resource Description Framework for metadata

The working assumption here is that the application of machine-readable vocabularies to information quality issues is both useful and important, and that technologies such as PICS and RDF now provide us with the basic infrastructure needed to make a number of quality-related aspects of information management more amenable to automation. Originally, the intention was to create one or more PICS ratings vocabularies with categories drawing on the DESIRE Quality Guidelines for Internet cataloguing resource selection. With the development of the W3C's RDF, and the gradual migration of PICS applications to RDF, the situation becomes more complex: RDF, unlike PICS, is a rich and expressive metadata format adequate for representing full resource descriptions, and not just simple categories and ratings. In this context, the range of applications in this area becomes much broader. This document attempts to provide a guide to some of the various design decisions that arise when considering the application of RDF (and PICS) to information quality issues.

1.2 Intended Audience

This report assumes a basic level of knowledge of resource discovery issues and mechanisms such as PICS and RDF. It is intended for information gateway managers or those in the process of setting up information gateways. The document is also intended for organisations and researchers who have an interest in the application of PICS to resource discovery, filtering and general quality issues.

1.3 Structure of this Document

This report begins by providing some background to previous DESIRE activity in this area and an overview of PICS and RDF technologies used in the DESIRE Quality Vocabulary work. The report then presents a breakdown of some example applications, organised according to the agency responsible for making some machine-readable assertions about a resource. Quality-related metadata, like any metadata, can be generated either by content creators, by end users, or by third party organisations such as libraries and rating services. Within each of these categories a number of application scenarios can be envisaged; there are however a range of technological issues which to some extent can be discussed without reference to a particular application scenario. An overview of these issues is presented later in the document.

1.4 Motivating Examples

There is a widely acknowledged need to provide selective access to the mass of undifferentiated content on the Internet. The creation of metadata and the provision of search services and access tools based on that metadata enables a variety of selective access routes to Internet resources. The creation of quality ratings for resources is a particular case of metadata provision, and such ratings open up a number of possibilities for adding value to existing services.

The scope of this report is perhaps best characterised with a series of motivating examples. Listed below are a number of scenarios in which properties relating to the quality of a resource (perhaps relative to some user and/or context) might usefully be specified using RDF. These examples cover a wide range of issues, and illustrate the manner in which the ‘Web quality problem’ might in part be addressed by the ability to interchange machine readable data which makes assertions about the quality related properties of Web resources.

· I'm looking for peer-reviewed journals (and not merely ‘vanity publishing’).

· I'm looking for resources recommended by a subject-librarian.

· I'm looking for 3rd party descriptions of this resource from metadata servers run according to [some specified] collections policy.

· I'm looking for Web resources matching [some search] which will be useable (by blind users / on a Nokia Communicator browser / without Java enabled).

· I've created a set of Web pages from my PhD thesis; I'd like to include metadata in those pages, which makes it clear that this is well-researched content, so that other people working in this area can discover my document.

· Our pages are listed in catalogue of the (OMNI/SOSIG/EELS/DutchESS/…) subject gateway; we'd like to include a 'kitemark' logo and a (digitally signed) machine-readable equivalent on those pages so that search engines know that the site has been rated as 'high quality' by a trusted source.

· I want to be able to 'recommend' resources as rating highly on some quality scale to a trusted metadata service, so that those resources might be found more easily by others in my subject community.

· I want to be able to find resources that other subject specialists in my community have rated highly.

· I want to be able to find resources that other PhD researchers in my community have rated highly.

· I want to be able to do an Internet search from a single point of access, and have my query automatically forwarded to appropriate searchable catalogues/databases/gateways/indexes on the Web, prioritising gateways that follow (something like) the DESIRE quality selection criteria.

· I've created a page that uses Macromedia Shockwave; how can this technology dependency be made explicit so that people who can't use Shockwave don't find it when searching?

· I want to be able to have my search results filtered or ranked according to some 'rule' based on a quality-related property of the resources listed

· I want to find resources matching [some search term], listing those that are freely accessible first.

· I'm an Internet cataloguer and would like to have some automated support tools to help with resource selection and description (e.g. forms pre-populated with mechanically detectable information, or an easy way of finding out whether a site has lots of broken links or makes well known usability errors)

· I run a large scale Internet search service, and want to be able to cross-reference from our 'search results' page to 3rd party descriptions, ratings, classifications and reviews available elsewhere; we need to know which of these services are run by information specialists, librarians and subject specialists, and which aren't.

Scenarios such as these present a considerable challenge – they raise questions about trust, about machine vocabularies for describing both Web resources and for characterising the agencies which create those descriptions. In addition these scenarios suggest problems which are more architectural in nature: how, for example, can one service discover which other metadata servers offer useful descriptions for some given URL. The ‘RDF quality vocabulary’ strand of activity in DESIRE attempts to make some contribution towards addressing these issues, and does so in the broader context of the DESIRE Subject Gateway activity and the work on distributed indexing and searching. The scope of the discussion and recommendations which follow are consequently more constrained than the list of ‘motivating scenarios’ given above might suggest. When combined with the technologies, services and recommendations developed elsewhere within DESIRE, the framework outlined here should go some way towards addressing many of the issues raised in the motivating examples above.

Back | Next


Title: Recommendations on Implementation of Quality Ratings in an RDF Environment
Issue: 1.1
Date: 4.2.99