Research: Deliverables: D3.1 Quality Ratings in RDF

This deliverable discusses the use of W3C Resource Description Framework (RDF) in the context of the DESIRE work on web indexing and quality-assured information gateways.

The report is available in the following formats:

Peer reviews for earlier drafts of the report are also available:

  • Review by Chris Armstrong, Centre for Information Quality Management.
  • Review by Mark Field, Professional Adviser, Special Libraries and Information Services, The Library Association.

Abstract

This report looks at developing controlled vocabularies in the domain of information quality assurance. It begins by providing some background to previous DESIRE activity and an overview of PICS and RDF technologies in this area. The report then presents a breakdown of some example applications, organised according to the agency responsible for making machine-readable assertions about a resource. It provides recommendations for future directions and presents a number of possible demonstrators.

Keywords

Labelling, filtering, quality rating, PICS, XML/RDF, metadata, Internet catalogues, kitemarking, selection criteria.

Scope of this report (taken from the introduction)

There is a widely acknowledged need to provide selective access to the mass of undifferentiated content on the Internet. The creation of metadata and the provision of search services and access tools based on that metadata enables a variety of selective access routes to Internet resources. The creation of quality ratings for resources is a particular case of metadata provision, and such ratings open up a number of possibilities for adding value to existing services.

The scope of this report is perhaps best characterised with a series of motivating examples. Listed below are a number of scenarios in which properties relating to the quality of a resource (perhaps relative to some user and/or context) might usefully be specified using RDF. These examples cover a wide range of issues, and illustrate the manner in which the 'Web quality problem' might in part be addressed by the ability to interchange machine readable data which makes assertions about the quality related properties of Web resources.

Motivating Scenarios

  • I'm looking for peer-reviewed journals (and not merely 'vanity publishing').
  • I'm looking for resources recommended by a subject-librarian.
  • I'm looking for 3rd party descriptions of this resource from metadata servers run according to [some specified] collections policy.
  • I'm looking for Web resources matching [some search] which will be useable (by blind users / on a Nokia Communicator browser / without Java enabled).
  • I've created a set of Web pages from my PhD thesis; I'd like to include metadata in those pages, which makes it clear that this is well-researched content, so that other people working in this area can discover my document.
  • Our pages are listed in catalogue of the (OMNI/EELS/DutchESS/SOSIG) subject gateway; we'd like to include a 'kitemark' logo and a (digitally signed) machine-readable equivalent on those pages so that search engines know that the site has been rated as 'high quality' by a trusted source.
  • I want to be able to 'recommend' resources as rating highly on some quality scale to a trusted metadata service, so that those resources might be found more easily by others in my subject community.
  • I want to be able to find resources that other subject specialists in my community have rated highly.
  • I want to be able to find resources that other PhD researchers in my community have rated highly.
  • I want to be able to do an Internet search from a single point of access, and have my query automatically forwarded to appropriate searchable catalogues/databases/gateways/indexes on the Web, prioritising gateways that follow (something like) the DESIRE quality selection criteria.
  • I've created a page that uses Macromedia Shockwave; how can this technology dependency be made explicit so that people who can't use Shockwave don't find it when searching?
  • I want to be able to have my search results filtered or ranked according to some 'rule' based on a quality-related property of the resources listed
  • I want to find resources matching [some search term], listing those that are freely accessible first.
  • I'm an Internet cataloguer and would like to have some automated support tools to help with resource selection and description (e.g. forms pre-populated with mechanically detectable information, or an easy way of finding out whether a site has lots of broken links or makes well known usability errors)
  • I run a large scale Internet search service, and want to be able to cross-reference from our 'search results' page to 3rd party descriptions, ratings, classifications and reviews available elsewhere; we need to know which of these services are run by information specialists, librarians and subject specialists, and which aren't.

Scenarios such as these present a considerable challenge - they raise questions about trust, about machine vocabularies for describing both Web resources and for characterising the agencies which create those descriptions.

In addition these scenarios suggest problems which are more architectural in nature: how, for example, can one service discover which other metadata servers offer useful descriptions for some given URL. The 'RDF quality vocabulary' strand of activity in DESIRE attempts to make some contribution towards addressing these issues, and does so in the broader context of the DESIRE Subject Gateway activity and the work on distributed indexing and searching.

The scope of the discussion and recommendations which follow are consequently more constrained than the list of 'motivating scenarios' given above might suggest. When combined with the technologies, services and recommendations developed elsewhere within DESIRE, the framework outlined here should go some way towards addressing many of the issues raised in the motivating examples above.