Back | Next

2 Background

The DESIRE project involves two categories of resource discovery service: subject based services with controlled selection policies, and regional exhaustive services with a ‘catch-all’ inclusion policy based on blanket coverage. It is our intention in DESIRE to explore the possibilities of using quality ratings in both service models, as well as to consider the use of quality ratings by the research community in other contexts.

The motivation for this activity comes from a wish to extend work done on identifying quality selection criteria in the first phase of the DESIRE project. As an initial task, DESIRE looked at current resource description approaches for selective subject services on the Internet. This was carried out as part of the resource discovery strand of DESIRE, and a series of three studies were produced covering resource description formats (metadata) (Dempsey and Heery 1996), quality resource selection criteria (Hofman and Worsfold 1996) and classification (Koch 1997).

The second study on resource selection started with an initial review, which found that many subject-based services had not produced a formal statement of their selection criteria. It was decided that a formalised list of criteria would be valuable as a tool for present and future subject gateways to share. The first stage in this activity was to produce a comprehensive, structured list of quality criteria used for the purposes of resource collection. An initial list of over 250 criteria was gathered from a comprehensive study of a variety of existing Internet resource discovery services and traditional information services. This list was refined by means of de-duplication, standardising language and grouping criteria thematically. The refined list was field tested by subject service providers participating in DESIRE (SOSIG, EELS, and Koninklijke Bibliotheek). The final list contained over 125 evaluation criteria for Internet resources. The list aimed to be comprehensive to allow subject gateways to 'pick and choose' the criteria that most suited their purpose. The criteria in the list are divided into a number of areas:

Scope policy, content criteria, form criteria, process criteria, collection management policy

Scope Criteria

These are criteria that take into account the needs of the users. In a third party environment the service will need to define its audience and to formally state what is and what is not considered to be part of the scope of their service. An explicit scope policy will allow users to understand what they will find if they use this service and help them choose whether or not they fit within the definition of the service’s audience. The scope policy can then be used as a first set of selection criteria to decide what information is acceptable.

Content Criteria

Content criteria focus on an evaluation of the actual information or data that the resources contain.

Form Criteria

Form criteria focus on the presentation and organisation of the information and the interface through which it is delivered. Whilst these may not be as critical as an evaluation of the content of information nevertheless they have an impact on the accessibility of a resource, if they cannot be accessed by the user they cannot successfully meet the user’s needs.

Process Criteria

Process criteria focus on the processes that exist to support a resource over time. The Internet is a volatile environment, resources may be edited, moved or deleted without any notice and archives of previous editions are rarely kept. This lack of integrity can affect the quality of the resource for the user.

Collection Management Criteria

Collection management criteria will change as the collection grows or as resources change or disappear. They take into account the collection of resources as a whole and provide criteria for deselection and editing.

The DESIRE Quality Guidelines were developed to support the creation of subject gateways within a particular environment (to suit the needs of academics and researchers). The guidelines focus on the formulation of a scope policy for subject gateways which helps to define at a very broad level what is, and what is not to be included in the service. This is achieved by identifying the target audience and their informational requirements e.g. subject matter, acceptable sources of information, geographical coverage, cost, accessibility, etc. The list of criteria forms a generic framework which can be adapted for the needs of a particular service (particular criteria will be relevant to any individual service). These criteria were reviewed for the purpose of using as a basis of a quality vocabulary (see Section 4).

2.1 Technical Context

2.1.1 Introduction to PICS

Platform for Internet content Selection (PICS) has been proposed as the basis for providing quality ratings systems of various kinds. Although current and future PICS development is subsumed under the W3C RDF/XML activity, it is useful for us to take into account the overall architecture for rating underlying PICS, and where possible to leverage PICS compliant tools that have been deployed. PICS has developed specifications for different service models and we should be able to learn from the framework, even if the PICS syntax is not used.

This section will give a brief overview of the current status of PICS.

PICS is specified in recommendations developed by the World Wide Web Consortium (W3C) over the last three years. PICS aims to facilitate distribution of machine readable descriptions of digital resources, in other words metadata, to enable the rating of particular Web sites or individual Web pages against particular criteria. The PICS framework was originally motivated as a way to provide ‘safe’ (inoffensive) access to Internet resources (Resnick 1997), a topic of considerable debate that we will not enter into here. PICS itself is solely concerned with infrastructure, it is neutral as regards the content of ratings and the purpose for rating. As Paul Resnick states:

It (PICS) is values-neutral: it can accommodate any set of labeling dimensions, and any criteria for assigning labels. Any PICS-compatible software can interpret labels from any source, because each source provides a machine-readable description of its labeling dimensions (Resnick and Miller 1996).

PICS provides a standard means of associating labels with Web sites or Web pages. The labels can refer to any characteristic of the content, typically this might be pornography, violence, language but the labels could be used to rate privacy policy, terms and conditions or indeed quality criteria as drawn up in the DESIRE selection list.

W3C makes available stable PICS specifications (i.e. official W3C recommendations) for:

Service description (the format for describing a rating vocabulary and scales)

Label format and distribution (the format for the ‘labels’, or metadata, and methods for distributing labels)

PICS Rules (the format for defining filtering rules to allow these rules to be transported and installed in a standard way, this ensures users' preferences can be expressed in a standard profile which will be understood by a variety of software agents)

PICS also offers a draft specification (i.e. a proposed W3C recommendation) for PICS signed labels (Dsig) which forms the syntax and semantics for digital signature in PICS labels.

PICS rating labels can be created by the originator or manager of the resource (self-labelling). This would be done using one of the self-ratings vocabularies drawn up by organisations such as RSAC and SafeSurf. Labels can be distributed in various ways: embedded in HTML documents using the META tag, distributed in HTTP headers, or by running a 'mini-label bureau' on the host Web server. Labels also can be created by third parties that build up databases of labels and offer a service as a third party label bureau. Some browsers and proxy servers as well as standalone software filters provide a facility to allow every request for a URL to be sent to such bureaux and filtered according to set user or proxy server preferences.

Preferences can be set up by users either in PICS compliant browsers or in standalone filtering software; or Web administrators can set up preferences in proxy servers through which users requests are channelled (Salamonsen and Yeo 1997).

A PICS system involves interaction of software in a variety of locations. The W3C categories for PICS compatible products and services reveal the components of the PICS framework:

Client software (browser or other client software that reads labels and filter resources based on PICS rules)

HTTP servers (that distribute labels along with resources)

Proxy servers (that read labels and filter resources based on PICS rules)

Label bureaux (HTTP servers that distribute third party PICS labels)

PICS has attracted a lot of interest from government bodies who are concerned with public policy for controlling Internet content. However there has been no significant government investment in the actual creation of labels. Two significant rating services aimed at providing safe access to the Internet are RSACi and SafeSurf. Both have achieved some level of success in marketing and deploying their label vocabularies, and further work might be done to identify the pattern of domain coverage. The Technology Inventory states:

While several PICS-based self-rating systems have emerged, and RSACi in particular has drawn significant media coverage around the world, none of the self-rating systems have achieved near-universal coverage. Alexa Internet reports that in August 1997 they searched a collection of 88,647 Web pages (these were the pages most requested by users of their service) and found 2363 had RSAC labels and 483 had SafeSurf labels. A year later RSAC reported that over 80,000 sites have been rated with RSACi, and that many of these sites consist of large numbers of pages. (Cranor et al 1997)

There are a number of initiatives and proposals advocating the use of PICS labels, although some of these developments appear to be awaiting the anticipated transformation of PICS to an RDF application. For example med-PICS (the Platform for Medical Internet Content Selection) hopes to exploit the potential of third party label bureaux for collaborative reviewing (rating) by health professionals of medical information on the Web. Such collaboration would be directed towards establishing a label bureau providing ratings for 'Critical Appraisal of Medical Information on the Internet'. Eysenbach and Diepgen stress the advantage of a label bureau over consulting a search service, as there will always be multiple options for search services, whereas a ratings service allows ratings to be applied automatically to all resources retrieved. The med-PICS initiative has drawn up its own ratings vocabulary:

This vocabulary contains descriptive categories such as the intended audience (from "kids" to "highly specialised researcher"), which could be used by authors to provide "context," and evaluative categories such as "source rating" (from "highly trustworthy" to "known to provide wrong or misleading information"), which could be used by third party label services (Eyensbach and Diepgen 1998).

The IEEE Computer Society has recognised a common requirement amongst refereed electronic journals and repositories for a standards based peer rating system. A call for collaboration of professional associations involved in this area was instigated in 1997:

The initial work here will be to bring together professional organizations, and others involved in electronic publishing where peer review type characteristics are important "selection" criteria. And to review the PICS method for accomplishing this. Then proposals can be made for application of the method to specific organizational needs, integrated, and a proposed standard be put forward (IEEE Computer Society).

The W3C PICS home page refers to a number of resources for developers as well as lists of PICS compatible products and services. As well as allowing for blocking of access, PICS potentially enables other actions to be taken based on PICS labels. The PICS Technology Inventory identifies these as:

Suggest - to recommend appropriate content.

Search - search services providing filtering against one or more rating vocabularies (AltaVista has partnered with Net Shepherd to provide a search service that returns only "family friendly" matches).

Inform - provide information about retrieved resources for example in the form of a banner on the Web page) - EvaluWEB and Alexa display such informational banners.

Monitor - recording all accesses and attempted accesses to Web pages.

Warn - to provide information about content before it is displayed.

Block - to prevent access to certain sites or pages.

2.1.2 Moving Forward into RDF

The Resource Description Framework (RDF) is an activity of the W3C instigated in 1998 to provide a framework for the various W3C metadata activities. RDF provides a common syntax for describing Web resources. It facilitates the development of common tools and for manipulating a diversity of metadata. The syntax is intended to be sufficiently flexible to express the semantics of metadata created for different functions and according to different schema. The W3C RDF home page states:

RDF is designed to provide an infrastructure to support metadata across many Web-based activities. RDF is the result of a number of metadata communities bringing together their needs to provide a robust and flexible architecture for supporting metadata on the Internet and WWW. Example applications include sitemaps, content ratings, stream channel definitions, search engine data collection (Web crawling), digital library collections, and distributed authoring.

RDF is a means to express properties of a resource and to associate values with these properties. It does not mandate the use of any particular properties or element names. The creator of the RDF record can choose which particular properties or sets of properties they wish to use.

RDF and the RDF Schema language were based on metadata research in the Digital Library community. In particular, RDF adopts a modular approach to metadata along the lines of the Warwick Framework [WF]. RDF represents an evolution of the Warwick Framework model in that the Warwick Framework allowed each metadata vocabulary to be represented in a different syntax. Within RDF, all vocabularies are expressed within a single well-defined model and syntax. This allows for a finer grained mixing of machine-processable vocabularies, and addresses the need to create metadata in which statements can draw upon multiple vocabularies that are managed in a decentralised fashion by various communities of expertise. The implications of this new architecture are not yet entirely clear; later in this report we present a more detailed overview of the RDF Schema facilities, and discuss the relationship between the RDF architecture and vocabulary management and creation issues.

PICS will become one particular application of RDF. In the process it may be that PICS is re-defined and will evolve to take account of the flexibility of RDF. The RDF Model and Syntax specification is sufficient to express PICS labels, and the RDF Schema specification shows one generic mechanism for mapping PICS ratings vocabularies into RDF schemas. At this stage it is not clear whether this work will progress any further as a formal standard, i.e. whether sufficient W3C Member organisations value a formal mapping from PICS to RDF. It is possible, for example, that many PICS vocabularies already in existence will be remodelled during the transition to RDF, and therefore make any mechanical PICS-to-RDF mapping redundant.

While PICS defines a simple mechanism for declaring information-filtering rules (PICS-Rules), RDF does not yet have a query language. Similarly, PICS defines a framework for querying metadata-servers (‘label bureaux”) for labels describing a resource named by URI. RDF does not yet specify how to obtain descriptions from third parties. Standards for querying RDF are currently under discussion with W3C (see for example position papers from the Query Languages for the Web workshop http://www.w3.org/TandS/QL/QL98), while proposals for RDF Services are still rather informal (e.g. the current Netscape browser implements a simple mechanism for retrieving RDF descriptions, but this has not yet been proposed as a formal standard).

2.1.2.1 Signed labels, digital signatures

PICS labels allow for a digital signature to be included; this would provide authentication that the label came from the expected source. PICS labels can also carry checksums which can be checked against a checksum in the document it describes, to ensure the document has not changed since the label was created.

RDF will eventually include a specification for digitally signing assertions; however the current proposed Model and Syntax specification does not offer recommendations on this issue.

2.1.2.2 Formalising Web annotations

A common scenario on the Web is for one document to offer a critique of another. Currently the Web architecture does not have a notion of annotations, so content creators are reduced to using mechanisms such as HTML frames to offer point-by-point critiques of material held on other servers. Although this is a useful application of HTML, approaches based on PICS, XML and RDF offer further advantages. Web annotations expressed in a machine-processable format such as RDF can be automatically indexed and searched, whereas it is difficult to build sophisticated automated tools for HTML annotations.

With PICS and RDF it is likely that user agents will offer built-in mechanisms for locating and rendering assertions about Web resources. In the example annotations shown below, the creator of the critique in the left-hand frame includes hyperlinks to the page(s) critiqued so that they are displayed in a second browser frame. This clumsy mechanism allows for critique and original content to be displayed simultaneously, but does not provide any means for users browsing the resource on the right to discover the fact that a critique of that material is available. The “what's related” mechanism in Netscape’s browser (see http://www-rl.netscape.com/) suggests one mechanism by which RDF descriptions might be rendered to end users in a more systematic fashion. However, there are as yet no formal proposals for “annotation discovery” in RDF. In other words, we have no mechanism to discover the resources that critique or review a specified resource.

The RDF demonstrators which follow this report will explore mechanisms to support such a service based on the existing work in DESIRE for federated metadata services which use the Common Indexing Protocol to allow databases to exchange “forward knowledge” about their contents (see http://purl.org/net/rdf/papers/QL98-distributed/ for discussion of related issues). Current technology provides no infrastructure for Web annotation; mechanisms such as HTML Frames can be used to provide a simplistic substitute:

Example of User Annotations Using Frames

2.2 Constraints and Scope of this Document

While this report does suggest some basic vocabularies for making quality-related assertions in RDF, it should not be taken as proposing a single monolithic "quality vocabulary" suitable in all contexts. Different applications and services will require different vocabularies to rate their resources depending on the needs of their users (based on the assumption that the notion of information quality is strongly bound up with the notion of fitness for purpose). The terms and structure of the vocabularies may be more or less complex depending on the application area. In many respects it is difficult to distinguish clearly between the role of a "quality vocabulary" and more general-purpose "resource description" vocabularies. The fact that RDF itself is expressive enough for both purposes complicates things further. This document provides some general vocabularies for making quality-related statements as well as indicating a number of areas in which Internet cataloguing practice might benefit from slightly greater formalism.

Back | Next


Title: Recommendations on Implementation of Quality Ratings in an RDF Environment
Issue: 1.1
Date: 4.2.99