The original intention within this task was to investigate building directly upon the quality criteria guidelines developed in the first phase of DESIRE. To this effect the criteria were reviewed for the purpose of turning into a detailed machine-readable vocabulary to describe the quality attributes of Internet resources. Each criterion was reviewed in terms of what it stated about a resource and how that statement could be established. Whilst many of the criteria can be taken as indicators of quality, by themselves they are not enough to establish the resource as being of high quality. For example, knowing that there is a bibliography attached to an article maybe an indication of research, but this fact by itself does not establish the document to be of high quality. In practice a range of criteria need to be taken into account and generally weighed against each other. Subject gateways are characterised by their use of human intermediaries, often librarians, who are skilled in making assessments of material (with the needs of their users in mind), to provide a quality-controlled collection of Internet resources. This human factor is integral to the selection, classification and cataloguing of each resource.
The possibilities and approach of formalising the DESIRE criteria has already been described elsewhere (Armstrong 1998). The large number of criteria (over 125) would make it impractical as a cataloguing tool or metadata label for information providers to be expected to provide. In addition it was difficult to imagine how such a large number of preferences could be presented simply to the end users of services implementing such a vocabulary. The notion of information quality of that which is "fit for purpose", presents a number of possibilities for the uses of a formal vocabulary. The sections below describe some workable scenarios for developing vocabularies to describe Internet resources. First, we present a high-level overview of the basic capabilities of the RDF schema language, which will be used for defining these vocabularies. We also discuss how the modularity of RDF enables the use of externally managed vocabularies alongside those created within DESIRE, and indicate some areas where it may be prudent to wait for other groups to agree some common base concepts (such as 'Agent').
This section presents a brief summary of the facilities offered by the RDF Schema Specification followed by a discussion of the implications of the RDF vocabulary architecture for Web quality issues.
A core feature of RDF is the ability to produce data drawing upon multiple vocabularies (schemata). This extends the more coarse-grained modularity proposed in the Warwick Framework [http://cs-tr.cs.cornell.edu/Dienst/UI/2.0/Describe/ncstrl.cornell/TR96-1593]. RDF vocabularies are themselves modelled formally in RDF, and can be made available for machine processing using the XML "serialisation syntax" for representing RDF data.
This overview does not discuss the RDF syntax, and presumes a basic familiarity with the RDF abstract data model. The RDF model is described fully in [http://www.w3.org/TR/WD-rdf-syntax]. For the purposes of this overview, it is sufficient to note that RDF models all data using a simple labelled diagram. This means that any RDF model can be thought of as a collection of simple three-part sentences or statements that identify some (URI-specified) property of some (URI-specified resource). The values of these properties are either a simple 'literal' value, such as a textual string, portion of XML mark-up, number, date etc, or else the value is another Web resource, identified by URI.
This simple but flexible approach to representing complex data structures allows RDF to incorporate data from multiple vocabularies through the use of URI identifiers for the resources, classes and properties that constitute RDF data.
The remainder of this section explores the implications of this decentralised, modular and extensible architecture for the creation and management of 'quality-related' RDF vocabularies.
In RDF, it is possible to declare a 'Class' of resources (for example 'LOCALVOCAB:GraphicArtist') to be a subclass (i.e. subset) of another independently-defined class of resources, such as 'XYZVOCAB:Agent'. RDF also provides a mechanism 'rdf:type' for saying that a resource is an instance of one or more such classesfn 1.
It is similarly possible to define properties (relations) between classes of resources that are independently defined. In other words, vocabulary creators do not have to start from scratch, and can use RDF to express how the concepts in a new vocabulary relate to items in other vocabularies.
The issue of relating multiple vocabularies in a machine-understandable manner arises when expressing specialisation-relationships between concepts, and when expressing consistency constraints on the allowed use of newly defined constructs. RDF defines two properties, 'rdfs:subPropertyOf' and 'rdfs:subClassOf', which provide the primary mechanism for describing how properties and classes in one vocabulary can be considered specialisations of other classes and properties. Vocabularies can also be intermingled when describing the consistency constraints on the use of properties: a newly defined property might be declared to "make sense" only when connecting resources that are instances of classes defined elsewhere.
Examples
For example, we might define a property called 'UTIL:peerReviewedBy' that makes sense when used to relate a publication (e.g. a resource of type 'WEB:Document') to a person or agency (e.g., of type 'USEFUL:Agent'). This is expressed in an RDF schema using the notion of 'domain' and 'range'. The RDF schema for the UTIL vocabulary that defines 'peerReviewedBy' will contain RDF statements to the effect that:
· UTIL:peerReviewedBy is an RDF:Property
· it makes sense when applied to resource which are members of the class of resources known as 'Web:Document'
· it makes sense when the value of the property is a resource that is a member of the class of resources known as 'USEFUL:Agent'.
The following RDF statements could be used in a schema defining the property 'peerReviewedBy'. Note that in the examples below full URIs are used to identify the 'Web' and 'UTIL' vocabularies, and that 'domain' and 'range' are used to express usage constraints on the new property in terms of vocabulary items defined externally. The XML language-tagging facility is used here to assert that all human-readable text is in English (xml:lang='en') and the RDF properties 'label' and 'comment' are used to provide human readable documentation of peerReviewedBy.
Example schema
The following RDF constitutes a schema definition for a new RDF property called 'peerReviewedBy'. This would be made accessible on the Web, for example by saving it as a text/xml file at [http://rdf.desire.org/vocab/simple1.rdf]. Once the vocabulary has been assigned a URI, any RDF data can use the new property simply by referencing that URI in the namespace definitions.
<rdf:RDF xml:lang="en" |
The above RDF introduces a new property into the RDF world. Below is some example data that draws upon the new property. Note that knowledge of the above schema definition allows us to infer that the resource identified as '/docs/report.html' is of type Web:Document and that the resource identified using the (fictional) URI 'personid:uk-NIcode:NX930366B' is of type USEFUL:Agent.
The example data tells us the title and peer review of the document being described; whilst the former is simple, the latter introduces complexities since it requires the identification of a person. There are as yet no clear conventions for using URIs to identify individualsfn 2.
<rdf:RDF xml:lang="en" xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#" xmlns:eg1="http://rdf.desire.org/vocab/simple1.rdf#" xmlns:dc="http://purl.org/dc/elements/1.0/" xmlns="http://www.w3.org/TR/WD-rdf-schema#" > <rdf:Description rdf:about="/docs/report.html" > <dc:Title>Some Report or Other</dc:Title> <eg1:peerReviewedBy rdf:resource="personid:uk-Nicode:NX930366B" /> </rdf:Description></rdf:RDF> |
These simple facilities - classes, properties, specialisations, range and domain - constitute the core RDF Schema machinery. They provide a mechanism by which RDF processors can check the consistency of data, or infer missing facts to transform inconsistent data into consistent data.
The basic overview of the RDF Schema facilities presented above introduces a key difference between RDF and preceding metadata proposals such as Warwick Framework. RDF allows a vocabulary to be defined in such a way as to re-use or specialise concepts formalised in other vocabularies, rather than requiring each application to stand-alone. This is a powerful mechanism, but one which presents challenges to vocabulary designers, particularly at this early stage in the deployment of RDF.
In the context of a proposed RDF Quality Ratings Vocabulary, these issues are particularly important. As noted above, the "quality" of some resource is not some simple and intrinsic property analogous to size, shape or price. There is no simple scale from 'very high quality' to 'absolutely lacking in quality' which can uncontroversially characterise the objective quality of an object. Rather, quality is bound up with the notion of fitness for purpose. Different individuals in different contexts will have wildly varying notions of the fitness for their purpose of various Web resources.
The challenge here is to build a modular framework for using RDF statements about resources, people and their 'fitness for purpose' judgements to enable step by step improvements in Web usability. For example, a resource which has a technological dependency on the application/Shockwave file format may be of great use to people equipped with a computer that understands Shockwave files, whilst at the same time being a time-wasting distraction to users whose computers are not capable of rendering Shockwave content. It might be useful that a DESIRE Quality Vocabulary should include some notion such as a "has-technology-dependency-on" property so that resources that require Java, VRML, HTML Frames or Shockwave could be explicitly labelled as such. Whilst we could very easily define such a property for RDF, it is not currently clear whether this would be advisable. At this stage in the development of RDF there are a number of vocabulary creation activities under way which have direct bearing upon the issues addressed in DESIRE.
The modular nature of RDF, as presented above, makes it possible for a 'quality rating' of a resource to draw upon statements expressed using DESIRE vocabularies as well as those described elsewhere. In the section that follows we present an overview of some other proposed RDF vocabularies, and discuss how these relate to the broad-brush 'motivating examples' which set the scope for the DESIRE RDF quality activity.
Having described the modular architecture of RDF, it is useful to consider how best to exploit it when considering the creation of quality-related vocabularies. For example, it would be a very significant undertaking for the DESIRE vocabulary to attempt to comprehensively characterise the technological dependencies (frames, audio facilities, Java etc) of a Web resource. Yet these are highly relevant to the usability of that resource, and hence it is context-specific subjective quality for end users. For this reason, a preliminary recommendation of this report is that the ability to make RDF assertions of this nature, whilst useful, is a facility we expect to be enabled using metadata vocabularies defined elsewhere.
The Composite Capability/Preference Profiles [CC/PP] proposal for a user side framework for content negotiation proposed in [http://www.w3.org/TR/NOTE-CCPP/] provides such a vocabulary. Currently in an early stage of development, NOTE-CCPP appears likely to develop into an RDF vocabulary which will make technology-oriented "fitness for purpose" judgements easier. Future DESIRE quality/RDF demonstrators will explore the use of the CC/PP vocabulary in the context of ratings bureaux that describe the properties of specified resources. We consequently do not at this stage anticipate a need to provide vocabulary items within DESIRE for this area.
In this note we describe a method for using RDF, the Resource Description Format of the W3C, to create a general, yet extensible framework for describing user preferences and device capabilities. The user can provide this information to servers and content providers. The servers can use this information describing the user's preferences to customize the service or content provided.[...] http://www.w3.org/TR/NOTE-CCPP/
Similarly, it would be an ambitious undertaking to establish a vocabulary for describing the formal credentials of authors, reviewers, publishers etc., or a taxonomy of such agents. Much is to be gained by sharing effort with others. The P3P (Platform for Privacy and Preferences) project within W3C, for example, is creating an RDF-based framework for Web-based negotiation, representation of user preferences etc. Similarly, the Dublin Core initiative is exploring the use of RDF for representing the various agencies (publishers, creators, contributors) associated with a 'document like' object on the Web. In turn, these agents (people, organisations...) tend to have common sets of properties (e.g., phone numbers, addresses etc).
The VCard project provides a simple and widely supported specification for such properties; however, an RDF representation has not yet been defined. The problem here is that there is an overlapping Web of RDF applications which could all usefully gain from simple base concepts such as 'WebDocument', 'Agent', 'Organisation', 'Person' etc. In the absence of any proposed "top level ontology", vocabulary designers are left in a position of having to invent these categories within application schemas. For example, the 'peerReviewedBy' property sketched above makes sense in the context of RDF classes representing agents and documents. We anticipate the development of such classes, possibly as part of the P3P and Dublin Core activities.
There is a blur between descriptive metadata and indicators of quality attributes. One implementation of a vocabulary may be to enhance the descriptive element of metadata records. Within a subject gateway's catalogue records, much of the useful descriptive information about a resource is typically found in the free text description or abstract field. This is helpful information upon which the user may base a decision about whether or not to visit that resource. However, it would be more beneficial for this to be represented formally in a machine searchable way. Information such as the provenance of information, the review mechanisms the resource had to go through before being published on the Internet, the publication status of the resource, etc. all may be represented in the vocabulary. These could be usefully used to filter or prioritise resources according a user’s stated preferences. It may also be used as a filtering mechanism to aid the selection process for subject gateways. Automated programs could analyse Web pages for certain attributes (such as link integrity, last updated date, etc.,) which could feed into the decision to accept or reject a resource.
Below are a few examples of attributes for a vocabulary that may be used to rate resources (some of these may already be part of existing metadata formats):
Name of Attribute | Definition | Values |
Provenance | The originating agency for the resource | Academic, Government, Organisation, Commercial, Personal, Unknown |
Intended Audience | The audience that the resource was originally developed for | Researcher/Academic, Practitioner/Professional, Student, General/Popular, Unknown |
Review Mechanism | What editorial process or checks (if any) the resource has gone through | Publisher, Editor, Peer Review, Reader Comments, None, Unknown |
Authorship Verified | Whether or not the author of a resource is verifiable | Yes, No |
Dependency on Technology | If the resource requires special technology to use it e.g. Java | From an enumerated MIME types list |
Registration | If you need to register to use the resource | Yes, No |
Cost Involved | If the resource has a charge associated with its use | Yes, No |
Link Integrity | Whether or not the links within a resource are still live and active | express as a % or excellent, good, average, poor |
Last Updated Date | The date that the resource was last updated |
One application of a vocabulary may be to describe the 'form' properties of a resource - these are the properties concerned with the presentation and organisation of a resource and the interface through which it is presented. The aspect of quality described by this vocabulary would be one of accessibility and usability of the resource. The vocabulary would be useful for describing and choosing the accessibility of resources for a whole range of users including people with disabilities, users using new page viewing technologies (mobile and voice), and electronic agents such as indexing robots.
The DESIRE quality guidelines suggest a number of criteria concerned with the form (presentation and organisation) of a resource. However these guidelines have been superseded to some extent by the work of the W3C Web Accessibility Initiative (WAI), who have been co-ordinating with many organisations to develop a comprehensive and unified set of accessibility guidelines. These could be employed to create a standard vocabulary for the format of Internet resources. The working draft of the WAI Accessibility Guidelines on Page Authoring provides a list of guidelines that page authors should follow in order to make their pages more accessible. Conformance to the WAI guidelines would imply that the resource is accessible to the widest possible audience and also provide opportunities for users to filter resources based on these properties e.g. not to offer any resources that are not viewable by the user's access mechanism.
To create machine-readable ratings for accessibility, each of the WAI guidelines might be encoded as a formal classification scheme, or we might have a more general yes/no category such as "meets most of the WAI guidelines". In a usage context a personalised search environment which knew something about the users information needs could prioritise search results on the basis of (a) their preferences ("no shockwave", "only highly usable sites", "sites that meet WAI-A.7 only" and (b) classification of those resources by some agency, mechanical or human. The text below is taken directly from the WAI Authoring Guidelines, some or all of these guidelines could be used to generate accessibility ratings for resources.
WAI Page Authoring Guidelines:
A.1. Provide alternative text for all images, applets, and image maps.
A.2. Provide descriptions for important graphics, scripts, or applets if they are not fully described through alternative text or in the document's content.
A.3. Provide textual equivalents (captions) for all audio information.
A.4. Provide verbal descriptions of moving visual information in both auditory and text form.
A.5. Ensure that text and graphics are perceivable and understandable when viewed without colour.
A.6. Indicate structure with structural elements, and control presentation with presentation elements and style sheets.
A.7. The resource should ensure that moving, blinking, scrolling, or auto-updating objects or pages may be paused or frozen.
A.8. Provide supplemental information needed to pronounce or interpret abbreviated or foreign text.
A.9. The resource should ensure that pages using newer W3C features (technologies) will transform gracefully into an accessible form if the feature is not supported or is turned off.
A.10. Elements that contain their own user interface should have accessibility built in
A.11. Use features that enable activation of page elements via input devices other than a pointing device (e.g., via keyboard, voice, etc.).
A.12. Use interim accessibility solutions so that assistive technologies and older browsers will operate correctly. as often as the inaccessible (original) page.
B.1. For frames, provide sufficient information to determine the purpose of the frames and how they relate to each other.
B.2. Provide contextual information about the relationship between group controls, selections, and labels into semantic units.
B.3. Ensure that tables (not used for layout) have necessary markup to be properly restructured or presented by accessible browsers and other user agents.
B.4. Wherever possible, create good link phrases that are meaningful out of context.
C.1. Only use technologies defined in a W3C specification and use them in an accessible manner. Where not possible, provide an accessible alternative page that does.
C.2. Provide mechanisms that facilitate navigation within your site.
C.3. Create a single downloadable file for documents that exist as a series of separate pages.
In addition to being able to describe individual resources there is also a need to be able to rate services that are engaged in the description of those resources. One of the motivating examples at the beginning of the report was a user who was looking for third party descriptions of a resource provided by a metadata server run according to a particular collections policy. The application of a vocabulary to describe the characteristics of particular metadata collections would allow individual users to “sign-up” to a service with known properties and policies. Moreover, it would allow a collaborating mesh of metadata collections to share information about holdings, collection policies, and redirect search queries based on these properties. Some work on the notion of collection level descriptions is underway in the UK [http://www.ukoln.ac.uk/metadata/cld/]. This could be expanded to provide particular characteristics of collection types.
The example below attempts to show how the characteristics of a particular metadata collection could be represented in a machine-readable format. This makes an assumption that you can come up with a set of common characteristics to describe the properties of such a collection. This example describes the properties of SOSIG (Social Science Information Gateway) against a set of criteria that DESIRE might recommend as requirements for a subject gateway. The notion of a collection level description could also be used to provide ‘kitemarking’ of resources i.e. all resources that have been catalogued by a DESIRE subject gateway
DESIRE Gateway Criteria
In order for a gateway to be considered a "DESIRE Gateway" it would be expected to have most or all of the following features:
Scope Policy
Selection Criteria
Content Management Policy
Educational focus
National or international coverage
Accept recommendations from user community
Evaluation procedures
Classification scheme
Browse facility
Abstracts or descriptions of resource
Search mechanism
Machine searchable interface
Text and graphical access
Meets W3 Web Accessibility Initiative guidelines
Link integrity policy
Record integrity policy
Cataloguing rules
Example Description in RDF:
<rdf:RDF xmlns:rdf=" http://www.w3.org/TR/WD-rdf-syntax#"
xmlns:rdfs="http://www.w3.org/TR/WD-rdf-schema#"
xmlns:qual="http://www.desire.org/rdf/quality.rdf#"
xmlns:dc="http://purl.org/dc/elements/1.0/"
xmlns:cld="http://www.ukoln.ac.uk/metadata/cld/vocab.rdf#" >
<rdf:Description about="http://www.sosig.ac.uk/">
<rdf:type resource="http://www.desire.org/rdf/quality.rdf#hasscopepolicy"/>
<rdf:type resource="http://www.desire.org/rdf/quality.rdf#hascollectionpolicy"/>
<rdf:type resource="http://www.desire.org/rdf/quality.rdf#hascontentmgtpolicy"/>
<dc:Title>Social Science Information Gateway</dc:Title>
<dc:Title>SOSIG</dc:Title>
<dc:Subject>Economics, Development, Law, Education, Management, Accountancy, Business, Environmental Issues, Philosophy, Demography , Politics, International Relations, Ethnology, Social Anthropology, Psychology, Feminism, Social Science General, Methodology, Geography, Social Welfare, Community, Disability, Education, Sociology, Government, Military Science, Statistics, Demography</dc:Subject>
<dc:Description>SOSIG is an online catalogue of thousands of high quality Internet resources relevant to social science education and research. Every resource has been selected and described by a librarian or subject specialist.</dc:Description>
<dc:Publisher>University of Bristol</dc:Publisher>
<cld:Type>Collection.Catalogue.Internet.Subject</cld:Type>
</rdf:Description>
</rdf:RDF>
One of the suggested uses of a vocabulary is to provide a 'browsing companion' for users to provide information about resources as they browse the Web. This would also allow them to recommend resources they discovered to other users. These recommendations would provide a further measure of "fitness for purpose" within a specified community of users, e.g. within a subject gateway community. These recommendations could be used to prioritise the ordering of resources within search results.
An important issue here is the source of the recommendation, any number of people may be making recommendations including; third party services, subject librarians, colleagues, peers, etc. It will be important to verify who is making the recommendation, what their qualifications are for making it, evidence of usage and any motivation (or bias) in making the recommendation. These will need to be aggregated in some way (perhaps combined with usage data or citation analysis). The use of digital signatures will help the verification process.
In terms of a vocabulary this may be as simple as: "Do you recommend this Resource? - Yes/No. Jakob Neilsen refers to a similar idea for recommending resources in his Alertbox Column:
Typically, this would be done by adding two buttons to the interface: a thumbs-up button and a thumbs-down button. A neutral rating would be given by doing nothing (since we want to minimize overhead in the user interface), but when a user encounters something particularly good, he or she would hit the "good" button. Similarly, disappointing services would be punished by a click on "bad." (Neilsen 1998)
Example of a possible browsing companion service