Project Number:

RE 4004 (RE)

Project Title:

DESIRE II - Development of a European Service for Information on Research and Education II

Deliverable Type:

RP

Deliverable Number:

D3.6b

Contractual Date of Delivery:

Internal Deliverable

Actual Date of Delivery:

15 June 2000

Title of Deliverable:

Enhancements to User Interface

Workpackage(s) contributing to the Deliverable:

WP3

Nature of the Deliverable:

RE

Author:

Phil Cross, Dan Brickley, Traugott Koch

Contact Details:

phil.cross@bristol.ac.uk

daniel.brickley@bristol.ac.uk

Institute for Learning & Research Technology, University of Bristol, 8-10 Berkeley Square, Bristol, UK, http://www.ilrt.bris.ac.uk/

traugott.koch@ub2.lu.se

NETLAB, Lund University, Lund S-221 00, Sweden

http://www.lub.lu.se/koch.html

Other Authors:


URL

http://www.desire.org/html/research/deliverables/D3.6/D3.6b.html

Abstract

This deliverable comprises two sections: the first covers suggested methods for implementing cross-browsing as a means of searching across subject gateways; the second is concerned with the encoding, storage, and use of Web-based hierarchical controlled vocabularies, as aids to the keyword searching of subject gateways. Browsing is a commonly provided complementary option to keyword access to resources indexed on subject gateways. Problems with the browsing paradigm arise, however, when cross-browsing between gateways is implemented. In particular, this report looks at two mechanisms whereby resources from different gateways can be retrieved and displayed; as well as looking at the issues that arise when the different gateways implement different classification systems, leading to problems in cross-mapping between the sections of the different schemes. The second part of the deliverable is concerned with the structure of both thesauri and classification systems, covering the different relationships that can exist between the concepts they contain. It also looks at a method of encoding thesauri for storage and/or data transferral, providing a candidate for a standard syntax that uses the Resource Discovery Framework (RDF) and XML.

Keywords

Subject gateways, Browsing, Cross browsing, Classification, Classification schemes, Resource Discovery Framework, XML, Metadata, Thesauri


Distribution List:

Public usage

Issue:

1.0

Total Number of Pages:

27 pages


Scope Statement

This report covers two fairly separate subjects, which are both of significance to the further development of Subject Gateways. The original DESIRE project championed the development of Subject Gateways as a means of expediting the discovery of high quality research-level information over the Internet. DESIRE II has continued with this theme, particularly through the publication of the Information Gateways Handbook. It now seems likely that the future development of Subject Gateways will depend upon cooperation between gateways, particularly through interoperability, and other projects such as Renardus (www.renardus.org) and Imesh (www.imesh.org) are studying methods for how this can best be achieved. The first section of this report concerns the problems associated with the cross-browsing of subject gateways and two possible mechanisms by which this may be accomplished.

The second part of the report discusses the types of relationships between documents, between terms in thesauri, and between classes in classification systems; both of the latter two systems being used within Subject Gateways as means for improving subject access. We also consider a basic core subset of these relationships that are relevant for hierarchical thesauri and show how these can be encoded using the newly developed Resource Discovery Framework Schema mechanism, and expressed using XML. Hierarchical controlled vocabularies are now seen as a very useful means of aiding subject access to collections of data, and work is underway to see how such vocabularies can be made multilingual and also how different schemes may be cross-mapped. We believe that the system of encoding thesauri we propose can act as a common format to facilitate the transfer of controlled vocabularies between organisations, but will also facilitate the expression of relationships between the terms of different thesauri and between the different language terms within a single thesaurus.


1. Introduction

This deliverable comprises two sections: the first covers suggested methods for implementing cross-browsing as a means of searching across subject gateways; the second is concerned with the encoding, storage, and use of Web-based hierarchical controlled vocabularies, as aids to the keyword searching of subject gateways.

Browsing is a commonly provided complementary option to keyword access to resources indexed on subject gateways. It is particularly popular with end-users and becomes possible when the resources have been categorised using some structured subject scheme. With existing subject gateways, the latter is usually an hierarchical classification scheme, either home-grown, based upon the resources held within the collection, or else based upon a published general or subject-specific system. With individual resources assigned to the sections within the chosen scheme, the scheme’s hierarchy can be displayed to the user through a series of hyperlinks, each section listing those titles assigned to it. Locating resources by this means is most effective for users who are looking for resources on a particular topic without having a specific title in mind. The hierarchical classification system can present a view of the structure of the subject to the user and lead them through a logical series of broader, narrower, and, possibly, related sections, to the subject area closest to their topic of interest.

Problems with the browsing paradigm arise, however, when cross-browsing between gateways is implemented. One set of questions concerns the interface issue of how the fact of cross-browsing a number of gateways should be presented to the user; other questions concern the mechanism whereby resources from different gateways are retrieved and displayed; and a further issue arises when the different gateways implement different classification systems, leading to problems in cross-mapping between the sections of the different schemes.

The second section of this report concerns the use of thesauri as tools to improve keyword searching of gateways. In essence, there are similarities between thesauri and classification systems: both consist of a representation of the relationships between the entities of the system: controlled terms in the case of thesauri, and classes in the case of classification systems.

Their differences are due mostly to the different purposes behind their development: classification systems were developed to enable objects to be physically ordered by subject, using a nomenclature assigned to the different sections of the scheme that can express a systematic order; thesauri were developed for use with abstracting journals to improve indexing of the research articles covered. With the application of both systems to online resources, these differences become less obvious, although the different approaches to their development mean that classification systems are best used for grouping documents to support browsing as described above, while thesauri, using more specific terminology, are used for indexing individual documents to support searching.

The second part of this deliverable is thus concerned with the structure of both thesauri and classification systems, covering the different relationships that can exist between the concepts they contain. It also looks at a method of encoding thesauri for storage and/or data transferral, providing a candidate for a standard syntax that uses the Resource Discovery Framework (RDF) and XML. Although this method could also be applied to classification systems, this has not been attempted for this report.

This approach describes a thesaurus schema that can be used with the RDF Triplestore API presented in the DESIREII toolkit <ref>. The triplestore is a database designed for storing the predicate, subject, object triples that form the basis of the RDF syntax, such as:

(title, document URI, "My Homepage").

When used with the thesaurus RDF Schema, this flexible method of storage has the added benefits of providing a mechanism for storing multilingual thesauri, and of providing a structure suitable for representing cross-mapping between thesaurus schemes (for which some kind of cross-mapping schema would then be required).

2. Cross-browsing paradigms

Subject Gateways use browsing as a method of providing systematic subject access, which is what differentiates them from other rather unstructured information gateways. It is this provision of a mechanism for classifying indexed resources into a structured subject hierarchy, and the provision of a user interface into that structure, which is part of the definition of a subject gateway, as envisaged by the DESIRE projects.

Much work is currently being done on the interoperability of subject gateways to provide a broader and wider range of resources through a single user interface. This requires some mechanism for brokering searches to a number of separate gateways, whether this is provided by a central repository of data or through a cross-searching mechanism. Standards such as Z39.50 and WHOIS++ provide for the latter technique and are being explored in a number of projects such as Renardus <ref> and the IMesh project <ref>. However, there has been little work done on investigating methods of cross-browsing such distributed systems.

There is no standard definition of exactly what constitutes cross-browsing, possibly because there are so few actual instances of it in practice. The broadest definition might simply refer to any technique that allowed the browsing search paradigm to be applied across more than one gateway, whether or not this fact is apparent to the end-user. The major problem in implementing such a system comes from the large number of different classification schemes used by subject gateways and the resulting need to cross-map between them. Due to the different structures of classification schemes and the varying degrees of specificity or depth of the different areas within them, such cross-mapping can be problematical.

Aside from the problem of cross-mapping, however, is the issue of what mechanism is to be used to present a cross-browsing system to the end-user. We describe below two possible mechanisms by which this may be done. The first method consists of populating the sections of the browse tree of one gateway with links to individual resources held on other, remote Gateways; whilst the second method looks at the technique of placing some form of link from a sub-section of one tree to an equivalent sub-section in the tree of a remote Gateway, thus allowing the user to literally browse across different services.

2.1. Populating a browse-tree with links to remote resources

This technique has been adopted by the Social Science Information Gateway (SOSIG) based at the Institute for Learning and Research Technology. This subject-based gateway is run using the ROADS gateway software (ROADS Liaison, 2000). This allows it to cross-search other ROADS-based gateways using the WHOIS++ protocol. SOSIG currently cross-searches Biz/ed (Biz/ed Team, 2000), a Web site that contains a business and economics subject gateway. Since SOSIG allows browsing of its resources (using the Universal Decimal Classification system, UDC), it was necessary to incorporate Biz/ed resources into the browse structure as well. The technique used for this was not available within the ROADS software and was devised in-house for SOSIG.

With ROADS subject gateways, browse trees consist of static html files that are compiled nightly. The process involves parsing each record to extract details including the title of the resource, its URI, unique record ID, and assigned class numbers. Once this information is collected for all records, a separate html browse page can be made up for each class number used, containing a list of all resources assigned to that number. Each item within a page, as well as displaying the resource title, has the remote resource’s URI to provide a hyperlink directly to it, as well as the record ID. This latter information, hidden from the user, allows a second link to send a request via the ROADS interface to the WHOIS++ server for the full details of the record to be displayed. This link must also include the name of the SOSIG WHOIS++ server so that the request for full record details is sent to the correct address.


Figure 1. SOSIG Economic Geography Section showing three records held on the remote Biz/ed Gateway

So that users can cross-browse the Biz/ed catalogue, Biz/ed resources are also parsed for details of title, URI, class numbers, and record ID, with the data collated and transferred to SOSIG as a single file. This information can then be incorporated into the static SOSIG browse pages, only with a different WHOIS++ server name attached to each item. Users selecting a resource title to see further details need not be aware of whether the request is being sent to the local or a remote server.

This technique raises certain difficulties and other issues.

2.2. Linking between Gateway hierarchies

The second approach to cross-browsing comprises direct links from within the sub-sections of a browse tree to equivalent sub-sections within the hierarchies of remote catalogues. This involves moving to another gateway rather than using some form of 'GET' command to have information transferred from the other gateway.

Cross-mapping is again an issue here, and it may be necessary to point to a more general level within the other scheme if a structure of equivalent detail is not provided. This is a situation that needs to be considered in the creation of cross-mapping schemas: defining different types of cross-mapping relationships to ensure the user understands the nature of the cross-link encountered.

This approach to cross-browsing has some useful advantages, and a couple of possible disadvantages.

2.2.1. A Possible technical implementation

In implementing such a cross-browsing method, hard-coding individual links within each browse section is undesirable. One possible mechanism we investigated was to make use of a bookmarklet that activates a script to look up a list of related remote gateway pages. These are presented to the user in a newly opened Window as a list of hyperlinks (see figure 2).

Bookmarklets are small JavaScript scripts (actually JavaScript URLs), which are inserted as the value of a standard browser bookmark; when the bookmark is selected, the script is activated. With the latest versions of Netscape Navigator and Internet Explorer, bookmarks can be placed onto the browser toolbar, producing the equivalent of a ‘button’ to activate the script. In the experimental setup, the JavaScript sends the URI of the page currently being viewed to a Perl script residing on a remote server. This latter script performs a simple lookup on a text file, which on each line has a list of the URIs of related browse-sections within different gateways, therefore providing a textual representation of the cross-mapping data.

Figure 2. List of equivalent remote browsing sections generated by a cross-browsing bookmarklet

This approach is very simple but effective. The same basic approach could also be achieved by configuring Netscape Navigator's ‘What’s related’ facility to point to the remote script, or simply to have hard-coded links pointing to the script, rather than to individual Web pages.

A problem with this system is that a configuration file or database, containing related URIs, has to be constructed (and maintained) by hand; the script working by recognising the specific URI of page the user is browsing. The database also has to be updated whenever URIs change. A way to reduce this reliance on URIs would be to make use of metadata within browse pages to provide information on the subject scheme used and subject-area covered. A metadata scheme such as Dublin Core could be used for this purpose. With this information present in the browse pages, a spider program such as DESIRE's Combine harvester (Lundberg, 2000) could extract the subject covered and URI from each page of a subject gateway browse-tree. This information would then be stored in a central database. The various subject schemes used by the gateways would need to be mapped to a single standard overarching scheme.

With this information available, the bookmarklet, or other mechanism on the user's browser, could call a remote script that would download and parse the page the user was currently viewing, extract details of the scheme and subject area covered, and then perform a lookup on the database of subject gateway browse pages, returning a list of comparable browse pages to the user. With this approach, a single bookmarklet would work with any gateway, provided that its browsing scheme was encoded as metadata and cross-mapping to the overarching scheme was available.

3. Thesauri

3.1. Conceptual relationships for RDF encoding of thesauri, classification systems and organised metadata collections

This section proposes an RDF representation of various conceptual relationships typical of controlled vocabularies such as thesauri, classification systems and organised metadata collections. The aim is to explore the use of RDF as a common formalism for representing a variety of different thesauri and classification systems within the same overall framework. By doing so, we expect to leverage generic RDF facilities (such as query and storage software components), and also to have a basis for mapping between subject classifications expressed using these various vocabularies.

The approach taken here is to divide the problem into two stages. Firstly we define a simple core RDF representation of concepts such as 'broader term' and 'narrower term' typically used in classification and thesauri systems. Then we extend this with a range of more semantically meaningful relationships expressed in terms of classes of objects. Many vocabulary systems have a tacit or unarticulated semantic model obscured behind relatively uninformative relationships such as 'broader' and 'narrower'. It is usually impossible to mechanically derive a richer set of relationships from a system based around these vague, generic relation types. General hierarchical relationships are frequently used to indicate one of several actual relationships. The relationships 'is a', 'has instantiation', and 'has part', for example, might all be encoded using the less informative 'narrower' relation.

The simpler 'core' relations are best thought of as being relationships between named concepts or terms, rather than as relations between real world (or abstract) entities. In other words, while we might say that "Fido is a dog" using a rich, semantic relationship, we would say that "the-term-Fido has-broader-term the-term-dog". The vague 'broader term' relation in this case subsumes the more informative 'is a' relation. The proposal in this document separates out these two approaches since it is crucial to remain unambiguous about when a node ('resource') in an RDF data model represents a named concept or term rather than some less abstract entity, i.e. the "thing in itself".

A further reason for creating two distinct RDF representations for these vocabulary systems is that RDF itself includes some common core vocabulary elements which have some overlap in functionality with the semantic modelling facilities required to transform simple flat vocabulary systems into richer knowledge bases. In particular, the RDF specifications define notions of 'Class', 'Property', 'subClassOf', 'type', 'domain' and 'range', which may be applicable to the task described above. By first addressing the need to find a simple RDF representation for the broader/narrow/preferred relationships, i.e. those simple relations which make sense in the context of terms/concepts rather than semantically modelled entities, we should be able to make some initial progress without having to solve the entire problem of 'knowledge modelling in RDF'.

The remainder of this section walks through the desired set of relationships, using bold type to indicate candidates for the simple core RDF vocabulary. The following section sketches how a machine-processable RDF representation of the simple term-oriented concepts of a thesaurus might look, and finally, we give a machine-readable RDF Schema for the simple vocabulary.

In the definitions given below, the term Category is used when the relationship applies to classification systems, Term when the relationship applies to thesauri, and Document when the relationship can be used with individual documents.


A HIERARCHICAL RELATIONSHIPS

Label: BroaderTerm

Term: Broader term

Member of: A

Definition: Term one level up in a hierarchy, without specification of the type of hierarchical relationship

Label: NarrowerTerm

Term: Narrower term

Member of: A

Definition: Term one level down in a hierarchy, without specification of type of hierarchical relationship

(For inclusion in the simple core vocabulary)

A1 GENERIC RELATIONSHIP

Label: IsA

Term: is a (instance of)

Member of: A1

Definition: Term/Category is an instance of a term/category one level up in the hierarchy

Label: HasInstantiation

Term: has instantiation

Member of: A1

Definition: Term/Category has an instantiation one level below in the hierarchy

A2 Whole-part relationships

Label: IsPartOf

Term: is part of

Member of: A2

Definition: Document/Category/Term represents a (unspecified) part of a document/category/term one level up in the hierarchy

Label: HasPart

Term: has part

Member of: A2

Definition:Document/Category/Term has a (unspecified) part one level below in the hierarchy

Label: IsSpatialPartOf

Term: is spatial part of

Member of: A2

Definition: Term/Category represents a spatial/geographical part of a term/category one level up in the hierarchy

Label: HasSpatialPart

Term: has spatial part

Member of: A2

Definition: Term/Category has a spatial/geographical subterm/subcategory one level below in the hierarchy

Label: IsConceptuallyPartOf

Term: is conceptually part of

Member of: A2

Definition: Term/Category is a subconcept to a term/category one level up in the hierarchy

Label: HasConceptualPart

Term: has conceptual part

Member of: A2

Definition: Term/Category has a subterm/subcategory one level below in the hierarchy

Label: IsCollectionMemberOf

Term: is collection member of

Member of: A2

Definition: Document/Category/Term is member of a group of documents/categories/terms

Label: HasCollectionMember

Term: has collection member

Member of: A2

Definition: Group of documents/categories/terms has member

B EQUIVALENCE RELATIONSHIPS

B1 Single directional equivalence

Label: Use

Term: use, see

Member of: B1

Definition: The term/category pointed to should be preferred

Label: UsedFor

Term: used for

Member of: B1

Definition: The term/category pointed to is the non-preferred term/category

(For inclusion in the simple core vocabulary)

Label: IsVersionOf

Term: is version of

Member of: B1

Definition: The document/category/term pointed to is a version of another document/category/term

Label: HasVersion

Term: has version

Member of: B1

Definition: The document/category/term has another version

B2 Bi-directional equivalence

Label: IsSynonymOf

Term: is synonym of

Member of: B2

Definition: The term is a synonym of the one pointed to

Label: IsFormatOf

Term: is format of

Member of: B2

Definition:The document is a result of a format transformation of the one pointed to

C ASSOCIATIVE RELATIONSHIPS

Label: RelatedTerm

Term: related term, see also, similar to

Member of: C

Definition: The document/category/term pointed to is related (in an unspecified way)

(For inclusion in the simple core vocabulary)

Label: IsReferencedBy

Term: is referenced by

Member of: C

Definition: The document is referenced by the document pointed to

Label: References

Term: references

Member of: C

Definition: The document is referencing the document pointed to

Label: IsRequiredBy

Term: is required by

Member of: C

Definition: The document/object is required by/dependent on the document pointed to

Label: Requires

Term: requires

Member of: C

Definition: The document requires/is dependent on the document/object pointed to

Label: IsBasedOn

Term: is based on

Member of: C

Definition: The document/term is based on the document/term/object pointed to

Label: IsBasisFor

Term: is basis for

Member of: C

Definition: The document/term/object is the basis for the document/term pointed to

Label: IsDerivedFrom

Term: is derived from

Member of: C

Definition: The document/term is derived from the document/term pointed to

Label: HasDerivate

Term: has derivate

Member of: C

Definition: The document/term has the derivate pointed to

Label: IsTranslatedFrom

Term: is translated from

Member of: C

Definition: The document/term is translated from the document/term pointed to

Label: HasTranslation

Term: has translation

Member of: C

Definition: The document/term has the translation pointed to

Label: IsInterpretationOf

Term: is interpretation of

Member of: C

Definition: The document is a (creative, artistic) interpretation of the document/object pointed to

Label: HasInterpretation

Term: has interpretation

Member of: C

Definition: The document has a (creative, artistic) interpretation pointed to

Label: IsMappedTo

Term: is mapped to

Member of: C

Definition: The document/category/term is mapped to the document/category/term pointed to

Label: HasMapping

Term: has mapping

Member of: C

Definition: The document/category/term has this document/category/term mapped to it

Label: IsLinkedFrom

Term: is linked from

Member of: C

Definition: The document/category/term is linked to from the document/category/term pointed to

Label: HasLinkTo

Term: has link to

Member of: C

Definition: The document/category/term is linking to the document/category/term pointed to

Label: IsSameLevelNeighbour

Term: is same level neighbour

Member of: C

Definition: The document/category/term is a neighbour on the same level of a organisational structure to the document/category/term pointed to

Label: IsTopologicalNearestNeighbour

Term: is topological nearest neighbour

Member of: C

Definition: The document/category/term is a topologically nearest neighbour in a organisational structure to the document/category/term pointed to

3.2. RDF core vocabulary

The relationships in bold above are candidates for a simple core set of relationships for a thesaurus. They are:

This terminology is taken from ISO 2788: Guidelines for the establishment and development of monolingual thesauri (International Organization for Standardisation, 1986).

The terminology deals with the terms themselves, that is, the lexical representation of concepts. For the creation of an RDF schema for storing structured vocabularies, we decided to differentiate between the lexical representation of a concept and the concept itself. It was felt that the unique resource should be the concept, each concept resource being indicated by one or more term resources. Thus the RDF resource used to represent cats, would be indicated by a term whose value was the word "cats". This is represented by the graph in figure 3 below.

Figure 3. RDF graph representation of the concept representing a cat (concept 5)

In figure 3, concept_5 represents the concept of cats. Its indicator is a term (term_7) whose value is the text string "cats". Another term indicating the concept might have the value "chats".

As a result of the above approach, the RDF schema refers to relationships between concepts rather than between terms, and this is reflected in the vocabulary used below, e.g. broaderConcept rather than broaderTerm.

Whilst the relationships: 'broader', 'narrower', and 'related' are still meaningful when considering concepts rather than terms, the relationships 'use' and 'used for' refer only to terms. This is because 'use' and 'used for' indicate which particular term has been chosen to be used to represent the relevant concept when indexing some resource. For the core RDF vocabulary then, these relationships have instead been represented by properties of the term resources. This is referred to using the attribute 'termUsage', which has values of 'preferred' or 'nonPreferred'. The second issue considered was that since broaderTerm and narrowerTerm are commutative, i.e.

A narrowerTerm B implies

B broaderTerm A,

utilising both relationships when storing or transferring the vocabulary data would be inefficient. We therefore decided to create a relationship 'broaderConcept' for the RDF Schema but not 'narrowerConcept', as this is implied; it being the responsibility of any application using the data to deduce the opposite relationship and present it to the user.

The second relationship between concepts chosen for the schema was 'relatedConcept'. This term is bi-directional, and hence if the relationship

A relatedConcept B exists, then it is implied that

B relatedConcept A is also true.

Hence we only add one of the two possible pairs to the datastore.

A further attribute often used within thesauri, is 'top term'. This indicates a term that is at the top of a hierarchy within the thesaurus. Since this is a property that may be deduced by an application from the lack of a broaderConcept property for that concept, this attribute is also left out of the schema.

broaderConcept and relatedConcept were therefore selected as the only two core relationships between concepts that would be required for a basic RDF vocabulary schema. Other properties are required however to allow the encoding of thesauri, taking into account the recommendations of ISO 2788 (International Organization for Standardisation , 1986) and general thesaurus usage. These are listed in the next section which describes the RDF thesaurus schema proposed.

3.3. RDF Thesaurus Schema

3.3.1. Resource Description Framework Schemas

The Resource Description Framework (RDF) is a W3C (World Wide Web Consortium, 2000) recommendation for representing structured data on the Web. RDF, like both the Web and thesaurus systems, is based around a strategy of managing information as a collection of links between uniquely named entities. RDF's Web-based information model uses the term 'resource' to refer to the entities that it models, and provides an application-neutral framework within which various kinds of entities and relationships can be described. A general introduction to RDF is beyond the scope of this document. The W3C home page for RDF (Swick, 2000) lists a number of introductory tutorials as well as the RDF specifications.

In this document we describe the application of RDF to the description of thesaurus-like data structures. Specifically, we show how the RDF data model can represent a Web of inter-related concepts and terms from one or more thesauri. To do this, we define a simple RDF vocabulary that uses Web identifiers (Universal Resource Identifiers) to name some relationships and resource types useful for the description of concepts and terms in a thesaurus. It should be noted that we do not here attempt to model the richer semantic relationships that hold between the entities denoted by such concepts, although RDF itself can also be used to represent this kind of information

3.3.2. Proposal for an RDF Thesaurus Schema

The XML/RDF thesaurus schema is set out in Appendix A. An example set of XML/RDF thesaurus data is given in Appendix B.

As described above, the schema consists of two main resources: Concept and Term. Concept resources are related by the properties: 'broaderConcept' and 'relatedConcept'. Concepts have a property 'indicator' which points to one or more term resources. The value of each Term resource will be the actual text string.

As noted above, the Term resources have an optional property called 'termUsage', which can be used with those thesauri that have non-preferred terms linked to preferred terms through the use/'used for' relationships. The value of termUsage must be either the string 'preferred' or 'nonPreferred'.

A second Term property is 'lang', which can be used to indicate the language of the term; thus a single concept can be 'indicated' by both preferred and non-preferred terms, and by terms from different languages (there is likely to be one preferred term for each language). The thesaurus schema therefore provides a mechanism for storing multilingual thesauri. If an English term and a German term both 'indicate' the same concept resource, it is implied that the two terms are either equivalent, or at least are treated as such for indexing purposes.

It may be considered necessary to recognise relationships between terms of different languages other than 'exactly equivalent', such as recognising that the equivalent term is broader in meaning, or where a single term in one language can be represented by two or more terms in another. In such a case, separate sets of concepts could be used for the different languages, with a new set of properties devised to indicate the different types of relationship between them, rather than using the 'lang' property.

There are two further optional properties that are permissible for Concept resources: 'scope' and 'conceptCode'.


Appendix A: RDF/XML Thesaurus Schema

<rdf:RDF xml:lang="en"
    xmlns:rdf=" http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:rdfs=" http://www.w3.org/TR/1999/PR-rdf-schema-19990303#">
    <rdfs:Class rdf:ID="Concept">
       <rdfs:comment>
         A unique concept defined within a thesaurus. Instances 
         use the rdfs:isDefinedBy property with a vocabulary 
         namespace as its value, to indicate the vocabulary to
         which the concept belongs.
       </rdfs:comment>
       <rdfs:subClassOf
         rdf:resource=" http://www.w3.org/TR/1999/PR-rdf-schema-
         19990303#Resource"/>
    </rdfs:Class>
    <rdfs:Class rdf:ID="Term">
       <rdfs:comment>
          Instances of this class represent the written forms of 
          Concepts. The string is given by the rdf:value of Term.
       </rdfs:comment>
       <rdfs:subClassOf
          rdf:resource=" http://www.w3.org/TR/1999/PR-rdf-schema-
          19990303#Resource"/>
    </rdfs:Class>
    <rdfs:Class rdf:ID="ScopeNote">
       <rdfs:comment>
          The value of this optional resource is a scope note: 
          a note attached to a term to indicate its meaning within 
          an indexing language 
       </rdfs:comment>
       <rdfs:subClassOf
        rdf:resource=" http://www.w3.org/TR/1999/PR-rdf-schema-
        19990303#Resource"/>
    </rdfs:Class>
    <rdfs:Class rdf:ID="TermUsageValue">
       <rdfs:comment>
         The value of the property: termUsage. It can take one of two
         values: 'preferred' or 'nonPreferred'.
       </rdfs:comment>
       <rdfs:subClassOf 
         rdf:resource=" http://www.w3.org/TR/1999/PR-rdf-schema-
         19990303#Resource"/>
    </rdfs:Class>
    <rdf:Property ID="broaderConcept">
        <rdfs:comment>
          This schema does not define a property 'narrowerConcept', 
          but applications can assume the existence of a property 
          narrowerConcept such that if: 
          {broaderConcept,ConceptA,ConceptB}, then 
          {narrowerConcept,ConceptB,ConceptA} is true.
        </rdfs:comment>
        <rdfs:domain rdf:resource="#Concept"/>
        <rdfs:range rdf:resource="#Concept"/>
    </rdf:Property>
    <rdf:Property ID="relatedConcept">
        <rdfs:comment>
          The relatedConcept is commutative, such that if:
          {relatedConcept,ConceptA,ConceptB}, then
          {relatedConcept,ConceptB,ConceptA} is true.
        </rdfs:comment>
        <rdfs:domain rdf:resource="#Concept"/>
        <rdfs:range rdf:resource="#Concept"/>
    </rdf:Property>
    <rdf:Property ID="indicator">
        <rdfs:comment>
          A mandatory property of a Concept whose value is 
          the Term instance representing a written form of the 
          Concept. A Concept may have as an indicator more than
          one Term. A Term may only be an indicator of one 
          Concept.
        </rdfs:comment>
        <rdfs:domain rdf:resource="#Concept"/>
        <rdfs:range rdf:resource="#Term"/>
    </rdf:Property>
    <rdf:Property ID="conceptCode">
        <rdfs:comment>
          An optional property for any code assigned to 
          thesaurus concepts.
        </rdfs:comment>
        <rdfs:domain rdf:resource="#Concept"/>
        <rdfs:range
          rdf:resource=" http://www.w3.org/TR/1999/PR-rdf-schema-
          19990303#Literal"/>
    </rdf:Property>
    <rdf:Property ID="scope">
        <rdfs:comment>
          This optional property has as its value an instance of
          the resource ScopeNote.
        </rdfs:comment>
        <rdfs:domain rdf:resource="#Concept"/>
        <rdfs:range
          rdf:resource="#ScopeNote"/>
    </rdf:Property>
    <rdf:Property ID="lang">
       <rdfs:comment>
         Optional property that can be used to give the language
         of a Term instance. The codes from "ISO 639:1988,
         Code for the representation of names of languages" should 
         be used as the values for this property.
       </rdfs:comment>
        <rdfs:domain rdf:resource="#Term"/>
        <rdfs:range
          rdf:resource=" http://www.w3.org/TR/1999/PR-rdf-schema-
          19990303#Literal"/>
    </rdf:Property>
    <rdf:Property ID="termUsage">
        <rdfs:comment>
          This optional property indicates whether the Term
          instance is the 'preferred or 'nonPreferred' textual
          expression of the Concept instance that is 'indicated'
          by the Term, for a given language.
        </rdfs:comment>
        <rdfs:domain rdf:resource="#Term"/>
        <rdfs:range rdf:resource="#TermUsageValue"/>
    </rdf:Property>
    <rdf:Description rdf:ID="preferred">
      <rdf:type rdf:resource="#TermUsageValue"/>
    </rdf:Description>
    <rdf:Description rdf:ID="nonPreferred">
      <rdf:type rdf:resource="#TermUsageValue"/>
    </rdf:Description>
</rdf:RDF>


Appendix B: Sample Thesaurus Metadata Expressed Using the RDF/XML Thesaurus Schema

The example below shows the relationships between three concepts, whose term values are: 'Interpersonal Attraction', 'Interpersonal Relations', and 'Friends'. A graph representation of the RDF follows the XML representation (excluding the scopeNote property).

<web:RDF xml:lang="en"
   xmlns:thes="http://snowball.ilrt.bris.ac.uk/~pldab/rdf-
   dot/Thes/Thes.xrdf#"
   xmlns:web=" http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs="http://www.w3.org/TR/1999/PR-rdf-schema-19990303#">
<web:Description about="http://sosig.ac.uk/hasset/terms/TID_3">
   <web:type resource="http://snowball.ilrt.bris.ac.uk/~pldab
      /rdf-dot/Thes/Thes.xrdf#Term"/>
   <thes:lang>en</thes:lang>
   <web:value>Interpersonal Attraction</web:value>
   <thes:termUsage web:resource="http://snowball.ilrt.bris.ac.uk
      /~pldab/rdf-dot/Thes/Thes.xrdf#preferred"/>
</web:Description>
<web:Description about="http://sosig.ac.uk/hasset/concepts/CID_6">
   <web:type resource="http://snowball.ilrt.bris.ac.uk/~pldab/
      rdf-dot/Thes/Thes.xrdf#Concept"/>
   <rdfs:isDefinedBy web:resource="http://sosig.ac.uk/hasset/
      concepts/"/>
   <thes:indicator web:resource="http://sosig.ac.uk/hasset/
      terms/TID_3"/>
   <thes:conceptCode>768</thes:conceptCode>
   <thes:broaderConcept>
      <web:Description about="http://sosig.ac.uk/hasset/
         concepts/CID_8">
         <rdfs:isDefinedBy web:resource="http://sosig.ac.uk/
            hasset/concepts/"/>
         <thes:indicator web:resource="http://sosig.ac.uk/
            hasset/terms/TID_15"/>
         <thes:conceptCode>769</thes:conceptCode>
      </web:Description>
   </thes:broaderConcept>
   <thes:relatedConcept web:resource="http://sosig.ac.uk/hasset/
      concepts/CID_15"/>
</web:Description>
<web:Description about="http://sosig.ac.uk/hasset/concepts/CID_15">
   <web:type resource="http://snowball.ilrt.bris.ac.uk/~pldab/
      rdf-dot/Thes/Thes.xrdf#Concept"/>
   <rdfs:isDefinedBy web:resource="http://sosig.ac.uk/hasset/
      concepts/"/>
   <thes:indicator web:resource="http://sosig.ac.uk/hasset/
      terms/TID_21"/>
   <thes:conceptCode>780</thes:conceptCode>
   <thes:scope web:resource="http://sosig.ac.uk/hasset/
      scopenotes/SN_12"/>
</web:Description>
<web:Description about="http://sosig.ac.uk/hasset/terms/TID_15">
   <web:type resource="http://snowball.ilrt.bris.ac.uk/~pldab/
     rdf-dot/Thes/Thes.xrdf#Term"/>
   <thes:lang>en</thes:lang>
   <web:value>Interpersonal Relations</web:value>
   <thes:termUsage web:resource="http://snowball.ilrt.bris.ac.uk/
      ~pldab/rdf-dot/Thes/Thes.xrdf#preferred"/>
</web:Description>
<web:Description about="http://sosig.ac.uk/hasset/terms/TID_21">
   <web:type resource="http://snowball.ilrt.bris.ac.uk/~pldab/
      rdf-dot/Thes/Thes.xrdf#Term"/>
   <thes:lang>en</thes:lang>
   <web:value>Friends</web:value>
   <thes:termUsage web:resource="http://snowball.ilrt.bris.ac.uk/
       ~pldab/rdf-dot/Thes/Thes.xrdf#preferred"/>
</web:Description>
<web:Description about="http://sosig.ac.uk/hasset/scopenotes/SN_12">
   <web:type resource="http://snowball.ilrt.bris.ac.uk/
       ~pldab/rdf-dot/Thes/Thes.xrdf#ScopeNote"/>
   <thes:lang>en</thes:lang>
   <web:value>To be used only for platonic relationships</web:value>
</web:Description>
</web:RDF>



PART IV

References

Biz/ed Team. (Accessed June 2000). Biz/ed. http://www.bized.ac.uk/

Institute for Learning and Research Technology. (Accessed 2000). http://www.ilrt.bristol.ac.uk/

International Organization for Standardisation. 1986. ISO 2788: Guidelines for the establishment and development of monolingual thesauri, 2nd ed., Geneva: ISO.

Lundberg, Sigfrid. (Accessed June 2000). Combine System Homepage. http://www.lub.lu.se/combine/

ROADS Liaison (Accessed June 2000). ROADS: Resource Organisation and Discovery in the Subject-based Services. http://www.ilrt.bris.ac.uk/roads/

Social Science Information Gateway. (Accessed June 2000). http://www.sosig.ac.uk

Swick, Ralph et al. (Accessed June 2000). W3C Resource Description Framework. http://www.w3.org/RDF/

World Wide Web Consortium. (Accessed June 2000). W3C – The World Wide Web Consortium. http://www.w3.org/


Title:
Issue: 1.0
Date: 15 June 2000