Conceptual relationships for encoding thesauri, classification systems and organised metadata collections and a proposal for encoding a core set of thesaurus relationships using an RDF SchemaAuthors: Phil Cross
<phil.cross@bristol.ac.uk>, Dan Brickley
<daniel.brickley@bristol.ac.uk> (Institute for Learning & Research
Technology, University of Bristol, UK) Traugott Koch
<traugott.koch@ub2.lu.se > (NETLAB, Lund University, Sweden) Last modified: 06-Jun-00 1. IntroductionThis paper proposes an
RDF representation of various conceptual relationships typical of controlled
vocabularies such as thesauri, classification systems and organised metadata collections.
The aim is to explore the use of RDF as a common formalism for representing a
variety of different thesauri and classification systems within the same
overall framework. By doing so, we expect to leverage generic RDF facilities
(such as query and storage software components), and also to have a basis for
mapping between subject classifications expressed using these various
vocabularies. The approach taken here
is to divide the problem into two stages. Firstly we define a simple core RDF
representation of concepts such as 'broader term' and 'narrower term'
typically used in classification and thesauri systems. Then we extend this
with a range of more semantically meaningful relationships expressed in terms
of classes of objects. Many vocabulary systems have a tacit or unarticulated
semantic model obscured behind relatively uninformative relationships such as
'broader' and 'narrower'. It is usually impossible to mechanically derive a
richer set of relationships from a system based around these vague, generic
relation types. General hierarchical relationships are frequently used to
indicate one of several actual relationships. The relationships 'is a', 'has
instantiation', and 'has part', for example, might all be encoded using the
less informative 'narrower' relation. The simpler 'core'
relations are best thought of as being relationships between named concepts
or terms, rather than as relations between real world (or abstract) entities.
In other words, while we might say that "Fido is a dog" using a
rich, semantic relationship, we would say that "the-term-Fido has-broader-term
the-term-dog". The vague 'broader term' relation in this case subsumes
the more informative 'is a' relation. The proposal in this document separates
out these two approaches since it is crucial to remain unambiguous about when
a node ('resource') in an RDF data model represents a named concept or term
rather than some less abstract entity, i.e. the "thing in itself".
A further reason for
creating two distinct RDF representations for these vocabulary systems is
that RDF itself includes some common core vocabulary elements which have some
overlap in functionality with the semantic modelling facilities required to
transform simple flat vocabulary systems into richer knowledge bases. In
particular, the RDF specifications define notions of 'Class', 'Property',
'subClassOf', 'type', 'domain' and 'range', which may be applicable to the
task described above. By first addressing the need to find a simple RDF
representation for the broader/narrow/preferred relationships, i.e. those
simple relations which make sense in the context of terms/concepts rather
than semantically modelled entities, we should be able to make some initial
progress without having to solve the entire problem of 'knowledge modelling
in RDF'. The next section walks
through the desired set of relationships, using bold type to indicate
candidates for the simple core RDF vocabulary. The following section sketches
how a machine-processable RDF representation of the simple term-oriented
concepts of a thesaurus might look, and finally, we give a machine-readable
RDF Schema for the simple vocabulary. In the definitions given
below, the term Category is used when the relationship applies to
classification systems, Term when the relationship applies to
thesauri, and Document when the relationship can be used with
individual documents. 2. Conceptual relationships for encoding thesauri,
classification systems and organised metadata collections
A HIERARCHICAL RELATIONSHIPS Label: BroaderTerm Term: Broader term Member of: A Definition: Term one level
up in a hierarchy, without specification of the type of hierarchical
relationship Label: NarrowerTerm Term: Narrower term Member of: A Definition: Term one level
down in a hierarchy, without specification of type of hierarchical
relationship (For inclusion in the simple core vocabulary) A1 GENERIC
RELATIONSHIP Label: IsA Term: is a (instance of) Member of: A1 Definition: Term/Category
is an instance of a term/category one level up in the hierarchy Label: HasInstantiation Term: has instantiation Member of: A1 Definition: Term/Category
has an instantiation one level below in the hierarchy A2 Whole-part
relationships Label: IsPartOf Term: is part of Member of: A2 Definition: Document/Category/Term
represents a (unspecified) part of a document/category/term one level up in
the hierarchy Label: HasPart Term: has part Member of: A2 Definition:Document/Category/Term has a (unspecified) part one level
below in the hierarchy Label: IsSpatialPartOf Term: is spatial part of Member of: A2 Definition: Term/Category
represents a spatial/geographical part of a term/category one level up in the
hierarchy Label: HasSpatialPart Term: has spatial part Member of: A2 Definition: Term/Category
has a spatial/geographical subterm/subcategory one level below in the
hierarchy Label: IsConceptuallyPartOf Term: is conceptually part of Member of: A2 Definition: Term/Category
is a subconcept to a term/category one level up in the hierarchy Label: HasConceptualPart Term: has conceptual part Member of: A2 Definition: Term/Category
has a subterm/subcategory one level below in the hierarchy Label: IsCollectionMemberOf Term: is collection member of Member of: A2 Definition: Document/Category/Term
is member of a group of documents/categories/terms Label: HasCollectionMember Term: has collection member Member of: A2 Definition: Group of
documents/categories/terms has member B EQUIVALENCE
RELATIONSHIPS B1 Single directional
equivalence Label: Use Term: use, see Member of: B1 Definition: The
term/category pointed to should be preferred Label: UsedFor Term: used for Member of: B1 Definition: The
term/category pointed to is the non-preferred term/category (For inclusion in the simple core vocabulary) Label: IsVersionOf Term: is version of Member of: B1 Definition: The
document/category/term pointed to is a version of another
document/category/term Label: HasVersion Term: has version Member of: B1 Definition: The
document/category/term has another version B2 Bi-directional
equivalence Label: IsSynonymOf Term: is synonym of Member of: B2 Definition: The term is a synonym of the one pointed to Label: IsFormatOf Term: is format of Member of: B2 Definition:The document is a result of a format transformation of the
one pointed to C ASSOCIATIVE
RELATIONSHIPS Label: RelatedTerm Term: related term, see also, similar to Member of: C Definition: The
document/category/term pointed to is related (in an unspecified way) (For inclusion in the simple core vocabulary) Label: IsReferencedBy Term: is referenced by Member of: C Definition: The document is referenced by the document pointed to Label: References Term: references Member of: C Definition: The document is referencing the document pointed to Label: IsRequiredBy Term: is required by Member of: C Definition: The
document/object is required by/dependent on the document pointed to Label: Requires Term: requires Member of: C Definition: The document
requires/is dependent on the document/object pointed to Label: IsBasedOn Term: is based on Member of: C Definition: The document/term is based on the document/term/object
pointed to Label: IsBasisFor Term: is basis for Member of: C Definition: The document/term/object
is the basis for the document/term pointed to Label: IsDerivedFrom Term: is derived from Member of: C Definition: The document/term is derived from the document/term
pointed to Label: HasDerivate Term: has derivate Member of: C Definition: The document/term has the derivate pointed to Label: IsTranslatedFrom Term: is translated from Member of: C Definition: The document/term is translated from the document/term
pointed to Label: HasTranslation Term: has translation Member of: C Definition: The document/term has the translation pointed to Label: IsInterpretationOf Term: is interpretation of Member of: C Definition: The document
is a (creative, artistic) interpretation of the document/object pointed to Label: HasInterpretation Term: has interpretation Member of: C Definition: The document has a (creative, artistic) interpretation
pointed to Label: IsMappedTo Term: is mapped to Member of: C Definition: The document/category/term is mapped to the
document/category/term pointed to Label: HasMapping Term: has mapping Member of: C Definition: The document/category/term has this document/category/term
mapped to it Label: IsLinkedFrom Term: is linked from Member of: C Definition: The document/category/term is linked to from the
document/category/term pointed to Label: HasLinkTo Term: has link to Member of: C Definition: The document/category/term is linking to the
document/category/term pointed to Label: IsSameLevelNeighbour Term: is same level neighbour Member of: C Definition: The document/category/term
is a neighbour on the same level of a organisational structure to the
document/category/term pointed to Label: IsTopologicalNearestNeighbour Term: is topological nearest neighbour Member of: C Definition: The
document/category/term is a topologically nearest neighbour in a
organisational structure to the document/category/term pointed to 3. RDF core vocabularyThe relationships in bold
above are candidates for a simple core set of relationships for a thesaurus.
They are: ·
BroaderTerm ·
NarrowerTerm ·
Use ·
UsedFor ·
RelatedTerm This terminology is taken
from ISO 2788: Guidelines for the establishment and development of
monolingual thesauri (International Organization for Standardisation, 1986). The terminology deals
with the terms themselves, that is, the lexical representation of concepts.
For the creation of an RDF schema for storing structured vocabularies, we
decided to differentiate between the lexical representation of a concept and
the concept itself. It was felt that the unique resource should be the concept,
each concept resource being indicated by one or more term
resources. Thus the RDF resource used to represent cats, would be indicated
by a term whose value was the word "cats". This is represented by
the graph in figure 3 below.
Figure 3. RDF graph representation of the concept
representing a cat (concept 5) In figure 3, concept_5
represents the concept of cats. Its indicator is a term
(term_7) whose value is the text string "cats". Another term
indicating the concept might have the value "chats". As a result of the above
approach, the RDF schema refers to relationships between concepts rather than
between terms, and this is reflected in the vocabulary used below, e.g.
broaderConcept rather than broaderTerm. Whilst the relationships:
'broader', 'narrower', and 'related' are still meaningful when considering
concepts rather than terms, the relationships 'use' and 'used for' refer only
to terms. This is because 'use' and 'used for' indicate which particular term
has been chosen to be used to represent the relevant concept when indexing
some resource. For the core RDF vocabulary then, these relationships have
instead been represented by properties of the term resources. This is
referred to using the attribute 'termUsage', which has values of 'preferred'
or 'nonPreferred'. The second issue considered was that since broaderTerm and
narrowerTerm are commutative, i.e. A narrowerTerm B implies B broaderTerm A, utilising both
relationships when storing or transferring the vocabulary data would be
inefficient. We therefore decided to create a relationship 'broaderConcept' for
the RDF Schema but not 'narrowerConcept', as this is implied; it being the
responsibility of any application using the data to deduce the opposite
relationship and present it to the user. The second relationship
between concepts chosen for the schema was 'relatedConcept'. This term is
bi-directional, and hence if the relationship A relatedConcept B
exists, then it is implied that B relatedConcept A is
also true. Hence we only add one of
the two possible pairs to the datastore. A further attribute often
used within thesauri, is 'top term'. This indicates a term that is at the top
of a hierarchy within the thesaurus. Since this is a property that may be
deduced by an application from the lack of a broaderConcept property for that
concept, this attribute is also left out of the schema. broaderConcept and
relatedConcept were
therefore selected as the only two core relationships between concepts that
would be required for a basic RDF vocabulary schema. Other properties are
required however to allow the encoding of thesauri, taking into account the
recommendations of ISO 2788 (International Organization for Standardisation ,
1986) and general thesaurus usage. These are listed in the next section which
describes the RDF thesaurus schema proposed. 4. RDF Thesaurus Schema4.1. Resource Description Framework SchemasThe Resource Description
Framework (RDF) is a W3C (World Wide Web Consortium,
2000) recommendation for representing structured data on the Web. RDF, like
both the Web and thesaurus systems, is based around a strategy of managing
information as a collection of links between uniquely named entities. RDF's
Web-based information model uses the term 'resource' to refer to the entities
that it models, and provides an application-neutral framework within which
various kinds of entities and relationships can be described. A general
introduction to RDF is beyond the scope of this document. The W3C home page
for RDF (Swick, 2000) lists a number of
introductory tutorials as well as the RDF specifications. In this document we describe
the application of RDF to the description of thesaurus-like data structures.
Specifically, we show how the RDF data model can represent a Web of
inter-related concepts and terms from one or more thesauri. To do this, we
define a simple RDF vocabulary that uses Web identifiers (Universal Resource
Identifiers) to name some relationships and resource types useful for the
description of concepts and terms in a thesaurus. It should be noted that we
do not here attempt to model the richer semantic relationships that hold
between the entities denoted by such concepts, although RDF itself can
also be used to represent this kind of information. 4.2. Proposal for an RDF Thesaurus SchemaThe XML/RDF thesaurus
schema is set out in Appendix A. An example set of XML/RDF thesaurus data is
given in Appendix B. As described above, the
schema consists of two main resources: Concept and Term. Concept resources
are related by the properties: 'broaderConcept' and 'relatedConcept'.
Concepts have a property 'indicator' which points to one or more term
resources. The value of each Term resource will be the actual text string. As noted above, the Term
resources have an optional property called 'termUsage', which can be used
with those thesauri that have non-preferred terms linked to preferred terms
through the use/'used for' relationships. The value of termUsage must be
either the string 'preferred' or 'nonPreferred'. A second Term property is
'lang', which can be used to indicate the language of the term; thus a single
concept can be 'indicated' by both preferred and non-preferred terms, and by
terms from different languages (there is likely to be one preferred term for
each language). The thesaurus schema therefore provides a mechanism for
storing multilingual thesauri. If an English term and a German term both
'indicate' the same concept resource, it is implied that the two terms are
either equivalent, or at least are treated as such for indexing purposes. It may be considered
necessary to recognise relationships between terms of different languages
other than 'exactly equivalent', such as recognising that the equivalent
term is broader in meaning, or where a single term in one language can be
represented by two or more terms in another. In such a case, separate sets of
concepts could be used for the different languages, with a new set of
properties devised to indicate the different types of relationship between
them, rather than using the 'lang' property. There are two further
optional properties that are permissible for Concept resources: 'scope' and
'conceptCode'. ·
The
value of the scope property is a resource called 'ScopeNote', which also has
a 'lang' property, and whose value is an optional scope note for the term. A
scope note is defined in ISO 2788 as "a note attached to a term to
indicate its meaning within an indexing language" (International
Organization for Standardisation, 1986). ·
The
property 'conceptCode' can be used for any code that is assigned to the
preferred terms in a systematic thesaurus. In ISO 2788, the property 'address
code' is defined as a code which links terms in an alphabetical index to
their location in the systematic section. They "should have obvious
filing values … may consist simply of running numbers … or may comprise a
system of hierarchically expressive notation" (International
Organization for Standardisation, 1986). Such a code will be unique for each
concept in an RDF version of the thesaurus and might perhaps be useful in
providing a language neutral method for indexing documents. In other thesauri,
there may be non-unique codes, such as notations that associate the terms to
broad subject categories, and such codes could also be held as values of the
conceptCode attribute. Any unique code associated with the preferred terms in
a thesaurus could also be usefully incorporated into the URI of the Concept
resources, as this would be an aid in future management of the data (for
instance for updates to the database). However, the conceptCode property has
also been provided as a means of storing such information if required. Appendix A: RDF/XML Thesaurus Schema <rdf:RDF xml:lang="en" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/TR/1999/PR-rdf-schema-19990303#"> <rdfs:Class rdf:ID="Concept"> <rdfs:comment> A unique concept defined within a thesaurus. Instances use the rdfs:isDefinedBy property with a vocabulary namespace as its value, to indicate the vocabulary to which the concept belongs. </rdfs:comment> <rdfs:subClassOf rdf:resource="http://www.w3.org/TR/1999/PR-rdf-schema- 19990303#Resource"/> </rdfs:Class> <rdfs:Class rdf:ID="Term"> <rdfs:comment> Instances of this class represent the written forms of Concepts. The string is given by the rdf:value of Term. </rdfs:comment> <rdfs:subClassOf rdf:resource="http://www.w3.org/TR/1999/PR-rdf-schema- 19990303#Resource"/> </rdfs:Class> <rdfs:Class rdf:ID="ScopeNote"> <rdfs:comment> The value of this optional resource is a scope note: a note attached to a term to indicate its meaning within an indexing language </rdfs:comment> <rdfs:subClassOf rdf:resource="http://www.w3.org/TR/1999/PR-rdf-schema- 19990303#Resource"/> </rdfs:Class> <rdfs:Class rdf:ID="TermUsageValue"> <rdfs:comment> The value of the property: termUsage. It can take one of two values: 'preferred' or 'nonPreferred'. </rdfs:comment> <rdfs:subClassOf rdf:resource="http://www.w3.org/TR/1999/PR-rdf-schema- 19990303#Resource"/> </rdfs:Class> <rdf:Property ID="broaderConcept"> <rdfs:comment> This schema does not define a property 'narrowerConcept', but applications can assume the existence of a property narrowerConcept such that if: {broaderConcept,ConceptA,ConceptB}, then {narrowerConcept,ConceptB,ConceptA} is true. </rdfs:comment> <rdfs:domain rdf:resource="#Concept"/> <rdfs:range rdf:resource="#Concept"/> </rdf:Property> <rdf:Property ID="relatedConcept"> <rdfs:comment> The relatedConcept is commutative, such that if: {relatedConcept,ConceptA,ConceptB}, then {relatedConcept,ConceptB,ConceptA} is true. </rdfs:comment> <rdfs:domain rdf:resource="#Concept"/> <rdfs:range rdf:resource="#Concept"/> </rdf:Property> <rdf:Property ID="indicator"> <rdfs:comment> A mandatory property of a Concept whose value is the Term instance representing a written form of the Concept. A Concept may have as an indicator more than one Term. A Term may only be an indicator of one Concept. </rdfs:comment> <rdfs:domain rdf:resource="#Concept"/> <rdfs:range rdf:resource="#Term"/> </rdf:Property> <rdf:Property ID="conceptCode"> <rdfs:comment> An optional property for any code assigned to the thesaurus concepts. </rdfs:comment> <rdfs:domain rdf:resource="#Concept"/> <rdfs:range rdf:resource="http://www.w3.org/TR/1999/PR-rdf-schema- 19990303#Literal"/> </rdf:Property> <rdf:Property ID="scope"> <rdfs:comment> This optional property has as its value an instance of the resource ScopeNote. </rdfs:comment> <rdfs:domain rdf:resource="#Concept"/> <rdfs:range rdf:resource="#ScopeNote"/> </rdf:Property> <rdf:Property ID="lang"> <rdfs:comment> Optional property that can be used to give the language of a Term instance. The codes from "ISO 639:1988, Code for the representation of names of languages" should be used as the values for this property. </rdfs:comment> <rdfs:domain rdf:resource="#Term"/> <rdfs:range rdf:resource="http://www.w3.org/TR/1999/PR-rdf-schema- 19990303#Literal"/> </rdf:Property> <rdf:Property ID="termUsage"> <rdfs:comment> This optional property indicates whether the Term instance is the 'preferred or 'nonPreferred' textual expression of the Concept instance that is 'indicated' by the Term, for a given language. </rdfs:comment> <rdfs:domain rdf:resource="#Term"/> <rdfs:range rdf:resource="#TermUsageValue"/> </rdf:Property> <rdf:Description rdf:ID="preferred"> <rdf:type rdf:resource="#TermUsageValue"/> </rdf:Description> <rdf:Description rdf:ID="nonPreferred"> <rdf:type rdf:resource="#TermUsageValue"/> </rdf:Description> </rdf:RDF> Appendix B: Sample Thesaurus
Metadata Expressed Using the RDF/XML Thesaurus Schema
The example below shows the relationships between three concepts, whose term values are: 'Interpersonal Attraction', 'Interpersonal Relations', and 'Friends'. A graph representation of the RDF follows the XML representation (excluding the scopeNote property). <web:RDF xml:lang="en" xmlns:thes="http://snowball.ilrt.bris.ac.uk/~pldab/rdf- dot/Thes/Thes.xrdf#" xmlns:web="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/TR/1999/PR-rdf-schema-19990303#"> <web:Description about="http://sosig.ac.uk/hasset/terms/TID_3"> <web:type resource="http://snowball.ilrt.bris.ac.uk/~pldab /rdf-dot/Thes/Thes.xrdf#Term"/> <thes:lang>en</thes:lang> <web:value>Interpersonal Attraction</web:value> <thes:termUsage web:resource="http://snowball.ilrt.bris.ac.uk /~pldab/rdf-dot/Thes/Thes.xrdf#preferred"/> </web:Description> <web:Description about="http://sosig.ac.uk/hasset/concepts/CID_6"> <web:type resource="http://snowball.ilrt.bris.ac.uk/~pldab/ rdf-dot/Thes/Thes.xrdf#Concept"/> <rdfs:isDefinedBy web:resource="http://sosig.ac.uk/hasset/ concepts/"/> <thes:indicator web:resource="http://sosig.ac.uk/hasset/ terms/TID_3"/> <thes:conceptCode>768</thes:conceptCode> <thes:broaderConcept> <web:Description about="http://sosig.ac.uk/hasset/ concepts/CID_8"> <rdfs:isDefinedBy web:resource="http://sosig.ac.uk/ hasset/concepts/"/> <thes:indicator web:resource="http://sosig.ac.uk/ hasset/terms/TID_15"/> <thes:conceptCode>769</thes:conceptCode> </web:Description> </thes:broaderConcept> <thes:relatedConcept web:resource="http://sosig.ac.uk/hasset/ concepts/CID_15"/> </web:Description> <web:Description about="http://sosig.ac.uk/hasset/concepts/CID_15"> <web:type resource="http://snowball.ilrt.bris.ac.uk/~pldab/ rdf-dot/Thes/Thes.xrdf#Concept"/> <rdfs:isDefinedBy web:resource="http://sosig.ac.uk/hasset/ concepts/"/> <thes:indicator web:resource="http://sosig.ac.uk/hasset/ terms/TID_21"/> <thes:conceptCode>780</thes:conceptCode> <thes:scope web:resource="http://sosig.ac.uk/hasset/ scopenotes/SN_12"/> </web:Description> <web:Description about="http://sosig.ac.uk/hasset/terms/TID_15"> <web:type resource="http://snowball.ilrt.bris.ac.uk/~pldab/ rdf-dot/Thes/Thes.xrdf#Term"/> <thes:lang>en</thes:lang> <web:value>Interpersonal Relations</web:value> <thes:termUsage web:resource="http://snowball.ilrt.bris.ac.uk/ ~pldab/rdf-dot/Thes/Thes.xrdf#preferred"/> </web:Description> <web:Description about="http://sosig.ac.uk/hasset/terms/TID_21"> <web:type resource="http://snowball.ilrt.bris.ac.uk/~pldab/ rdf-dot/Thes/Thes.xrdf#Term"/> <thes:lang>en</thes:lang> <web:value>Friends</web:value> <thes:termUsage web:resource="http://snowball.ilrt.bris.ac.uk/ ~pldab/rdf-dot/Thes/Thes.xrdf#preferred"/> </web:Description> <web:Description about="http://sosig.ac.uk/hasset/scopenotes/SN_12"> <web:type resource="http://snowball.ilrt.bris.ac.uk/ ~pldab/rdf-dot/Thes/Thes.xrdf#ScopeNote"/> <thes:lang>en</thes:lang> <web:value>To be used only for platonic relationships</web:value> </web:Description> </web:RDF>
References: International
Organization for Standardisation. 1986. ISO 2788: Guidelines for the
establishment and development of monolingual thesauri, 2nd
ed., Geneva: ISO. Swick, Ralph et al.
(Accessed June 2000). W3C Resource Description Framework. http://www.w3.org/RDF/ World Wide Web
Consortium. (Accessed June 2000). W3C – The World Wide Web Consortium.
http://www.w3.org/ |