![]() |
|||||
| | | | | | | | |||||
Research: Publications: TNC'98 Gateways overviewAbout this document:This is a pre-print of a paper presented at the TERENA Networking Conference '98 (TNC'98) in Dresden, Germany (URL: http://www.terena.nl/conf/tnc98/) It is also a pre-print of the article published in the Elsevier journal Computer Networks and ISDN Systems (Vol 30 Numbers 12-18) 30th Sept 1998; http://www.elsevier.nl/inca/publications/store/5/0/5/6/0/6/ Slides from the presentation are also available. Subject gateways - fulfilling the DESIRE for knowledgeby Emma Worsfold (emma.worsfold@bristol.ac.uk) AbstractOne of the greatest challenges for users in the electronic age is to learn how the Internet can be used to meet their information needs and to conceptualise how the different Internet search and retrieval tools can help them to do this. This article aims to describe how subject gateways can help different user communities to make effective use of the Internet and to clarify some of the key differences between subject gateways and search engines. The paper will reference work that has been done by the European Union's DESIRE Project [7] on developing guidelines and tools for the development of new subject gateways. It will highlight some of the gateways already freely available for use and will give some insight into the far-reaching developments that users can expect to see in the coming months and years.1 Why do users need Internet search and retrieval tools?It is well documented that the amount of information available over the Internet is increasing dramatically. More and more people are choosing to publish information on the Internet and digitisation programmes are gradually making more of the world’s printed information available in electronic form. Our libraries, our schools, our businesses and our homes are all being touched by the digital age and some of the core information and communication resources have entered the Internet arena - newspapers, books, music, letters, encyclopaedias, journals, leaflets, public announcements, meetings, the minutes of meetings… Many human activities are also being translated into the virtual environment - teaching, learning, buying, selling, reading, writing, composing, listening, falling in love …All the activity on the Internet is creating myriad resources which are freely available for anyone to access from their desk top. The problem for users, of course, is how to find exactly what they are looking for when they have access to information held on millions of computers all over the World; when they have no idea what information is held on which computer and when their only window on the world is a two dimensional computer screen. 2 Humans, robots and Internet search and retrievalImagine a user going up to a library enquiry desk and asking for every bit of published material ever written, both in this library and in all other libraries, that contained the words "Karl and Marx"! Theoretically this request would involve trawling every word of every text, it would retrieve thousands and thousands of resources and would leave the user with the job of looking at and evaluating each resource one by one until they found what they were after. In practice it is not feasible for a human to do this. However, on the Internet, as we know, it is common practice to send robots out to try and do exactly this. The search engines attempt to index words in every single Internet resource, leaving the users to search these mammoth but unsophisticated indexes by plugging in often naïve search terms such as "Karl Marx".In a library a naive user asking at the enquiry desk "What have you got on Karl Marx" will be prompted by the librarian to refine their search - "Are you looking for works by Karl Marx or commentary on his works, or biography? Do you want texts written in German, Russian or English?" etc. A more sophisticated user may avoid the queues at the enquiry desk by choosing one of two options:
Subject gateways have drawn upon traditional library practices to help bring order to the chaos of the Internet. The art of librarianship has been used for thousands of years to organise knowledge and the gateways have recognised that many of the methods and practices translate well to the Internet environment. Subject gateways and search engines take a fundamentally different approach
to providing Internet search and retrieval services. These are summarised
in Table 1.
Table 1: some differences between search engines and subject gateways The benefits of the subject gateway approach will now be discussed. 3 Subject gatewaysSubject gateways are characterised by the fact that they are developed with specific user groups in mind. A user served by a subject gateway can enjoy the following benefits:
3.1 Access to a quality controlled collection of Internet resourcesSubject gateways are selective - they only point to high quality Internet resources which can meet the needs of the user community. On the Internet this can be a considerable advantage. Almost anyone can publish anything on the Internet, and the enormous quantity of popular information available is testimony to this. Information need not be filtered by third-party publishers, editors or proof-readers and so may not be of the quality that users would expect from traditional, printed information resources. Quality controlled collections can save users time - they need not wade through thousands of irrelevant or poor quality resources and they need not question the validity of resources.3.1.1 Selection criteria and collection managementThe quality control is possible because every resource is evaluated for its quality against the gateway’s formal selection criteria by information and subject specialists. In a similar vein to libraries, most gateways will have:
3.2 The ability to search the catalogueJust as a library has a catalogue of books, subject gateways have a catalogue of Internet resources. Users can enter their search terms and retrieve any catalogue records containing those terms. Unlike a conventional library catalogue a subject gateway can use hyperlinks to connect users directly from the catalogue entries and descriptions to the resources themselves. Many gateways allow advanced searching techniques such as field-searching, phrase searching, stemming and the use of Boolean operators. Others have additional specialist databases and primary resources.3.2 2 Behind the scenes - cataloguing / the creation of metadataSearching is possible because information professionals have created a catalogue record for each of the Internet resources in the collection. The record contains "bibliographic" information about the resource (metadata), including: title, description, keywords, URL etc.Subject gateways do not therefore, only help users to find information they add value to the information. The resource description details the type of resource (e.g. electronic journal, discussion list etc), the content, the source and information about any access restrictions. Users can use the descriptions to save themselves time when searching the Internet - they can decide whether the resource is likely to be of interest to them before connecting to it and waiting for it to download. The metadata created for subject gateways is particularly rich, because it is created by an objective third party (ie. not the creator of the resource) and because it results from a semantic analysis of the resources by a subject expert (ie is created by a human not a machine). These records, or metadata, are the key element in a subject gateway. The quality of the records will determine a gateway's usefulness and its reputation. 3.3 The ability to browse the collection by subjectUsers can also choose to browse the "virtual shelves" of subject gateways. Browsing enables a user to view all the resources listed under a particular subject heading. Browsing can be useful for users who do not have any specific search terms in mind, or who simply wish to see what kinds of resources the Internet can offer in their subject field. It retains the serendipity which is one of the charms of browsing "real" library shelves.3.3.1 Behind the scenes - classificationBehind the scenes, browsing is made possible by the use of classification schemes. Gateway staff assign a class mark to each resource, based on the subject area it relates to. These marks are used to arrange resources from the database into browsable lists arranged under subject headings.4 A guided tour of SOSIG (The Social Science Information Gateway)SOSIG [19] is funded by the Electronic Libraries Programme [10] and by the Economic and Social Research Council [9]. It is based in the Institute for Learning and Research Technology [13] at the University of Bristol, and has been used as a model for the creation of several UK based gateways in other subject areas.A virtual tour of SOSIG involves a trip along the button bar at the
top of each screen which acts as the key navigational aid to the service.
The SOSIG Homepage (Figure 1) can be found at http://www.sosig.ac.uk/
Figure 1: The SOSIG Home Page 4.1 Search SOSIGThe Search SOSIG button takes users to a search screen that enables them to search the SOSIG catalogue (Figure 2). Boolean operators may be used, and phrase searching and field searching are possible. Extended search options allow users to search by region, or by the type of resource. A thesaurus is available to help users with their searching. SOSIG has been working with the UK Data Archive to develop the Archive's social science thesaurus, HASSET, based on the UNESCO thesaurus, which will benefit all UK social science on-line services.Figure 2: The extended search options on SOSIG 4.2 Browse SOSIGThe Browse SOSIG bbutton takes users to a screen that enables them to browse resources by subject or by geographical region (Figure 3). Under each of the main subject headings are sub-sections which point to more resources.Figure 3: Browse SOSIG form 4.3 What's New”The What's New button on SOSIG is the virtual equivalent of the ‘New Books Shelf’ in a library. Users can take a quick look at all the titles added to the collection in the previous week. The new titles appear every Monday, and often include resources that are new on the Internet. A quick skim of the ‘What’s New’ section of SOSIG each week can be a good way to keep up some current awareness of social science resources available over the Internet.4.4 “Submit a New Resource”The “Submit a New Resource” button can be used as a virtual suggestions box by users. The button allows users to send in details of any Internet resources which they think should be added to the collection. They can do this by filling in a WWW form which is emailed directly to the SOSIG team.4.5 Help - user education and supportThe on-line Help Pages are accessible from every page. SOSIG staff also receive and answer enquiries via email. However, SOSIG acknowledges that face-to-face training is often the most effective way of helping people to learn new skills, and dedicated training staff travel the UK giving hands-on SOSIG workshops to librarians and academics [20]. It is hoped that these intermediaries will gain the skills to pass on their knowledge to researchers, students and colleagues.5 Gateways currently availableDetails of some of the subject gateways currently available can be seen in table 2.
Table 2: subject gateways currently available 6 Problems and solutions for the gateway approachStatistics for some of the gateways listed above indicate that they are highly used (SOSIG gets over 140,000 hits per month from users all over the World). Formal user evaluation studies have suggested that users like the approach. They appreciate the subject focus, the quality control and the sense of community that the gateways give.The main criticism tends to be that users want the gateways to point to more resources. This highlights one of the main dilemmas for subject gateways - the human factor is both the strength and the weakness - the semantic judgements required to select quality resources and the cognitive processes necessary to classify and catalogue resources all take time. Subject gateways are labour intensive to build and maintain, however, the value that is added by the human touch cannot be matched by automated solutions. Researchers have been looking for ways of creating large-scale gateways quickly, without sacrificing the human factor. In the coming months and years, users are likely to see the benefits of this research, described below. 6.1 Distributed cataloguing - involving more librariansImagine being able to search a gateway that had been developed, not by a research project, but by a national team of subject experts.Many of the gateways have now set up systems where librarians who are geographically dispersed can feed resources into a central database. They can select Internet resources from any networked PC, and catalogue resources into remote gateways via the WWW. Distributed cataloguing has been successfully piloted by many gateways - a DESIRE paper describes the different systems used [21]. For example, SOSIG has eleven Section Editors who are university subject librarians adding resources to the gateway from locations across the UK as part of their work. This has improved the subject coverage of the gateway considerably as these staff have a great deal of specialist subject knowledge and have a day-to-day experience of the information needs of users. It does not make sense for every library to set up its own subject gateway when each gateway can be accessed from anywhere in the World. The duplication of effort would be a waste, and users would benefit more if the efforts of librarians could be used for the benefit of global users as opposed to purely local users. The potential is there for many more librarians to take the opportunity of working remotely for subject gateways and those interested are invited to contact the relevant gateways. 6.2 Distributed databases and cross-searchingImagine another scenario where you could search a gateway that had been created, not by a national team of librarians, but by librarians all over the World, cataloguing resources from every country, written in every language. A strategy has been put forward for different countries to set up their own national gateways and to then create an international network of gateways that can be cross-searched by the end user [22].Considerable work has already been done to make this strategy a reality [15]. Many gateways have been careful to adopt standard metadata formats to ensure that they would be compatible with other databases if and when cross-searching became possible, and cross-searching of multiple and distributed gateways is now technologically possible and likely to become available to users in the near future. Both the EU-funded DESIRE [7] and UK-funded ROADS [18] projects have created successful prototypes of cross-searching - the ROADS Web page provides a cross-searching demonstrator [17] involving a number of gateways and DESIRE has set up a pilot where SOSIG can be cross-searched with a database set up by the National Library of the Netherlands [6]. With the technological solutions already in place, researchers are now concentrating on designing a user-interface for cross-searching. In the next year users should be able to start cross-searching subject gateways. 6.3 Automating parts of the cataloguing processUsers may see a significant increase in the number of resources available via the subject gateways if the use of metadata by information providers increases. It is possible for parts of the catalogue records to be filled automatically using information assigned to resources in the METATAGS in the HTML headers. Information providers are to be encouraged to embed metadata in formats such as Dublin Core [8] or RDF [2] in their Web documents as it will help them to be entered more efficiently into both subject gateways and search engines, and thus improve the chances of their retrieval by the end user.6.4 Harvesting quality collectionsIn addition to using harvesting techniques to collect metadata and automate parts of the cataloguing process as described above, the same harvesting techniques can be used to expand a subject gateway's coverage whilst not making substantial compromises on quality. The Combine Harvester, developed at the University of Lund has been used to this effect [1]. For example, SOSIG has an experimental service offering harvested (i.e. gathered by robot) pages in addition to its human created catalogue [3]. By using the URLs from the catalogue as a starting point and restricting the number of "hops" which a robot can take, the number of resources found by a user can be increased tenfold without appreciably losing precision.7. International gateway developmentsThe developments described above have injected considerable energy into the development of subject gateways. Distributed cataloguing has attracted interest from the library and information profession and over time it may be seen as a normal part of a librarian’s job to select and catalogue Internet resources.The cross-searching demonstrators, notably the developments with the ROADS software, have attracted considerable international interest. People from a number of countries, including Sweden, Iceland, America and Australia have expressed interest in joining an International network of compatible gateways. Certainly the UK has invested large amounts of national funding in the subject gateways developed under the Electronic Libraries Programme [10]. This year the Finnish Ministry of Education committed to using the ROADS technologies to develop the Finnish Virtual Library Project - involving the development of 40 subject gateways by university librarians across the country [11]. The subject gateway model can also be applied outside the academic sector, indeed it has been used as the basis for initiatives in the commercial and public library sectors. Clearly there are important financial and political considerations, but there is an opportunity for international collaboration to lay the foundation for a true virtual library, built from distributed components and using the expertise of local specialists. It would benefit the academic community world wide and is a key ingredient in providing access to the information cornucopia which the Internet has promised but has so far failed to deliver. 8 Guidelines and tools for those setting up subject gatewaysThe European Union’s DESIRE Project has developed a number of guidelines, methods and tools for those interested in setting up subject gateways and creating an international network [5]. These are all freely available and those interested are encouraged to use them.8.1 Quality selectionGuidelines have been written for those interested in developing quality selection criteria for subject gateways [12]. DESIRE states that "A high quality Internet resource is one that satisfies the information needs of the user". A framework for the development of service-specific criteria is given, which encourages the development of scope and collection management policies as well as selection criteria.DESIRE has produced an online tutorial called "Internet Detective" [14] which aims to raise awareness of the issue of information quality on the Internet. The tutorial is interactive and employs a wide variety of learning techniques, including quizzes, worked examples, tutorials and exercises. It is freely available for use and readers are invited to try it! 8.2 MetadataDESIRE has written a comprehensive review of metadata formats [4]. The key issues for gateways are that they use standard formats that make them compatible with other gateways and bibliographic databases. DESIRE recommended the use of the ROADS templates, which have been designed to be compatible with both Internet standards and library standards such as MARC.8.3 ClassificationDESIRE reviewed the different classification schemes currently being used to classify Internet resources [16]. The appropriateness of different classification schemes for use by subject gateways was evaluated. It was concluded that generic schemes did not always have the detail required for a subject-specific service and that in some cases subject-based classification schemes provided a more sensible way to generate the browsable lists.9 What can users expect from gateways in the future?Users can already take advantage of subject gateways, which between them describe tens of thousands of high quality Internet resources. In the future users can expect to see the existing gateways grow considerably in size as more librarians and information professionals contribute to them. They can also expect new gateways to appear and to be able to cross-search different gateways simultaneously and seamlessly.Also on the horizon, user-profiles may be used to enable subject gateways to deliver a personalised information service. Users will be asked to enter their information preferences into a database, enabling the gateways to notify them of new resources as and when they appear in the catalogue. Subject gateways will also need to work out how to fit in with a number of new developments on the World Wide Web. For example, Netscape's Navigator 5 browser incorporates an RDF-based facility called "smart browsing". This works in a similar fashion to the PICS (Platform for Internet Content Selection) facilities already available in browsers such as Internet Explorer 3. The basic model is that Web browsers are configured to consult, for every page viewed, some third party "bureau" that offers additional, descriptive information about a resource. A "smart browsing" provider would offer users extra information, (related links, ratings, reviews etc.) to enrich their browsing experience. It is not yet clear how existing subject gateways will engage with this emerging market for resource descriptions. Individually, the gateways don't yet describe enough of the Web. Federated into a distributed Internet library however, the gateways may help transform the browsing experience. References[1] Combine Harvesterhttp://www.lub.lu.se/combine/ [2] D. Brickley, R.V. Guha, A. Layman, Resource Description Framework
(RDF) Schemas, W3C Working Draft 9 April 1998
[3] Combine Demonstrator
[4] L Dempsey, R. Heery, M. Hamilton, D. Hiom, J. Knight, T. Koch, M.
Peereboom, A. Powell, A review of metadata: a survey of current resource
description formats, Work Package 3 of Telematics for Research project
DESIRE (RE 1004), March 1997
[5] DESIRE Cataloguing and Indexing
[6] DESIRE Cross-Searching Demonstrator
[7] DESIRE Project Home Page
[8] Dublin Core Metadata
[9] Economic and Social Research Council (ESRC)
[10] Electronic Libraries Programme (eLib)
[11] Finnish Virtual Library project
[12] P. Hofman, E. Worsfold, D. Hiom, M. Day, A. Oehler, Selection Criteria
for Quality Controlled Information Gateways, Work Package 3 of Telematics
for Research project DESIRE (RE 1004)
[13] Institute for Learning and Research Technology at the University
of Bristol
[14] Internet Detective
[15] J. Kirriemuir, John, D. Brickley, M. Hamilton, J. Knight, S. Welsh,
Cross-Searching Subject Gateways: The Query Routing and Forward Knowledge
Approach, D-Lib Magazine, January 1998
[16] T. Koch, M.Day, A. Brümmer, D. Hiom, M. Peereboom, A. Poulter
and E. Worsfold, The role of classification schemes in Internet resource
description and discovery, Work Package 3 of Telematics for Research project
DESIRE (RE 1004)
[17] ROADS Cross Searching Demonstrator
[18] ROADS Home Page
[19] Social Science Information Gateway (SOSIG) Home Page
[20] Social Science on the Internet (Training courses for Social
Scientists offered by the Institute for Learning and Research Technology,
University of Bristol)
[21] E. Worsfold, Distributed and Part-Automated Cataloguing, March
1998
[22] E. Worsfold, P. Hofman, D. Hiom, Developing multilingual subject
gateways, October 1997
VitaeEmma WorsfoldSOSIG and DESIRE Research Officer
Tel: +44 (0)117 928 8443
Emma Worsfold has worked in electronic library research since 1995,
working on the NetLinkS project at the University of Sheffield and SOSIG
(The Social Science Information Gateway) and the European Union's DESIRE
project at the University of Bristol. She has a background in psychology
and is a chartered librarian
|
|
Contact | © 1998-2000 DESIRE Consortium | Disclaimer | Search Last updated: |