DESIRE Information Gateways Handbook
HomeTable of contentsAuthors-
Search | Help   
Subject specialists and information managersSection 2 : Information Issues (Print Version)

Target audience
 

Section 2 of this handbook is aimed at gateway staff responsible for information management - the subject specialists and information professionals who will consider the content and organisation of the information within the gateway.

It aims to cover the important decisions that need to be made when setting up a new gateway (such as choosing a metadata format, designing a use interface, writing a selection policy) but also covers issues that arise in the day-to-day running of an existing gateway (such as cataloguing, resource discovery and publicity and promotion).

Each chapter offers some background, practical tips and hints, key references, a glossary, case studies and examples. Watch out for the Cross Reference that will take you to related sections elsewhere in the handbook.

Contents
  Section 1 : Strategic Issues

Section 2 : Information Issues
  1. Quality selection
  2. Resource discovery
  3. Metadata formats
  4. Cataloguing
  5. Subject classification, browsing and searching
  6. Collection management
  7. Working with information providers
  8. Publicity and promotion
  9. User interface design
  10. Integration of robot and manual indexes
  11. Distributed cataloguing
  12. Multi-lingual issues
  13. Co-operation between gateways
Section 3 : Technical Issues

-2.1. Quality selection: ensuring the quality of your collection

In this chapter...
 
  • why develop and publish a selection policy for your gateway?
  • creating a scope policy and selection criteria for your gateway
  • guidelines for selecting and evaluating Internet resources
  • skills and training required by gateway staff in selection and evaluation
  • changing your selection criteria over time
  • quality ratings/labelling/PICS and other Internet initiatives in this area

Introduction
 

Subject gateways are sometimes called the Internet equivalent of a library, and in terms of the selection process this is certainly true.

Gateways are characterised by the focus and quality of their collections. They aim to provide their users with a quality controlled environment in which to search for information on the Internet and they do this by building selective collections where every resource that the gateway points to has been carefully selected for its quality.

The selection process involves people making value judgements about Internet resources and selecting only those resources that satisfy certain quality criteria.

But what constitutes a 'high quality' Internet resource? Information gateways need to use a service-driven definition of quality, where resources are selected for their relevance to the user group as well as their inherent features.

Selecting resources for a gateway therefore requires a clear understanding of the information needs of the end-users, as well of as the pros and cons of the design features of Internet sites.

Information gateways consciously emphasise the importance of skilled human involvement in the assessment and 'quality control' of their selected Internet resources. Selection and evaluation of resources for a gateway is typically done by a librarian or subject specialist, reflecting the fact that selection is based on an evaluation of the semantic content of the resources.

A formal selection policy can support the development of a consistent and coherent collection of high quality Internet resources.


Why develop and publish a selection policy for your gateway?
 

Many subject guides on the Internet do not explicitly state their selection policies, but there are a number of advantages in developing a formal selection policy for a gateway and publishing it on your site:

  • it helps users to appreciate that the service is selective and quality controlled
  • it helps users to understand the level of quality of information they will find when using the service
  • it helps gateway staff to be consistent in their selection and to maintain the quality of the collection
  • it can be used to train new staff
  • it ensures consistency in collections that are developed by a distributed team

By publishing your selection policy on the gateway you can help your users to conceptualise the nature of the collection they are using. On the Web, users are very often faced with a search box or an index, and it is not always easy for them to understand exactly what they are searching. An explicit selection policy can help them to understand the nature of your gateway service. The Centre for Information Quality Management (CIQM) recommends that database providers offer a 'published specification' or 'user-level agreement' to 'lessen the gap between user expectations and the reality of searching' (Armstrong, 1997). A formal selection policy can help to meet with this recommendation.

The integrity of a collection will depend on there being some consistency in the type and quality of resources that your staff decide to include in the collection. A formal selection policy can help to ensure that the selection is consistent and that the quality of the collection remains high.

A selection policy can ensure that the same member of staff makes consistent judgements about what they include in the collection. It can also ensure that different members of the staff team make consistent judgements and that they are all using the same selection criteria.

The selection policy can help new staff to understand quickly both the nature of the collection and the criteria they should use when selecting new resources to add to the gateway.

A formal policy can also help to ensure consistency of selection within a distributed team. For example, if a number of gateways are working collaboratively, an agreed selection policy can help to ensure that the combined collection has a consistent level of quality.


What is a selection policy?
 

In an information environment, a selection policy defines the criteria used for selecting resources to add to a collection. It will typically outline the scope of the collection and the criteria used when new resources are selected for the collection. The scope policy relates to the needs of the target user group, while the selection criteria relate to the inherent features of the Internet resources.

Defining the scope of the collection

Subject gateways do not aim to include every resource available on the Internet. The scope of a gateway defines the boundaries of the collection. The scope policy is therefore a broad statement of the parameters of the collection.

The scope policy of a service states what is and is not to be included in the catalogue. In the selection process, the scope of the service will affect the first decisions made about the quality of the resources. Those falling outside the scope will be rejected and the rest will have the quality criteria applied to them.

The scope criteria are the first filter through which the resources pass. They will tend to involve clear decisions; either a resource falls within the scope or it does not.

A scope statement will typically outline:

  • the subject areas covered by the gateway
  • the types of resources covered by the gateway

It may also outline:

  • language parameters (e.g. whether the gateway only includes resources in a certain language)
  • geographical parameters (e.g. whether the gateway only includes resources from a particular country)
  • other parameters of relevance to the user group served
E X A M P L E

Examples of scope policies


Defining the quality selection criteria

Subject gateways do not generally aim to point to every Internet resource that falls within their subject area and scope. They are characterised by their quality control, aiming to point only to the best resources available for their subject area and audience.

The selection criteria outline the qualities that a resource must have to be included in the collection.

E X A M P L E

Examples of quality selection criteria



Developing a selection policy for your gateway
 

How should a gateway develop its selection policy? Each gateway needs to develop its own unique set of selection criteria to take the information needs of the user group and the aims of the service into account.

The first steps are to define:

  1. your target user group
  2. the information needs of the user group
  3. the aims and objectives of the gateway (balancing what you'd like to cover with what you have the resources to cover)

Once these steps have been taken, it is a matter of defining a formal scope policy and a set of selection criteria.

The DESIRE project has created some tools for creating a scope and selection policy. The guidelines are not prescriptive and are designed to help an institution or service develop its own tailor-made policies in the light of its aims and audience. A comprehensive list of criteria is given, from which criteria relevant to the individual service can be chosen. The list has been drawn from a 'state of the art review' of current practice, library and Web literature.

Creating a scope policy

Some possible criteria for creating your scope policy are given below. For each heading you will need to outline the parameters to be used in your gateway. Not all of these will be appropriate for your audience and you may need to add additional criteria.

INFORMATION COVERAGE

Subject Matter

  • what subject matter is appropriate for the target audience?
  • are there any subjects which will be censored (e.g. for ethical reasons, such as resources produced by hate groups or resources about bomb-making/paedophilia etc.)
  • how important is the subject matter of linked sites?

Acceptable Types of Resource

  • what types of resource are appropriate for the target audience?
  • is the information scholarly rather than popular?
  • does the resource contain more than just a list of links?
  • is the site either proven to be or expected to be durable?
  • would a resource intended for use by an individual or local group be acceptable?
  • is it innovative - does it contain breakthrough design elements?

Acceptable Sources

  • which sources of information are acceptable/appropriate for the target audience?
  • are academic, government, commercial, trade/industry, non-profit private sources all acceptable?
  • are pages maintained by individual enthusiasts (e.g. students) acceptable?
  • is biased information acceptable, and are opinions and ideologies acceptable?

Acceptable Levels of Difficulty

  • which sources of information are acceptable/appropriate for the target audience?
  • are pages maintained by individual enthusiasts (e.g. students) acceptable?
  • is biased information acceptable, and are opinions and ideologies acceptable?

Acceptable Levels of Difficulty

  • what level of resource is appropriate for the target audience? (e.g. users may be school children or may be academics)

Advertising

  • are resources that contain advertising acceptable?
  • is there a limit to the amount of advertising that is acceptable?
  • are there any forms of advertising that will be censored?

ACCESS

Cost

  • how is charging going to affect selection - is the service only going to point to resources that are free to access?
  • are there any price limits in terms of the access charge?
  • what if resources are under copyright?

Technology

  • what technologies are appropriate for the target audience? (forms, ismaps, databases, CGI scripts, Java applications, frames, etc.)
  • what connectivity does your audience have and how will this affect selection?
  • what software do your users have and how will this affect selection? (e.g. will resources that work well in graphical browsers but not in line browsers be accepted?)
  • what hardware do your users have and how will this affect selection?

Registration

  • will the service accept resources where user-registration is necessary before the resource can be accessed?
  • is online registration acceptable?
  • if users must negotiate written contracts before access is possible, is this acceptable?

Special Needs

  • do your users have any special needs that will affect the resources selected? (e.g. large print or audio options for disabled users)

METADATA AND CATALOGUING ISSUES

Granularity

  • at what level will resources be selected/catalogued?
  • will resources be considered at the Web site/Usenet group level or the Web page/Usenet article level?

Resource description

  • what is the minimum amount of information needed to create a resource description in your catalogue, i.e. what basic information MUST a resource contain to be selected? (e.g. in a WWW document, contact details, last update details, etc.)
  • is there sufficient information to create a descriptive record?
  • will the service accept resources with/without specific metadata?

GEOGRAPHICAL ISSUES

Geographical Restraints

  • are any geographical restraints appropriate for your audience?
  • will the service cover information produced locally, from particular countries, particular continents or worldwide?

Language

  • in which languages are resources acceptable/appropriate to your target audience?

Creating quality selection criteria

Once you have defined the scope of your gateway, you will need to outline the level of quality that is acceptable within each individual resource.

A list of possible quality selection criteria is given below, from which criteria relevant to the individual service can be picked.

Content criteria: evaluating the information

  • validity
  • authority and reputation of source
  • accuracy
  • comprehensiveness
  • uniqueness
  • composition and organisation
  • currency, adequacy of maintenance

Form criteria: evaluating the medium

  • ease of navigation
  • provision of user support
  • use of recognised standards
  • appropriate use of technology
  • aesthetics

Process criteria: evaluating the system

  • information integrity (work of the information provider)
  • site integrity (work of the Webmaster/site manager)
  • system integrity (work of the systems administrator)

Fuller description of each of these criteria and examples can be found in an online tutorial called 'Internet Detective':

  . Tips

Internet Detective

Internet Detective is an interactive, online tutorial which provides an introduction to the issues of information quality on the Internet and teaches the skills required to evaluate critically the quality of an Internet resource. There is no charge, it takes around two hours to complete and it has interactive quizzes and exercises to lighten the learning process.

Selection criteria for quality controlled information gateways

This is a lengthy, peer-reviewed report which describes the DESIRE research into the development of quality systems and selection criteria for subject gateways. This report will be of interest to people wishing to see the research and methodology that lay behind the development of the lists of criteria given above. The lists resulted from a 'state of the art' review of quality issues, both within subject gateways and in other sectors, notably the private sector and industry.


Guidelines for selecting and evaluating Internet resources
 

The staff responsible for selecting new resources to add to the gateway will need to be able to select resources that together create a consistent and coherent collection of high quality Internet resources.

What constitutes a 'high quality' Internet resource? The definition of quality used here has been drawn from the commercial sector, where quality is seen to be closely related to customer satisfaction and to developing systems of continuous improvement. In the context of a subject gateway, the quality of a resource will depend on the users of the service, and the nature of the service, as well as the internal features of the resource itself. We suggest that for information gateways 'a high quality Internet resource is one that meets the information needs of the user'.

This is a service-oriented definition, and so, when evaluating the quality of Internet resources, gateway staff must consider the user group that they are serving as much as the Internet resources they are evaluating.

SOSIG (The Social Science Information Gateway) has come up with five steps that describe the selection process for gateway staff:

E X A M P L E

SOSIG selection procedure: Five steps to quality control

Before you start - get to know the quality of SOSIG

  • read the SOSIG scope policy, which outlines the subjects and types of resources that are acceptable
  • become familiar with the SOSIG service, especially the coverage of the collection; browse the database to see the kinds of resources that are acceptable
  • become familiar with the SOSIG quality selection criteria outlined in these Web pages

Finding resources

You may find it easier to divide the selection process into two stages:

  1. Spend time finding resources on the Internet and bookmarking those with potential.
  2. Go back to the bookmark list later to spend time evaluating each resource in some detail.

Once you have found a resource to evaluate, there are five steps to quality control, which are summarised below.

1. Ensure that the resource falls within the scope of SOSIG

This is the most important filter through which all resources should pass - if it isn't relevant then reject it! You can use the scope policy for guidance. Most important of all is to ensure that the resource is social science related! You can look at the browsing pages to see which subject areas the service covers.

2. Search the SOSIG collection

To avoid duplication within the SOSIG collection, it is essential that you go to 'Search SOSIG' and check that the resource is not already in the database. Consider how the resource will add to the SOSIG collection (this will get easier the more you get to know SOSIG). The coverage and balance of the collection is important. Try to find resources for subject areas that are not well covered.

3. Evaluate the content of the information

Content criteria are based on the information the resources actually contain. Of the criteria relating to the resources themselves, the content criteria are the most important. Content criteria should take precedence over form criteria - SOSIG users are likely to care more about getting the information that they need than about the form it takes.

4. Evaluate the form of the information

Form criteria relate to the medium, design and presentation of the resource. Some evaluation of the form can be made by considering the ease of navigation, provision of user support, and design. Resources should rarely be rejected on design points alone, but there may be factors which should be mentioned in your description of the resource (e.g. if a resource comes in a form that some users will not be able to access).

5. Evaluate the processes set up to support the resource

Process criteria relate to the fact that Internet resources can be volatile and can lack integrity. Some evaluation of the processes set up to support a resource is necessary. These may involve personnel as well as computer systems. You need to evaluate the likelihood that a resource will be adequately maintained over time and that it will remain current and stable.

Quality resources can now be added to SOSIG via the WWW catalogue form


Skills and training required by gateway staff in selection and evaluation
 

The choices made by the staff who select resources for a gateway will determine the nature of the collection. Recruitment and training of staff will therefore be a critical choice for your gateway.

Recruiting staff

Subject gateways typically employ librarians or subject specialists to select Internet resources to add to the gateways. This reflects an acceptance that to build a high quality collection you need:

  • a good understanding of the information needs of your target user group
  • to base selection on semantic judgements about the relevance and value of resources to your users
  • to have knowledge and expertise in the subject
  • to have knowledge and experience of information resources
  • skills in critical evaluation of information resources

Recruiting skilled and knowledgeable staff will help ensure the integrity of the gateway collection.

Training staff

Staff will need to be consistent in their selection criteria if the collection is to develop consistently. They will need to be familiar with the scope and selection criteria of your gateway, but will also need to develop skills for evaluating Internet resources. Training staff may involve:

  • 'editorial meetings'- where all the selection staff discuss the criteria to be used
  • creating a staff manual - giving staff paper or online copies of the selection policy
  • developing exercises and examples based on Web sites to evaluate
  • asking staff to complete the 'Internet Detective' online tutorial
  • monitoring the sites selected by new staff to check they comply with the selection policy
  • setting up an email list for all staff to discuss and debate any quality issues that arise

Changing your selection criteria over time
 

It may be necessary to update a selection policy, as the priorities for selection may change over time as a gateway collection matures.

Adapting scope policies

A new gateway may wish to focus on developing a core collection very quickly before broadening the parameters. The scope may be much narrower in the early stages of collection development. For example, a new gateway may set narrow parameters for things such as:

  • granularity (e.g. focus on Web sites as opposed to Web pages)
  • subjects covered (e.g. prioritise generic resources over resources for very rarely researched subjects)
  • geographic boundaries (e.g. focus on UK resources before adding those from elsewhere)
  • types of resource (e.g. focus on Web sites as opposed to mailing lists or newsgroups)

A more mature gateway on the other hand may broaden its scope once a core collection has been developed to include resources beyond the very narrow scope initially used. It may choose to extend its subject coverage, work at a finer level of granularity or include resources from different countries and of different types. These decisions should be reflected in the scope policy of the service.

Adapting selection criteria

The Internet offers uneven coverage of subjects, and this may affect the quality selection criteria used within different parts of a gateway collection.

For example, if a subject comes within the scope of the gateway but very few resources can be found about that subject, it may be that less stringent quality criteria should be used, to ensure that there is at least some subject coverage.

Conversely, if there are many resources available for a subject, then very stringent quality criteria may be used to ensure that the highest quality resources are selected in preference to others with the same subject coverage.

These issues relate to collection management, which is discussed in the Collection Management chapter of this handbook.


Quality ratings/labelling/PICS and other initiatives in this area
 

The Web and metadata communities have been exploring the potential for automated approaches to quality-related aspects of information management on the Internet. The main aim has been to create a system where the quality of an Internet resource can be described in a machine-readable form. If this were to be achieved a number of scenarios would become possible. For example:

  • search engines could retrieve or rank resources according to aspects of their quality
  • users could search for resources using particular quality requirements (e.g. only peer reviewed journals, or resources that work with version 3.1 of Netscape, or resources that have been approved by a librarian)
  • users could recommend and rate Internet resources in a standard format and share these ratings

There have been two main challenges:

  1. Creating the technological infrastructure to support machine-readable quality ratings.
  2. Creating metadata vocabularies to describe various quality attributes of Internet resources.

PICS and RDF

PICS and RDF both aim to provide a technological infrastructure to support machine-readable quality ratings.

PICS stands for Platform for Internet Content Selection. It has been approved by the W3C (World Wide Web Consortium) as an agreed standard for associating labels (metadata) with Web sites or Web pages. Essentially, these labels refer to the information content of the sites, and therefore provide a means of recording information about aspects of their quality. PICS has most famously been used to support the development of services that aim to protect children from X-rated sites on the Internet.

RDF stands for Resource Description Framework and is a standard approved by the W3C. It has emerged as a successor to PICS, offering a broader infrastructure for assigning metadata labels to Internet sites and pages. RDF can be used with many different metadata vocabularies, and certainly there is potential for it to be used with a vocabulary that describes the quality of an Internet resource.

Metadata vocabularies for quality

The second challenge has been to create metadata vocabularies to describe various quality attributes of Internet resources. At the time of writing no vocabulary has emerged but work is under way, particularly within the medical community, to create metadata labels for quality that can be incorporated into Internet resource discovery services.

With the basic RDF framework in place, it is now possible for different communities to create their own quality vocabularies and apply them to their own services.

How does this work relate to Information gateways?

This work has the potential to offer gateways a number of interesting possibilities, for example:

  • Internet cataloguers may use quality ratings to help them find high quality resources to add to their gateway
  • gateways may create machine-readable quality labels
  • they may incorporate user ratings into their services

The missing link, as things stand, is the development of quality vocabularies. Gateways may see it as their role to create such vocabularies and to use RDF to create machine-readable metadata about the quality of Internet resources. At present we cannot offer an example of a gateway doing this, but some key sites where new developments will appear are listed below.

E X A M P L E

Examples of recent work with PICS and quality ratings


Glossary
 

DutchESS Dutch Electronic Subject Service
EELS Engineering Electronic Library Sweden
PICS Platform for Internet Content Selection
RDF Resource Description Framework
SOSIG Social Science Information Gateway


References
 

DutchESS, http://www.konbib.nl/dutchess/

EELS, http://www.ub.lu.se/eel/

European Link Treasury, http://www.en.eun.org/news/european-link-treasury.html

Information Quality WWW Virtual Library, http://www.ciolek.com/WWWVL-InfoQuality.html

Internet Detective, http://www.sosig.ac.uk/desire/internet-detective.html

Länkskafferiet (Link Larder), http://lankskafferiet.skolverket.se/information/kvalitetskriterier.html

PICS Home Page, http://www.w3.org/PICS/

RDF Home Page, http://www.w3.org/RDF/

Scout Report, http://scout.cs.wisc.edu/index.html

SOSIG, http://www.sosig.ac.uk/

J. Alexander & M. A. Tate, Evaluating Web Resources,
http://www2.widener.edu/Wolfgram-Memorial-Library/webeval.htm

C. Armstrong, 'Metadata, PICS and Quality', Ariadne Issue 9. 1997
http://www.ariadne.ac.uk/issue9/pics/

N. Auer, Bibliography on Evaluating Internet Resources
http://www.lib.vt.edu/research/libinst/evalbiblio.html

D. Brickley, T. Gardner, R. Heery & D. Hiom, Recommendations on Implementation of Quality Ratings in an RDF Environment.
http://www.desire.org/html/research/deliverables/D3.2/

A. Cooke, Finding Quality on the Internet: a guide for librarians and information professionals,
(London: Library Association Publishing, 1999. ISBN: 1-85604-267-7).


Credits
 

Chapter author: Emma Place

With contributions from: Michael Day, Debra Hiom, Ann-Sofie Zettergren


-2.2. Resource discovery

In this chapter...
 
  • the resource discovery process - ensuring new Internet resources are found to add to your gateway
  • systems for gateway managers - to support efficient resource discovery within your team
  • strategies for gateway staff - to continuously locate high quality resources on the Internet
  • case studies - resource discovery tips and hints from existing gateways
  • new and mature gateways - different resource discovery issues for different gateways
Introduction
 

Subject gateways should aim to describe the best resources that the Internet has to offer in their field and for their target audience. They need to:

  • point to the highest quality networked resources currently available
  • point to new networked resources as they appear

Finding high quality resources on the Internet can be a time-consuming job - which of course, is exactly why gateways exist - to save the end-user some of the time and commitment required to discover and retrieve high quality information on the Internet.

Locating resources to add to your gateway will require one of the biggest investments of staff time and effort, and so it is important to find efficient and effective methods of working at this task:

  • gateway managers need to ensure that systems to support resource discovery are in place
  • individual gateway staff need to develop their own strategies for locating as many high quality resources as efficiently as possible

Resource discovery issues for gateway managers
 

Gateway managers will need to provide the systems and strategies to support efficient resource discovery within their team.

Resource discovery is labour-intensive and efficient strategies can help to maximise the number of resources added to the gateway. This section suggests some of the systems that managers can put in place to support efficient resource discovery within the team:

  1. Avoid duplicated effort.
  2. Find the right people for the job.
  3. Provide training in resource discovery.
  4. Set up support systems for resource discovery staff.
  5. Set up systems to encourage your user community to suggest resources.

1. Avoiding duplicated effort

Duplicated effort can be wasted effort. There are issues of duplication:

  • between gateways
  • within the team

Avoid duplication with other gateways

It is worth finding out whether other gateways already describe Internet resources in your field. If there are other gateways you have to ask yourself whether it really makes sense to spend time and effort cataloguing the same resources twice. If existing gateways are already describing resources relevant to your users you should consider:

  • collaboration with other gateways (to avoid cataloguing the same resources twice)
  • cross-searching your gateway with other gateways so that your users can search more than one simultaneously
  • sharing metadata records

Cross reference
Co-operation between gateways

Avoid duplication within your team

Time can be wasted if members of your team are all trawling the same sources. Consider developing a team strategy for resource discovery. For example by:

  • giving people different subject responsibilities - so they are each hunting for resources in a different discipline
  • giving people different monitoring responsibilities - so they are each monitoring different sources (email lists/URLs/current awareness services etc.)
E X A M P L E

Example of a team dividing resource discovery responsibilities

SOSIG has divided responsibilities among the team of core staff and section editors as follows:

Section Editors: each have responsibility for a particular SUBJECT area
Central staff: have responsibility for trawling generic sources and for monitoring suggestions of sites sent in by users

See: http://www.sosig.ac.uk/contact.html


2. Find the right people for the job

It will be financial and political considerations which determine whom you can take on to do the job of resource discovery, as with recruiting staff for cataloguing.

Cross reference
Subject indexing and classification, Distributed cataloguing

Volunteers?

Pros: may be cheap and plentiful

Cons: may be inconsistent and unreliable in their contribution and it may be difficult to find volunteers with the subject expertise to select the high quality resources you want

Subject specialists?

Pros: may know of the best sources to use to discover relevant resources for your gateway and should be able to assess resources effectively, given their subject knowledge.

Cons: may be expensive, short of time, difficult to recruit and unable or unwilling to spend time cataloguing

Librarians/information professionals?

Pros: have training in selecting resources to meet the information needs of users and also may be able to catalogue resources in addition to selecting them, since they may have training in cataloguing/information retrieval issues.

Cons: may be expensive/difficult to recruit

  . .   R E M E M B E R
  • Internet skills can be taught more easily than subject expertise!
  • Librarians may be more willing and able to catalogue resources than to discover them

3. Provide training in resource discovery

The Internet is always growing and changing, so there are always new tips and hints to be learned in Internet resource discovery - training staff can improve skills and effectiveness. Training may include:

  • offering lists of sources for staff to use
  • offering demonstrations and hands-on work with different resource discovery tools
  • brainstorming ideas within the team to share resource discovery strategies

4. Set up support systems for resource discovery staff

The following are ideas for support systems for resource discovery staff:

  • create Web documents that list resource discovery strategies appropriate to your gateway
  • set up a mailing list for resource discovery staff so that the team can share knowledge of any useful new sources or techniques they find - and so they can talk about issues that arise
  • set up meetings for resource discovery staff to share stories of successful and unsuccessful strategies which they have found.
E X A M P L E

Example of a support system for gateway staff

  1. SOSIG has created a Web page for section editors, which lists possible resource strategies: 'Finding Internet resources for SOSIG: strategies and sources'
  2. A mailing list has been set up for section editors to share news of any new, effective strategies they discover.
  3. Twice a year the section editors come together and compare experiences of the most effective and the most ineffective (!) resource discovery strategies.

5. Set up systems to encourage your user community to suggest resources

Why not let the resources come to you! Encourage your users to send you details of any sites which they think should be added to the gateway. You will need:

  1. to publicise an email address or Web form for submissions
  2. to publicise your scope and selection criteria

Cross reference
Quality selection

  . Tips
  • Web forms are great because they encourage users to generate the appropriate metadata - and they may have good ideas about keywords and descriptions
  • make sure your selection criteria are freely available, to try to discourage inappropriate resources from being submitted and to make it clear that not all submissions will be accepted
  • a quick thank-you message to users is good PR and can encourage them to submit again. If you are getting a lot of submissions - create a standard courtesy reply
  • publicise the fact that you welcome submissions from your user community. If you run an email list associated with your gateway, (***CROSS REFERENCE publicity and promotion) you can send out occasional reminders to subscribers

E X A M P L E

Examples of Web forms for users to submit resources


Resource Discovery Strategies for Staff
 

Gateway staff do the 'leg work' for SOSIG users - joining the lists, monitoring the sites and doing the searches that many users do not have the time to do, filtering out items that are of poor quality or irrelevant to the users.

It's easy to waste time when surfing the Internet - gateway staff need to develop efficient and effective strategies for locating high quality Internet resources. Some strategies are suggested below.

Resource discovery tools and methods

  1. Browsing strategies
  2. Mailing lists and their archives
  3. Distribution lists and current awareness services
  4. Search tools
  5. Newsgroups and discussion forums
  6. URL-minders and Web agents
  7. Non-Internet sources

1. Browsing strategies

One of the richest sources of resources will be existing Web pages - especially authoritative ones in your field which list related or recommended resources. Trawling these sites is the equivalent of citation pearl-growing or snowballing, traditionally done by researchers looking for references - if they find one useful resource, they will follow the references from that resource to find others.

Trawling home pages of known experts

If you know of experts in your field, do a search to see if they have their own Web page. You may find that:

  1. They have published their work on the Web.
  2. They have collected a list of links (and, given their knowledge and expertise, they will be worth checking out!)

Bookmark any that look as if they may be developed over time, so that you can check them again in the future.

Trawling organisational home pages

Many organisations now have their own Web sites. These can be useful in two ways:

  1. They may include primary resources for you to catalogue.
  2. They may have lists of links selected by people with subject knowledge which you could trawl.

Consider which organisations are relevant to your audience and try to keep in touch with developments concerning them.

  . Tips

Take time to do a search for the most relevant organisational sites for you and organise them in a bookmark folder, so you can take a look at them periodically. Only bookmark the best - you won't have time to trawl too many.


If you are creating a gateway for an academic audience then it can pay to monitor university Web pages. Look for:

  • library Web sites - as many librarians are now building collections of Internet links
  • academic departments' Web sites - where lecturers and researchers may publish their work or may create lists of links
E X A M P L E

Examples of some starting points useful for academic gateways:


Trawling subject-based sites

Many sites have a section of 'links' which can be mined for new resources. The better quality the original site, the better the related links are likely to be:

  • find the most important sites in your field and look at all the links they recommend
  • look for 'What's New' or 'Latest News' features on trusted sites
  • bookmark these link pages or 'What's New' pages to check regularly, or consider putting the URLs into a Web Agent or URL-minder (see below) so that they can let you know when anything new is added
E X A M P L E

Examples of the types of pages that could be bookmarked or monitored by a minder/agent:


2. Mailing lists and their archives

Joining and monitoring email lists/checking mailing list archives

People often use email lists to announce new resources they have made available on the Internet.

You have two possible strategies here:

  1. Joining the lists and reading messages via your email
  2. Bookmarking the Web archives of the lists (if they have them) and making periodic checks on them
  . Tips

Don't join so many lists that your own email becomes unmanageable. If you can, filter your email so that messages from lists don't get mixed up with all your other mail. For very busy email lists it is probably more time-effective to make a regular scan of the archives. Set up a bookmark file for 'Archives to Check Regularly'


Subject-based lists

If you can find a list that is relevant to your subject area and audience, you have a rich source. In the early days it's worth doing a search for relevant lists and asking colleagues to recommend them.

E X A M P L E

Examples of sites which can help you to find mailing lists

  • Liszt - Directory of email groups and discussion lists
    A directory of email groups and discussion lists, including listserv, listproc, majordomo and Mailbase lists. Also offers a directory of newsgroups. The search facility makes this a quick way of finding lists on a particular subject.
  • Mailbase - The UK's major electronic mailing list service
  • The Directory of Scholarly and Professional E-Conferences - A directory designed to list the Internet communication groups and services likely to be of interest to academics and professionals.

Generic email lists that announce new Internet sites

A number of email lists exist to alert people to new Internet sites. Be warned - these lists can be prolific!

3. Distribution lists and current awareness services

Internet current awareness services come in different forms and are becoming more sophisticated. Free email subscription services will send you updates, bulletins and email publications on a regular basis. It may be worth subscribing to services that are run by key individuals or organisations in your subject area. Other services are emerging where you can create your own personal profile on the Web, which the service then uses to email you incoming information that is likely to interest you.

E X A M P L E

Examples of current awareness services


4. Search tools

Searching the Internet can be time-consuming, since many of the search tools retrieve huge numbers of hits which take a lot of time to work through. However, searching can be a good strategy in some cases:

  • targeted searching, i.e. looking for a specific resource
  • building up a specific section of your collection

In our experience, search engines can be a waste of time if broad search terms such as 'social psychology' are used. Highly focused searching based on known sources, however, can be fruitful. For example, if you have a list of well-respected journals or organisations in your field, you could search for them by name, to see whether they have a presence on the Internet. A number of hints for finding the leads for focused searching are recommended:

  • use other sources, e.g. directories, to find things to search for
  • use a subject-specific site to get lists of dates/organisations/names to search on
  • search for Internet equivalents of printed materials, e.g. scholarly journals or academic publishers
  • search for specific dates or people
  • search for important organisations to see if they are publishing anything of value on the Internet
  • use leads from your knowledge of the field

Search Engines

These are good for finding LOTS of information and for finding very precise pieces of information (so if you know exactly what you're after they can be very effective).

  . Tips

Get to know how to use one search engine very well, rather than lots of them very badly. Take time to read the Help pages for the search engine and learn how to use the Advanced Search options.

Be aware that search engines change over time and that different ones are more effective for searching for different types of information - do some research to find the best one for your needs.

Bookmark complex searches so that you can run them again periodically to see if anything new has appeared.

E X A M P L E

Examples of ways to find out about Internet search tools


5. Newsgroups and discussion forums

Internet discussion forums are a powerful and fun way to communicate with people around the world who are interested in the same things as you. Thanks to the Internet's rapid growth and the exploding popularity of the World Wide Web, people from all walks of life now participate on a regular basis.

E X A M P L E

Example of a source for Newsgroups

DejaNews offers access to tens of thousands of Usenet groups and discussion forums. It can help you to find those forums relevant to your user groups, but it may also be worth following a few yourself to see if any other Internet resources are talked about that would be appropriate for your gateway.

6. URL-minders and Web agents

Some free Web services exist that help you to monitor changes made to Internet resources or to inform you of new sites that might interest you. You register the URLs of the sites you wish to monitor or search queries you would like to have done and the service sends you an email whenever a change is made to these resources or the search yields new results.

E X A M P L E

Examples of URL-minders and Web agents


Remember that these are automated services and will not always yield high quality results.

  . Tips
  • Remember that the more URLs you register, the more email you will get - so don't set up more than you can cope with! If you can, set up email filters to separate these messages from the rest of your mail.

7. Non-Internet sources

You don't have to use the Internet to learn about Internet sites. Consider using non-Internet sources:

  • talk to people - your users/experts in your field/Internet enthusiasts and get their recommended sites
  • look at the bookmarks of these people if they publish them on the Web - if not, then ask them to let you get access to them another way
  • scan printed publications e.g. specialist journals, newspapers, newsletters, magazines
  • watch out for URLs - which are increasingly appearing everywhere from billboards to TV to the side of cornflake packets!
  . .   R E M E M B E R

It's chaos out there so don't expect resource discovery to be without its problems:

  • expect information overload and develop systems to manage it effectively
  • let serendipity play a role
  • be open to adopting new strategies and changing your old ways - the Internet is always changing
  • be open minded - take the Alexander Fleming attitude - there may be millions of petri dishes containing nothing more than a load of jelly, but keep your wits about you - what looks like a mould may turn out to be penicillin!

Issues for new gateways
 

New gateways may have different priorities for resource discovery from mature gateways as they will be focussing on developing a core collection very quickly. New gateways may want to consider the following issues:

  • target efforts to make sure that you include the most important resources first
  • balance the collection to ensure you have at least a few resources for all the subject areas you cover
  • divide responsibilities among your team
  • don't duplicate other gateways
  • be absolutely clear of your scope and selection criteria before you start the resource discovery process

Issues for mature gateways
 

Mature gateways will have already developed a core collection and may have widened their scope. Staff will need to adjust their resource discovery strategies in line with this. Mature gateways may consider the following issues:

  • collection management - you need to ensure that all the different subject areas within your collection are growing at the same rate - target efforts at areas that are falling behind and require development.
  • ensure that all areas of the collection are comparable in quality
  • focus on strategies for finding new resources AS THEY APPEAR
  • build your community - to encourage more submissions from users and information providers

Cross reference
Quality selection; Changing your selection criteria over time


Glossary
 

DutchESS Dutch Electronic Subject Service
EEVL Edinburgh Engineering Virtual Library
EUNI List of European Universities, provided by Adminet in France
SOSIG Social Science Information Gateway
URL-minder a service based in California, USA, twhich enables you to track changes made to Web sites and URLS


References
 

College and University Home Pages (world-wide), http://www.rirr.cnuce.cnr.it/universities/univ.html

Dejanews, http://www.dejanews.com/

The Directory of Scholarly and Professional E-Conferences, http://www.n2h2.com/KOVACS/

DutchESS, http://www.konbib.nl/dutchess/

EEVL, http://www.eevl.ac.uk/

EUNI - List of European Universities, http://www.ensmp.fr/~scherer/euni/euni_list.html

The Informant, http://informant.dartmouth.edu/

Library and Related Sources, http://www.exeter.ac.uk/~ijtilsed/lib/wwwlibs.html

Liszt, http://www.liszt.com/

Mailbase, http://www.mailbase.ac.uk/

Mind-it, http://mindit.netmind.com/

NewJour: Recent Issues, http://gort.ucsd.edu/newjour/nj2/

Search Engine Corner, http://www.ariadne.ac.uk/issue19/search-engines/

Search Engine Watch, http://searchenginewatch.com/

Manchester Metropolitan University's Department of Information and Communications Search Tools, http://www.mmu.ac.uk/h-ss/dic/main/search.htm

The Social Science Research Grapevine, http://www.grapevine.bris.ac.uk/

SOSIG, http://www.sosig.ac.uk

What's New in WWW Social Sciences Online Newsletter, http://www.mmu.ac.uk/h-ss/dic/main/search.htm

'What's New' on the Web server of the European Union, http://europa.eu.int/geninfo/whatsnew.htm

A. S. McNab & I. R. Winship, How to find out about new resources on the Internet, The New Review of Information Networking (1995), 147-53.

Association of Public Data Users and International Association for Social Science Information Service and Technology (IASSIST), Strategies for Searching for Information on the Internet.
http://dpls.dacc.wisc.edu/www_searchers.html

TERENA & M. Isaacs, Internet Users' Guide to Network Resource Tools, Addison Wesley Longman: 1998

E. Worsfold, Finding Internet resources for SOSIG - strategies and sources, 1997
http://sosig.ac.uk/desire/esig.html


Credits
 

Chapter author: Emma Place

With contributions from: Lisa Gray (OMNI), Debra Hiom (SOSIG), Linda Kerr (EEVL), John Kirriemuir (OMNI), Roddy McLeaod (EEVL), Kate Sharp (Biz/ed)


-2.3. Metadata formats

In this chapter...
 
  • why create metadata records?
  • types of metadata attributes
  • standard metadata formats
  • choosing metadata attributes and formats for your gateway
  • format conversion and future proofing
Introduction
 

Information gateways are characterised by their creation of third-party metadata records - individual descriptions of Internet resources held in a database that have separate fields for different attributes of the resources, such as title, author, URL etc. These resource descriptions are used to:

  • help users learn more about the Internet resources (from a trusted third-party)
  • support information search and retrieval

Gateways adopt the approach where metadata is created by a third party ie. an independent subject specialist or information professional, rather than the creator of the resource. This enables the quality control for which gateways are renowned - the resource descriptions all assume a standard format and are generated manually (at least in part) to enable high quality metadata that benefits for semantic judgements about the nature and origin of the resources.

The metadata created by gateways is their greatest asset - adding value to the Internet resources by creating independent, standardised third-party descriptions.

The decision of which metadata format to use is an imporatnt one as it impacts on the searching capabilities of the gateway and the value of the descriptions to the end-users. The creation of metadata will be one of the most time-consuming tasks in running a gateway and so a balance between value and cost may be required in deciding on a format.

This chapter will introduce some of these issues and provide some background information that information gateway managers will need to consider when choosing a metadata format for their gateway.


Why create metadata records?
 

Information gateways are services that give access to networked resources in particular subject areas, linguistic domains, and so on. Many Internet portals simply comprise of sets of Web pages with lists of hyperlinks on a static Web page, perhaps with annotations, however, this approach has distinct disadvantages:

  • the portal can be browsed, but with no database it cannot be searched effectively
  • maintaining the portal is time consuming as all edits and additions require manual changes to the HTML

Gateways take advantage of database technologies which gets over both these problems, but requires that a standard format be used for creating and storing the resource descriptions. Metadata formats are structured formats for Internet resource descriptions. For gateways, the metadata fomats are the forms or templates that need to be filled in by the cataloguers to create a resource description.

The use of metadata by an information gateway has many benefits over the simple HTML list approach, for example:

  • the metadata has structure and so can form the basis of far more advanced search facilities within a gateway (e.g. fielded searching, such as searching by title or author)
  • the metadata can be converted to other formats or be otherwise persuaded to interoperate with different search and retrieve protocols
  • it is easier to maintain a database of resource descriptions than a large number of HTML files. Administrative metadata can also be used to record when resources need to be re-evaluated or removed from the database

Metadata attributes
 

Gateways staff will need to agree on the attributes of an Internet resource that they wish to describe. Metadata can be grouped into various kinds according to their use within the gateway. They might include:

Descriptive

Descriptive metadata contain information which may be usefully returned from a search of the gateway. A user may be able to decide from this information whether it is worth spending time looking at the resource itself.

  • title
  • short title (e.g. an acronym of the full title)
  • alternative title (e.g. title of resource in another language)
  • subtitle
  • description
  • URI (or other location)
  • author
  • language
  • character set encoding
  • organisation - either creating or hosting the resource-
  • medium (e.g. text/images/audio/video)
  • type of resource (using types appropriate to your gateway)
  • physical medium
  • copyright owner
  • availability (is payment or registration needed?)
  • software required for access (e.g. specific browsers, MIDI software)
  • quality rating
  • intended audience (e.g. undergraduate level)

Subject

Subject metadata can facilitate effective searching. They can also be used to organise the browsing structure of your gateway. A fuller discussion can be found in the

Cross reference
Subject indexing and classification

  • keywords
  • classification code
  • classification system - must accompany classification code!
  • terms from thesauri
  • subject headings

Administrative

Administrative metadata are intended primarily to assist the gateway staff in maintaining the gateway. They are of less concern to users and may not be visible to them; however, they can be used, for example, to check that resource descriptions are still current.

  • resource maintainer
  • date of addition of resource to gateway
  • date record was last updated
  • date resource was last changed
  • review-by date
  • expiry date (e.g. of a conference announcement)
  • submitter of resource
  • cataloguer of resource
  • origin of record (if gateway has collaborators)
  • rights ownership
E X A M P L E

ROADS templates contain relatively simple administrative metadata attributes like the following:

To-Be-Reviewed-Date:
Record-Last-Verified-Email:
Record-Last-Verified-Date:
Comments:
Record-Last-Modified-Date:
Record-Last-Modified-Email:
Record-Created-Date:
Record-Created-Email:


Consideration of which particular administrative functions are required and an assessment of which particular administrative metadata elements are needed will be an important part of choosing (or adapting) a metadata format for use in a particular information gateway.

Core metadata

The possible metadata fields listed above are by no means exhaustive, but including them all would require considerable effort both in initial cataloguing and in keeping records up to date. Not all of them might be appropriate to your gateway.

Attempts have been made to define standards for a 'core' of metadata which should be regarded as a bare minimum. One such standard is the Dublin Core.

E X A M P L E

Dublin Core currently involves 15 core elements:

  1. Title
  2. Author or Creator
  3. Subject and Keywords
  4. Description
  5. Publisher
  6. Other Contributor
  7. Date
  8. Resource Type
  9. Format
  10. Resource Identifier
  11. Source
  12. Language
  13. Relation
  14. Coverage
  15. Rights Management

http://purl.oclc.org/dc/about/element_set.htm


ROADS offers a number of metadata templates designed for different types of Internet resources. Each template contains attributes specific to the type of Internet resource. For example, the template for describing a mailarchive will have a different set of fields from the template for describing a Web document. ROADS also maintains a 'template registry' where the metadata fields used in the various kinds of ROADS templates are recorded. This ensures that ROADS services are potentially interoperable in this area. New fields can be nominated for addition to the registry.

E X A M P L E

ROADS offers metadata formats for the following types of Internet resource:

ROADS template-types:

COLLECTION - experimental
DATASET
DOCUMENT
DUBLINCORE
EVENT - experimental
IMAGE
MAILARCHIVE
PROJECT
SERVICE
SOFTWARE
SOUND
TRAINING MATERIALS
USENET
VIDEO

http://www.ukoln.ac.uk/metadata/roads/templates/


Choosing metadata attribites
 

You should think carefully about which metadata attributes your gateway is going to use, and their format, when you first set up the gateway. If you do not, you may find yourself constrained by the absence of useful metadata, or have to add a new metadata field or convert an existing field to a different format when you already have several thousand resources in your database. Moreover, decisions about metadata will in turn affect the design of your interface (especially the parts of it used for cataloguing and/or submitting new resources for consideration).

Cross reference
Cataloguing

Which metadata fields could be usefully searched on by your users?

You should consider your potential user community and also the nature of the resources which your gateway will cover. For example, if your gateway is intended to cover only geographically local resources in one language, a 'language' field will not be very informative unless your gateway is going to be cross-searched with others elsewhere.

And how are they going to search them?

This will affect not only what metadata fields you provide but also the cataloguing rules you adopt. For example, if you are ranking searches by the frequency of the occurrence of the search term, you may wish to make descriptions similar in length, otherwise resources with long descriptions may be more likely to returned high up the order.

Cross reference
Subject indexing and classification

Which metadata fields will be displayed to the users of the gateway?

Will they need to be converted from the form in which they are stored and if so does an easy way of converting them exist?

Which metadata fields will be used for housekeeping by the gateway staff and how?

Metadata can supply information for partially automating this otherwise laborious aspect of gateway management. For example, you can have an automatic email sent to maintainers of resources occasionally to ask whether they have made any changes, or set a web-page tracking tool to monitor changes to resources.

Cross reference
Collection management

Which if any are optional?

If you are collaborating (or thinking of it), which metadata fields will be shared with your collaborators? Are they likely to want extra information, such as language, which you would not otherwise include in your metadata? You will need to use the same schemes for e.g. classification or have a usable crosswalk to convert between schemes. You should also think about the issue of copyright.

Cross reference
Co-operation between gateways, Interoperability

Are you going to display your metadata in the same format as that in which you store it?

If not, you will need a way of converting between formats.

Can any of the software you are using generate useful metadata?

For example, ROADS automatically records when a template was last updated. You may wish to use in addition software for creating metadata (see below). Harvesting software, if used, may also be able to harvest metadata.

Cross reference
Harvesting, indexing and automated metadata collection

Who will generate metadata fields (and which ones?).

Metadata may be supplied by:

  • information providers
  • gateway users
  • cataloguers for the gateway
  • subject editors for the gateway
  • core gateway staff
  • another gateway working in collaboration with you
  • automatic generation by software

How much cross-checking will there be? (Time will need to be allowed for this).

If you are allowing gateway users or information providers to submit resources, what information should they supply?

What information may they also supply optionally? How important is it that (for example) descriptions or keywords are consistent across the gateway? If this is important, can you supply cataloguing rules or other guidance to help information providers and others who are submitting resources? How much effort can be expended on editing their contributions, given that gateway users and information providers cannot be compelled to follow your cataloguing rules?

Cross reference
Working with information providers

How might you ensure that information such as dates is in a consistent format? Possible methods include:

  • pulldown menus on forms
  • authority files
  • cataloguing rules

Cross reference
Cataloguing

In what language are your metadata records going to be kept?

If this is different from the language of some of your resources, are you going to make any provision for searching in that language (e.g. an 'alternative title' field)?

Cross reference
Multi-lingual issues


Standard metadata formats
 

Information gateway managers will need to make decisions about which metadata format (or formats) to use within their service at a very early stage of its development. At present, however, the existence of a large and varied range of metadata formats and initiatives complicates these decisions.

It is worth remembering also that the choice of metadata formats will often be influenced by other factors, both technological and social. For example, an information gateway that wishes to use the ROADS software toolkit with little modification will currently need to use the ROADS template format, or something very similar to it. Again, where gateway cross-searching or interoperability is seen to be important, there may be technical reasons why one format may have advantages over another.

The nature of metadata development means that at any one time there are likely to be a variety of formats that could be chosen as the basis of an information gateway. For example, a review of metadata formats undertaken under DESIRE I identified and described over twenty formats that were in use (or under development) in 1996 (Dempsey et al., 1997). In order to help analyse the different metadata formats described in the review, the DESIRE I study produced a typology of metadata based upon their underlying complexity.

Band One Band Two Band Three

[simple]

---------------

---------------

[complex]

(full text indexes)

(simple structured generic formats)

(more complex structure, domain specific)

(part of a larger semantic framework)

Proprietary formats

Proprietary formats
Dublin Core
ROADS templates
LDIF

FGDC
MARC

TEI headers
EAD
CIMI

Figure 1. Typology of metadata formats (adapted from Dempsey and Heery, 1998).


Choosing a metadata format
 

Choosing a format from the variety of existing ones will depend upon various factors. In general, current information gateways tend to use relatively simple generic formats with some structure ('Band Two' formats such as ROADS templates or Dublin Core). These formats have the twin advantages of simplicity, which means that they are relatively easy to create and maintain, and the existence of some structure, which facilitates both interoperability and format conversion. However, in particular circumstances there may be good arguments for basing an information gateway on more complex formats ('Band Three' formats such as MARC or TEI headers) if this offers some competitive advantage to the gateway. For example, the USMARC format has been used for the cataloguing of Internet resources in the InterCat project and it would be possible to set up MARC-based information gateways. However, the use of these more complex formats may have implications for the level of expertise (technical and other) that would be required for cataloguing and may have other costs.

As noted before, the choice of a particular format may be dictated by technological or social factors. For example, particular gateway software may dictate the use (or non-use) of particular formats. Information gateways that, for example, are running the ROADS software without much modification will need either to use one of the existing templates defined by the ROADS project or to create new (and similar) templates in the form of attribute-value pairs.

Example format 1: Dublin Core

The Dublin Core (DC) is the result of an international and interdisciplinary initiative to define a core set of metadata elements for electronic resources, primarily for resource discovery on the Internet. DC was initially conceived as a simple format that could be used for author-generated descriptions of Web resources. However, the format has also attracted the attention of resource description professionals from a variety of communities such as libraries, museums, archives and government agencies.

E X A M P L E

Example of a DC based gateway

EdNA (Education Network Australia):

EdNA - an information gateway for Australian education resources - uses a metadata standard that is based on the DC element set. The owners of documents are encouraged to embed metadata within their documents where it can be read by the EdNA resource harvester and transferred to the EdNA database.


The format has been developed by means of a series of invitational workshops, the first being held in Dublin, Ohio in March 1995. The workshop series and related work has resulted in the definition of fifteen core metadata elements as RFC 2413 (Weibel et al., 1998). These elements are intended to be repeatable and extensible in any application.

The initial focus of DC was the Web, so the initiative has concentrated on the production of draft guidance for the encoding of DC elements, first in HTML (Kunze, 1999) and more recently in XML/RDF (e.g. Miller, Miller and Brickley, 1999).

E X A M P L E

Example of DC metadata embedded in HTML

<link rel="schema.DC" href="http://purl.org/dc">
<meta name="DC.Title" content="Southampton Oceanography Centre (SOC)">
<meta name="DC.Creator" content="Bruce Dupee (b.dupee@soc.soton.ac.uk)">
<meta name="DC.Subject" content="oceanography, marine, technology, geology, seafloor, education, science, research, ships, vessels">
<meta name="DC.Description" content="An introduction to the services provided by the Southampton
Oceanography Centre - a joint venture between the University of Southampton and the Natural Environment
Research Council. Includes information on internal departments and divisions, and the National Oceanographic Library">
<meta name="DC.Publisher" content="NERC Computer Services">
<meta name="DC.Date" scheme="WTN8601" content="1999-06-08">
<meta name="DC.Type" content="Text">
<meta name="DC.Format" content="text/html">
<meta name="DC.Format" content="7985 bytes">
<meta name="DC.Identifier" content="http://www.soc.soton.ac.uk/">

Metadata created by DC-dot, a service that will retrieve a Web page and automatically generate Dublin Core metadata, either as HTML <META> tags or as RDF/XML, suitable for embedding in the page header.


Example format 2: ROADS templates

ROADS templates are a development of the IAFA templates originally developed for anonymous FTP archives (Deutsch et al., 1994). IAFA templates are a simple text-based metadata format consisting of predefined sets of attribute-value pairs. Templates exist for a number of different resource types, but the templates most commonly used in existing ROADS-based gateways are those designated SERVICE, DOCUMENT and MAILARCHIVE.

E X A M P L E

Example of part of a ROADS SERVICE template

Template-Type: SERVICE
Handle: 840738289-29226
Title: Southampton Oceanography Centre
URI-v1: http://www.soc.soton.ac.uk/
Admin-Email-v1: webmaster@mail.soc.soton.ac.uk
Publisher-Name-v1: University of Southampton
Publisher-Postal-v1: Southampton Oceanography Centre, University of Southampton, Waterfront Campus, European Way, Southampton SO14 3ZH, United Kingdom
Publisher-City-v1: Southampton
Publisher-Country-v1: UK
Publisher-Phone-v1: +44 (0)1703 596666
Description: An introduction to the services provided by the Southampton Oceanography Centre - a joint venture between the University of Southampton and the Natural Environment Research Council. Includes information on internal departments and divisions, and the National Oceanographic Library
Keywords: Southampton Oceanography Centre; Natural Environment Research Council; NERC;
Subject-Descriptor-v1: 551.46
Subject-Descriptor-Scheme-v1: DDC21
Record-Last-Modified-Date: Wed, 12 May 1999 18:24:49 +0000
Record-Last-Modified-Email: cataloguer@subject-gateway.ac.uk
Record-Created-Date: Wed, 12 May 1999 18:24:49 +0000
Record-Created-Email: cataloguer@subject-gateway.ac.uk


Format conversion
 

One of the advantages of using well-defined and structured metadata formats is that this allows conversion into other formats when necessary. This is useful in two main circumstances:

  1. When a gateway wants to change to using a different metadata format. For example, a gateway that currently uses a custom-built database management system with a Web interface might want to run the ROADS software to take advantage of cross-searching facilities. The gateway's existing records would therefore need to be converted into ROADS templates. These types of conversion will be required periodically as information gateway software and its associated metadata evolve.
  2. To aid interoperability.

Format conversion is facilitated by the creation of crosswalks (or mapping tables) between metadata formats. Crosswalks can be used as the basis for the production of a specific conversion program or for the production of search systems that would permit the interrogation of heterogeneous metadata formats. A number of metadata format crosswalks have been published. One of the earliest DC-based crosswalks mapped Dublin Core to USMARC (Caplan and Guenther, 1996) and other crosswalks exist for other formats including Text Encoding Initiative (TEI) headers, ROADS templates and a variety of MARC formats, including the Universal MARC format (UNIMARC). A collection of metadata mappings is maintained on the UKOLN Web site (Day, 1996).

Cross reference
Interoperability

E X A M P L E

Examples of metadata conversion projects

Nordic Metadata Project

The Nordic Metadata Project produced a variety of tools designed to aid the wider utilisation of Dublin Core (Hakala et al., 1998). The toolkit included a utility called d2m, a Dublin Core to MARC converter that converts Dublin Core metadata embedded in HTML into a variety of Nordic MARC formats and USMARC.

BIBLINK project

The BIBLINK project developed a custom-built software system (the BIBLINK Workspace) which converts metadata produced by publishers into the UNIMARC format for use by participating national bibliographic agencies (Day, Heery and Powell, 1999). The UNIMARC records can in turn be converted into other formats (usually MARC-based) used by these national bibliographic agencies, who can then enhance them for inclusion in their national bibliography and (possibly) for returning this enhanced record to the publisher. The metadata conversion process in the BIBLINK Workspace uses metadata crosswalks produced for the project by UKOLN (e.g. Day, 1998a).


Future proofing
 

Any choices concerning metadata will need to take into account possible future developments. The gateway may decide to expand by including new types of descriptions (possibly for new types of resource such as images or multimedia) or to include additional metadata (such as descriptions aimed at alternative audiences, rights metadata, digital preservation data). At the simplest level, updates and extensions to existing metadata element sets need to be accommodated. The gateway may want to ensure that:

  • metadata creation tools can be easily extended to deal with new elements and new formats
  • the system has sufficient flexibility to allow a variety of formats to be imported and exported

Within the lifetime of the gateway, it may have to migrate to a different system which will require different metadata formats, whether these are new versions of existing formats or completely different. Re-structuring the metadata can be done more efficiently if the gateway follows some general guidelines for the content of metadata. Such guidelines might include recommendations that:

  • metadata formats and rules for content are agreed among collaborating gateways (this means that gateways can share costs of converting their data)
  • gateways implement local usages by means of local processing rather than by incorporating them into the data (for example, adding punctuation and other presentational enhancements by software processing rather than by storing it as part of the data)
  • there are as few local variants to standard metadata formats as possible. (For example, variant element names can be displayed using local processing rather than by storing non-standard element names.)
  • collaborate with other gateways so that migration can take advantage of economies of scale.

Conclusions
 

Choosing a metadata format is one of the most important decisions that needs to be made when setting up an information gateway. It is vital that the format is able to work with the software that forms the basis of the gateway service and it should also contain all fields (including administrative metadata) that have been identified as appropriate for the service in question (or the format should be extensible). It is possible that ongoing changes in technologies may require periodic conversion of the gateway database into new formats. This process will require the production of metadata crosswalks and/or format conversion programs.


References
 

BIBLINK, http://hosted.ukoln.ac.uk/biblink/

d2m, http://www.bibsys.no/meta/d2m/

DC-dot, http://www.ukoln.ac.uk/cgi-bin/dcdot.pl

Dublin Core, http://purl.oclc.org/dc

EdNA, http://www.edna.edu.au/EdNA/

InterCat, http://purl.org/net/intercat

ROADS, http://www.ilrt.bris.ac.uk/roads/

P. L. Caplan & R. S. Guenther, 'Metadata for Internet resources: the Dublin Core Metadata Element Set and its mapping to USMARC', Cataloging and Classification Quarterly 22 (3/4) (1996), 43-58.

M. Day, Interoperability between metadata formats (Bath: UKOLN, 1996).
http://www.ukoln.ac.uk/metadata/interoperability/

M. Day, Mapping BIBKLINK Core (BC) to UNIMARC. BIBLINK project document (Bath: UKOLN, 10 September 1998).
http://hosted.ukoln.ac.uk/biblink/wp10/bc-unimarc.html

M. Day, R. Heery & A. Powell, 'National bibliographic records in the digital information environment: metadata, links and standards', Journal of Documentation 55 (1) (1999), 16-32.

L. Demspey & R. Heery, 'Metadata: a current view of practice and issues', Journal of Documentation 54 (2) (1998), 145-172.

L. Demspey, R. Heery, M. Hamilton, D. Hiom, J. Knight, T. Koch, M. Peereboom & A. Powell, A review of metadata: a survey of current resource description formats (DESIRE deliverable D3.2 (1), March 1997).
http://www.ukoln.ac.uk/metadata/desire/overview/

P. Deutsch, A. Emtage, M. Koster & M. Stumpf, Publishing information on the Internet with Anonymous FTP (Internet-Draft, September 1994).
http://info.webcrawler.com/mak/projects/iafa/iafa.txt

J. Hakala, P. Hansen, O. Husby, T. Koch & S. Thorborg, The Nordic Metadata Project: final report (Helsinki: Helsinki University Library, July 1998).
http://linnea.helsinki.fi/meta/nmfinal.htm

R. Heery, 'Review of metadata formats', Program 30 (4) (1996), 345-373.

R. Iannella & D. Campbell, The A-Core: metadata about content metadata (Internet-Draft, 21 June 1999).
http://metadata.net/admin/draft-iannella-admin-01.txt

J. Kunze, Encoding Dublin Core Metadata in HTML (Internet-Draft, 25 May 1999).
http://www.ietf.org/internet-drafts/draft-kunze-dchtml-01.txt

O. Lassila & R. Swick, eds., Resource Description Framework (RDF) model and syntax specification (W3C Working Draft, 1999).
http://www.w3.org/TR/WD-rdf-syntax/

Making of America project, The Making of America II testbed project white paper (Version 1.03, March 16 1998).
http://sunsite.berkeley.edu/MOA2/wp-v1_03.html

E. Miller, P. Miller & D. Brickley, eds., Guidance on expressing the Dublin Core within the Resource Description Framework (RDF) (Dublin Core Metadata Initiative, Draft Proposal,1999).
http://www.ukoln.ac.uk/metadata/resources/dc/datamodel/WD-dc-rdf/

S. Weibel, J. Kunze, C. Lagoze & M. Wolf, RFC 2413, Dublin Core metadata for resource discovery (Internet Engineering Task Force, Network Working Group, September 1998).
ftp://ftp.isi.edu/in-notes/rfc2413.txt

S. Weibel, 'The State of the Dublin Core Metadata Initiative', D-Lib Magazine 5 (4) (April 1999).
http://www.dlib.org/dlib/april99/04weibel.html

S. L. Weibel & C. Lagoze, 'An element set to support resource discovery: the state of the Dublin Core', International Journal on Digital Libraries, 1(2) (January 1997), 176-186.


Credits
 

Chapter author: Michael Day

With contributions from: Rachel Heery, Emma Place and Virginia Knight


-2.4. Cataloguing

In this chapter...
 
  • describing Internet resources: cataloguing and metadata approaches
  • metadata formats and content rules
  • types of information needed by an information gateway
  • developing cataloguing guidelines for a gateway
  • cataloguing interfaces and maintenance
Introduction
 

The role of cataloguing rules or guidelines is to specify how the content of a metadata format is entered. Once a metadata format has been chosen, consideration should then be given to how this metadata should be entered into the information gateway database and a set of cataloguing rules prepared.

One of the key roles of Internet subject gateways is the creation of descriptive metadata about networked resources which can be used as a basis for searching and browsing the gateway. These descriptions can also help gateway users to identify whether the resources are really what they need, potentially saving them a considerable amount of time browsing through the limited amounts of information available elsewhere on the Internet (Sha, 1995, p. 467). Therefore, one of the most important (and time-consuming) activities for a subject gateway will be the provision of these descriptions. This is the activity generally known as 'cataloguing' and is one of the key tasks of any information gateway.


Background
 

Cataloguing can be defined as the creation of surrogate records which can be used to facilitate the identification, location, access and use of resources (Levy, 1995). These descriptions are usually created in accordance with certain standards (cataloguing rules and metadata formats) and will often include additional features such as classification, subject analysis and authority control (Dillon and Jul, 1996, p. 198, Bryant 1980). These tools and standards were originally developed for the cataloguing and indexing of traditional - mostly printed - collections. However, many of them have been revised to take account of resources based on newer technologies. Recent developments include:

1. ISBD(ER). In 1997, the IFLA Universal Bibliographic Control and International MARC Programme (UBCIM) published a revision of ISBD(CF) for 'Computer Files' for both online and off