| Search | Help |
|
Target audience
|
|
|
Section 2 of this handbook is aimed at gateway staff responsible for information management - the subject specialists and information professionals who will consider the content and organisation of the information within the gateway. It aims to cover the important decisions that need to be made when setting up a new gateway (such as choosing a metadata format, designing a use interface, writing a selection policy) but also covers issues that arise in the day-to-day running of an existing gateway (such as cataloguing, resource discovery and publicity and promotion). Each chapter offers some background, practical tips and hints, key references, a glossary, case studies and examples. Watch out for the |
|
|
Introduction
|
|
|
Subject gateways are sometimes called the Internet equivalent of a library, and in terms of the selection process this is certainly true. Gateways are characterised by the focus and quality of their collections. They aim to provide their users with a quality controlled environment in which to search for information on the Internet and they do this by building selective collections where every resource that the gateway points to has been carefully selected for its quality. The selection process involves people making value judgements about Internet resources and selecting only those resources that satisfy certain quality criteria. But what constitutes a 'high quality' Internet resource? Information gateways need to use a service-driven definition of quality, where resources are selected for their relevance to the user group as well as their inherent features. Selecting resources for a gateway therefore requires a clear understanding of the information needs of the end-users, as well of as the pros and cons of the design features of Internet sites. Information gateways consciously emphasise the importance of skilled human involvement in the assessment and 'quality control' of their selected Internet resources. Selection and evaluation of resources for a gateway is typically done by a librarian or subject specialist, reflecting the fact that selection is based on an evaluation of the semantic content of the resources. A formal selection policy can support the development of a consistent and coherent collection of high quality Internet resources. |
|
|
Why develop and publish a selection policy for your gateway?
|
|
|
Many subject guides on the Internet do not explicitly state their selection policies, but there are a number of advantages in developing a formal selection policy for a gateway and publishing it on your site:
By publishing your selection policy on the gateway you can help your users to conceptualise the nature of the collection they are using. On the Web, users are very often faced with a search box or an index, and it is not always easy for them to understand exactly what they are searching. An explicit selection policy can help them to understand the nature of your gateway service. The Centre for Information Quality Management (CIQM) recommends that database providers offer a 'published specification' or 'user-level agreement' to 'lessen the gap between user expectations and the reality of searching' (Armstrong, 1997). A formal selection policy can help to meet with this recommendation. The integrity of a collection will depend on there being some consistency in the type and quality of resources that your staff decide to include in the collection. A formal selection policy can help to ensure that the selection is consistent and that the quality of the collection remains high. A selection policy can ensure that the same member of staff makes consistent judgements about what they include in the collection. It can also ensure that different members of the staff team make consistent judgements and that they are all using the same selection criteria. The selection policy can help new staff to understand quickly both the nature of the collection and the criteria they should use when selecting new resources to add to the gateway. A formal policy can also help to ensure consistency of selection within a distributed team. For example, if a number of gateways are working collaboratively, an agreed selection policy can help to ensure that the combined collection has a consistent level of quality. |
|
|
What is a selection policy?
|
|||||||
|
In an information environment, a selection policy defines the criteria used for selecting resources to add to a collection. It will typically outline the scope of the collection and the criteria used when new resources are selected for the collection. The scope policy relates to the needs of the target user group, while the selection criteria relate to the inherent features of the Internet resources. Defining the scope of the collection Subject gateways do not aim to include every resource available on the Internet. The scope of a gateway defines the boundaries of the collection. The scope policy is therefore a broad statement of the parameters of the collection. The scope policy of a service states what is and is not to be included in the catalogue. In the selection process, the scope of the service will affect the first decisions made about the quality of the resources. Those falling outside the scope will be rejected and the rest will have the quality criteria applied to them. The scope criteria are the first filter through which the resources pass. They will tend to involve clear decisions; either a resource falls within the scope or it does not. A scope statement will typically outline:
It may also outline:
Defining the quality selection criteria Subject gateways do not generally aim to point to every Internet resource that falls within their subject area and scope. They are characterised by their quality control, aiming to point only to the best resources available for their subject area and audience. The selection criteria outline the qualities that a resource must have to be included in the collection.
|
|||||||
|
Developing a selection policy for your gateway
|
||||||||||||||||||||||||||||||||||||||||
|
How should a gateway develop its selection policy? Each gateway needs to develop its own unique set of selection criteria to take the information needs of the user group and the aims of the service into account. The first steps are to define:
Once these steps have been taken, it is a matter of defining a formal scope policy and a set of selection criteria. The DESIRE project has created some tools for creating a scope and selection policy. The guidelines are not prescriptive and are designed to help an institution or service develop its own tailor-made policies in the light of its aims and audience. A comprehensive list of criteria is given, from which criteria relevant to the individual service can be chosen. The list has been drawn from a 'state of the art review' of current practice, library and Web literature. Creating a scope policy Some possible criteria for creating your scope policy are given below. For each heading you will need to outline the parameters to be used in your gateway. Not all of these will be appropriate for your audience and you may need to add additional criteria.
Creating quality selection criteria Once you have defined the scope of your gateway, you will need to outline the level of quality that is acceptable within each individual resource. A list of possible quality selection criteria is given below, from which criteria relevant to the individual service can be picked. Content criteria: evaluating the information
Form criteria: evaluating the medium
Process criteria: evaluating the system
Fuller description of each of these criteria and examples can be found in an online tutorial called 'Internet Detective':
|
||||||||||||||||||||||||||||||||||||||||
|
Guidelines for selecting and evaluating Internet resources
|
||||
|
The staff responsible for selecting new resources to add to the gateway will need to be able to select resources that together create a consistent and coherent collection of high quality Internet resources. What constitutes a 'high quality' Internet resource? The definition of quality used here has been drawn from the commercial sector, where quality is seen to be closely related to customer satisfaction and to developing systems of continuous improvement. In the context of a subject gateway, the quality of a resource will depend on the users of the service, and the nature of the service, as well as the internal features of the resource itself. We suggest that for information gateways 'a high quality Internet resource is one that meets the information needs of the user'. This is a service-oriented definition, and so, when evaluating the quality of Internet resources, gateway staff must consider the user group that they are serving as much as the Internet resources they are evaluating. SOSIG (The Social Science Information Gateway) has come up with five steps that describe the selection process for gateway staff:
|
||||
|
Skills and training required by gateway staff in selection and evaluation
|
|
|
The choices made by the staff who select resources for a gateway will determine the nature of the collection. Recruitment and training of staff will therefore be a critical choice for your gateway. Recruiting staff Subject gateways typically employ librarians or subject specialists to select Internet resources to add to the gateways. This reflects an acceptance that to build a high quality collection you need:
Recruiting skilled and knowledgeable staff will help ensure the integrity of the gateway collection. Training staff Staff will need to be consistent in their selection criteria if the collection is to develop consistently. They will need to be familiar with the scope and selection criteria of your gateway, but will also need to develop skills for evaluating Internet resources. Training staff may involve:
|
|
|
Changing your selection criteria over time
|
|
|
It may be necessary to update a selection policy, as the priorities for selection may change over time as a gateway collection matures. Adapting scope policies A new gateway may wish to focus on developing a core collection very quickly before broadening the parameters. The scope may be much narrower in the early stages of collection development. For example, a new gateway may set narrow parameters for things such as:
A more mature gateway on the other hand may broaden its scope once a core collection has been developed to include resources beyond the very narrow scope initially used. It may choose to extend its subject coverage, work at a finer level of granularity or include resources from different countries and of different types. These decisions should be reflected in the scope policy of the service. Adapting selection criteria The Internet offers uneven coverage of subjects, and this may affect the quality selection criteria used within different parts of a gateway collection. For example, if a subject comes within the scope of the gateway but very few resources can be found about that subject, it may be that less stringent quality criteria should be used, to ensure that there is at least some subject coverage. Conversely, if there are many resources available for a subject, then very stringent quality criteria may be used to ensure that the highest quality resources are selected in preference to others with the same subject coverage. These issues relate to collection management, which is discussed in the Collection Management chapter of this handbook. |
|
|
Quality ratings/labelling/PICS and other initiatives in this area
|
||||
|
The Web and metadata communities have been exploring the potential for automated approaches to quality-related aspects of information management on the Internet. The main aim has been to create a system where the quality of an Internet resource can be described in a machine-readable form. If this were to be achieved a number of scenarios would become possible. For example:
There have been two main challenges:
PICS and RDF PICS and RDF both aim to provide a technological infrastructure to support machine-readable quality ratings. PICS stands for Platform for Internet Content Selection. It has been approved by the W3C (World Wide Web Consortium) as an agreed standard for associating labels (metadata) with Web sites or Web pages. Essentially, these labels refer to the information content of the sites, and therefore provide a means of recording information about aspects of their quality. PICS has most famously been used to support the development of services that aim to protect children from X-rated sites on the Internet. RDF stands for Resource Description Framework and is a standard approved by the W3C. It has emerged as a successor to PICS, offering a broader infrastructure for assigning metadata labels to Internet sites and pages. RDF can be used with many different metadata vocabularies, and certainly there is potential for it to be used with a vocabulary that describes the quality of an Internet resource. Metadata vocabularies for quality The second challenge has been to create metadata vocabularies to describe various quality attributes of Internet resources. At the time of writing no vocabulary has emerged but work is under way, particularly within the medical community, to create metadata labels for quality that can be incorporated into Internet resource discovery services. With the basic RDF framework in place, it is now possible for different communities to create their own quality vocabularies and apply them to their own services. How does this work relate to Information gateways? This work has the potential to offer gateways a number of interesting possibilities, for example:
The missing link, as things stand, is the development of quality vocabularies. Gateways may see it as their role to create such vocabularies and to use RDF to create machine-readable metadata about the quality of Internet resources. At present we cannot offer an example of a gateway doing this, but some key sites where new developments will appear are listed below.
|
||||
|
Glossary
|
|
|
DutchESS Dutch Electronic Subject Service |
|
| References
|
|
|
DutchESS, http://www.konbib.nl/dutchess/ EELS, http://www.ub.lu.se/eel/ European Link Treasury, http://www.en.eun.org/news/european-link-treasury.html Information Quality WWW Virtual Library, http://www.ciolek.com/WWWVL-InfoQuality.html Internet Detective, http://www.sosig.ac.uk/desire/internet-detective.html Länkskafferiet (Link Larder), http://lankskafferiet.skolverket.se/information/kvalitetskriterier.html PICS Home Page, http://www.w3.org/PICS/ RDF Home Page, http://www.w3.org/RDF/ Scout Report, http://scout.cs.wisc.edu/index.html SOSIG, http://www.sosig.ac.uk/ J. Alexander & M. A. Tate, Evaluating Web Resources, C. Armstrong, 'Metadata, PICS and Quality', Ariadne Issue 9. 1997 N. Auer, Bibliography on Evaluating Internet Resources D. Brickley, T. Gardner, R. Heery & D. Hiom, Recommendations on Implementation of Quality Ratings in an RDF Environment. A. Cooke, Finding Quality on the Internet: a guide for librarians and information professionals, |
|
| Credits
|
|
|
Chapter author: Emma Place |
|
|
|
||||
|
||||
|
Introduction
|
|
|
Subject gateways should aim to describe the best resources that the Internet has to offer in their field and for their target audience. They need to:
Finding high quality resources on the Internet can be a time-consuming job - which of course, is exactly why gateways exist - to save the end-user some of the time and commitment required to discover and retrieve high quality information on the Internet. Locating resources to add to your gateway will require one of the biggest investments of staff time and effort, and so it is important to find efficient and effective methods of working at this task:
|
|
|
Resource discovery issues for gateway managers
|
||||||||||||||||
|
Gateway managers will need to provide the systems and strategies to support efficient resource discovery within their team. Resource discovery is labour-intensive and efficient strategies can help to maximise the number of resources added to the gateway. This section suggests some of the systems that managers can put in place to support efficient resource discovery within the team:
1. Avoiding duplicated effort Duplicated effort can be wasted effort. There are issues of duplication:
Avoid duplication with other gateways It is worth finding out whether other gateways already describe Internet resources in your field. If there are other gateways you have to ask yourself whether it really makes sense to spend time and effort cataloguing the same resources twice. If existing gateways are already describing resources relevant to your users you should consider:
Avoid duplication within your team Time can be wasted if members of your team are all trawling the same sources. Consider developing a team strategy for resource discovery. For example by:
2. Find the right people for the job It will be financial and political considerations which determine whom you can take on to do the job of resource discovery, as with recruiting staff for cataloguing.
Volunteers? Pros: may be cheap and plentiful Cons: may be inconsistent and unreliable in their contribution and it may be difficult to find volunteers with the subject expertise to select the high quality resources you want Subject specialists? Pros: may know of the best sources to use to discover relevant resources for your gateway and should be able to assess resources effectively, given their subject knowledge. Cons: may be expensive, short of time, difficult to recruit and unable or unwilling to spend time cataloguing Librarians/information professionals? Pros: have training in selecting resources to meet the information needs of users and also may be able to catalogue resources in addition to selecting them, since they may have training in cataloguing/information retrieval issues. Cons: may be expensive/difficult to recruit
3. Provide training in resource discovery The Internet is always growing and changing, so there are always new tips and hints to be learned in Internet resource discovery - training staff can improve skills and effectiveness. Training may include:
4. Set up support systems for resource discovery staff The following are ideas for support systems for resource discovery staff:
5. Set up systems to encourage your user community to suggest resources Why not let the resources come to you! Encourage your users to send you details of any sites which they think should be added to the gateway. You will need:
|
||||||||||||||||
|
Resource Discovery Strategies for Staff
|
|||||||||||||||||||||||||||||||||||||
|
Gateway staff do the 'leg work' for SOSIG users - joining the lists, monitoring the sites and doing the searches that many users do not have the time to do, filtering out items that are of poor quality or irrelevant to the users. It's easy to waste time when surfing the Internet - gateway staff need to develop efficient and effective strategies for locating high quality Internet resources. Some strategies are suggested below. Resource discovery tools and methods
1. Browsing strategies One of the richest sources of resources will be existing Web pages - especially authoritative ones in your field which list related or recommended resources. Trawling these sites is the equivalent of citation pearl-growing or snowballing, traditionally done by researchers looking for references - if they find one useful resource, they will follow the references from that resource to find others. Trawling home pages of known experts If you know of experts in your field, do a search to see if they have their own Web page. You may find that:
Bookmark any that look as if they may be developed over time, so that you can check them again in the future. Trawling organisational home pages Many organisations now have their own Web sites. These can be useful in two ways:
Consider which organisations are relevant to your audience and try to keep in touch with developments concerning them.
If you are creating a gateway for an academic audience then it can pay to monitor university Web pages. Look for:
Trawling subject-based sites Many sites have a section of 'links' which can be mined for new resources. The better quality the original site, the better the related links are likely to be:
2. Mailing lists and their archives Joining and monitoring email lists/checking mailing list archives People often use email lists to announce new resources they have made available on the Internet. You have two possible strategies here:
Subject-based lists If you can find a list that is relevant to your subject area and audience, you have a rich source. In the early days it's worth doing a search for relevant lists and asking colleagues to recommend them.
Generic email lists that announce new Internet sites A number of email lists exist to alert people to new Internet sites. Be warned - these lists can be prolific! 3. Distribution lists and current awareness services Internet current awareness services come in different forms and are becoming more sophisticated. Free email subscription services will send you updates, bulletins and email publications on a regular basis. It may be worth subscribing to services that are run by key individuals or organisations in your subject area. Other services are emerging where you can create your own personal profile on the Web, which the service then uses to email you incoming information that is likely to interest you.
4. Search tools Searching the Internet can be time-consuming, since many of the search tools retrieve huge numbers of hits which take a lot of time to work through. However, searching can be a good strategy in some cases:
In our experience, search engines can be a waste of time if broad search terms such as 'social psychology' are used. Highly focused searching based on known sources, however, can be fruitful. For example, if you have a list of well-respected journals or organisations in your field, you could search for them by name, to see whether they have a presence on the Internet. A number of hints for finding the leads for focused searching are recommended:
Search Engines These are good for finding LOTS of information and for finding very precise pieces of information (so if you know exactly what you're after they can be very effective).
Be aware that search engines change over time and that different ones are more effective for searching for different types of information - do some research to find the best one for your needs. Bookmark complex searches so that you can run them again periodically to see if anything new has appeared.
5. Newsgroups and discussion forums Internet discussion forums are a powerful and fun way to communicate with people around the world who are interested in the same things as you. Thanks to the Internet's rapid growth and the exploding popularity of the World Wide Web, people from all walks of life now participate on a regular basis.
6. URL-minders and Web agents Some free Web services exist that help you to monitor changes made to Internet resources or to inform you of new sites that might interest you. You register the URLs of the sites you wish to monitor or search queries you would like to have done and the service sends you an email whenever a change is made to these resources or the search yields new results.
Remember that these are automated services and will not always yield high quality results.
7. Non-Internet sources You don't have to use the Internet to learn about Internet sites. Consider using non-Internet sources:
|
|||||||||||||||||||||||||||||||||||||
|
Issues for new gateways
|
|
|
New gateways may have different priorities for resource discovery from mature gateways as they will be focussing on developing a core collection very quickly. New gateways may want to consider the following issues:
|
|
|
Issues for mature gateways
|
|
|
Mature gateways will have already developed a core collection and may have widened their scope. Staff will need to adjust their resource discovery strategies in line with this. Mature gateways may consider the following issues:
|
|
|
Glossary
|
|
|
DutchESS Dutch Electronic Subject Service |
|
| References
|
|
|
College and University Home Pages (world-wide), http://www.rirr.cnuce.cnr.it/universities/univ.html Dejanews, http://www.dejanews.com/ The Directory of Scholarly and Professional E-Conferences, http://www.n2h2.com/KOVACS/ DutchESS, http://www.konbib.nl/dutchess/ EEVL, http://www.eevl.ac.uk/ EUNI - List of European Universities, http://www.ensmp.fr/~scherer/euni/euni_list.html The Informant, http://informant.dartmouth.edu/ Library and Related Sources, http://www.exeter.ac.uk/~ijtilsed/lib/wwwlibs.html Liszt, http://www.liszt.com/ Mailbase, http://www.mailbase.ac.uk/ Mind-it, http://mindit.netmind.com/ NewJour: Recent Issues, http://gort.ucsd.edu/newjour/nj2/ Search Engine Corner, http://www.ariadne.ac.uk/issue19/search-engines/ Search Engine Watch, http://searchenginewatch.com/ Manchester Metropolitan University's Department of Information and Communications Search Tools, http://www.mmu.ac.uk/h-ss/dic/main/search.htm The Social Science Research Grapevine, http://www.grapevine.bris.ac.uk/ SOSIG, http://www.sosig.ac.uk What's New in WWW Social Sciences Online Newsletter, http://www.mmu.ac.uk/h-ss/dic/main/search.htm 'What's New' on the Web server of the European Union, http://europa.eu.int/geninfo/whatsnew.htm A. S. McNab & I. R. Winship, How to find out about new resources on the Internet, The New Review of Information Networking (1995), 147-53. Association of Public Data Users and International Association for Social Science Information Service and Technology (IASSIST), Strategies for Searching for Information on the Internet. TERENA & M. Isaacs, Internet Users' Guide to Network Resource Tools, Addison Wesley Longman: 1998 E. Worsfold, Finding Internet resources for SOSIG - strategies and sources, 1997 |
|
| Credits
|
|
|
Chapter author: Emma Place |
|
|
|
||||
|
||||
|
Introduction
|
|
|
Information gateways are characterised by their creation of third-party metadata records - individual descriptions of Internet resources held in a database that have separate fields for different attributes of the resources, such as title, author, URL etc. These resource descriptions are used to:
Gateways adopt the approach where metadata is created by a third party ie. an independent subject specialist or information professional, rather than the creator of the resource. This enables the quality control for which gateways are renowned - the resource descriptions all assume a standard format and are generated manually (at least in part) to enable high quality metadata that benefits for semantic judgements about the nature and origin of the resources. The metadata created by gateways is their greatest asset - adding value to the Internet resources by creating independent, standardised third-party descriptions. The decision of which metadata format to use is an imporatnt one as it impacts on the searching capabilities of the gateway and the value of the descriptions to the end-users. The creation of metadata will be one of the most time-consuming tasks in running a gateway and so a balance between value and cost may be required in deciding on a format. This chapter will introduce some of these issues and provide some background information that information gateway managers will need to consider when choosing a metadata format for their gateway. |
|
|
Why create metadata records?
|
|
|
Information gateways are services that give access to networked resources in particular subject areas, linguistic domains, and so on. Many Internet portals simply comprise of sets of Web pages with lists of hyperlinks on a static Web page, perhaps with annotations, however, this approach has distinct disadvantages:
Gateways take advantage of database technologies which gets over both these problems, but requires that a standard format be used for creating and storing the resource descriptions. Metadata formats are structured formats for Internet resource descriptions. For gateways, the metadata fomats are the forms or templates that need to be filled in by the cataloguers to create a resource description. The use of metadata by an information gateway has many benefits over the simple HTML list approach, for example:
|
|
|
Metadata attributes
|
||||||||||
|
Gateways staff will need to agree on the attributes of an Internet resource that they wish to describe. Metadata can be grouped into various kinds according to their use within the gateway. They might include: Descriptive Descriptive metadata contain information which may be usefully returned from a search of the gateway. A user may be able to decide from this information whether it is worth spending time looking at the resource itself.
Subject Subject metadata can facilitate effective searching. They can also be used to organise the browsing structure of your gateway. A fuller discussion can be found in the
Administrative Administrative metadata are intended primarily to assist the gateway staff in maintaining the gateway. They are of less concern to users and may not be visible to them; however, they can be used, for example, to check that resource descriptions are still current.
Consideration of which particular administrative functions are required and an assessment of which particular administrative metadata elements are needed will be an important part of choosing (or adapting) a metadata format for use in a particular information gateway. Core metadata The possible metadata fields listed above are by no means exhaustive, but including them all would require considerable effort both in initial cataloguing and in keeping records up to date. Not all of them might be appropriate to your gateway. Attempts have been made to define standards for a 'core' of metadata which should be regarded as a bare minimum. One such standard is the Dublin Core.
ROADS offers a number of metadata templates designed for different types of Internet resources. Each template contains attributes specific to the type of Internet resource. For example, the template for describing a mailarchive will have a different set of fields from the template for describing a Web document. ROADS also maintains a 'template registry' where the metadata fields used in the various kinds of ROADS templates are recorded. This ensures that ROADS services are potentially interoperable in this area. New fields can be nominated for addition to the registry.
|
||||||||||
|
Choosing metadata attribites
|
|
|
You should think carefully about which metadata attributes your gateway is going to use, and their format, when you first set up the gateway. If you do not, you may find yourself constrained by the absence of useful metadata, or have to add a new metadata field or convert an existing field to a different format when you already have several thousand resources in your database. Moreover, decisions about metadata will in turn affect the design of your interface (especially the parts of it used for cataloguing and/or submitting new resources for consideration). Which metadata fields could be usefully searched on by your users? You should consider your potential user community and also the nature of the resources which your gateway will cover. For example, if your gateway is intended to cover only geographically local resources in one language, a 'language' field will not be very informative unless your gateway is going to be cross-searched with others elsewhere. And how are they going to search them? This will affect not only what metadata fields you provide but also the cataloguing rules you adopt. For example, if you are ranking searches by the frequency of the occurrence of the search term, you may wish to make descriptions similar in length, otherwise resources with long descriptions may be more likely to returned high up the order.
Which metadata fields will be displayed to the users of the gateway? Will they need to be converted from the form in which they are stored and if so does an easy way of converting them exist? Which metadata fields will be used for housekeeping by the gateway staff and how? Metadata can supply information for partially automating this otherwise laborious aspect of gateway management. For example, you can have an automatic email sent to maintainers of resources occasionally to ask whether they have made any changes, or set a web-page tracking tool to monitor changes to resources. Which if any are optional? If you are collaborating (or thinking of it), which metadata fields will be shared with your collaborators? Are they likely to want extra information, such as language, which you would not otherwise include in your metadata? You will need to use the same schemes for e.g. classification or have a usable crosswalk to convert between schemes. You should also think about the issue of copyright.
Are you going to display your metadata in the same format as that in which you store it? If not, you will need a way of converting between formats. Can any of the software you are using generate useful metadata? For example, ROADS automatically records when a template was last updated. You may wish to use in addition software for creating metadata (see below). Harvesting software, if used, may also be able to harvest metadata.
Who will generate metadata fields (and which ones?). Metadata may be supplied by:
How much cross-checking will there be? (Time will need to be allowed for this). If you are allowing gateway users or information providers to submit resources, what information should they supply? What information may they also supply optionally? How important is it that (for example) descriptions or keywords are consistent across the gateway? If this is important, can you supply cataloguing rules or other guidance to help information providers and others who are submitting resources? How much effort can be expended on editing their contributions, given that gateway users and information providers cannot be compelled to follow your cataloguing rules?
How might you ensure that information such as dates is in a consistent format? Possible methods include:
In what language are your metadata records going to be kept? If this is different from the language of some of your resources, are you going to make any provision for searching in that language (e.g. an 'alternative title' field)? |
|
|
Standard metadata formats
|
|||||||||||||||||
|
Information gateway managers will need to make decisions about which metadata format (or formats) to use within their service at a very early stage of its development. At present, however, the existence of a large and varied range of metadata formats and initiatives complicates these decisions. It is worth remembering also that the choice of metadata formats will often be influenced by other factors, both technological and social. For example, an information gateway that wishes to use the ROADS software toolkit with little modification will currently need to use the ROADS template format, or something very similar to it. Again, where gateway cross-searching or interoperability is seen to be important, there may be technical reasons why one format may have advantages over another. The nature of metadata development means that at any one time there are likely to be a variety of formats that could be chosen as the basis of an information gateway. For example, a review of metadata formats undertaken under DESIRE I identified and described over twenty formats that were in use (or under development) in 1996 (Dempsey et al., 1997). In order to help analyse the different metadata formats described in the review, the DESIRE I study produced a typology of metadata based upon their underlying complexity.
Figure 1. Typology of metadata formats (adapted from Dempsey and Heery, 1998). |
|||||||||||||||||
|
Choosing a metadata format
|
||||||||||
|
Choosing a format from the variety of existing ones will depend upon various factors. In general, current information gateways tend to use relatively simple generic formats with some structure ('Band Two' formats such as ROADS templates or Dublin Core). These formats have the twin advantages of simplicity, which means that they are relatively easy to create and maintain, and the existence of some structure, which facilitates both interoperability and format conversion. However, in particular circumstances there may be good arguments for basing an information gateway on more complex formats ('Band Three' formats such as MARC or TEI headers) if this offers some competitive advantage to the gateway. For example, the USMARC format has been used for the cataloguing of Internet resources in the InterCat project and it would be possible to set up MARC-based information gateways. However, the use of these more complex formats may have implications for the level of expertise (technical and other) that would be required for cataloguing and may have other costs. As noted before, the choice of a particular format may be dictated by technological or social factors. For example, particular gateway software may dictate the use (or non-use) of particular formats. Information gateways that, for example, are running the ROADS software without much modification will need either to use one of the existing templates defined by the ROADS project or to create new (and similar) templates in the form of attribute-value pairs. Example format 1: Dublin Core The Dublin Core (DC) is the result of an international and interdisciplinary initiative to define a core set of metadata elements for electronic resources, primarily for resource discovery on the Internet. DC was initially conceived as a simple format that could be used for author-generated descriptions of Web resources. However, the format has also attracted the attention of resource description professionals from a variety of communities such as libraries, museums, archives and government agencies.
The format has been developed by means of a series of invitational workshops, the first being held in Dublin, Ohio in March 1995. The workshop series and related work has resulted in the definition of fifteen core metadata elements as RFC 2413 (Weibel et al., 1998). These elements are intended to be repeatable and extensible in any application. The initial focus of DC was the Web, so the initiative has concentrated on the production of draft guidance for the encoding of DC elements, first in HTML (Kunze, 1999) and more recently in XML/RDF (e.g. Miller, Miller and Brickley, 1999).
Example format 2: ROADS templates ROADS templates are a development of the IAFA templates originally developed for anonymous FTP archives (Deutsch et al., 1994). IAFA templates are a simple text-based metadata format consisting of predefined sets of attribute-value pairs. Templates exist for a number of different resource types, but the templates most commonly used in existing ROADS-based gateways are those designated SERVICE, DOCUMENT and MAILARCHIVE.
|
||||||||||
|
Format conversion
|
||||
|
One of the advantages of using well-defined and structured metadata formats is that this allows conversion into other formats when necessary. This is useful in two main circumstances:
Format conversion is facilitated by the creation of crosswalks (or mapping tables) between metadata formats. Crosswalks can be used as the basis for the production of a specific conversion program or for the production of search systems that would permit the interrogation of heterogeneous metadata formats. A number of metadata format crosswalks have been published. One of the earliest DC-based crosswalks mapped Dublin Core to USMARC (Caplan and Guenther, 1996) and other crosswalks exist for other formats including Text Encoding Initiative (TEI) headers, ROADS templates and a variety of MARC formats, including the Universal MARC format (UNIMARC). A collection of metadata mappings is maintained on the UKOLN Web site (Day, 1996).
|
||||
|
Future proofing
|
|
|
Any choices concerning metadata will need to take into account possible future developments. The gateway may decide to expand by including new types of descriptions (possibly for new types of resource such as images or multimedia) or to include additional metadata (such as descriptions aimed at alternative audiences, rights metadata, digital preservation data). At the simplest level, updates and extensions to existing metadata element sets need to be accommodated. The gateway may want to ensure that:
Within the lifetime of the gateway, it may have to migrate to a different system which will require different metadata formats, whether these are new versions of existing formats or completely different. Re-structuring the metadata can be done more efficiently if the gateway follows some general guidelines for the content of metadata. Such guidelines might include recommendations that:
|
|
|
Conclusions
|
|
|
Choosing a metadata format is one of the most important decisions that needs to be made when setting up an information gateway. It is vital that the format is able to work with the software that forms the basis of the gateway service and it should also contain all fields (including administrative metadata) that have been identified as appropriate for the service in question (or the format should be extensible). It is possible that ongoing changes in technologies may require periodic conversion of the gateway database into new formats. This process will require the production of metadata crosswalks and/or format conversion programs. |
|
| References
|
|
|
BIBLINK, http://hosted.ukoln.ac.uk/biblink/ d2m, http://www.bibsys.no/meta/d2m/ DC-dot, http://www.ukoln.ac.uk/cgi-bin/dcdot.pl Dublin Core, http://purl.oclc.org/dc EdNA, http://www.edna.edu.au/EdNA/ InterCat, http://purl.org/net/intercat ROADS, http://www.ilrt.bris.ac.uk/roads/ P. L. Caplan & R. S. Guenther, 'Metadata for Internet resources: the Dublin Core Metadata Element Set and its mapping to USMARC', Cataloging and Classification Quarterly 22 (3/4) (1996), 43-58. M. Day, Interoperability between metadata formats (Bath: UKOLN, 1996). M. Day, Mapping BIBKLINK Core (BC) to UNIMARC. BIBLINK project document (Bath: UKOLN, 10 September 1998). M. Day, R. Heery & A. Powell, 'National bibliographic records in the digital information environment: metadata, links and standards', Journal of Documentation 55 (1) (1999), 16-32. L. Demspey & R. Heery, 'Metadata: a current view of practice and issues', Journal of Documentation 54 (2) (1998), 145-172. L. Demspey, R. Heery, M. Hamilton, D. Hiom, J. Knight, T. Koch, M. Peereboom & A. Powell, A review of metadata: a survey of current resource description formats (DESIRE deliverable D3.2 (1), March 1997). P. Deutsch, A. Emtage, M. Koster & M. Stumpf, Publishing information on the Internet with Anonymous FTP (Internet-Draft, September 1994). J. Hakala, P. Hansen, O. Husby, T. Koch & S. Thorborg, The Nordic Metadata Project: final report (Helsinki: Helsinki University Library, July 1998). R. Heery, 'Review of metadata formats', Program 30 (4) (1996), 345-373. R. Iannella & D. Campbell, The A-Core: metadata about content metadata (Internet-Draft, 21 June 1999). J. Kunze, Encoding Dublin Core Metadata in HTML (Internet-Draft, 25 May 1999). O. Lassila & R. Swick, eds., Resource Description Framework (RDF) model and syntax specification (W3C Working Draft, 1999). Making of America project, The Making of America II testbed project white paper (Version 1.03, March 16 1998). E. Miller, P. Miller & D. Brickley, eds., Guidance on expressing the Dublin Core within the Resource Description Framework (RDF) (Dublin Core Metadata Initiative, Draft Proposal,1999). S. Weibel, J. Kunze, C. Lagoze & M. Wolf, RFC 2413, Dublin Core metadata for resource discovery (Internet Engineering Task Force, Network Working Group, September 1998). S. Weibel, 'The State of the Dublin Core Metadata Initiative', D-Lib Magazine 5 (4) (April 1999). S. L. Weibel & C. Lagoze, 'An element set to support resource discovery: the state of the Dublin Core', International Journal on Digital Libraries, 1(2) (January 1997), 176-186. |
|
| Credits
|
|
|
Chapter author: Michael Day |
|
|
|
||||
|
||||
|
Introduction
|
|
|
The role of cataloguing rules or guidelines is to specify how the content of a metadata format is entered. Once a metadata format has been chosen, consideration should then be given to how this metadata should be entered into the information gateway database and a set of cataloguing rules prepared. One of the key roles of Internet subject gateways is the creation of descriptive metadata about networked resources which can be used as a basis for searching and browsing the gateway. These descriptions can also help gateway users to identify whether the resources are really what they need, potentially saving them a considerable amount of time browsing through the limited amounts of information available elsewhere on the Internet (Sha, 1995, p. 467). Therefore, one of the most important (and time-consuming) activities for a subject gateway will be the provision of these descriptions. This is the activity generally known as 'cataloguing' and is one of the key tasks of any information gateway. |
|
|
Background
|
|
|
Cataloguing can be defined as the creation of surrogate records which can be used to facilitate the identification, location, access and use of resources (Levy, 1995). These descriptions are usually created in accordance with certain standards (cataloguing rules and metadata formats) and will often include additional features such as classification, subject analysis and authority control (Dillon and Jul, 1996, p. 198, Bryant 1980). These tools and standards were originally developed for the cataloguing and indexing of traditional - mostly printed - collections. However, many of them have been revised to take account of resources based on newer technologies. Recent developments include: 1. ISBD(ER). In 1997, the IFLA Universal Bibliographic Control and International MARC Programme (UBCIM) published a revision of ISBD(CF) for 'Computer Files' for both online and off | |