Project Number:

RE 4004 (RE)

Project Title:

DESIRE II - Development of a European Service for Information on Research and Education II

Deliverable Type:

PU

Deliverable Number:

M3.3b

Contractual Date of Delivery:

November 1999

Actual Date of Delivery:

November 1999

Title of Deliverable:

Pre-release of DESIRE Integrated Toolkit

Workpackage(s) contributing to the Deliverable:

WP3

Nature of the Deliverable:

PR

Author:

Tracey Hooper

Contact Details:

Institute for Learning and Research Technology
University of Bristol
8-10 Berkeley Square
BRISTOL
BS8 1HH
Tel: +44 (0)117 928 7193
Fax: +44 (0)117 928 7112
Email: t.a.hooper@bristol.ac.uk

Other Authors:

Tim Dixon, TDC Networking Consultancy Limited

URL

http://www.desire.org/html/research/deliverables/D3.3/m3.3b.html


Abstract

This Milestone is an interim report describing a limited peer-reviewed pre-release of the toolkit, available from November 1999 as a precursor to the final DESIRE toolkit, to be available by April 2000. During the course of the DESIRE project, a number of different software components are being produced. In many cases, these build upon previous work (such as the ROADS toolkit), or on work undertaken in phase one of the project (eg Combine). Phase two of DESIRE includes a number of strands that share common software requirements. The purpose of this toolkit is to provide a supporting environment and framework to allow collective and ongoing development of this software, and to provide both an environment and framework for their maintenance beyond the lifetime of DESIRE-II.

Keywords

Software
Toolkit
ROADS
Metadata
Web Indexing


Distribution List:

DESIRE Project Team; European Commission

Issue:

V0.3

Reference:

Pre-release of DESIRE Integrated Toolkit

Total Number of Pages:

10


Document Control

Issue Number

Issue Date

Reason for Change

0.1

4/11/99

(Initial Draft)

0.2

9/11/99

Early draft issued for comment

0.3

18/11/99

Second draft issued for comment

1.0

19/11/99

Incorporates comment on draft; version released for peer review

1.1

9/12/99

Incorporates comments from peer review

Executive Summary

This Milestone is an interim release of the DESIRE toolkit scheduled to be available in its complete form in April 2000.

During the course of the DESIRE project, a number of different software components are being produced. In many cases, these build upon previous work (such as the ROADS toolkit), or on work undertaken in phase one of the project (eg. Combine). Phase two of DESIRE includes a number of strands that share common software requirements. The purpose of this toolkit is to provide a supporting environment and framework to allow collective and ongoing development of this software, and to provide both an environment and framework for their maintenance beyond the lifetime of DESIRE-II.

There are already a number of 'resource discovery', metadata and Web indexing themed software toolkits in existence. There are also projects such as the Advanced Search Framework (ASF) which draw together a number of smaller components to produce services and applications. ROADS and Combine, are good examples of toolkits which themselves include a number of sub-components, while also usefully serving as components of large applications.

Many of the developments in DESIRE phase two build upon these software distributions. Rather than risk duplication by creating a monolithic 'DESIRE software distribution' for the DESIRE toolkit, we instead adopt a decentralised model. ROADS and Combine have an existence beyond the scope of the DESIRE project. Through the creation of the DESIRE software toolkit, we aim to bring these diverse components together under a broad umbrella.

In order to provide a resource that will have a continued applicability beyond the lifetime of the project, the tools themselves will be used to construct an online system that will allow developers to contribute, select and check the compatibility of a wide variety of software tools, initially drawn from the DESIRE environment, but extensible to other relevant tools in the future.

Scope Statement

This is an interim document, covering the high-level design of the toolkit and introducing some of the tools developed in various Work Packages of the DESIRE II project.


Glossary

Centroid

A summary of the information contained in a whois++ index used to guide the routing of information queries to the likely source of the information being sought.

CIP

Common Indexing Protocol (RFC 2651, RFC 2652, RFC 2653)

IAFA

Internet Anonymous File Archive

LDAP

Lightweight Directory-access Protocol (RFC 1777)

RDF

Resource Description Framework

TIO

A Tagged Index Object for use in the Common Indexing Protocol (RFC 2654)

URL

Uniform Resource Locator (RFC 1738)

Whois++

A network information lookup service and associated protocol (RFC 1834, RFC 1385, RFC 1913, RFC 1914)

1

1 Toolkit Framework

Although, ultimately, the value of the work done in DESIRE II is to be found in the quality and usefulness of the software tools themselves, their worth can only be exploited if the developers of information systems can readily locate the tools they need to build their applications. Since the tools developed in the DESIRE projects cover a wide variety of different application areas, and are designed to complement tools available from other sources, the construction of a specific application may require the selection of several tools from different sources.

The DESIRE toolkit will ultimately use the DESIRE tools themselves to provide an information resource describing the available tools and the possible combinations in which they may be employed. Using metadata elements specifically selected to encapsulate they key features of the software tools, the “toolfinder” interface to the system will allow developers to find the set of tools that most closely matches their requirements.

1.1 Categorisation of Tools

To assist in the selection of tools, all components of the toolkit will be described and categorised in a standardised fashion which captures information about the nature and availability of the component and the possible ways in which it may be combined with other components. Each relevant characteristic of a component is named an Attribute. Each attribute may occur exactly once (indicated by 1 in the table below) or may occur zero or more times (indicated by * in the table below) or may occur 1 or more times (indicated by +). Each attribute may have a single value (such as the name of the component) or may consist of a list, each item of the list describing a different aspect of the attribute’s value. In the table below, lists are shown in the conventional manner – items grouped in parentheses and separated by commas.

The value of each attribute is interpreted according to the syntax shown below. Most attributes consist of text, but the values of others are constrained. Syntax constraints noted below include:

Category A high-level classification of the nature of the software tool: (resource description, resource search, resource retrieval, metadata conversion, metadata management, metadata inference, protocol conversion, user interface)

URL The value must be a URL according to RFC 1738

Platform The value encodes the identification of an operating system or application environment and must be one of (AnyPerl, UnixPerl, Unix-Generic, Unix-Solaris, Unix-Linux, JavaScript, Win32, Win16, WinNT, MacOS)

Input The value encodes the identification of a data representation or network protocol which the tool accepts as input and must be one of (whois++, CIP, LDAP, LDIF, Z39.50, HTTP, FTP, IAFA, RDF, ROADS, centroid, DVT Tree, DVT Database, RUDOLF-RDF, TIO)

Output The value encodes the identification of a data representation or network protocol which the tool produces as output and must be one of (whois++, CIP, LDAP, LDIF, Z39.50, HTTP, FTP, IAFA, RDF, ROADS, centroid, HDB, DVT Tree, DVT Database, RUDOLF-RDF, TIO)

SupportType The value encodes an indication of the level of support available for the tool and must be one of (Unsupported, Community, Developer, Commercial)

LicenceType The value encodes an indication of the terms on which the software is made available and must be one of (Public Domain, Open Source, Educational, Commercial, Restricted Distribution)

Attribute

Occurrence

Syntax

Explanation

Name

1

Text

The name of the component

Nature

+

Category

The category or categories of function performed by the tool

Function

1

Text

A description of the component’s main function

Application

1

Text

A description of the application (or applications) in which the component might be used

Example

*

(URL, Text)

Examples of online services that have been built with this tool – the URL at which the example can be found and a text description of the key features of the example

Origin

1

Text

A description of the individual or organisation providing the component

Licence

1

(LicenceType, Text)

The legal terms relating to use of the component

Support

+

SupportType or (SupportType, URL)

The type of support available and, optionally the URL of a Web page or mailing list from which support may be obtained

InstallGuide

*

URL

Location(s) at which information can be found relating to the installation of the tool

InstallKit

*

(Platform, URL)

For each operating system or other platform for which software is available online, a pair of values containing the platform type and the URL at which the software for that platform may be found

Documentation

*

URL

Location(s) at which general documentation can be found

Consumes

*

(Input, Text)

For each input type supported by the tool, a pair of values specifying the input type accepted and describing the use made of the input

Produces

*

(Output, Text)

For each type of output produced by the tool, a pair of values specifying the output type produced and describing the output

2 Contents of Interim Toolkit

The preliminary toolkit consists of the tools listed in this section. Each has been categorised according to the description scheme set out above.

2.1 ROADS

Attribute

Value

Name

ROADS

Nature

resource description, resource search, resource retrieval, metadata conversion

Function

ROADS is a system which stores metadata describing information resources in the form of IAFA templates. Resource descriptions can be searched via locally-generated Web pages, or remotely using the whois++ protocol. Resource descriptions can be amalgamated and exchanged with other retrieval systems in the form of centroids or using the Common Indexing Protocol.

Application

ROADS is particularly suited to the construction of manually-maintained catalogues of information resources which require a consistent approach to the categorisation or rating. ROADS is commonly used to build Subject-Based Information Gateways.

Example

http://www.sosig.ac.uk/
The Social Science Information Gateway

Origin

Department of Computer Science at Loughborough University of Technology

Licence

Open Source
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Support

Developer
mailto:roads-liaison@bristol.ac.uk

InstallKit

UnixPerl
ftp://ftp.roads.lut.ac.uk/pub/ROADS/roads-v2.3.tar.Z

InstallGuide

http://www.roads.lut.ac.uk/v2/Manual/manual-1.html

Documentation

http://www.roads.lut.ac.uk/v2/Manual/

Consumes

whois++
Processes queries expressed using the whois++ protocol

IAFA
Can generate centroids from its database of IAFA templates.

Produces

whois++
Responds to queries using the whois++ protocol

centroid
Generates RFC 1913 centroids for both Harvest and Z39.50 servers

IAFA
Records resource descriptions in the form of IAFA templates. The distribution includes default templates for several types of common resources, including; Documents, FAQs, Images, Sounds, Video, Mail Archives, Information about Organisations, Information about Projects, Online Services, Software, Training Materials, News Groups.


2.2 ROADS Dublin Core Metatadata Repository (‘DC in a box’)

Attribute

Value

Name

ROADS Dublin Core Metatadata Repository (‘DC in a box’)

Nature

resource description, resource search, resource retrieval, metadata conversion

Function

An extension to the ROADS software toolkit to allow the creation and storage of Dublin Core metadata.

Application

Dublin Core elements are mapped to ROADS IAFA attributes for storage to allow interoperability with other ROADS-based services and so that various ROADS scripts continue to work without modification.

The cataloguer and end user see only the mapped views of the metadata represented as a DC element set.

Included is a script to generate ‘on the fly’ Dublin Core in RDF representations of the stored records (wpp2qualdc.pl) and also scripts to inform the content providers of the metadata.

The content providers are encouraged to create a relationship between their resource and the corresponding metadata record using the HTML ‘link’ tag to call the wpp2qualdc.pl script.

Example

http://edward.ilrt.bris.ac.uk/dciab/

Origin

ILRT

Licence

Open Source
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Support

Developer
paul.hollands@bris.ac.uk@bristol.ac.uk

InstallKit

UnixPerl
http://roads.opensource.ac.uk/dciab/dciab.tar.gz

InstallGuide

http://roads.opensource.ac.uk/dciab/

Documentation

http://www.roads.lut.ac.uk/v2/Manual/

Consumes

whois++
Processes queries expressed using the whois++ protocol

IAFA
Can generate centroids from its database of IAFA templates.

Produces

whois++
Responds to queries using the whois++ protocol

centroid
Generates RFC 1913 centroids for both Harvest and Z39.50 servers

IAFA
Records resource descriptions in the form of IAFA templates. The distribution includes default templates for several types of common resources, including; Documents, FAQs, Images, Sounds, Video, Mail Archives, Information about Organisations, Information about Projects, Online Services, Software, Training Materials, News Groups.

Dublin Core

RDF


2.3 Combine

Attribute

Value

Name

Combine

Nature

resource retrieval

Function

Combine is a robot for harvesting of Web resources, and is designed to be distributable, parallel and flexible. It is distributable in the sense that different parts of Combine can run on separate computers. Parallel, meaning that some parts of a Combine system can exist in several instances to increase performance. Flexibility is achieved by the system being built by putting together small and relatively simple building blocks in a way that is modifiable by the user.

Application

Combine can be used to produce distributed, federated and regional Web indexes.

Example

http://safari.hsv.se/index.html.en
A distributed search engine carrying metadata-labelled research information.

Origin

Lund University Library NetLab

Licence

Open Source
Copyright (c) 1996-1999 LUB NetLab

Support

Developer
mailto:sigfrid.lundberg@lub.lu.se

InstallKit

UnixPerl
http://www.lub.lu.se/combine/dist/combine-v1.1-src.tar.gz

InstallGuide

http://www.lub.lu.se/combine/docs/uguide.html

Documentation

http://www.lub.lu.se/combine/docs/uguide.html

Consumes

HTTP
Retrieves documents via HTTP

Produces

HDB
Harvest database (see documentation for details)


2.4 DESIRE Vocabulary Database

Attribute

Value

Name

DESIRE Vocabulary Database

Nature

Metadata management

Function

The vocabulary database tools allow the construction and manipulation of databases of specialist vocabularies which can then be used to facilitate browsing in user interfaces to search services.

Application

The vocabulary tools can be used to build systems for searching across a number of Internet data stores while traversing a vocabulary database.

Example

http://safari.hsv.se/browse/structure/root.html.en
Safari

Origin

Lund University Library NetLab

Licence

Open Source
Copyright (c) 1996-1999 LUB NetLab

Support

Developer
mailto:sigfrid.lundberg@lub.lu.se

InstallKit

UnixPerl
http://www.lub.lu.se/combine/dvt/lib/Vocabulary.pm.gz
http://www.lub.lu.se/combine/dvt/storeVocabulary.pl.gz
http://www.lub.lu.se/combine/dvt/printVocabulary.pl.gz

InstallGuide

http://www.lub.lu.se/combine/dvt/

Documentation

http://www.lub.lu.se/combine/dvt/

Consumes

DVT Tree
A tree-structured representation of a specialist vocabulary

DVT Database
An internal representation of a vocabulary tree

Produces

DVT Tree
A tree-structured representation of a specialist vocabulary

DVT Database
An internal representation of a vocabulary tree


2.5 DESIRE Vocabulary Browser

Attribute

Value

Name

DESIRE Vocabulary Browser

Nature

User interface

Function

The vocabulary browser tools allow interaction between a search interface and a vocabulary database constructed with the DESIRE vocabulary database tools.

Application

The vocabulary tools can be used to build systems for searching across a number of Internet data stores while traversing a vocabulary database.

Example

http://safari.hsv.se/browse/structure/root.html.en
Safari

Origin

Lund University Library NetLab

Licence

Open Source
Copyright (c) 1996-1999 LUB NetLab

Support

Developer
mailto:sigfrid.lundberg@lub.lu.se

InstallKit

UnixPerl
http://www.lub.lu.se/combine/dvt/showNode.tar.gz
UnixPerl
http://www.lub.lu.se/combine/dvt/alphabetize.pl.en.gz
JavaScript
http://www.lub.lu.se/combine/dvt/vocabulary.js

InstallGuide

http://www.lub.lu.se/combine/dvt/

Documentation

http://www.lub.lu.se/combine/dvt/

Consumes

DVT Database
An internal representation of a vocabulary tree

Produces



2.6 DESIRE Vocabulary Browser for Z39.50

Attribute

Value

Name

DESIRE Vocabulary Browser for Z39.50

Nature

User interface

Function

The vocabulary browser tools allow interaction between a Z39.50 search service and a vocabulary database constructed with the DESIRE vocabulary database tools.

Application

The vocabulary tools can be used to build systems for searching across a number of Z39.50 servers while traversing a vocabulary database.

Example

http://safari.hsv.se/browse/structure/root.html.en
Safari

Origin

Lund University Library NetLab

Licence

Open Source
Copyright (c) 1996-1999 LUB NetLab

Support

Developer
mailto:sigfrid.lundberg@lub.lu.se

InstallKit

UnixPerl
http://www.lub.lu.se/combine/dvt/browse_tools.tar.gz

InstallGuide

http://www.lub.lu.se/combine/dvt/

Documentation

http://www.lub.lu.se/combine/dvt/

Consumes

DVT Database
An internal representation of a vocabulary tree

Produces

Z39.50
Communicates with servers using the Z39.50 protocol (requires YAZ command line tool from Index Data, http://www.indexdata.dk/yaz/).


2.7 RDF Data Store

Attribute

Value

Name

RDF Data Store

Nature

Metadata management

Function

The RDF data model presents a flexible, expressive approach for representing structured data for the Web. This has both advantages and drawbacks. A key difference between RDF storage systems and the traditional relational approach is that with generalised RDF stores, it is necessary to anticipate the need to manage data drawing on metadata vocabularies that were unknown at the time the database was initialised. This tool provides a generalised data storage engine for RDF and defines an API which may be used to access it.

Application

Any application which needs to store or retrieve metadata in RDF format.

Example

http://www.grapevine.sosig.ac.uk/grapevine/recommender.htm
SOSIG Grapevine “Recommender” Demo

Origin

University of Bristol Institute for Learning and Research Technology

Licence

Open Source
Copyright (c) 1998-1999 University of Brisol

Support

Developer
mailto:daniel.brickley@bristol.ac.uk

InstallKit

UnixPerl
http://cvs.desire.org/cgi-bin/cvsweb.cgi/rudolf-perl/

InstallGuide


Documentation

http://rudolf.opensource.ac.uk/about/ratings/

Consumes

RUDOLF-RDF
A database representation of RDF metadata

Produces

RUDOLF-RDF
A database representation of RDF metadata


2.8 Opinion Server

Attribute

Value

Name

Opinion Server

Nature

Metadata management

Function

A simple RDF rating and recommendation server built on top of a graph-oriented RDF API.

Application

Solicits, manages and stores ratings and recommendations using the RDF graph APIs. The Opinion Server application provides mechanisms for constructing simple RDF statements about Web resources and writing those statements into an RDF store.

Example

Http://www.grapevine.sosig.ac.uk/grapevine/recommender.htm
SOSIG Grapevine “Recommender” Demo

Origin

University of Bristol Institute for Learning and Research Technology

Licence

Open Source
Copyright (c) 1998-1999 University of Bristol

Support

Developer
mailto:daniel.brickley@bristol.ac.uk

InstallKit

UnixPerl
http://cvs.desire.org/cgi-bin/cvsweb.cgi/desire-perl/

InstallGuide

To be provided

Documentation

http://rudolf.opensource.ac.uk/about/ratings/

Consumes

RUDOLF-RDF
A database representation of RDF metadata

Produces

RUDOLF-RDF
A database representation of RDF metadata


2.9 LDAP crawler

Attribute

Value

Name

LDAP crawler

Nature

Resource retrieval

Function

Gathering of LDAPv2/v3 objects

Application

The LDAP crawler is an LDAPv2/v3 directory robot that produces LDIF (LDAP Interchange Format) dumps of data in a specified Directory Information Tree (DIT). It can be used to feed centralised or distributed indexing services that, e.g., offer a single entry-point to a set of LDAP servers in an organisation or country.

Example

1. ldap://search.surfnet.nl/c=NL
LDAP address book for all of higher education and research in the Netherlands based on data gathered by the LDAP crawler. Around 150,000 entries. Works with Netscape, Microsoft Outlook and Eudora mailers.

2. http://search.surfnet.nl/naam/index.html
Web interface (in dutch) for searching the address book for all of higher education and research in the Netherlands based on data gathered by the LDAP crawler, including PGP-keys.

Origin

SURFnet

Licence

Copyright (c) 1998, SURFnet bv, the Netherlands. All rights reserved. This program may currently only be distributed among DESIRE project participants.

Support

Developer:
mailto:Henny.Bekker@sec.nl

InstallKit

Unix Tar file. Currently available ‘as is’ for DESIRE project participants.

InstallGuide

To be provided

Documentation

To be provided

Consumes

LDAP
v2/v3 objects directly from LDAP servers.

Produces

LDIF
One file per crawled server


2.10 DESIRE Generic Distributed Indexing Server

Attribute

Value

Name

DESIRE Generic Distributed Index Server

Nature

resource search, resource retrieval, metadata conversion, metadata management, protocol conversion

Function

Collects, indexes and makes available forward knowledge about resources, based on the IETF Common Indexing Protocol (CIP). The stored CIP forward knowledge objects are searchable for clients using (possibly various) search protocols.

The software includes:


  • conversion scripts to convert LDIF and whois++ Centroids into Tagged Index Objects (TIOs) of the IETF Common Indexing Protocol (CIP);
  • HTTP server scripts for storing and retrieving TIOs over HTTP
  • HTTP server scripts and a back-end database for searching stored TIOs using LDAP URLs over HTTP (may include other URLs in the future)
  • LDAPv3 native protocol server to convert LDAPv2/v3 queries into LDAP URLs over HTTP
  • Security features for safe transport of forward knowledge

Application

The Generic Index Server provides a referral based distributed indexing service that can, e.g., offer a single entry-point to a set of LDAP servers in an organisation or country, or a set of resource discovery services. The architecture is explained in tio/">http://www.surfnet.nl/innovatie/surf-ace/search/ldap/d2_ldap_tio/.

Example

http://www.sec.nl/persons/henny/desire/ldap/d2demo.html

Demonstrator page on a Distributed LDAP-index service

Origin

SURFnet, The Netherlands; University of Tuebingen, Germany.

Licence

Copyright (c) 1999, SURFnet bv, the Netherlands. All rights reserved. The software is currently under development and will be made available to DESIRE project participants, initially.

Support

Developers:
mailto:Peter.Gietz@directory.dfn.de
mailto:Henny.Bekker@sec.nl
mailto:Peter.Valkenburg@surfnet.nl

InstallKit

To be provided

InstallGuide

To be provided

Documentation

To be provided

Consumes

LDIF

TIO
Exchanged using HTTP

Centroid

Produces

LDIF
Referral

LDAP
Referral

HTTP
Referral

TIO
Exchanged using HTTP


2.11 Matcher

Attribute

Value

Name

Matcher

Nature

Metadata inference

Function

The tool implements a subject classification process using a subject-specific thesaurus which terms are intellectually mapped to categories or subject classes. The classification process is made up of several steps. First the document to be classified is fetched. From this document text is extracted, and all thesaurus terms are matched to it. Some heuristic processing rules are applied to the results from the matching process. Finally the outcome is formatted for either presentation or storing in a database.

Application

Automatic subject classification of WWW-pages

Example

http://www.lub.lu.se/desire/demonstration.html

Origin

Lund University Library NetLab

Licence

Open Source
Copyright (c) Anders Ardö 1999

Support

Developer
mailto:and@dtv.dk

InstallKit

UnixPerl

InstallGuide


Documentation


Consumes

HTTP
Retrieves documents via HTTP

HDB
Retrieves documents from HDB

Produces

RDF
Representation of classification

HDB
HDB records enhanced with classification


Title: Pre-release of DESIRE Integrated Toolkit
Issue: V0.3
Date: November 1999