Project Number:

RE 4004 (RE)

Project Title:

DESIRE II - Development of a European Service for Information on Research and Education II

Deliverable Type:

(RP)

Deliverable Number:

(M3.3)

Contractual Date of Delivery:

July 1999

Actual Date of Delivery:

July 1999

Title of Deliverable:

Progress Report on DESIRE II Integrated Toolkit

Workpackage(s) contributing to the Deliverable:

WP3

Nature of the Deliverable:

RE

Author:

Daniel Brickley, Tracey Hooper

Contact Details:

Institute for Learning and Research Technology
University of Bristol
8-10 Berkeley Square
BRISTOL
BS8 1HH
Tel: +44 (0)117 928 7096
Fax: +44 (0)117 928 7112
Email: daniel.brickley@bristol.ac.uk

Other Authors:


URL

http://www.desire.org/html/research/deliverables/D3.3/m3.3a.html

Abstract

This Milestone is an interim report to the Commission to outline progress towards an integrated DESIRE toolkit. It has been agreed that a peer-reviewed pre-release of the toolkit will be made available by the end of November 1999 and that the final toolkit will be available by April 2000. During the course of the DESIRE project, a number of different software components are being produced. In many cases, these build upon previous work (such as the ROADS toolkit), or on work undertaken in phase one of the project (eg. Combine). Phase two of DESIRE includes a number of strands that share common software requirements. The purpose of this toolkit is to provide a supporting environment and framework to allow collective and ongoing development of this software, and to provide both an environment and framework for their maintenance beyond the lifetime of DESIRE-II

Keywords

Software, Toolkit, ROADS, Metadata, Web Indexing,


Distribution List:

Commission

DESIRE Project Partners

Issue:

Version 1.0

Reference:

M3_3v1_0.doc

Total Number of Pages:

Seven


Document Control

Issue Number

Issue Date

Reason for Change

V1.0

28 July 99

Original document

V1.1

3 Aug 99

Additional demonstrators and documents added primarily D3.2 demonstrators




Executive Summary

This Milestone is an interim report to the Commission to outline progress towards an integrated DESIRE toolkit. It has been agreed that a peer-reviewed pre-release of the toolkit will be made available by the end of November 1999 and that the final toolkit will be available by April 2000.

During the course of the DESIRE project, a number of different software components are being produced. In many cases, these build upon previous work (such as the ROADS toolkit), or on work undertaken in phase one of the project (eg. Combine). Phase two of DESIRE includes a number of strands that share common software requirements. The purpose of this toolkit is to provide a supporting environment and framework to allow collective and ongoing development of this software, and to provide both an environment and framework for their maintenance beyond the lifetime of DESIRE-II.

There are already a number of 'resource discovery', metadata and Web indexing themed software toolkits in existence. There are also projects such as the Advanced Search Framework (ASF) which draw together a number of smaller components to produce services and applications. ROADS and Combine, for example, are good examples of toolkits which themselves include a number of sub-components, while also usefully serving as components of large applications.

Many of the developments in DESIRE phase two build upon these software distributions. Rather than risk duplication by creating a monolithic 'DESIRE software distribution' for the DESIRE toolkit, we instead adopt a decentralised model. ROADS and Combine have an existence beyond the scope of the DESIRE project. Through the creation of the DESIRE software toolkit, we aim to bring these diverse components together under a broad umbrella.

A series of Web pages have been developed on the DESIRE Web site which will provide sufficient information for developers to understand the workings of the various components of the toolkit, to participate in the development of the system and build applications that draw upon independently created software tools. See:

http://www.desire.org/html/research/software/

Scope Statement

This is a progress report for the DESIRE Integrated toolkit which will be delivered in April 2000 with the peer-reviewed pre-release in November 1999. The primary aim of this report is to point to demonstrators and working papers rather than describe the work itself.


1. Key Components of the Toolkit

These are software packages that have an ongoing connection with one or more DESIRE project partners, and which are being improved or adapted as part of the DESIRE workplan.

In addition to working on the packaging, presentation and collaborative development of existing tools, there is a substantial amount of new development work taking place within the DESIRE project. It is anticipated that we will produce packages for

· RDF metadata storage, manipulation and query

· Automatic classification tools

· RDF representations of thesaurus and classification schemes

· LDAP Distributed indexing tools

Many of these tools will be integrated with the software distributions above; others will be packaged as stand-alone components. The following sections provide links to background working papers plus tools and demonstrators currently in development.

2. The Combine harvesting robot

The harvesting work within DESIRE II builds on the existing EWI software and other open-source efforts, adding improved support for metadata gathering and a wider range of resource types. Using a metadata aware indexing and search system will improve resource discovery and retrieval and making it more precise. More information is available from:

http://www.lub.lu.se/combine/

3. The DESIRE Vocabulary Toolkit

The first version of the DVT was developed as a part of the Safari project, where it was used for giving support to users creating embedded metadata in pages to be harvested by the search service which is using the Combine robot. The package is written in perl and requires the Berkeley Database package. The toolkit can be used together with metadata creators (eg., the Safari tool) and search forms and is being generalized and improved within the DESIRE project:

http://www.lub.lu.se/combine/dvt/

4. Automatic Classification

In order to provide a subject based browsing interface to the robot-generated index some kind of automatic classification is needed. The goal is to structure the index using the same Ei (Engineering information Inc.) classification that is used in the quality service EELS. This will allow cross-browsing between both services.

4.1 Automatic Classification Demonstrator

The Automatic Classification demonstration page is available from:

http://www.lub.lu.se/desire/demonstration.html

A poster presentation has been put together for the ECDL99 Conference in September and provides a useful overview of the auto-classification work.

Creation and automatic classification of a robot-generated subject index, Anders Ardö, Traugott Koch NetLab. Summary of poster for ECDL99 Conference

http://www.lub.lu.se/desire/poster.html

4.2 Automatic Classification Working Papers

Working papers in this area include:

The construction of a robot-generated subject index, Anders Ardö, Traugott Koch and Lars Noodén, NetLab, Lund Univ.

http://www.lub.lu.se/desire/DESIRE36a-WP1.html

Automatic Classification and Content Navigation Support for Web Services - DESIRE II Cooperates with OCLC, Traugott Koch, NetLab, Lund University, Diane Vizine-Goetz, Consulting Research Scientist, OCLC Office of Research

http://www.oclc.org/oclc/research/publications/review98/koch_vizine-goetz/automatic.htm

5. LDAP Distributed indexing tools

The project is in the process of developing a White Pages service based on an index of existing directory servers in Europe. The service model proposes the development of a set of national, LDAPv3 accessible servers, each holding a copy of the entire index. The initial index service will permit the client to find personal entries actually held in the set of indexed databases. Each indexed record contains selected attributes from the source, together with a reference to an LDAPv3 server holding a copy of the entry complete. The client used to access the service should be able to search the index server and follow up references to the list of matching entries that it returns.

5.1 Prototype Distributed Indexing Toolkit

A prototype of the distributed indexing toolkit is now available at:

http://surver.surfnet.nl:8000/

5.2 Distributed Indexing Toolkit Working Papers

A number of published and internal working papers have contributed to the development of the toolkit including a formal Internet Draft. These are available from the URLs given below:

Requirements and overview for an European LDAP indexing service (Internet Draft), Peter Gietz, Peter Valkenburg and Henny Bekker

http://www.ietf.org/internet-drafts/draft-gietz-ldapindex-00.txt

Outline of the internal architecture of the DESIRE II LDAP client, Damanjit Mahl, Brunel University:

http://www.brunel.ac.uk/x500/desire/client-architecture/docframe.htm

Outline of the user requirements for the LDAP directory service demonstrator client, Damanjit Mahl, Brunel University:

http://www.brunel.ac.uk/x500/desire/client-specification/docframe.htm

Generic distributed indexing architecture using CIP/TIO & HTTP, Henny Bekker, SURFnet Expertise Centrum:

http://www.sec.nl/persons/henny/desire/ldap/d2_ldap_tio.html

6. Prototype RDF rating services

DESIRE's Deliverable D3.2 demonstrates several prototype applications of the W3C Resource Description Framework (RDF) in the context of the DESIRE work on web indexing and quality-assured information gateways. This work builds upon, and implements in part, the analysis undertaken earlier as D3.1 - Quality Ratings in RDF.

The documentation and demonstrators for D3.2 consist of two parts, undertaken by ILRT and UKOLN respectively. UKOLN's work focuses on the integration of quality-oriented attributes into the ROADS search and retrieval environment, and presents a prototypical client-side interface to one such rating bureau. The ILRT demonstrators contrast the facilities offered by a generic RDF datastore with a more traditional relational database approach, showing how either approach can be integrated into more general RDF-based information services.

The demonstrators are maintained on separate sites and can be accessed from the URLs below.

6.1 ILRT demonstrator: rating and recommendations toolkit and prototype

http://rudolf.opensource.ac.uk/about/ratings/

6.2 UKOLN demonstrator: Quality-based ranking and client application

http://www.ukoln.ac.uk/metadata/desire/qualityratings/

7. Developer support

A number of publicly accessible CVS servers are being created here, to allow software developers to track and participate in the creation of these tools. We are currently working to provide CVS views of the ROADS and Combine packages, with more to follow.

7.1 ILRT CVS Repository

ILRT have a growing number of CVS repositories to aid collaborative development. These can be viewed from:

http://cvs.desire.org/cgi-bin/cvsweb.cgi/

8. Future Timescales

As has been demonstrated above, there is considerable development work taking place within the DESIRE project. As mentioned in the summary rather than risk duplication by creating a monolithic 'DESIRE software distribution' for the DESIRE toolkit, we instead adopt a decentralised model. ROADS and Combine have an existence beyond the scope of the DESIRE project. Through the creation of the DESIRE software toolkit, we aim to bring these diverse components together under a broad umbrella.

We plan to release a initial version of the DESIRE toolkit at the end of November which will pull all the above strands together within the DESIRE Web site. A second version of the toolkit will be available in April 2000 though it is expected that the toolkit will have a continuing life after the end of the project.


Title:
Issue: Version 1.0
Date: July 1999