Project Number: | RE 4004 (RE) |
Project Title: | DESIRE II - Development of a European Service for Information on Research and Education II |
Deliverable Type: | PU |
Deliverable Number: | M3.3b |
Contractual Date of Delivery: | November 1999 |
Actual Date of Delivery: | November 1999 |
Title of Deliverable: | Pre-release of DESIRE Integrated Toolkit |
Workpackage(s) contributing to the Deliverable: | WP3 |
Nature of the Deliverable: | PR |
Author: | Tracey Hooper |
Contact Details: |
|
Other Authors: | Tim Dixon, TDC Networking Consultancy Limited |
URL | http://www.desire.org/html/research/deliverables/D3.3/m3.3b.html |
Abstract | This Milestone is an interim report describing a limited peer-reviewed pre-release of the toolkit, available from November 1999 as a precursor to the final DESIRE toolkit, to be available by April 2000. During the course of the DESIRE project, a number of different software components are being produced. In many cases, these build upon previous work (such as the ROADS toolkit), or on work undertaken in phase one of the project (eg Combine). Phase two of DESIRE includes a number of strands that share common software requirements. The purpose of this toolkit is to provide a supporting environment and framework to allow collective and ongoing development of this software, and to provide both an environment and framework for their maintenance beyond the lifetime of DESIRE-II. |
Keywords | Software |
Distribution List: | DESIRE Project Team; European Commission |
Issue: | V0.3 |
Reference: | Pre-release of DESIRE Integrated Toolkit |
Total Number of Pages: | 10 |
Issue Number | Issue Date | Reason for Change |
0.1 | 4/11/99 | (Initial Draft) |
0.2 | 9/11/99 | Early draft issued for comment |
0.3 | 18/11/99 | Second draft issued for comment |
1.0 | 19/11/99 | Incorporates comment on draft; version released for peer review |
1.1 | 9/12/99 | Incorporates comments from peer review |
This Milestone is an interim release of the DESIRE toolkit scheduled to be available in its complete form in April 2000.
During the course of the DESIRE project, a number of different software components are being produced. In many cases, these build upon previous work (such as the ROADS toolkit), or on work undertaken in phase one of the project (eg. Combine). Phase two of DESIRE includes a number of strands that share common software requirements. The purpose of this toolkit is to provide a supporting environment and framework to allow collective and ongoing development of this software, and to provide both an environment and framework for their maintenance beyond the lifetime of DESIRE-II.
There are already a number of 'resource discovery', metadata and Web indexing themed software toolkits in existence. There are also projects such as the Advanced Search Framework (ASF) which draw together a number of smaller components to produce services and applications. ROADS and Combine, are good examples of toolkits which themselves include a number of sub-components, while also usefully serving as components of large applications.
Many of the developments in DESIRE phase two build upon these software distributions. Rather than risk duplication by creating a monolithic 'DESIRE software distribution' for the DESIRE toolkit, we instead adopt a decentralised model. ROADS and Combine have an existence beyond the scope of the DESIRE project. Through the creation of the DESIRE software toolkit, we aim to bring these diverse components together under a broad umbrella.
In order to provide a resource that will have a continued applicability beyond the lifetime of the project, the tools themselves will be used to construct an online system that will allow developers to contribute, select and check the compatibility of a wide variety of software tools, initially drawn from the DESIRE environment, but extensible to other relevant tools in the future.
This is an interim document, covering the high-level design of the toolkit and introducing some of the tools developed in various Work Packages of the DESIRE II project.
Centroid
A summary of the information contained in a whois++ index used to guide the routing of information queries to the likely source of the information being sought.
CIP
Common Indexing Protocol (RFC 2651, RFC 2652, RFC 2653)
IAFA
Internet Anonymous File Archive
LDAP
Lightweight Directory-access Protocol (RFC 1777)
RDF
Resource Description Framework
TIO
A Tagged Index Object for use in the Common Indexing Protocol (RFC 2654)
URL
Uniform Resource Locator (RFC 1738)
Whois++
A network information lookup service and associated protocol (RFC 1834, RFC 1385, RFC 1913, RFC 1914)
Although, ultimately, the value of the work done in DESIRE II is to be found in the quality and usefulness of the software tools themselves, their worth can only be exploited if the developers of information systems can readily locate the tools they need to build their applications. Since the tools developed in the DESIRE projects cover a wide variety of different application areas, and are designed to complement tools available from other sources, the construction of a specific application may require the selection of several tools from different sources.
The DESIRE toolkit will ultimately use the DESIRE tools themselves to provide an information resource describing the available tools and the possible combinations in which they may be employed. Using metadata elements specifically selected to encapsulate they key features of the software tools, the “toolfinder” interface to the system will allow developers to find the set of tools that most closely matches their requirements.
To assist in the selection of tools, all components of the toolkit will be described and categorised in a standardised fashion which captures information about the nature and availability of the component and the possible ways in which it may be combined with other components. Each relevant characteristic of a component is named an Attribute. Each attribute may occur exactly once (indicated by 1 in the table below) or may occur zero or more times (indicated by * in the table below) or may occur 1 or more times (indicated by +). Each attribute may have a single value (such as the name of the component) or may consist of a list, each item of the list describing a different aspect of the attribute’s value. In the table below, lists are shown in the conventional manner – items grouped in parentheses and separated by commas.
The value of each attribute is interpreted according to the syntax shown below. Most attributes consist of text, but the values of others are constrained. Syntax constraints noted below include:
Category A high-level classification of the nature of the software tool: (resource description, resource search, resource retrieval, metadata conversion, metadata management, metadata inference, protocol conversion, user interface)
URL The value must be a URL according to RFC 1738
Platform The value encodes the identification of an operating system or application environment and must be one of (AnyPerl, UnixPerl, Unix-Generic, Unix-Solaris, Unix-Linux, JavaScript, Win32, Win16, WinNT, MacOS)
Input The value encodes the identification of a data representation or network protocol which the tool accepts as input and must be one of (whois++, CIP, LDAP, LDIF, Z39.50, HTTP, FTP, IAFA, RDF, ROADS, centroid, DVT Tree, DVT Database, RUDOLF-RDF, TIO)
Output The value encodes the identification of a data representation or network protocol which the tool produces as output and must be one of (whois++, CIP, LDAP, LDIF, Z39.50, HTTP, FTP, IAFA, RDF, ROADS, centroid, HDB, DVT Tree, DVT Database, RUDOLF-RDF, TIO)
SupportType The value encodes an indication of the level of support available for the tool and must be one of (Unsupported, Community, Developer, Commercial)
LicenceType The value encodes an indication of the terms on which the software is made available and must be one of (Public Domain, Open Source, Educational, Commercial, Restricted Distribution)
Attribute | Occurrence | Syntax | Explanation |
Name | 1 | Text | The name of the component |
Nature | + | Category | The category or categories of function performed by the tool |
Function | 1 | Text | A description of the component’s main function |
Application | 1 | Text | A description of the application (or applications) in which the component might be used |
Example | * | (URL, Text) | Examples of online services that have been built with this tool – the URL at which the example can be found and a text description of the key features of the example |
Origin | 1 | Text | A description of the individual or organisation providing the component |
Licence | 1 | (LicenceType, Text) | The legal terms relating to use of the component |
Support | + | SupportType or (SupportType, URL) | The type of support available and, optionally the URL of a Web page or mailing list from which support may be obtained |
InstallGuide | * | Location(s) at which information can be found relating to the installation of the tool | |
InstallKit | * | (Platform, URL) | For each operating system or other platform for which software is available online, a pair of values containing the platform type and the URL at which the software for that platform may be found |
Documentation | * | Location(s) at which general documentation can be found | |
Consumes | * | (Input, Text) | For each input type supported by the tool, a pair of values specifying the input type accepted and describing the use made of the input |
Produces | * | (Output, Text) | For each type of output produced by the tool, a pair of values specifying the output type produced and describing the output |
The preliminary toolkit consists of the tools listed in this section. Each has been categorised according to the description scheme set out above.
Attribute | Value |
Name | ROADS |
Nature | resource description, resource search, resource retrieval, metadata conversion |
Function | ROADS is a system which stores metadata describing information resources in the form of IAFA templates. Resource descriptions can be searched via locally-generated Web pages, or remotely using the whois++ protocol. Resource descriptions can be amalgamated and exchanged with other retrieval systems in the form of centroids or using the Common Indexing Protocol. |
Application | ROADS is particularly suited to the construction of manually-maintained catalogues of information resources which require a consistent approach to the categorisation or rating. ROADS is commonly used to build Subject-Based Information Gateways. |
Example | http://www.sosig.ac.uk/ |
Origin | Department of Computer Science at Loughborough University of Technology |
Licence | Open Source |
Support | Developer |
InstallKit | UnixPerl |
InstallGuide | |
Documentation | |
Consumes | whois++ IAFA |
Produces | whois++ centroid |
Attribute | Value |
Name | ROADS Dublin Core Metatadata Repository (‘DC in a box’) |
Nature | resource description, resource search, resource retrieval, metadata conversion |
Function | An extension to the ROADS software toolkit to allow the creation and storage of Dublin Core metadata. |
Application | Dublin Core elements are mapped to ROADS IAFA attributes for storage to allow interoperability with other ROADS-based services and so that various ROADS scripts continue to work without modification. The cataloguer and end user see only the mapped views of the metadata represented as a DC element set.Included is a script to generate ‘on the fly’ Dublin Core in RDF representations of the stored records (wpp2qualdc.pl) and also scripts to inform the content providers of the metadata. The content providers are encouraged to create a relationship between their resource and the corresponding metadata record using the HTML ‘link’ tag to call the wpp2qualdc.pl script. |
Example | |
Origin | ILRT |
Licence | Open Source |
Support | Developer |
InstallKit | |
InstallGuide | |
Documentation | |
Consumes | whois++ IAFA |
Produces | whois++ centroid |
Attribute | Value |
Name | Combine |
Nature | resource retrieval |
Function | Combine is a robot for harvesting of Web resources, and is designed to be distributable, parallel and flexible. It is distributable in the sense that different parts of Combine can run on separate computers. Parallel, meaning that some parts of a Combine system can exist in several instances to increase performance. Flexibility is achieved by the system being built by putting together small and relatively simple building blocks in a way that is modifiable by the user. |
Application | Combine can be used to produce distributed, federated and regional Web indexes. |
Example | http://safari.hsv.se/index.html.en |
Origin | Lund University Library NetLab |
Licence |
|
Support | Developer |
InstallKit | UnixPerl |
InstallGuide | |
Documentation | |
Consumes | HTTP |
Produces | HDB |
Attribute | Value |
Name | DESIRE Vocabulary Database |
Nature | Metadata management |
Function | The vocabulary database tools allow the construction and manipulation of databases of specialist vocabularies which can then be used to facilitate browsing in user interfaces to search services. |
Application | The vocabulary tools can be used to build systems for searching across a number of Internet data stores while traversing a vocabulary database. |
Example | |
Origin | Lund University Library NetLab |
Licence |
|
Support | Developer |
InstallKit | UnixPerl |
InstallGuide | |
Documentation | |
Consumes | DVT Tree DVT Database |
Produces | DVT Tree DVT Database |
Attribute | Value |
Name | DESIRE Vocabulary Browser |
Nature | User interface |
Function | The vocabulary browser tools allow interaction between a search interface and a vocabulary database constructed with the DESIRE vocabulary database tools. |
Application | The vocabulary tools can be used to build systems for searching across a number of Internet data stores while traversing a vocabulary database. |
Example | |
Origin | Lund University Library NetLab |
Licence |
|
Support | Developer |
InstallKit | UnixPerl |
InstallGuide | |
Documentation | |
Consumes | DVT Database |
Produces |
Attribute | Value |
Name | DESIRE Vocabulary Browser for Z39.50 |
Nature | User interface |
Function | The vocabulary browser tools allow interaction between a Z39.50 search service and a vocabulary database constructed with the DESIRE vocabulary database tools. |
Application | The vocabulary tools can be used to build systems for searching across a number of Z39.50 servers while traversing a vocabulary database. |
Example | |
Origin | Lund University Library NetLab |
Licence |
|
Support | Developer |
InstallKit | UnixPerl |
InstallGuide | |
Documentation | |
Consumes | DVT Database |
Produces | Z39.50 |
Attribute | Value |
Name | RDF Data Store |
Nature | Metadata management |
Function | The RDF data model presents a flexible, expressive approach for representing structured data for the Web. This has both advantages and drawbacks. A key difference between RDF storage systems and the traditional relational approach is that with generalised RDF stores, it is necessary to anticipate the need to manage data drawing on metadata vocabularies that were unknown at the time the database was initialised. This tool provides a generalised data storage engine for RDF and defines an API which may be used to access it. |
Application | Any application which needs to store or retrieve metadata in RDF format. |
Example | http://www.grapevine.sosig.ac.uk/grapevine/recommender.htm |
Origin | University of Bristol Institute for Learning and Research Technology |
Licence |
|
Support | Developer |
InstallKit | UnixPerl |
InstallGuide | |
Documentation | |
Consumes | RUDOLF-RDF |
Produces | RUDOLF-RDF |
Attribute | Value |
Name | Opinion Server |
Nature | Metadata management |
Function | A simple RDF rating and recommendation server built on top of a graph-oriented RDF API. |
Application | Solicits, manages and stores ratings and recommendations using the RDF graph APIs. The Opinion Server application provides mechanisms for constructing simple RDF statements about Web resources and writing those statements into an RDF store. |
Example | Http://www.grapevine.sosig.ac.uk/grapevine/recommender.htm |
Origin | University of Bristol Institute for Learning and Research Technology |
Licence |
|
Support | Developer |
InstallKit | UnixPerl |
InstallGuide | To be provided |
Documentation | |
Consumes | RUDOLF-RDF |
Produces | RUDOLF-RDF |
Attribute | Value |
Name | LDAP crawler |
Nature | Resource retrieval |
Function | Gathering of LDAPv2/v3 objects |
Application | The LDAP crawler is an LDAPv2/v3 directory robot that produces LDIF (LDAP Interchange Format) dumps of data in a specified Directory Information Tree (DIT). It can be used to feed centralised or distributed indexing services that, e.g., offer a single entry-point to a set of LDAP servers in an organisation or country. |
Example | 1. ldap://search.surfnet.nl/c=NL 2. http://search.surfnet.nl/naam/index.html |
Origin | SURFnet |
Licence | Copyright (c) 1998, SURFnet bv, the Netherlands. All rights reserved. This program may currently only be distributed among DESIRE project participants. |
Support | Developer: |
InstallKit | Unix Tar file. Currently available ‘as is’ for DESIRE project participants. |
InstallGuide | To be provided |
Documentation | To be provided |
Consumes | LDAP |
Produces | LDIF |
Attribute | Value |
Name | DESIRE Generic Distributed Index Server |
Nature | resource search, resource retrieval, metadata conversion, metadata management, protocol conversion |
Function | Collects, indexes and makes available forward knowledge about resources, based on the IETF Common Indexing Protocol (CIP). The stored CIP forward knowledge objects are searchable for clients using (possibly various) search protocols. The software includes: |
Application | The Generic Index Server provides a referral based distributed indexing service that can, e.g., offer a single entry-point to a set of LDAP servers in an organisation or country, or a set of resource discovery services. The architecture is explained in tio/">http://www.surfnet.nl/innovatie/surf-ace/search/ldap/d2_ldap_tio/. |
Example | http://www.sec.nl/persons/henny/desire/ldap/d2demo.html Demonstrator page on a Distributed LDAP-index service |
Origin | SURFnet, The Netherlands; University of Tuebingen, Germany. |
Licence | Copyright (c) 1999, SURFnet bv, the Netherlands. All rights reserved. The software is currently under development and will be made available to DESIRE project participants, initially. |
Support | Developers: |
InstallKit | To be provided |
InstallGuide | To be provided |
Documentation | To be provided |
Consumes | LDIF TIO |
Produces | LDIF LDAP |
Attribute | Value |
Name | Matcher |
Nature | Metadata inference |
Function | The tool implements a subject classification process using a subject-specific thesaurus which terms are intellectually mapped to categories or subject classes. The classification process is made up of several steps. First the document to be classified is fetched. From this document text is extracted, and all thesaurus terms are matched to it. Some heuristic processing rules are applied to the results from the matching process. Finally the outcome is formatted for either presentation or storing in a database. |
Application | Automatic subject classification of WWW-pages |
Example | |
Origin | Lund University Library NetLab |
Licence | Open Source |
Support | Developer |
InstallKit | UnixPerl |
InstallGuide | |
Documentation | |
Consumes | HTTP HDB |
Produces | RDF HDB |