Preserv

       
Latest...
Array
Preserv 2 final report 'candid and realistic'
The final report from the Preserv 2 project has been described by the JISC programme manager responsible for funding the project, Neil Grindley, as ‚ÄĚcandid and realistic about the ... more
Project Partners

Oxford University Library Services ECS, University of Southampton The National Archives
Project Advisors
The British Library
Funded By
JISC

PRESERV 2 is funded by JISC within its capital programme in response to the September 06 call (Circular 04/06), Repositories and Preservation strand

PRESERV was originally funded by JISC within the 4/04 programme Supporting Digital Preservation and Asset Management in Institutions, theme 3: Institutional repository infrastructure development

MORE INFORMATION?

EMAIL: Steve Hitchcock, Project Manager

TEL: +44 (0)23 8059 3256
FAX: +44 (0)23 8059 2865

PRESERV Project,
IAM (Intelligence, Agents, Multimedia) Group,
Department of Electronics & Computer Science,
University of Southampton,
Highfield,
Southampton
SO17 1BJ, UK
RSS Admin


About the ProjectObjectives & OutcomesNews RSSPapers & Presentations RSSPeopleBlogs  RSS

Mining with ORE

At the 2008 Open Repositories Conference (OR08) a developers challenge was launched on repository interoperability by the Common Repository Interfaces Group (CRIG). This exciting challenge brought forward 19 teams of developers including a team from Preserv (past and present) who won the $5000 first prize. More information about the event can be found on the Preserv Noticeboard.

On this page we outline in more depth the outcome of this development, how it works and how it affects the wider community.

Introduction

As the area of digital libraries and repositories gains speed, more and more institutions are launching their own repositories after careful planning and policy making. One of the decisions that has to be made is which software to use to run the repository, based on whether it fulfils the policy requirements. Each piece of repository software comes from a different background and each performs operations slightly differently. Currently the main softwares are DSpace, EPrints and Fedora, among others that are custom-built for specific purposes.

A Specification for Interoperability

Although most will view interoperability as an important aspect of their repository, the amount of support enabled varies. This is mostly due to the fact there was no good specification for the purposes of Object Reuse and Exchange (ORE) ... until now (ish). The leaders of the Open Archives Initiative (OAI) have come up with an early specification for OAI-ORE. More information of the current state of implementation can be found on their website at http://www.openarchives.org/ore/

At the time of writing this specification had reached an Alpha 4 release.

OAI-ORE - Brief Overview

The OAI-ORE specification outlines a data model to identify and describe 'aggregations' of resources (URIs). ORE also specifies the type of encoding (XML) and the formatting of this encoding, which is based on the Atom Syndication Format.

It is the aggregation that is important in terms of objects such as those created and stored within a repository. If we were to view a publication as a single object, within a repository this object would also have a set or several sets of metadata linked to it. By creating an aggregation which lists both the publication file and the related metadata we create a complete representation of this digital object. Of course you could take this further by saying that the repository is an object and the aggregation consists of all those objects (themselves represented by OAI-ORE) contained within the repository.

To further understand what a particular aggregation represents, ORE also specifies that each aggregation is defined by a resource map, where the relationship between a Resource Map and an Aggregation can be described as follows:
  • An Aggregation is a Resource with a URI
  • A Resource Map is a Resource with a URI
  • Each Resource Map asserts (identifies) and describes one Aggregation
  • Each Aggregation MAY be asserted and described by multiple Resource Maps
  • Each Resource MUST have one serialisation (representation)

Mining with ORE

Mining with ORE shows one of the first large-scale implementations of the early ORE specification and was an experiment to see how effective ORE would be in enabling an entire repository's worth of data being moved from one repository software to another.

The two repository softwares chosen were EPrints and Fedora due to a local knowledge in the project of both of these pieces of software. There were two stages involved in this project: exporting an object in the form of a resource map, and then importing this into its new location.

Exporting

Both Fedora and EPrints apply similar programming methodologies where each is able to apply custom services and plug-ins over the core repository. With the specification for ORE clear, the export service/plug-in was a reasonably easy piece of code to implement on each platform.

Importing, and Problems with the Export Specification

Exporting is all well and good but until you try to import something you have no way of knowing fully if any problems exist within either the repository software or the ORE specification itself. In the case of the import plug-in we discovered minor technical problems in both.

Our first problem was that of discovering the Resource Maps in the repository software. Very few repositories implement the 302 or 303 redirection methods described in the http protocol, thus the only version of an object you can discover at the object's URI is the human-readable form. To avert the need to solve this problem, a hardcoded URL of a complete resource map for each archive was supplied.

The second problem directly affects the representation of objects in ORE. The majority of repository softwares make the requirement for a certain amount of metadata to exist before they can accept a file or deposit related to this metadata. Thus before being able to import an aggregation of resources we need to first import a set of metadata which defines the object and its resources. Both Fedora and EPrints are able to export OAI_DC and this object was contained within the aggregation, but without attempting to parse though all the objects in the aggregation and attempting to validate each one it is challenging to discover which part of the aggregation is the OAI_DC object. In the Mining with ORE solution we specified the requirement, where possible, for each object in an aggregation to contain a dc:conformsTo in its description. Thus for the OAI_DC object we can now simply look for the object with <dc:conformsTo>http://www.openarchives.org/OAI/2.0/oai_dc/</dc:conformsTo> in its description.

EPrints Plugins

The Export plugin demonstrates a typical export plugin, able to be built easily using the EPrints plugin framework.

The Import plugin for EPrints is detailed on the EPrints Wiki.
Both the import and export plugin are available via EPrints Files and are expected to be available in the next release of EPrints (3.2).

Screenshots

EPrints -> Fedora: For this demo the live publications archive from the OR08 conference (in EPrints) was used as the source archive. This was then exported into the Fedora repository software.
EPrints or08 EPrints
--->
Fedora or08 Fedora
Fedora -> EPrints: For this demo the live publications archive from the Oxford Research Archive was used as the source archive. This was then exported into the EPrints repository software.
Fedora ora Fedora
--->
EPrints ora EPrints
More screenshots can be found here.

Glossary

  • OAI-ORE - Open Archives Initiative - Object Reuse and Exchange specification
  • XML - eXtensible Markup Language - Open-ended language for marking up or tagging objects (Google it!).
  • URL - Uniform Resource Location - Location of a piece of information on the web.
  • URI - Uniform Resource Identifier. An identifiable URL, does not have to be resolvable like a URL but best practice recommends that it is.
David Tarrant
Updated 5 June 2008

This page produced and maintained by the PRESERV Project. Contact us