Mining with OREAt the 2008 Open Repositories Conference (OR08) a developers challenge was launched on repository interoperability by the Common Repository Interfaces Group (CRIG). This exciting challenge brought forward 19 teams of developers including a team from Preserv (past and present) who won the $5000 first prize. More information about the event can be found on the Preserv Noticeboard.
On this page we outline in more depth the outcome of this development, how it works and how it affects the wider community.
IntroductionAs the area of digital libraries and repositories gains speed, more and more institutions are launching their own repositories after careful planning and policy making. One of the decisions that has to be made is which software to use to run the repository, based on whether it fulfils the policy requirements. Each piece of repository software comes from a different background and each performs operations slightly differently. Currently the main softwares are DSpace, EPrints and Fedora, among others that are custom-built for specific purposes.
A Specification for InteroperabilityAlthough most will view interoperability as an important aspect of their repository, the amount of support enabled varies. This is mostly due to the fact there was no good specification for the purposes of Object Reuse and Exchange (ORE) ... until now (ish). The leaders of the Open Archives Initiative (OAI) have come up with an early specification for OAI-ORE. More information of the current state of implementation can be found on their website at http://www.openarchives.org/ore/
At the time of writing this specification had reached an Alpha 4 release.
OAI-ORE - Brief OverviewThe OAI-ORE specification outlines a data model to identify and describe 'aggregations' of resources (URIs). ORE also specifies the type of encoding (XML) and the formatting of this encoding, which is based on the Atom Syndication Format.
It is the aggregation that is important in terms of objects such as those created and stored within a repository. If we were to view a publication as a single object, within a repository this object would also have a set or several sets of metadata linked to it. By creating an aggregation which lists both the publication file and the related metadata we create a complete representation of this digital object. Of course you could take this further by saying that the repository is an object and the aggregation consists of all those objects (themselves represented by OAI-ORE) contained within the repository.
To further understand what a particular aggregation represents, ORE also specifies that each aggregation is defined by a resource map, where the relationship between a Resource Map and an Aggregation can be described as follows:
Mining with OREMining with ORE shows one of the first large-scale implementations of the early ORE specification and was an experiment to see how effective ORE would be in enabling an entire repository's worth of data being moved from one repository software to another.
The two repository softwares chosen were EPrints and Fedora due to a local knowledge in the project of both of these pieces of software. There were two stages involved in this project: exporting an object in the form of a resource map, and then importing this into its new location.
ExportingBoth Fedora and EPrints apply similar programming methodologies where each is able to apply custom services and plug-ins over the core repository. With the specification for ORE clear, the export service/plug-in was a reasonably easy piece of code to implement on each platform.
Importing, and Problems with the Export SpecificationExporting is all well and good but until you try to import something you have no way of knowing fully if any problems exist within either the repository software or the ORE specification itself. In the case of the import plug-in we discovered minor technical problems in both.
Our first problem was that of discovering the Resource Maps in the repository software. Very few repositories implement the 302 or 303 redirection methods described in the http protocol, thus the only version of an object you can discover at the object's URI is the human-readable form. To avert the need to solve this problem, a hardcoded URL of a complete resource map for each archive was supplied.
The second problem directly affects the representation of objects in ORE. The majority of repository softwares make the requirement for a certain amount of metadata to exist before they can accept a file or deposit related to this metadata. Thus before being able to import an aggregation of resources we need to first import a set of metadata which defines the object and its resources. Both Fedora and EPrints are able to export OAI_DC and this object was contained within the aggregation, but without attempting to parse though all the objects in the aggregation and attempting to validate each one it is challenging to discover which part of the aggregation is the OAI_DC object. In the Mining with ORE solution we specified the requirement, where possible, for each object in an aggregation to contain a dc:conformsTo in its description. Thus for the OAI_DC object we can now simply look for the object with <dc:conformsTo>http://www.openarchives.org/OAI/2.0/oai_dc/</dc:conformsTo> in its description.
EPrints PluginsThe Export plugin demonstrates a typical export plugin, able to be built easily using the EPrints plugin framework.
The Import plugin for EPrints is detailed on the EPrints Wiki.
Both the import and export plugin are available via EPrints Files and are expected to be available in the next release of EPrints (3.2).
ScreenshotsEPrints -> Fedora: For this demo the live publications archive from the OR08 conference (in EPrints) was used as the source archive. This was then exported into the Fedora repository software.
Updated 5 June 2008