Digital Repositories - Overview

Terms of use
X Terms of use
Free to distribute for non-commercial purposes.
Print
The term “Digital Repository” is used to describe a wide range of systems which provide the infrastructure for the storage, preservation, management, discovery and delivery of all types of electronic content. Common types of digital repository include Institutional Repositories - used by organisations such as Universities to store copies of scholarly works (either digitized or ‘born digital’)- and Digital Archives, where there is a strong emphasis on digital preservation.

Digital repositories of learning content are used to store assets and the metadata describing them. They may also store assets and metadata in different levels of aggregation. Although repositories provide a link between assets and metadata, they do not necessarily need to be stored in the same location.

There has recently been much effort to develop technologies that catalogue and facilitate the location of text, images, video and audio. Current efforts in the development of repository standards and software are broad and varied, with players coming from nearly every major sector. This diversity of interest has resulted in an impressive number of competing standards and supporting technologies.

The basic functionalities provided by a repository are described in the Reference Model for an Open Archival Information Systems (2002):
  • Data ingest: refers to the method of getting data into a repository and can include data push (publication or submission of data from a source into the repository) and data pull (harvesting or gathering of data initiated by a repository and acting on data sources).
  • Data management: refers to the actions which take place within the repository. Although not exposed directly to repository users, the results of these actions are often visible (e.g. a high resolution image may be transformed into a lower resolution format for web based dissemination).
  • Archival storage: refers to the technical infrastructure used for the storage and retrieval of assets and metadata. Where open standards are adopted for data ingest and providing access (search/ retrieval) then this should be transparent to users of the repository.
  • Providing access: refers to the way in which the repository exposes its contents to end users (e.g. search/ browse interfaces).
The IMS Digital Repositories Interoperability specification (only slightly more recent, being published in 2003) typically described repository functions as a number two-stage actions, the first performed by the user and the second by the repository: search/expose, gather/expose, submit/store, request/deliver, and alert/expose.

Since the publication of these early reference works, the main developments have included:
  • the widespread adoption of OAI-PMH as a standard for exposing repositories for harvesting
  • the development of open source repository management tools such as DSpace and Fedora
  • the development of open source indexing tools such as Lucene and SOLR
  • the growth on the use of ‘mash-ups’ and the subsequent growth in interest from developers in the use machine APIs for search and retrieval.

This section provides an overview of some key specifications and standards of particular interest to the eLearning community and provides links to news and events relating to digital repositories.
Comments / Suggestions / Error reporting on this page
Please, choose an item on drop-down menu and write your text
Send
Terms of use
X
These contents have been obtained from the ADL official Web site and edited for presentation. Please refer to the ADL Web site for additional information on terms of use.
Print
Important note
The CORDRA specification was originally developed within the ADL initiative. Currently, ADL does not continue this work and its website does not provide updated information on this specification. The last available on-line information by ADL on CORDRA is included in under section "Current version" below or on the left menu. Information available in the LTSO was obtained during the period of elaboration of the original specification.
Overview
Image for 'Overview'
Advanced Distribute Learning Initiative
Over the past 10 years, the Advanced Distributed Learning (ADL) Initiative’s Sharable Content Object Reference Model (SCORM®) evolved to provide a modular, object-based design approach for digital objects that solved key interoperability and reusability issues across many learning systems in industry and government. SCORM has enjoyed widespread international adoption, has become a de facto standard in many learning communities, and is supported by U.S. Department of Defense (DoD) policy. While SCORM advances the state of the art in designing and creating interoperable and reusable objects, it does not address finding and reusing objects after they have been created.

In 2003, ADL began work to solve this problem. ADL launched an investigation into the difficulties and realities of object creation, storage, and management and uncovered the limitations and problems encountered by others in related fields, such as library science, computer and network systems design, and publishing. As ADL investigated these fields and formulated high-level requirements for the learning community, it quickly found that many problems had not been solved by others and that the problem space was much more complex than it first appeared.

To address these issues, ADL set out to:
  • Define high-level requirements, policies, and business rules for object repositories that would be practical to implement.
  • Identify and apply the most relevant technologies and specifications that could be used to define the architecture.
  • Define an architecture on which necessary services could be built.
  • Define an architecture that would be scalable.
CORDRA (Content Object Repository Discovery and Registration/Resolution Architecture) is an effort to define a framework for the federation of digital collections. The framework aka “CORDRA specification” is intended to be an open, standards-based model for designing and implementing information systems including registries and repositories for the purposes of discovery, sharing and reuse of information. The CORDRA specification will describe how owners or managers of widely distributed information expressed in digital form may register the existence of that information and so enable others to find and use that information.

CORDRA is designed to be an enabling model to bridge the worlds of learning content management and delivery, and content repositories and digital libraries. CORDRA aims to identify and specify (not develop) appropriate technologies and existing interoperability standards that can be combined into a reference model used to enable a learning content infrastructure.

Corporation for National Research Initiatives and Advanced Distributed Learning Initiative collaborated to design and implement a metadata registry, known as ADL Registry, to showcase and provide a reference implementation of the CORDRA specification (launched in December 2005, the ADL Registry is the first publicly available CORDRA implementation).
Comments / Suggestions / Error reporting on this page
Please, choose an item on drop-down menu and write your text
Send
Terms of use
X
These contents have been obtained from the AICC Web site and edited for presentation. Please refer to the AICC Web site for additional information on terms of use.
Print
Image for 'Overview'
Aviation Industry CBT Committee
The specification/guidelines Package Exchange Notification Services (PENS), developed by the AICC (CMI010), describes a protocol to support a notification service to announce the location of content package(s) that are available for transport. The intent is to automate the notification, transfer and delivery confirmation of content packages between tools or systems that generate content and systems that manage, publish or deliver content.
Purpose and Scope
The purpose of this specification is to fill a gap that currently exists between the creation of content packages by “content authors” and the deployment of those content packages on LMSs by “LMS administrators” where learners may ultimately have access to them. Without a specification that addresses this gap, the concept of shared content is incomplete: LMSs do not have a means to obtain newly developed, revised or updated content.

The scope of the specification is specifically constrained to the notification request, package transfer and related responses. Specifically outside the scope of this specification are mechanisms for physical deployment of content packages, content management, version control, publication or revoca-tion of content.
AICC Package Exchange Notification Services overview
Comments / Suggestions / Error reporting on this page
Please, choose an item on drop-down menu and write your text
Send
Terms of use
X
Free to distribute for non-commercial purposes.
Print
Image for 'Description'
European Commitee for Standardization
The Simple Publishing Interface (SPI) specification (CWA 16097:2010), partly sponsored by the CEN Workshop on Learning Technologies, defines a protocol for publishing learning objects and/or their metadata to digital repositories. This protocol facilitates transferring metadata and learning objects from tools that produce materials to applications that persistently manage learning objects and metadata, but is also applicable to the publication of a wider range of digital objects. A CEN Workshop Agreement containing the specification and a binding to the ATOM Publishing Protocol (APP), compatible with the SWORD profile which is widely used by institutional repositories, is due for publication in 2010.

The objective is to develop a practical approach towards interoperability between repositories for learning and applications that consume or produce educational materials. Examples of repositories for learning are educational brokers, knowledge pools, institutional repositories, streaming video servers, etc. Applications that consume these educational materials are for instance query and indexation tools, authoring tools, presentation programs, content packagers, etc. The work will concentrate on the development of the simple publishing interface (SPI), an interface for storing educational materials in a repository.

The design of the SPI API is based on the design principles of the Simple Query Interface (SQI) specification. It has been defined a simple set of commands that is extensible and flexible. By analogy with SQI, this protocol makes a distinction between semantic and syntactic interoperability:
  • Syntactic interoperability is the ability of applications to deal with the structure and format of data. For instance, a language such as XML Schema Description (XSD) ensures the syntactic interoperability of XML documents as it allows for the parsing and validation of these documents.
  • Semantic interoperability refers to the ability of two parties to agree on the meaning of data or methods. When exchanging data, semantic interoperability is achieved when data is understood the same way by all the applications involved.
Objectives
This publishing protocol meets the following objectives:
  • SPI enables integrating publishing into authoring environments. This is beneficial for the authors’ workflow, as they do not need to manually upload their learning objects using external publishing applications.
  • SPI provides interoperability between applications that publish and applications that manage learning objects and metadata. Doing so, the effort of integrating publishing access into an authoring application can be reused on other learning object repositories, provided that they support SPI.
Simple Publishing Interface specification
Comments / Suggestions / Error reporting on this page
Please, choose an item on drop-down menu and write your text
Send
Terms of use
X
These contents have been obtained from the European Committee for Standardization (CEN) official Web site and edited for presentation. Please refer to the CEN official Web site for additional information on terms of use.
Print
Image for 'Description'
European Commitee for Standardization
The Simple Query Interface (SQI) specification, supported by CEN WS-LT (CWA 15454:2005), presents an Application Programming Interface (API) for querying learning object repositories. Since one major design objective is to keep the specification simple and easy to implement, the interface is labelled Simple Query Interface (SQI).

In the context of SQI, learning object repositories are defined as collections of educational material, courses, and learning objects with associated descriptions (referred to as “metadata”). Examples of repositories for learning are educational brokers, knowledge pools, streaming video servers, etc.

The collaborative effort of combining highly heterogeneous repositories has led to the following requirements:
  • SQI is neutral in terms of results format and query languages. The repositories connecting via SQI can be of highly heterogeneous nature, therefore, SQI makes no assumptions about the query language or results format.
  • SQI supports Synchronous and Asynchronous Queries in order to allow application of the SQI specification in heterogeneous use cases.
  • SQI supports, both, a stateful and a stateless implementation.
  • SQI is based on a session management concept in order to separate authentication issues from query management.
The design of the API itself is based on following design principles:
  • Command-Query Separation Principle,
  • Simple Command Set and Extensibility.
SQI forms a key part of the infrastructure used by the ASPECT project. ASPECT will also develop mappings from SQI to other search specifications including SRU.Simple Query Interface specification
Comments / Suggestions / Error reporting on this page
Please, choose an item on drop-down menu and write your text
Send
Terms of use
X
These contents have been obtained from the IMS Global Learning Consortium official Web site and edited for presentation. Please refer to the IMS Global Learning Consortium official Web site for additional information on terms of use.
Print
Image for 'Overview'
IMS Global Learning Consortium
The purpose of the Digital Repositories Interoperability (DRI) specification, developed by the IMS Global Learning Consortium, is to provide recommendations for the interoperation of the most common repository functions. These recommendations should be implementable across services to enable them to present a common interface.

On the broadest level, this specification defines digital repositories as being any collection of resources that are accessible via a network without prior knowledge of the structure of the collection. Repositories may hold actual assets or the meta-data that describe assets. The assets and their meta-data do not need to be held in the same repository.
IMS Digital Repositories Web Page
Comments / Suggestions / Error reporting on this page
Please, choose an item on drop-down menu and write your text
Send
Terms of use
X
These contents have been obtained from the IMS Global Learning Consortium official Web site and edited for presentation. Please refer to the IMS Global Learning Consortium official Web site for additional information on terms of use.
Print
Image for 'overview'
IML Global Learning Consortium
The Learning Object Discovery & Exchange (LODE) activity in IMS GLC is tasked with facilitating the discovery and retrieval of learning content stored in repositories. The “exchange” requirement centers on the fact that, while learning content repositories already cater for their local users, there are no agreed profiles that address the needs of the learning domain, and no established practices for combining existing specifications into complete solutions. Individual organizations are creating their own solutions, with quite different technical strategies, policy apparatus, and metadata schemes, and an opportunity to establish broader interoperability is being missed. There is also no way of measuring or testing the compatibility and conformance of specific solutions.

The following are considered in scope for the activity:
  • Search protocol, search query, and search results (i.e., metadata)
  • Metadata harvesting
  • Application of identifiers
  • Collection and service description
The following areas are considered out of scope of this activity:
  • Authentication, authorization, access (unless it is part of a specific protocol)
  • Digital Rights Management
  • Identity management
  • Metadata application profiling
Interoperability will be demonstrated when a system (e.g., a LMS) end user is able to discover a compatible learning object (e.g., a common cartridge) hosted on a separate system (e.g., a learning object repository) using a LODE-compliant discovery service. Demonstrations will focus on federated discovery (through either federated search or harvest-driven centralized search), as these present the greater interoperability challenge. The federations should be based on LODE search and LODE registry specifications. However LODE does not require federation for compliance. “Federated” is used in a loose sense to refer to a group of distributed, independently managed and potentially heterogeneous repositories, whether or not any agreements, trust relationships etc. exist between them.IMS LODE Web Site
Comments / Suggestions / Error reporting on this page
Please, choose an item on drop-down menu and write your text
Send
Terms of use
X
Free to distribute for non-commercial purposes.
Print
This category includes a number of specifications and recommendations related to the digital repositories area which either are not targeted specifically to the learning technologies field either have been developed by institutions not specifically focused to defining standards.
OAI Protocol for Metadata Harvesting (OAI-PMH)
Image for 'OAI Protocol for Metadata Harvesting (OAI-PMH)'
Open Archives Initiative
The OAI-Protocol for Metadata Harvesting (OAI-PMH), supported by the Open Archives Initiative, defines a mechanism for harvesting records containing metadata from repositories. The OAI-PMH gives a simple technical option for data providers to make their metadata available to services, based on the open standards HTTP (Hypertext Transport Protocol) and XML (Extensible Markup Language). The metadata that is harvested may be in any format that is agreed by a community (or by any discrete set of data and service providers), although unqualified Dublin Core is specified to provide a basic level of interoperability. Thus, metadata from many sources can be gathered together in one database, and services can be provided based on this centrally harvested, or "aggregated" data. The link between this metadata and the related content is not defined by the OAI protocol. It is important to realise that OAI-PMH does not provide a search across this data, it simply makes it possible to bring the data together in one place. In order to provide services, the harvesting approach must be combined with other mechanisms.OAI-PMH Project Web Site
Z39.50 Protocol
"Z39.50" refers to the International Standard, ISO 23950: "Information Retrieval (Z39.50): Application Service Definition and Protocol Specification", and to ANSI/NISO Z39.50. The standard specifies a client/server-based protocol for searching and retrieving information from remote databases.

Z39.50 makes it easier to use large information databases by standardizing the procedures and features for searching and retrieving information. Specifically, Z39.50 supports information retrieval in a distributed, client and server environment where a computer operating as a client submits a search request (i.e., a query) to another computer acting as an information server. Software on the server performs a search on one or more databases and creates a result set of records that meet the criteria of the search request. The server returns records from the result set to the client for processing. The power of Z39.50 is that it separates the user interface on the client side from the information servers, search engines, and databases. Z39.50 provides a consistent view of information from a wide variety of sources, and it offers client implementors the capability to integrate information from a range of databases and servers.
Z39.50 Protocol
Search/Retrieve via URL (SRU)
SRU is a search and retrieval protocol that uses Internet and web facilities to carry the messages between user and target. It was defined by the defined by the U.S. Library of Congress on the basis of the Z39.50 protocol, a very widely implemented globally specification. Much of the functionality of SRU is derived from the older protocol, however, only the most useful was brought over, and in a simplified form. At the time that SRU development began, similar search use of the URL was under investigation in several institutions, notably by staff at the Royal Library in the Netherlands. An international group of experts from the Z39.50 community collaborated on a draft of this new protocol for the Internet/web/XML environment. The SRU specifications were first published in 2002 and have been popular for use in new applications because of the ease of implementation.

SRU is very flexible. It is XML-based and the most common implementation is SRU via URL, which uses the HTTP GET for message transfer. Other versions, however, can be run over the web's SOAP protocol (SRU via SOAP), which supports more web service features, and over HTTP POST (SRU via POST), which avoids some length and character set restrictions that are currently present with HTTP GET. The records returned in response to a search may be in any well-defined XML format.
Search/Retrieve via URL (SRU)
OpenSearch
The web is a big place, and search engines that crawl the surface of the web are picking up only a small fraction of the great content that is out there. Moreover, some of the richest and most interesting content can not even be crawled and indexed by one search engine or navigated by one relevancy algorithm alone. Different types of content require different types of search engines. The best search engine for a particular type of content is frequently the search engine written by the people that know the content the best.

OpenSearch is a collection of simple formats for the sharing of search results. The OpenSearch description document format can be used to describe a search engine so that it can be used by search client applications. The OpenSearch response elements can be used to extend existing syndication formats, such as RSS and Atom, with the extra metadata needed to return search results.

OpenSearch was created by A9.com, an Amazon.com company, and the OpenSearch format is now in use by hundreds of search engines and search applications around the Internet. The OpenSearch specification is made available according to the terms of a Creative Commons license so that everyone can participate.
OpenSearch Web Site
Contextual Query Language (CQL)
CQL, the Contextual Query Language, defined by the U.S. Library of Congress, is a formal language for representing queries to information retrieval systems such as web indexes, bibliographic catalogs and museum collection information. The design objective is that queries be human readable and writable, and that the language be intuitive while maintaining the expressiveness of more complex languages.

Traditionally, query languages have fallen into two camps: Powerful, expressive languages, not easily readable nor writable by non-experts (e.g. SQL, PQF, and XQuery);or simple and intuitive languages not powerful enough to express complex concepts (e.g. CCL and google). CQL tries to combine simplicity and intuitiveness of expression for simple, every day queries, with the richness of more expressive languages to accomodate complex concepts when necessary.
CQL Specification Web Site
ProLearn Query Language (PLQL)
Image for 'ProLearn Query Language (PLQL)'
The ProLearn Query Language (PLQL), developed by the PROLEARN "Network of Excellence", is a query language for repositories of learning objects. PLQL is primarily a query interchange format, used by source applications (or PLQL clients) for querying repositories (or PLQL servers). PLQL has been designed with the goal of effectively supporting search over LOM, DC and MPEG-7 metadata. However, PLQL does not assume or require these metadata standards.

PLQL is based on existing language paradigms (like the Contextual Query Language - CQL), and aims to minimize the need for introducing new concepts. Given that an XML binding is available for all relevant metadata standards for learning objects, it was decided to express exact search by using query paths on hierarchies, borrowing concepts from XPath. Thus, PLQL combines two of the most popular query paradigms, allowing its implementations to re-use existing technology from both fields: approximate search (using information retrieval engines such as Lucene) and exact search (using XML-based query engines).
PLQL Specification
Comments / Suggestions / Error reporting on this page
Please, choose an item on drop-down menu and write your text
Send