Persistent ID (will always link to the latest version): <http://w3id.org/ldac/pilars>
To cite this document: (pending a publication) please use this:
Sefton, P., et al. (2024). Protocols for Implementing Long-term Archival Repositories Services (PILARS). Retrieved from http://w3id.org/ldac/pilars
This is a working draft which has been created by the below contributors.
We will be collecting feedback until the end of June 2024. Contribute at Github
More information and background is available at (RRKive.org)
Protocols for Implementing Long-term Archival Repositories Services (PILARS) by Sefton et al is licensed under CC BY 4.0
Peter Sefton p.sefton@uq.edu.au, University of Queensland, 0000-0002-3545-944X
Moises Sacal Bonequi m.sacalbonequi@uq.edu.au, University of Queensland 0000-0002-4438-2755
Alex Ip, alex.ip@aarnet.edu.au, AARNet 0000-0001-8937-8904
Michael Lynch, m.lynch@sydney.edu.au, University of Sydney 0000-0001-5152-5307
Amanda Lawrence, amanda.lawrence@rmit.edu.au, RMIT 0000-0003-2194-8178
Julia Colleen Miller, julia.miller@anu.edu.au, The Australian National University http://orcid.org/0000-0002-8827-3825
Sam Hames, s.hames@uq.edu.au, University of Queensland 0000-0002-1824-2361
Marissa Takahashi, marissa.takahashi@qut.edu.au, Queensland University of Technology 0000-0002-6695-7660
Salome Harris, ARDS Aboriginal Corporation. salome.harris@ards.com.au
River Tae Smith, river.smith@monash.edu, Monash University 0000-0002-2118-3147
Annie Cameron, anniec@wangkamaya.org.au, Wangka Maya PALC 0009-0007-5522-7121
Mark Raadgever, m.raadgever@uq.edu.au, University of Queensland
Nick Thieberger, thien@unimelb.edu.au, University of Melbourne0000-0001-8797-1018
Ben Foley, b.foley@uq.edu.au, University of Queensland 0000-0003-0879-9251
Adam Bell, adam.bell@aarnet.edu.au AARNet, 0000-0003-2129-4776
Janet McDougall, janet.mcdougall@anu.edu.au, The Australian National University 0000-0002-2151-2190
Michael Haugh, michael.haugh@uq.edu.au, University of Queensland, 0000-0003-4870-0850
This document sets out protocols for the design and implementation of sustainable Archival Repository services to achieve “CAREful FAIRness”; i.e. to support the CARE (Carroll et al. (2020)) and FAIR (Wilkinson et al. (2016) principles).
PILARS aims to guide the design and implementation of data storage services, referred to as Archival Repositories, for a range of purposes, including core use cases of:
supporting research that follows the FAIR (Wilkinson et al. (2016)) principles in any discipline, and
archiving cultural-heritage.
These protocols are designed to work alongside the CARE principles (Carroll et al. (2020))) which operate at a governance level and the Reference Model for an Open Archival Information System (OAIS) (“OAIS Reference Model (ISO 14721)” (n.d.) model.)
The high-level aims of these PILAR protocols are to:
Maximise autonomy for data custodians/stewards
Maximise return on investment in data and data infrastructure
Maximise long-term sustainability for data and for data systems and management
The technical goals to support the aims are:
Data is portable and not locked into a particular storage system.
Data can be stored and described in systems based on Open Specifications.
Services such as authorised access interfaces, catalogues and finding aids can be built and rebuilt from data in a storage system using Open Source Software solutions, services and tools.
This set of protocols is inspired by the continuing success of the technical approach taken over two decades by the PARADISEC (Pacific and Regional Archive for Digital Sources in Endangered Cultures) (Harris, Thieberger, and Barwick (2015) which houses cultural heritage material from more than 1360 languages with standard metadata with data stored in commodity services (initially files on disk, now objects in a cloud storage service), with metadata adjacent to the data, and work with the Language Data Commons of Australia to generalise the PARADISEC approach to other disciplines.)
These protocols are aimed at IT practitioners, archivists, librarians, researchers and infrastructure managers involved in long-term data management and are intended to be complementary to the existing practices and principles of those disciplines.
In a research context it is important to be able to support the FAIR principles (Wilkinson et al. (2016), ensuring that data is well described by metadata, is identified with persistent identifiers and that shared services with good governance are in place to store interoperable data, to make it findable and provide appropriate access controls. )
These protocols could form the basis for design, evaluation or procurement of archival repository-services, but also allow for data custodians to begin organising data in a format ready for archiving and digital preservation as long as they have access to some kind of commodity storage, by using a range of tools.
Each Storage Object is a directory (or storage object equivalent) containing the files including metadata and administrative files such as checksums that make up an Object.
Storage Objects can be located by inspecting the contents of the storage hierarchy by listing the paths (1.1.2) for example by the presence of a file with a defined name in the hierarchy.
Document and implement an ID resolution mapping system to map IDs to storage locations FAIR-F1.
Store documentation about the conventions and standards such as (1.3) used in a data store within the root of the storage service itself.
Data storage of well described data objects is considered separately from the current uses to which the data is put.
Data files use open or standard formats where possible, independent of particular software FAIR-I.
Do not expose data, for example via a portal without access controls or disseminate confidential license or other governance information. Licensing may change, be withdrawn and new licenses added over time, note, however once data has been distributed under an Open Access license it may not be withdrawn from those who have downloaded it.
Documentation about licenses for deposit and archive-wide accession policies may also be stored with an object
The following terms (used in capitalised form) are defined.
Used to cover any system that is designed to keep data securely for a defined period of time (often forever) and to make it findable by and available to appropriate parties. The terms Repository and Archive have different nuances and are used in a variety of ways in different communities, but here we want to emphasise the commonalities and focus on advice that is relevant to the audience of these protocols.
An individual or organisation with the authority to make decisions regarding data under management. This decision making process is assumed to take place with good governance, in line with the CARE principles.
The Digital Preservation Coalition page What is digital preservation? defines Digital Preservation as:
The series of managed activities necessary to ensure continued access to digital materials for as long as necessary, refers to all of the actions required to maintain access to digital materials beyond the limits of media failure or technological and organisational change.
A computer file is an aggregation of data on a storage device, identified by a name.
(This definition comes from a discussion thread on wikidepdia (TalkComputerFile2024))
The organizational schema for a file – this may be formally defined in a specification or be ad-hoc. File formats may considered at various layers of specificity – for example a text file may be plain text with a specific encoding such as UTF-8 and also be an XML file conforming to a particular schema
Metadata is data that describes other metadata. Linked Data Metadata follows the principles set out by Tim Berners Lee for LInked data, so that all metedata and references to entities described are URIs (URLs). (Berners-Lee (n.d.))
A versioned, published, openly available description of a set of precise requirements (e.g., for a format, system or protocol) which may or may not be endorsed by a standards authority. .
Freely distributable software according to the definition of the Open Software Foundation (OSF).
The term License is used here inclusively to refer to a document which captures the terms under which data in an Archival-Repository may be shared, used, reused or deposited. This includes documents such as Data Sharing Agreements or other contracts which may be negotiated at various times which give certain parties licence to use data in defined ways.
The term Repository Collection is used here to reference the Collection class from the Portland Common Data model[5] which was conceived as an interchange format for repository and digital library interchange definition for a collection includes this:
A Collection is a group of resources. Collections have descriptive metadata, access metadata, and may links (sic) to works and/or collections.
The term Repository Object is used here in line with the Portland Common Data mode definition [5] which is refers to an abstract object.
An Object is an intellectual entity, sometimes called a “work”, “digital object”, etc. Objects have descriptive metadata, access metadata, may contain files and other Objects as member “components”. Each level of a work is therefore represented by an Object instance, and is capable of standing on its own, being linked to from Collections and other Objects.
A Specification published by a recognized standards body such as the ISO or W3C. Standards are not always Open Access, so may have barriers to adoption.
A discrete unit in a physical storage service. This may represent, for example, a Repository Object or a Repository Collection which are abstract structural concepts. This concept is similar to an OCFL Object, and the concept of a Package in OAIS.
See the notes for more detail about implementing PILARS.