<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress.com" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>oais &amp;laquo; WordPress.com Tag Feed</title>
	<link>http://en.wordpress.com/tag/oais/</link>
	<description>Feed of posts on WordPress.com tagged "oais"</description>
	<pubDate>Wed, 10 Feb 2010 11:04:18 +0000</pubDate>

	<generator>http://en.wordpress.com/tags/</generator>
	<language>en</language>

<item>
<title><![CDATA[Curso: Gestión y preservación de la documentación electrónica]]></title>
<link>http://odei.info/2009/10/19/curso-gestion-y-preservacion-de-la-documentacion-electronica/</link>
<pubDate>Mon, 19 Oct 2009 05:54:45 +0000</pubDate>
<dc:creator>Fernando Fernandez de Aranguiz</dc:creator>
<guid>http://odei.info/2009/10/19/curso-gestion-y-preservacion-de-la-documentacion-electronica/</guid>
<description><![CDATA[Hace unos días he asistido al curso &#8220;Gestión y preservación de la documentación electrónica]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><img class="alignright size-full wp-image-202" title="jordi_serra_serra" src="http://infoodei.wordpress.com/files/2009/10/jordi_serra_serra.jpg" alt="jordi_serra_serra" width="153" height="192" />Hace unos días he asistido al curso &#8220;<a title="Gestión y preservación de la documentación electrónica" href="http://www.anabad.org/agenda/index.php?e=258" target="_blank">Gestión y preservación de la documentación electrónica</a>&#8221; que ha impartido <a title="Jordi Serra Serra" href="http://bd.ub.es/pub/serra/" target="_blank">Jordi Serra Serra</a> en el &#8220;<a title="Archivo General de La Rioja " href="http://www.larioja.org/npRioja/default/defaultpage.jsp?idtab=463050" target="_blank">Archivo General de La Rioja</a>&#8221; organizado por <a title="Anabad" href="http://www.anabad.org/" target="_blank">Anabad</a>.</p>
<p>En próximos artículos os comentaré aspectos interesantes del curso.</p>
<p>Jordi, muy afable y cercano, de forma muy clara fue desarrollando el siguiente temario:</p>
<blockquote><p>1. Características de la documentación electrónica de archivo<br />
1.1. Qué son los documentos electrónicos: características y propiedades significativas<br />
1.2. La problemática de los documentos electrónicos de archivo: disociación, virtualidad, modificabilidad y obsolescencia<br />
1.3. El valor evidencial del documento electrónico administrativo y su reconocimiento jurídico<br />
1.4. Los retos de la gestión documental electrónica: utilizar, compartir y conservar<br />
2. La gestión de la documentación electrónica de archivo<br />
2.1. Requisitos de los documentos en fase de tramitación<br />
2.2. Las soluciones tecnológicas<br />
2.2.1. Sistemas de seguridad y firma electrónica<br />
2.2.2. Sistemas de gestión y soluciones ECM: gestión documental, gestión del ciclo de vida y archivo electrónico<br />
2.3. Las soluciones metodológicas<br />
2.3.1. Normas de procedimiento<br />
2.3.2. Normas de certificación tecnológica<br />
2.4. Procedimientos de gestión<br />
2.4.1. Identificación<br />
2.4.2. Captura<br />
2.4.3. Registro<br />
2.4.4. Descripción<br />
2.4.5. Clasificación<br />
2.4.6. Valoración, selección y disposición<br />
3. La conservación de la documentación electrónica de archivo<br />
3.1. Principios de la preservación digital<br />
3.1.1. De la conservación permanente a la conservación contínua<br />
3.1.2. Del archivo como depósito al archivo como compromiso<br />
3.1.3. De la posesión a la disponibilidad de los documentos<br />
3.2. El archivo digital<br />
3.2.1. Modelos de archivo digital: OAIS<br />
3.2.2. El entorno externo del archivo digital<br />
3.2.3. Los componentes funcionales del archivo digital<br />
3.2.4. Los objetos de información de archivo: estructura y ciclo de vida<br />
3.3. Políticas y estrategias de preservación digital: conservación, migración y emulación<br />
3.4. Tecnología para la puesta en marcha del archivo digital<br />
3.4.1. Tecnología de almacenaje<br />
3.4.2. Tecnología de preservación activa<br />
4. El perfil y el papel del archivero en la Sociedad de la Información</p></blockquote>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Notes on the Open Archival Information System (OAIS)]]></title>
<link>http://gcmwalker.wordpress.com/2009/09/28/notes-on-the-open-archival-information-system-oais/</link>
<pubDate>Mon, 28 Sep 2009 16:49:12 +0000</pubDate>
<dc:creator>gcmwalker</dc:creator>
<guid>http://gcmwalker.wordpress.com/2009/09/28/notes-on-the-open-archival-information-system-oais/</guid>
<description><![CDATA[Back in 2002 Consultative Committee for Space Data Systems made a recommendation to the ISO for an O]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Back in 2002 Consultative Committee for Space Data Systems made a recommendation to the ISO for an Open Archival Information System. The recommendation has found broad acceptance and varying levels of compliance are usually elaborated upon in the digital repository software packages like DSpace. Since we want our archive to have a future as a <em>federated</em> or <em>cooperating</em> (OAIS terms) archive, and since the terminologies and concepts created in this document are widespread, I decided to take some notes on the recommendation as they relate to potential metadata elements we&#8217;ll employ.</p>
<p>The recommendation mostly concerns itself with the long term preservation of digital objects, although the framework incorporates metadata for physical objects as well. Broadly, OAIS defines an <strong>Information Object </strong>as a <strong>Data Object</strong> coupled with its <strong>Representation Information</strong>. The Representation Information allows a person to understand how the bits in the Data Object are to be interpreted. An example would be a TIFF file (Data Object) coupled with an ASCII document (Representation Information) detailing the headers, its compression method, etc., like here: <a href="http://www.digitalpreservation.gov/formats/fdd/fdd000022.shtml">TIFF description at Digital Preservation (The Library of Congress)</a>. Of course, one might also want Representation Information for the ASCII file, to explain how characters are interpreted in that format. OAIS terms this phenomenon recursive Representation Information and one might eventually accrue a <strong>Representation Network</strong> of such digital objects. One stops when the <strong>Knowledge Base</strong> of your <strong>Designated Community</strong> has the requisite knowledge to understand your top-most piece of Representation Information.</p>
<p>OAIS defines two types of Representation Information: <strong>Structure</strong> and <strong>Semantic</strong>. Structure Information describes the data format applied to the bit sequence to derive more meaningful values like characters, pixels, numbers, etc. Semantic Information describes the social meaning behind these higher values (for example that the text characters are English).</p>
<p>OAIS discourages using software that can access and use Data Objects as a replacement for comprehensive Representation Information. Although that would serve the end user well enough for a time, the software itself naturally poses its own obsolescence problem. Of course, the digital media we would like to preserve is mostly software itself. We may have datasets, images, scans, etc., but the majority of digital assets we hold are complete software packages. This includes operating systems, office suites, computer games, console games (on cartridges) and so on. Retrieving Representation Information for all these types of software will be a considerable and ongoing task, as most software will consist multiple file types.</p>
<p><!--more-->On a side note, some research (or risk assessment) will need to be done as to what conditions will preservation copies and fair use copies. The Digital Millennium Copyright Act can be exceptionally restrictive with this. This issue falls into a broader topic concerning what control we have (and if it is sufficient control) for Content Information we receive for which we do not own the intellectual property rights. A Submission Agreement can iron out these details in the case of a donor, but materials coming from recycling is another issue to investigate.</p>
<p>Along with the Data Object and Representation Information, which together form the <strong>Content Information</strong>, OAIS specifies <strong>Preservation Description Information </strong>to ensure the comprehensibility of the former. OAIS lists four type of PDI, each one give us something to consider in regards to all our holding types (software, hardware and documents):</p>
<ul>
<li>Provenance: custody information and processing history. Custody information is something we might want to really strive to extract from donors since it constitutes some of the history of personal and professional computing that we are trying to preserve. Other fields: revision history, license holder, registration, copyright.</li>
<li>Context: how the Content Information relates to information outside itself. This would be critical and quite formal if we were producing datasets, but as it stands this field seems open to a great deal of interpretation. Possible information could be use and purpose of the software, chip, computer, etc., conditions of use (where it was used, by who, etc.), and relations it may have with other materials we hold. Also manufacturing information, help files, user guides, the language of the package.</li>
<li>Reference: one or more unique identifiers. This should be contingent on our repository software and metadata set, and could include values like name, author/originator, version number or serial number.</li>
<li>Fixity: protection of the Content Information from undocumented alteration. Checksums, CRC, etc. It&#8217;s preferable to perform these operations for each file of a piece of software when we are not simply creating disk images of the media.</li>
</ul>
<p>Wrapping around all this is the <strong>Packaging Information</strong>, which explains, identifies and relates the Content Information and PDI. I believe this is where we would store critical information on media-dependent attributes for our software such as tape block sizes, CD-ROM volume information, floppy block sizes and various filesystem information. This is vital information that needs to be preserved at every step, since it constitutes the history of information storage, the conditions of the initial write and so on. Along with a disk image there should be concrete information on these aspects of the original media. In addition if we have provided another compression/packaging layer on top of this, that information would be located here. For example if we had compressed our Content Information and PDI into a tarball, the Packaging Information would explain the .tar format.</p>
<p>The Content Information and PDI constitute the <strong>Archival Information Package </strong>(AIP), which should contain all needed information to allow Long Term Preservation. OAIS also specifies <strong>Archival Information Collections</strong>, which we may use in the case of donor collections.</p>
<p>OAIS also details four migration types we will want to be aware of:</p>
<ul>
<li>Refreshment: simply copying a media instance from its original media to a new, identical piece of media. This would be when a floppy is getting old, looking old, etc., or when we are transferring data to a new hard drive (as routine backup or not). None of the infrastructure that points to AIPs needs to change, and all bits are preserved.</li>
<li>Replication: a transfer to a new media type where the Packaging Information, Content Information and PDI are exactly preserved. The infrastructure may need to be adjusted to point to a new location.</li>
<li>Repackaging: There are bits changed in the Packaging Information as a result of a transfer because new files and directories were created (though they may share the same structure and names of their originals).</li>
<li>Transformation: Bits are changed in the Content Information or PDI; the information content is preserved (supposedly).</li>
</ul>
<p>Providing that we use disk images, and do not attempt to move a software&#8217;s individual files and directory structure to new media &#8220;by hand&#8221;, we should be able to stay in the safe realms of Refreshment and Replication.</p>
<p>The recommendation covers a great deal more in submission protocols, dissemination, management, policy and materials control, but I what&#8217;s covered here I feel is most relevant to keep in mind when designing the metadata schema for our holdings.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[OAIS Reference Model Part II: The Model]]></title>
<link>http://easydigitalpreservation.wordpress.com/2009/08/03/oais-reference-model-part-ii-the-model/</link>
<pubDate>Mon, 03 Aug 2009 21:28:40 +0000</pubDate>
<dc:creator>M. Amaral</dc:creator>
<guid>http://easydigitalpreservation.wordpress.com/2009/08/03/oais-reference-model-part-ii-the-model/</guid>
<description><![CDATA[Welcome to Part II of my OAIS Reference Model crash course!  By now you probably have noticed that I]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Welcome to Part II of my OAIS Reference Model <a href="http://easydigitalpreservation.wordpress.com/2009/07/29/oais-reference-model-part-i-background-and-influence/">crash course</a>!  By now you probably have noticed that I have refrained from including in this post any of the <a href="http://www.paradigm.ac.uk/images/information-package.gif">many</a> <a href="http://www.paradigm.ac.uk/images/preservation-description.gif">graphed</a> <a href="http://www.dlib.org/dlib/july04/beagrie/fig-3.gif">images</a> that are in the OAIS reference model document.  This is because before I had a basic understanding of the model, these images seemed supremely complicated and confusing&#8230;kind of like Power Point slides with too many words.  I hope that what I provide here is a substantial enough understanding of the OAIS model to make the images less frightening when you do eventually encounter them.</p>
<h4>Model Roles:</h4>
<p>To start, it is important to recognize the three types of people that will be affiliated with a repository within the OAIS framework: the <em>Producers</em> of the repository&#8217;s content, the <em>Managers</em> of the content and repository, and the <em>Consumers</em> who use the content stored in the repository.  Each phase of the preservation process effects these three roles.  The ingest, the processing and storage, and the accessing of digital objects</p>
<h4>The Model in Brief:</h4>
<p>The document for the OAIS reference model has several key areas of content:</p>
<ul>
<li><strong><span style="color:#666699;">Terminology</span></strong>: An awesome vocabulary and glossary for the operations and information structures of repositories is located in Section 1.</li>
</ul>
<ul>
<li> <strong><span style="color:#666699;">Mandatory responsibilities</span></strong>: A list of the things that a repository must do in order be considered an OAIS-type repository comprises Section 3.  One particular action that this section calls for is identifying a designated producer/consumer community and ensuring that the information within the repository (metadata, etc), should be independently understandable (and accessible) by this community.  This means that &#8220;the community should be able to understand the information without needing the assistance of the experts who produced the information.&#8221;  Read <a href="http://alanake.wordpress.com/2008/01/22/oais-6-mandatory-oais-responsibilities/">this</a> for more detail about the other mandatory responsibilities.</li>
</ul>
<ul>
<li> A <strong><span style="color:#666699;">model for ingesting, storing, and providing access</span></strong> to stored items, including a very smart model for capturing each item&#8217;s metadata (Content Information) and preservation metadata (Preservation Description Information).  Together, this data is discussed as an item&#8217;s &#8220;packaging information.&#8221;  It is intended to include information about an item&#8217;s context in order to fulfill one of an OAIS-type repository&#8217;s mandatory responsibilities.  This is all discussed in Section 2.</li>
</ul>
<ul>
<li> An <strong><span style="color:#666699;">outline for </span></strong><span style="color:#666699;"><strong>administrative management</strong></span> of the repository and the OAIS functions is presented in Section 4.  This discusses working with the creators of the digital objects and the objectives behind the day-to-day mangement of the repository. The administrative role also oversees the general planning and governance of the repositories, and include policy and preservation decisions.</li>
</ul>
<ul>
<li> <strong><span style="color:#666699;">Actual preservation methods</span></strong>: Preservation processes such as digital migration and emulation are examined in Section 5.  Preservation Planning is obviously a central part of any repository&#8217;s role.</li>
</ul>
<ul>
<li> <strong><span style="color:#666699;">Archive and repository interoperability</span></strong>: concepts behind repository interoperability and federation are discussed and explained in Section 6.  Heavy cooperation between repositories to develop common local standards in order to make this a possibility.</li>
</ul>
<p>By following the OAIS model and the mandatory responsibilities which it entails, a repository will gain recognition as an OAIS-type archive or repository.  It is beneficial for a repository to be recognized as such because it means that the well-documented archival standards of the OAIS model will have been applied to help ensure the effective long-term storage, retrieval, and preservation of digital documents.  Another benefit is that communication with similarly-purposed OAIS repositories will be easy and fluid.</p>
<h4>OAIS in Action:</h4>
<p><a href="http://www.dspace.org/">DSpace</a> and <a href="http://www.fedora-commons.org/">Fedora</a> are two repository software platforms that have included OAIS-compliance capabilities in their product.  This helps pave the road for any repository that is built using either of these open source systems to follow procedures from the OAIS model.</p>
<p>What I would love to find or collect is a list of actual digital archives and repositories that are following the OAIS model either by the book or in some variation.  If anyone has a suggestion, please post a comment!</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[OAIS Reference Model Part I: Background and Influence]]></title>
<link>http://easydigitalpreservation.wordpress.com/2009/07/29/oais-reference-model-part-i-background-and-influence/</link>
<pubDate>Thu, 30 Jul 2009 02:00:29 +0000</pubDate>
<dc:creator>M. Amaral</dc:creator>
<guid>http://easydigitalpreservation.wordpress.com/2009/07/29/oais-reference-model-part-i-background-and-influence/</guid>
<description><![CDATA[The OAIS model is an international standard that has been adopted for guiding the long term preserva]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>The OAIS model is an international standard that has been adopted for guiding the long term preservation of digital data and documents.  In fact, the OAIS model is an <a href="http://easydigitalpreservation.wordpress.com/2009/07/29/iso-standards/">ISO</a> standard (ISO 14721:2003): it was developed by the Consultative Committee for Space Data Systems (<a href="http://public.ccsds.org/default.aspx" target="_blank">CCSDS</a>) in 2002, and was adopted as an ISO standard in 2003.  The document is freely available, despite the fact that most ISO documentation is usually sold as a service.  It&#8217;s a hefty 148-pages, available in PDF form <a href="http://public.ccsds.org/publications/archive/650x0b1.pdf" target="_blank">here</a>.</p>
<div id="attachment_78" class="wp-caption alignright" style="width: 180px"><a href="http://www.flickr.com/photos/olibac/2354102486/"><img class="size-medium wp-image-78" style="border:1px solid black;" title="oais" src="http://easydigitalpreservation.wordpress.com/files/2009/07/oais.jpg?w=295" alt="oais" width="170" height="173" /></a><p class="wp-caption-text">Photo by OliBac licensed under Creative Commons</p></div>
<p>The OAIS model is a standardized model describing a way that digital repositories intended for preservation purposes can be run.  Within this model, you will not find a standard for metadata.  It also does not endorse any particular repository platform, software, protocols or implementation procedure.  The OAIS model is simply a set of standardized guidelines intended to aid the people and systems behind a repository that has been designated with the responsibility of maintaining documents for archival purposes over a long period of time.</p>
<p>OAIS stands for Open Archival Information System, the word <em>open</em> referring to the open and public process under which this model was developed.  Participation in its initial development was encouraged by the CCSDS, and as an ISO standard, it will go under review every five years.</p>
<p>Because the OAIS model is a recognized standard, its users have formed a default sub-community within the digital preservation community.  But it has also been very beneficial to the digital preservation community at large and has helped promote progressive thinking and discussion.  Here are some key reasons why the OAIS model is so helpful to the digital preservation process and community:</p>
<ul>
<li> It has standardized the terminology associated with digital preservation</li>
</ul>
<ul>
<li> It has outlined the duties and services of a preservation repository</li>
</ul>
<ul>
<li> It has outlined a way that information should be attributed and managed within a repository</li>
</ul>
<ul>
<li> It has mobilized community discussions about repository standards and certification</li>
</ul>
<ul>
<li> It has included preservation metadata as an important part of the preservation process</li>
</ul>
<ul>
<li> It focuses on long-term preservation, but lets &#8220;long-term&#8221; be defined by the repository managers</li>
</ul>
<ul>
<li> OAIS-type archives are committed to a set of defined responsibilities</li>
</ul>
<p>As a final note, is important to make it clear that the OAIS model is by no means a requirement for a digital repository; while it is a recognized way of running a repository, it is not the <em>only </em>way.  It may not fit for some repositories, depending on their intended size, resources, and designated communities.  But admittedly, when a repository chooses not to follow the OAIS recommendations, it cannot fall under the umbrella of the most widely-used and understood digital archive standard.</p>
<p>&#8212;&#8212;&#8212;&#8212;</p>
<p><span style="color:#333333;">Here are some resources that were incredibly useful for me while writing this post and the one to follow:</span></p>
<ul>
<li><span style="color:#333333;">I really benefited from reading <a href="http://everybodyslibraries.com/2008/10/13/what-repositories-do-the-oais-model/">this</a> post by John Mark Ockerbloom, the editor of the blog <a href="http://everybodyslibraries.com/" target="_blank"><em>Everybody&#8217;s Libraries</em></a>.  I almost considered forgoing my own entry and just directing readers directly to his!</span></li>
</ul>
<ul>
<li><span style="color:#333333;">And then I found <a href="http://alanake.wordpress.com/2008/01/15/oais-reference-model-general/" target="_blank">this</a> post and was blown away by how thorough it is.  It&#8217;s really well done and I&#8217;d encourage you to check it out.</span></li>
</ul>
<ul>
<li><span style="color:#333333;">Brian Lavoie of OCLC wrote this <a href="http://www.dpconline.org/docs/lavoie_OAIS.pdf">introductory guide</a> to the OAIS model<a href="http://www.dpconline.org/docs/lavoie_OAIS.pdf">.</a></span></li>
</ul>
<ul>
<li><span style="color:#333333;">These <a href="http://www.slideshare.net/j.allinson/oais-as-a-reference-model-for-repositories" target="_blank">slide show</a> <a href="http://www.slideshare.net/michaelday/introduction-to-oais-model" target="_blank">presentations</a> have excellent and concise information.</span></li>
</ul>
<ul>
<li><span style="color:#333333;"><a href="http://standards.jisc.ac.uk/catalogue/OAIS.phtml">This</a> page is a brief run-down of OAIS from the JISC Standards Catalogue.</span></li>
</ul>
<p><span style="color:#333333;"><a href="http://wp.me/pArq0-1H">Continue on to Part II</a><br />
</span></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[RODA: Portugal's new digital preservation repository]]></title>
<link>http://alanake.wordpress.com/2009/06/26/roda-portugals-new-digital-preservation-repository/</link>
<pubDate>Fri, 26 Jun 2009 08:10:06 +0000</pubDate>
<dc:creator>alanake</dc:creator>
<guid>http://alanake.wordpress.com/2009/06/26/roda-portugals-new-digital-preservation-repository/</guid>
<description><![CDATA[RODA RODA (Repository of Authentic Digital Objects) is a Portuguese initiative to preserve governmen]]></description>
<content:encoded><![CDATA[RODA RODA (Repository of Authentic Digital Objects) is a Portuguese initiative to preserve governmen]]></content:encoded>
</item>
<item>
<title><![CDATA[Borrador de la versión 2 de OAIS]]></title>
<link>http://odei.info/2009/05/27/borrador-de-la-version-2-de-oais/</link>
<pubDate>Wed, 27 May 2009 22:00:45 +0000</pubDate>
<dc:creator>Fernando Fernandez de Aranguiz</dc:creator>
<guid>http://odei.info/2009/05/27/borrador-de-la-version-2-de-oais/</guid>
<description><![CDATA[He leído en el artículo &#8220;Actualización del estándar OAIS&#8221; del blog Archivistica.net que ]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><img class="alignright size-full wp-image-55" title="OAIS-2-mini" src="http://infoodei.wordpress.com/files/2009/05/oais-2-mini.jpg" alt="OAIS-2-mini" width="263" height="172" />He leído en el artículo &#8220;<a href="http://archivistica.blogspot.com/2009/05/actualizacion-del-estandar-oais.html" target="_blank">Actualización del estándar OAIS</a>&#8221; del blog <a href="http://archivistica.blogspot.com/" target="_blank">Archivistica.net</a> que ya está disponible el <a href="http://cwe.ccsds.org/moims/docs/MOIMS-DAI/Draft%20Documents/OAIS-candidate-V2-markup.pdf" target="_blank">borrador de la versión 2 de OAIS</a>.</p>
<p>En el artículo indican las diferencias fundamentales respecto de la versión 1: la aclaración de términos y conceptos, incidiendo en la autenticidad, las propiedades de la información y la inclusión en el paquete de difusión (PDI) de los derechos de acceso.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[OAIS version for public examination]]></title>
<link>http://dccbi.wordpress.com/2009/05/14/oais-version-for-public-examination/</link>
<pubDate>Thu, 14 May 2009 12:24:00 +0000</pubDate>
<dc:creator>dccbi</dc:creator>
<guid>http://dccbi.wordpress.com/2009/05/14/oais-version-for-public-examination/</guid>
<description><![CDATA[Thanks to David Giaretta for the following information on the state of the revision to OAIS (I have ]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Thanks to David Giaretta for the following information on the state of the revision to <a href="http://public.ccsds.org/publications/archive/650x0b1.pdf">OAIS</a> (I have commented <a href="http://digitalcuration.blogspot.com/2008/12/comments-on-oais-responses-to-our.html">earlier</a> on this process):<br />
<blockquote><span style="font-weight:bold;">OAIS version for public examination</span></p>
<p>Many comments and ideas for clarifications and improvements for OAIS were received as part of its 5 year review process.</p>
<p>These suggestions were reviewed and the proposed dispositions sent to their originators for further comment.  This draft version of OAIS contains these and many other improvements and is the candidate for submission to ISO for review. At this stage we are seeking primarily to identify errors rather than further ideas.</p>
<p>The PDF file is available at <a href="http://cwe.ccsds.org/moims/docs/MOIMS-DAI/Draft%20Documents/OAIS-candidate-V2-markup.pdf">http://cwe.ccsds.org/moims/docs/MOIMS-DAI/Draft%20Documents/OAIS-candidate-V2-markup.pdf</a></p>
<p>Please send corrections to <a href="mailto:oais-support@oais.info">oais-support@oais.info</a> by 15 June 2009</p>
<p>(NB there are some cross-reference errors which will be corrected in the final version)</p>
<p>Shortly after this date the corrected OAIS update will be sent to ISO and in due course this will be released for international review at which point further comments may be submitted.</p>
<p>John Garrett (chair)               David Giaretta (deputy-chair)<br />DAI-WG CCSDS</p></blockquote>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Présentation de PAC, la plateforme d’archivage pérenne du CINES / Olivier Rouchon]]></title>
<link>http://journeesao.wordpress.com/2009/05/04/archivage-perenne/</link>
<pubDate>Mon, 04 May 2009 20:36:22 +0000</pubDate>
<dc:creator>marlène</dc:creator>
<guid>http://journeesao.wordpress.com/2009/05/04/archivage-perenne/</guid>
<description><![CDATA[La mission d&#8217;archivage confiée au CINES depuis 2004 a vu son importance renforcée depuis 2008 ]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>La mission d&#8217;archivage confiée au <a href="http://www.cines.fr/" target="_blank">CINES</a> depuis 2004 a vu son importance renforcée depuis 2008 (recadrage de l&#8217;institution). C&#8217;est dans cette démarche que s&#8217;inscrit le service <a href="http://www.cines.fr/spip.php?rubrique152" target="_blank">PAC</a>, qui propose à la fois une architecture technique et un soutien <span style="text-decoration:line-through;">psychologique</span> en terme de gestion de projet aux établissements qui souhaitent mettre en place une solution de conservation à long terme (30 ans et +) de leurs données scientifiques, patrimoniales et administratives.</p>
<p>Rappel des risques connus liés aux fichiers informatiques : la corruption physique des données, l&#8217;obsolescence des supports et des logiciels&#8230; L&#8217;objectif du service est de limiter l&#8217;impact des ces risques, notamment par le respect de normes (<a href="http://www.figoblog.org/document1089.php" target="_blank">OAIS</a>), la conservation des spécifications des formats employés avec les documents archivés (pour garantir une re-conversion éventuelle)&#8230;</p>
<p>PAC fonctionne avec des services &#8220;versants&#8221;, qui collectent les documents et les dépôsent, et se chargent le plus souvent de développer des interfaces de consultation, et ne travaille pas directement avec les producteurs de données (l&#8217;ABES est un de ces intermédiaires).</p>
<p>Concrètement, le service gère déjà l&#8217;archivage d&#8217;un certain nombre de réalisations (Persée), et est partenaire de projets (les thèses dans le cadre de STAR, HAL, les vidéos de Canal-U, des archives sonores dans le cadre du TGE-Adonis&#8230;). Une possibilité de dépôt direct depuis ORI-OAI est à l&#8217;étude.</p>
<p>Enfin, pour <strong>Olivier Rouchon,</strong> la conservation devra de plus en plus être intégrée clairement dans le cycle de vie du document : il faut prévoir la constitution d&#8217;une version archivable (répondant donc aux normes et formats utilisés par PAC) &#8211; cela demande des ressources humaines, il y a donc tout intérêt à mutualiser ce type de service.</p>
<p>Marlène Delhaye.</p>
<p><!-- SlideShare error: doc is missing or has illegal characters /[^-_a-zA-Z0-9]/ --></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Repository preservation revisited]]></title>
<link>http://dccbi.wordpress.com/2009/03/09/repository-preservation-revisited/</link>
<pubDate>Mon, 09 Mar 2009 17:15:00 +0000</pubDate>
<dc:creator>dccbi</dc:creator>
<guid>http://dccbi.wordpress.com/2009/03/09/repository-preservation-revisited/</guid>
<description><![CDATA[Are institutional repositories set up and resourced to preserve their contents over the long term? P]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Are institutional repositories set up and resourced to preserve their contents over the long term? Potentially contradictory evidence has emerged from my various questions related to this topic.</p>
<p>You may remember that on the <a href="http://digitalcuration.blogspot.com/2009/02/repositories-and-preservation.html">Digital Curation Blog</a> and the <a href="http://www.jiscmail.ac.uk/archives/jisc-repositories.html">JISC-Repositories</a> JISCmail list on 23 February 2009, I referred to some feedback from two Ideas (<a href="http://jiscrepository.ideascale.com/akira/dtd/2276-784">here</a> and <a href="http://jiscrepository.ideascale.com/akira/dtd/2643-784">here</a>) on the JISC Ideascale site last year, and asked 3 further questions relating to repository managers’ views of the intentions of their repositories. Given a low rate of response to the original posting (which asked for votes on the original Ideascale site), I followed this up on the JISC-Repositories list (but through oversight, not on the blog), offering the same 3 questions in a Doodle poll. The results of the several different votes appear contradictory, although I hope we can glean something useful from them.</p>
<p>I should emphasise that this is definitely not methodologically sound research; in fact, there are methodological holes here large enough to drive a Mack truck through! Nevertheless, we may be able to glean something useful. To recap, here are the various questions I asked, with a brief  description of their audience, plus the outcomes:<br />
<blockquote>a) Audience, JISC-selected “expert” group of developers, repository managers and assorted luminaries. Second point is the same audience, a little later.
<ul>
<li>Idea: “The repository should be a full OAIS [CCSDS 2002] preservation system.” Result 3 votes in favour, 16 votes against, net -13 votes.</li>
<li>Idea: “Repository should aspire to make contents accessible and usable over the medium term.” Result: 13 votes in favour, 1 vote against, net +12 votes.</li>
</ul>
<p>b) Audience JISC-Repositories list and Digital Curation Blog readership. Three Ideas on Ideascale, with the results shown (note, respondents did not need to identify themselves):
<ul>
<li>My repository does not aim for accessibility and/or usability of its contents beyond the short term (say 3 years). Result 2 votes in favour, none against.</li>
<li>My repository aims for accessibility and/or usability of its contents for the medium term (say 4 to 10 years). Result 5 votes in favour, none against.</li>
<li>My repository aims for accessibility and/or usability of its contents for the long term (say greater than 10 years). Result 8 votes in favour, 1 vote against, net +7 votes. </li>
</ul>
<p>A further comment was left on the Digital Curation Blog, to the effect that since most repository managers were mainly seeing deposit of PDFs, they felt  (perhaps naively) sufficiently confident to assume these would be useable for 10 years.</p>
<p>c) Audience JISC-Repositories list. Three exclusive options on a Doodle poll, exact wording as in (c), no option to vote against any option, with the results shown below (note, Doodle asks respondents to provide a name and most did, with affiliation, although there is no validation of the name supplied):
<ul>
<li>My repository does not aim for accessibility and/or usability of its contents beyond the short term (say 3 years). Result 1 vote in favour.</li>
<li>My repository aims for accessibility and/or usability of its contents for the medium term (say 4 to 10 years). Result 0 votes in favour.</li>
<li>My repository aims for accessibility and/or usability of its contents for the long term (say greater than 10 years). Result 22 votes in favour.</li>
</ul>
</blockquote>
<p>I guess the first thing is to notice the differences between the 3 sets of results. The first would imply that long term is definitely off the agenda, and medium term is reasonable. The second is 50-50 split between long term and the short/medium term combination. The third is overwhelmingly in favour of long term (as defined).</p>
<p>By now you can also see at least some of the methodological problems, including differing audiences, differing anonymity, and differing wording (firstly in relation to the use of the term “OAIS”, and secondly in relation to the timescales attached to short, medium and long term). So, you can draw your own conclusions, including that none can be drawn from the available data!</p>
<p>Note, I would not draw any conclusions from the actual numerical votes on their own, but perhaps we can from the values within each group. However, ever hasty if not foolhardy, here are my own tentative interpretations:
<ul>
<li>First, even “experts” are alarmed at the potential implications of the term “OAIS”.</li>
<li>Second, repository managers don’t believe that keeping resources accessible and/or usable for 10 years (in the context of the types of material they currently manage in repositories) will give them major problems.</li>
<li>Third, repository managers don’t identify “accessibility and/or usability of its contents for the long term” as implying the mechanisms of an OAIS (this is perhaps rather a stretch given my second conclusion).</li>
</ul>
<p>So, where to next? I’m thinking of asking some further questions, again of the JISC-Repositories list and the audience of the Digital Curation Blog. However, this time I’m asking for feedback on the questions, before setting up the Doodle poll. My draft texts are<br />
<blockquote>
<ul>
<li>My repository is resourced and is intended to keep its contents accessible and usable for the long term, through potential technology and community changes, implying at least some of the requirements of an OAIS.</li>
<li>My repository is resourced and is intended to keep its contents accessible and usable unless there are significant changes in technology or community, ie it does not aim to be an OAIS.</li>
<li>Some other choice, please explain in free text…</li>
</ul>
</blockquote>
<p>Are those reasonable questions? Or perhaps, please help me improve them!</p>
<p>This post is made both to the Digital Curation Blog and to the JISC-repositories list&#8230;</p>
<p>OAIS: CCSDS. (2002). Reference Model for an Open Archival Information System (OAIS). Retrieved from <a href="http://public.ccsds.org/publications/archive/650x0b1.pdf">http://public.ccsds.org/publications/archive/650&#215;0b1.pdf</a>.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Email discussion on the usefulness of file format specifications]]></title>
<link>http://dccbi.wordpress.com/2009/01/06/email-discussion-on-the-usefulness-of-file-format-specifications/</link>
<pubDate>Tue, 06 Jan 2009 21:33:00 +0000</pubDate>
<dc:creator>dccbi</dc:creator>
<guid>http://dccbi.wordpress.com/2009/01/06/email-discussion-on-the-usefulness-of-file-format-specifications/</guid>
<description><![CDATA[This is a summary of an email exchange on the DCC Associates email list over a few days in late Nove]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>This is a summary of an email exchange on the DCC <a href="http://www.dcc.ac.uk/associates/">Associates</a> email list over a few days in late November, early December. I thought it was revealing of attitudes to preservation formats and to representation information (in the form of both specifications and running code), so I’ve summarised it here. Emails lists are great for promoting discussion, but threads tend to fracture off in various directions, so a summary can be useful. Quotes are reproduced with permission; my thanks to all those involved.</p>
<p>Steve Rankin from the <a href="http://www.dcc.ac.uk/">DCC</a> down in <a href="http://www.scitech.ac.uk/About/Find/RAL/Introduction.aspx">Rutherford Labs</a> noticed and drew the list’s attention to the Microsoft pages relating to their binary formats, made available under a so-called &#8220;Microsoft Open Specification Promise”.<br /><a href="http://www.microsoft.com/interop/docs/OfficeBinaryFormats.mspx"><br />http://www.microsoft.com/interop/docs/OfficeBinaryFormats.mspx</a> and <a href="http://www.microsoft.com/interop/osp/default.mspx">http://www.microsoft.com/interop/osp/default.mspx</a></p>
<p>Chris Puttick of <a href="http://thehumanjourney.net/">Oxford Archaeology</a> pointed out that the pages had been up for a while (since February 2008 at least). He made a couple of interesting points:<br />
<blockquote>“I have it on excellent authority that the specifications are useful but incomplete […]; secondly that as is this not the first time MS have published such information only to take it down again later [so] anyone interested in them should download them as soon as possible. I have on slightly less excellent authority that a ‘promise’ as encased in the [Open Specification Promise] is specifically something in US law and may not have any validity outside of the US.”</p></blockquote>
<p>Kevin Ashley from <a href="http://www.ulcc.ac.uk/digital-preservation/current-activities/ndad.html">ULCC/NDAD</a> agreed:<br />
<blockquote>“It&#8217;s my understanding &#8211; from those who have tried &#8211; that earlier specs that MS published failed exactly that test. It wasn&#8217;t possible to use them to write software that dealt with all syntactic and semantic variations.</p>
<p>“It&#8217;s a fairly fundamental test for network protocols that one can […] get two separate implementations to communicate with each other. The same is true of file formats, to my mind, and one can see the creating application and the reading application as equivalent to the two ends of a network connection, albeit not necessarily in real time.”</p></blockquote>
<p>David Rosenthal from Stanford and <a href="http://www.lockss.org/">LOCKSS</a> injected some engineering reality from direct experience into the discussion. He has already released a <a href="http://blog.dshr.org/2009/01/are-format-specifications-important-for.html">longer blog post</a> based on the discussion and his contribution; effectively he seemed to be aiming to demolish the argument for keeping specifications at all.<br />
<blockquote>“Speaking as someone who has helped implement PostScript from the specifications, I can assure you that published specifications are always incomplete.  There is no possibility of specifying formats as complex as CAD or Word so carefully that a clean-room implementation will be perfect.  Indeed, there are always minor incompatibilities (sometimes called enhancements, and sometimes called bugs) between different versions of the same code.  And there is no possibility of digital preservation efforts being able to afford the investment to do high-quality clean-room implementations of these complex formats.  Look at the investment represented by the Open Office suite, for example.</p>
<p>“On the other hand, note that Open Office and other open source office suites in practice do an excellent job of rendering MS formats, and their code thus represents a very high quality specification for these formats.  Code is the best representation for preservation metadata.”</p></blockquote>
<p>Colin Neilson from <a href="http://www.dcc.ac.uk/scarp/">DCC SCARP</a> wondered what the implications of incomplete specifications were for the concept of Representation Information in <a href="http://public.ccsds.org/publications/archive/650x0b1.pdf">OAIS</a> (RepInfo is often associated in examples with specifications).</p>
<p>He wrote:<br />
<blockquote>“I am interested in implications for areas (such as CAD software) where proprietary (secret sauce) formats are historically the norm. Is the legacy of digital working always preservable within an OAIS framework? […] Are there some limits in using an OAIS model if some &#8220;specifications&#8221; are inadequate or information is not available?”</p></blockquote>
<p>and in a later message<br />
<blockquote>“Do we need to have &#8220;access software&#8221; preserved (long term) if the other representation information is less complete in the case where standards for proprietary file formats (say like Microsoft word DOC format) are to a degree incomplete, less adequate or not available (perhaps more so in the case of older versions of file formats)?”</p></blockquote>
<p>Personally I think one of the advantages of Open Office is that it is not just Access Software, but Open Source Access Software. This should give it much greater longevity. But of course, such alternatives don’t exist in many areas, including many of the CAD formats Colin is concerned about.</p>
<p>Alan Morris from Morris and Ward asked the obvious question:<br />
<blockquote>“Who would even consider utilizing WORD as a preservation format?”</p></blockquote>
<p>… and got a surprising answer, from Peter Murray-Rust from the <a href="http://wwmm.ch.cam.ac.uk/wikis/wwmm/index.php/Main_Page">eponymous Cambridge research group</a>!<br />
<blockquote>“I would, and I argued this in my plenary lecture at <a href="http://or08.ecs.soton.ac.uk/">OpenRepositories08</a>. Not surprisingly it generated considerable discussion, from both sides.</p>
<p>“First the disclaimer. I receive research funding (though not personal funding) from Microsoft Research. Some of you may wish to stop reading now! But I don&#8217;t think it colours my judgment.</p>
<p>“My argument was not that Word2007 should be the only format, but that it should be used in conjunction with formats such as PDF. We have a considerable amount of work on [depositing] born-digital theses and we have recommended that theses should be captured in their original format (OOXML, ODT, LaTeX, etc.) as well as the PDF.</p>
<p>“I am a scientist (chemist) but generally interested in all forms of STM data (for example we collaborated in part of the KIM project mentioned a few emails ago). If you believe that preservation only applied to the holy &#8220;fulltext&#8221;, stop reading now. However I think many readers would agree that much of the essential information in STM work (experiments, data, protocols, code, etc.) is lost in the process of publication and reposition. Very frequently, however, the original born-digital work contains semantic information which can be retrieved. For example OOXML and ODT allow nearly 100% of chemical information (molecular structures) to be retrieved (in certain circumstances), whereas PDF allows 0% by default. (It is possible, though extremely difficult and extremely lossy, to turn PDF primitives back into chemistry)</p>
<p>“Note that we also work on Open Office documents and have a JISC-sponsored collaboration with Peter Sefton [of the <a href="http://www.usq.edu.au/adfi/default.htm">Australian Digital Futures Institute</a> of USQ in Australia] on his excellent <a href="http://ice.usq.edu.au/default.htm">ICE</a> system. We are exploring how easy it is to author chemistry directly into an ODT document and by implication into any compound semantic document (note that XML is the only practical way of holding semantics). […]”</p>
<p>“We&#8217;ve looked into using PDF for archiving chemistry and found that current usage makes this almost impossible. So we work with imperfect material.</p>
<p>“Note that Word2007 can emit OOXML that can be interpreted with Open Source tools. The conversion is not 100%, but whatever is? […]”</p>
<p>“I wonder whether the all the detractors of OOXML have looked at it in detail. Yes, it is probably impossible to recreate all the minutiae of typesetting, but it preserves much of the embedded information that less semantic formats (PDF and even LaTeX) do not. If I have no commercial software and someone gives me a PDF of chemistry and someone else gives me OOXML I&#8217;d choose the OOXML. HTML is, in many cases, a better format than PDF.</p>
<p>“So my suggestion is simple. Use more than one document format. After all do we really know what future generations want from the preservation process. It costs almost nothing as we are going to have to address compound documents and packaging anyway.”</p></blockquote>
<p>An anonymous contributor suggested that the appropriate course was to structure AIPs to contain both original source format and the preservation format. In the future, he asserted, better tools may exist to take the original source format and render a more completely accessible preservation format, particularly bearing in mind scientific notation.</p>
<p>Finally, <a href="http://www.cs.indiana.edu/%7Egeobrown/">Geoffrey Brown from Indiana</a> also argued in favour of keeping the original (and against NARA policy):<br />
<blockquote>“The Bush administration as well as various companies managed to embarrass themselves with inadvertently leaked information in the form of edit histories in word documents.   Migration will likely (who knows ?) discard such information unless special care is taken in developing migration tools.</p>
<p>“I am uncomfortable with the assumption that we can abandon the original documents as NARA seems to be doing by requiring(?) agencies to submit documents in PDF).   The edit histories are part of the historical record; however, it&#8217;s safe to say that most patrons will be satisfied with the migrated document.</p>
<p>“Digital repositories have an obligation to figure out how to preserve access to documents in any format and not use format as a gatekeeper.”</p></blockquote>
<p>So… running code is better than specs as representation information, and Open Source running code is better than proprietary running code. And, even if you migrate on ingest, keep BOTH the rich format and a desiccated format (like PDF/A). It won’t cost you much and may win you some silent thanks from your eventual users!</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Comments on OAIS responses to our comments on OAIS]]></title>
<link>http://dccbi.wordpress.com/2008/12/09/comments-on-oais-responses-to-our-comments-on-oais/</link>
<pubDate>Tue, 09 Dec 2008 23:01:00 +0000</pubDate>
<dc:creator>dccbi</dc:creator>
<guid>http://dccbi.wordpress.com/2008/12/09/comments-on-oais-responses-to-our-comments-on-oais/</guid>
<description><![CDATA[Yes, it sounds weird and it was, a bit. One of the workshops at the International Digital Curation C]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Yes, it sounds weird and it was, a bit. One of the workshops at the International Digital Curation Conference was to consider the proposed &#8220;dispositions&#8221; to the DCC/DPC comments on <a href="http://public.ccsds.org/publications/archive/650x0b1.pdf">OAIS</a>, made around two years ago! Sarah Higgins of the DCC and Frances Boyle of the DPC had an initial look and tried to work out which proposed dispositions we might have an issue with. A couple of the original participants made written comments in advance, but the rest of us (the original group or their successors, plus one slightly bemused visitor) had less than 3 hours to hammer through an see whether we could improve on the dispositions. The aim was to identify areas where we really felt the dispositon was wrong, to the extent that it would seriously weaken the standard, AND we were able to provide comments of the form &#8220;delete xxxx, insert yyyy&#8221;. We did identify a few such, but were also pleased to discover that some areas where we thought there would be no change would in fact receive significant revision (although we haven&#8217;t yet seen the revision).</p>
<p>We understand that the process now is for the OAIS group (<a href="http://mailman.ccsds.org/mailman/listinfo/moims-dai">MOIMS-DAI</a>) to finalise their text, and then put it out for a second round of public comment early next year. After that, it will go through the <a href="http://www.ccsds.org/">CCSDS</a> review process (ie it gets voted on by Space Agencies interested in Mission Operations Information), before going on to ISO, where it gets voted on by National Bodies involved in its <a href="http://www.iso.org/iso/iso_technical_committee.html?commid=46612">Technical Committee</a>! So, don&#8217;t hold your breath!</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Preservação Digital]]></title>
<link>http://preservacaodigital.wordpress.com/2008/11/24/preservacao-digital/</link>
<pubDate>Mon, 24 Nov 2008 15:52:45 +0000</pubDate>
<dc:creator>anaraqpereira</dc:creator>
<guid>http://preservacaodigital.wordpress.com/2008/11/24/preservacao-digital/</guid>
<description><![CDATA[Muitos são os temas que se tem discutido em torno da Preservação Digital. Desde que surgiu os primei]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Muitos são os temas que se tem discutido em torno da Preservação Digital. Desde que surgiu os primeiros registos, que sentiu-se uma forte necessidade de preservar o conhecimento para uma posterioridade. De acordo com  a pesquisa que temos vindo a desenvolver, verificamos que tem sido desenvolvidos estudos sobre esta temática, especialmente com o programa i2010 que prevê a digitalização de todo o material analógico quer a nível nacional como internacional. Vários modelos de metadados tem sido propostos por organizações internacionais como e o caso do <em>Open Archival  Information System</em> (OAIS) que constitui-se hoje num modelo de referência.</p>
<p>Brevemente traremos mais noticias sobre este tema enquanto um processo tecnológico.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Some interesting posts elsewhere]]></title>
<link>http://dccbi.wordpress.com/2008/11/05/some-interesting-posts-elsewhere/</link>
<pubDate>Wed, 05 Nov 2008 19:44:00 +0000</pubDate>
<dc:creator>dccbi</dc:creator>
<guid>http://dccbi.wordpress.com/2008/11/05/some-interesting-posts-elsewhere/</guid>
<description><![CDATA[I’m sorry for the gap in posting; I’ve been taking a couple of weeks of leave at the end of my trip ]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>I’m sorry for the gap in posting; I’ve been taking a couple of weeks of leave at the end of my trip to Australia. Since return I’ve been catching up on my blog reading, and there are some interesting posts around.</p>
<p>A couple of people (Robin Rice and Jim Downing in particular) have mentioned the post <a href="http://oxfordrepo.blogspot.com/2008/10/modelling-and-storing-phonetics.html"><span style="font-style:italic;">Modelling and storing a phonetics database inside a store</span></a>, from the Less Talk, More Code blog (Ben O&#8217;Steen). This is a practical report on the steps Ben took to put a database into a Fedora Commons-based repository. He details the analysis he went through, the mappings he made, the approaches to capturing representation information, to making the data citable at different levels of granularity, and an interesting approach that he calls “curation by addition”, which appears to be a way of curating the data incrementally, capturing provenance information of all the changes made. It’s a great report, and I look forward to more practical reports of this nature.</p>
<p>Quite a different post on peanubutter (whose author might be Frank Gibson): <a href="http://peanutbutter.wordpress.com/2008/10/30/the-triumvirate-of-scientific-data/"><span style="font-style:italic;">The Triumvirate of Scientific Data</span></a> discusses ideas that he suggests relate to the significant properties of science data. His triumvirate comprises<br />
<blockquote>&#8220;content, syntax, and semantics, or more simply put -What do we want to say? How do we say it? What does it all mean?&#8221;</p></blockquote>
<p>Oddly, the discussion associated with this blog post is on <a href="http://friendfeed.com/e/d4fa7fd4-ebed-1010-a3fa-306cd148e57e/The-Triumvirate-of-Scientific-Data/">Friendfeed</a> rather than associated with the blog itself. Very interesting to see the discussion recorded like that, and in the process see at least one sceptic become more convinced!</p>
<p>To me, there seemed to be strong resonances between his argument and some of the OAIS concepts, particularly <a href="http://digitalcuration.blogspot.com/2007/07/representation-information-what-is-it.html">Representation Information</a>. However, context, syntax and semantics might be a more approachable set of labels than RepInfo!</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[]]></title>
<link>http://losmasparados.wordpress.com/2008/10/28/112/</link>
<pubDate>Tue, 28 Oct 2008 01:40:43 +0000</pubDate>
<dc:creator>guacaroach</dc:creator>
<guid>http://losmasparados.wordpress.com/2008/10/28/112/</guid>
<description><![CDATA[Por si no tienen idea de como irse a los conciertos o si se quieren regresar a su casita acabando el]]></description>
<content:encoded><![CDATA[Por si no tienen idea de como irse a los conciertos o si se quieren regresar a su casita acabando el]]></content:encoded>
</item>
<item>
<title><![CDATA[iPres 2008 Preservation planning session]]></title>
<link>http://dccbi.wordpress.com/2008/09/29/ipres-2008-preservation-planning-session/</link>
<pubDate>Mon, 29 Sep 2008 15:42:00 +0000</pubDate>
<dc:creator>dccbi</dc:creator>
<guid>http://dccbi.wordpress.com/2008/09/29/ipres-2008-preservation-planning-session/</guid>
<description><![CDATA[Starting with Dirk von Suchodoletz from Freiburg, talking about emulation. I’ve always had a problem]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Starting with Dirk von Suchodoletz from Freiburg, talking about emulation. I’ve always had a problem with emulation; perhaps I’ve too long a memory of those early days of MS-DOS, when emulators were quite good at running well-behaved programs, but were rubbish at many common programs, which broke the rule-book and went straight into the interrupt vectors to get performance. OK, if you’re not that old, maybe emulation does work better these days, and is even getting trendy under the new name of virtualisation. My other problem is also a virtue of emulation: you will be presented with the object’s “original” interface, or look and feel. This sounds good, but in practice the world has moved on and most people don’t like old interfaces, even if historians may want them. I guess emulation can work well for objects which are “viewed”, in some sense; it’s not clear to me that one can easily interwork an emulated object with a current object. Dirk does point out that emulation does require running the original software, which itself may create licensing problems.</p>
<p>Gareth Knight talking about significant properties. We’ve discussed these before; conspicuously absent (is that an oxymoron???) from OAIS, an earlier speaker suggested the properties were less about the object than about the aims of the organisation and its community, than about the object. But Gareth is pursuing the properties of the object, as part of the JISC-funded <a href="http://www.significantproperties.org.uk/">InSPECT</a> project (http://www.significantproperties.org.uk/). They have a model of significant properties, and are developing a data dictionary for SPs, which (after consultation) they expect to turn into a XML schema. Their model has an object with components with properties, and also agents that link to all of these. Each SP has an identifier, a title, a description and a function. It sounds a lot of work; not clear yet how much can be shared; perhaps most objects in one repository can share a few catalogued SPs. It seems unlikely that most repositories could share them, as repositories would have different views on what is significant.</p>
<p>Alex Ball is talking about the <a href="http://www.ukoln.ac.uk/projects/grand-challenge/presentations/ball.patel2008aic.pdf">problems in curating engineering and CAD data</a>. In what appears to be a lose-lose strategy for all of us, engineering is an area with extremely long time requirements for preserving the data, but increasing problems in doing so given the multiple strangleholds that IPR has: on the data themselves, on the encodings and formats tied up in specific tightly controlled versions of high cost CAD software, coupled with “engineering as a service” approaches, which might encourage organisations to continue to tightly hold this IPR. An approach here is looking for light-weight formats (he didn’t say desiccated but I will) that data can be reduced to. They have a solution called LiMMA for this. Another approach is linking preservation planning approaches with Product Lifecycle Management. In this area they are developing a Registry/Repository of Representation Information for Engineering (RRoRIfE). Interesting comment that for marketing purposes the significant properties would include approximate geometry and no tolerances, but for manufacturing you would want exact geometry and detailed tolerances.</p>
<p>Finally for this session, Mark Guttenbrunner from TUV on evaluating approaches to preserving console video games. These systems started in the 1970s, and new generations and models are being introduced frequently in this very competitive area. It might sound trivial, but Lynne Brindley had earlier pointed out that one generation’s ephemera can be another generation’s important resource. In fact, there is already huge public interest in historical computer games. They used the PLANETS Preservation Planning approach to evaluate 3 strategies: one simple video preservation, and 2 emulation approaches. It was clear that IPR could become a real issue, as some games manufacturers are particularly aggressive in protecting their IPR against reverse engineering.</p>
<p>In the Q &#38; A session, I asked the panellists whether they thought the current revision process for OAIS should include the concept of significant properties, currently absent. A couple of panellists felt that it should, and one thought that the concept of representation information should be cleaned up first! Session Chair Kevin Ashley asked whether anyone present was involved in the revision of this critical standard, and no-one would admit to it; he pointed out how worrying this was.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[OAIS revision moving forward?]]></title>
<link>http://dccbi.wordpress.com/2008/09/08/oais-revision-moving-forward/</link>
<pubDate>Mon, 08 Sep 2008 19:31:00 +0000</pubDate>
<dc:creator>dccbi</dc:creator>
<guid>http://dccbi.wordpress.com/2008/09/08/oais-revision-moving-forward/</guid>
<description><![CDATA[Just over a year ago, in late August 2007 I wondered what was happening with the required review of ]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Just over a year ago, in late August 2007 I wondered <a href="http://digitalcuration.blogspot.com/2007/08/oais-review-whats-happening.html">what was happening</a> with the required review of the <a href="http://public.ccsds.org/publications/archive/650x0b1.pdf">Open Archival Information System</a> standard, which was announced in June 2006, and for which comments closed in October 2006. Well, there is at last some movement. Just recently, the DCC and the Digital preservation Coalition received notice of the &#8220;proposed dispositions, with rationale, to the suggestions which your organisation sent in response to the request for recommendations for updates to the OAIS Reference Model (ISO 14721)&#8221;. According to the email from John Garrett, Chair of the CCSDS Data Archiving and Ingest Working Group,<br />
<blockquote>&#8220;If you have feedback on these proposed dispositions please email them as soon as possible, and by 30th November 2008 at the latest, [...]</p>
<p>A revised draft of the full OAIS Reference Model is expected to be available on the Web in January 2009. There will then be a period for further comment before submission to ISO for full review.&#8221;</p></blockquote>
<p>Although a fair few of the dispositions are &#8220;No changes are planned&#8221;, a large number of changes are also proposed relating to the comments made by DCC/DPC. We have not yet had a chance to review them in detail, nor even to decide yet what the mechanism for this will be. But I am very much encouraged that progress is at last being made, and that more opportunities to interact with the development of this important standard will be available, even if it has not proved possible to find out the venue where the proposed changes have been discussed!</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Public Record Office of Victoria Digital Archive (1)]]></title>
<link>http://80gb.wordpress.com/2008/09/08/public-record-office-of-victoria-digital-archive-1/</link>
<pubDate>Mon, 08 Sep 2008 12:02:04 +0000</pubDate>
<dc:creator>80gb</dc:creator>
<guid>http://80gb.wordpress.com/2008/09/08/public-record-office-of-victoria-digital-archive-1/</guid>
<description><![CDATA[Developing the Digital Archive at PROV Australian archival theory emphasises the continuum of record]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><strong>Developing the Digital Archive at PROV</strong></p>
<p>Australian archival theory emphasises the continuum of recordkeeping, with good records management seen as the core foundation of archival practice.  This continuity is well illustrated by the <a title="PROV Digital Archive" href="http://www.prov.vic.gov.au/vers/digitalarchive/" target="_blank">Digital Archive</a> project at the <a title="Public Record Office of Victoria" href="http://www.prov.vic.gov.au" target="_blank">Public Record Office of Victoria</a> (PROV).</p>
<p>Funding for the development of the Digital Archive was won on the back of previous successes in digital recordkeeping, beginning as far back as 1995 in the report &#8216;<a title="Keeping Electronic Records Forever" href="http://www.prov.vic.gov.au/vers/pdf/kerf.pdf" target="_blank">Keeping Electronic Records Forever</a>&#8216; and the resulting specification of a <a title="VERS" href="http://www.prov.vic.gov.au/vers/vers/" target="_blank">Victorian Electronic Records Strategy</a> (VERS).</p>
<p>With firm backing at a strategic level within the Victorian State government, the project appears to have flourished in spite of its complexity in terms of both the project structure and the technical implementation.  In addition to staff contracted specially for the project, external contractors and consultants, PROV also seconded around a fifth of its own staff to the Digital Archive over the life of the implementation project, in order to ensure good staff buy-in and familiarity with the Digital Archive workflows.  This contribution was included in annual performance appraisals for the year, and emphasised as important to personal development.  Since several of the contract staff were also retained at PROV after the end of the project, knowledge of the system has remained within PROV, and the operation and maintenance of the Digital Archive is now well integrated into the general organisational structure.</p>
<div id="attachment_59" class="wp-caption alignnone" style="width: 310px"><a href="http://80gb.wordpress.com/files/2008/09/prov2.jpg"><img class="size-medium wp-image-59" title="prov2" src="http://80gb.wordpress.com/files/2008/09/prov2.jpg?w=300" alt="Public Record Office of Victoria" width="300" height="225" /></a><p class="wp-caption-text">Public Record Office of Victoria</p></div>
<p>Although larger than the typical local Archive Service in the UK, with 67 FTE staff, PROV is not a very large institution and it was felt important to integrate the Digital Archive into existing systems.  Technical support for the ongoing operation of the Digital Archive consists of just three staff members.  It was noted that OAIS was found useful in establishing a common vocabulary at the very beginning of the project, but was too high level to be of use in the tender and design phases of the project, and also views the Archival Information System as a standalone function rather than fitting into an pre-existing organisation and automated systems (such as the PROV catalogue).  This integration &#8211; with the PROV archival control database and with existing workflow models for processing new accessions &#8211; proved to be one of the most challenging aspects of the whole project.  Many of the processes followed in the analogue world can be streamlined in the digital, although a practice of repeated trial runs before completing a transfer has been found to work better than PROV staff attempting to correct errors in the records submitted.</p>
<p>The objectives were to keep digital archives for the long term, defined as up to 100 years &#8211; the project found that this concrete period was far easier for the IT contractors to grasp than a vague aspiration of keeping records indefinitely.  The solution had to be cost effective over time, and to allow (online) access immediately &#8211; unlike the National Archives in Australia, the State of Victoria&#8217;s Public Records Act does not include a &#8216;30 Year [closure] Rule&#8217;.  Building from these considerations and the previous work on VERS, the decision was taken to use a single <a title="VERS long term format" href="http://www.prov.vic.gov.au/vers/standard/spec_03/default.htm" target="_blank">long-term format</a>, a so-called VEO or Victorian Electronic Object.  Essentially this is an XML wrapper containing the original digital object plus a normalised version (for example, PDF/A) and associated metadata to enable management of the digital record over time.  This VEO is used as both a Submission Information Package (SIP) <em>and </em>Archival Information Package (AIP) under the OAIS model, and as the basis for the Dissemination Information Package (DIP).  The whole package, which may contain anything from a single record to hundreds, each consisting of multiple encodings, is then locked and signed with a digital signature.  Future users will be able to verify that each record has remained unaltered since its transfer from the government Agency which deposited it at PROV.</p>
<p><a title="PROV long term preservation formats" href="http://www.prov.vic.gov.au/vers/standard/spec_04/default.htm" target="_blank">Long-term preservation formats</a> are chosen on the basis of being open specifications and widely used (so that good translation tools are likely to remain available even after the format itself becomes obsolescent in the future).  The current specification lists plain text, PDF (with a preference for PDF/A), JPG, JPEG2000, TIFF and MPEG-4.  Interestingly, PROV have not yet begun to work with database formats, which appears to be one of the more immediate issues facing West Yorkshire Archive Service if a recent records management audit at Leeds City Council is anything to go by.</p>
<p>Bolstered by strong archival legislation, the PROV Digital Archive workflow pushes as much of the load back onto the depositing government Agencies as possible.  Agencies are required to create the VEOs (on transfer) &#8211; software suppliers can participate in a compliance programme run by PROV to certify which products meet the VERS specifications and can generate valid VEOs.  Currently five products are fully compliant, with several more close to completing the programme.</p>
<p>Unlike the National Archives of Australia, PROV do not maintain a &#8216;dark archive&#8217;.  Digital accessions are securely stored on Centera file systems, which cannot be accessed by standard protocols and are continually monitored to detect corruption and repair failures.  Off-site back-up copies are automatically updated.  Controlling the whole system is a customised version of Documentum&#8217;s Enterprise Content Management platform &#8211; this proved costly in the short term, and is likely to continue to be a problematic and expensive aspect of the Digital Archive until such time as a public domain engine might become available to run an automated workflow on ingest.</p>
<p>Access to preserved digital records for the public is via the PROV website at <a href="http://www.access.prov.vic.gov.au" target="_blank">http://www.access.prov.vic.gov.au</a>.  Different icons indicate the format of individual items &#8211; a sheet of paper for analogue archives, a camera for digitised material, a disk for born-digital.  Digital archives are delivered to the user in their long-term preservation format, although the full VEO can also be downloaded and the user can manually extract other formats from the VEO.  PROV decided not to require users to login to obtain digital archives &#8211; this was felt to be an audit control needed in the reading room which was not necessary in the digital world.  The Digital Archive currently holds over 300000 VEOs, many of them digitised images from PROV&#8217;s own paper-based collections.  Use of the PROV website has increased enormously.</p>
<p>Lessons learned from the project and subsequent operation of the Digital Archive included the importance of strong project management and stakeholder communications, and the need to have developers working on site rather than contracting the project overseas.  The bulk sometimes encountered in archival accessions has created bottlenecks at certain points along the Digital Archive workflow.  Customising off-the-shelf software is expensive.  Further training is required for Agency staff who prepare the records for transfer, and simplifications to the procedures may be required.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Claimed 200 year media life]]></title>
<link>http://dccbi.wordpress.com/2008/07/28/claimed-200-year-media-life/</link>
<pubDate>Mon, 28 Jul 2008 12:04:00 +0000</pubDate>
<dc:creator>dccbi</dc:creator>
<guid>http://dccbi.wordpress.com/2008/07/28/claimed-200-year-media-life/</guid>
<description><![CDATA[Graeme Pow spotted the announcement a few weeks back of Delkin&#8217;s Archival Blu-ray media: ]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Graeme Pow <a href="http://forum.dcc.ac.uk/viewtopic.php?p=503#503">spotted</a> the announcement a few weeks back of Delkin&#8217;s Archival Blu-ray media:<br />
<blockquote>&#8220;New from Delkin Devices is <a href="http://www.delkin.com/products/archivalgold/archival-blue-ray-delkin.html">Archival Gold Blu-ray</a>, a recordable disc that <span style="font-style:italic;">guarantees to preserve data safely for over 200 years. In addition to unprecedented longevity standards, Delkin BD-R boast a market-leading read/write speed of 4x, enabling a 25GB burn to be completed in only 23 minutes</span> &#8212; and the disc is coated in ScratchArmour, a scratch-proof coating that claims to be 50 times better than other coatings. &#8220;</p></blockquote>
<p>Italicised part is a direct quote from Delkin&#8217;s site. Delkin also claim at the site linked above:<br />
<blockquote>&#8220;Delkin archival Blu-ray (BD-R) discs offer the longest guaranteed protection over time.&#8221;</p></blockquote>
<p>There is a fair amount of scepticism about such claims, for example see the cdfreaks forum <a href="http://club.cdfreaks.com/f33/delkin-archival-gold-dvd-r-any-experience-179523/">thread on archival gold DVD-Rs</a>&#8230;</p>
<p>So what does this mean?
<ul>
<li>Well firstly they clearly haven&#8217;t run the media for 200 years, so presumably this is the result of some accelerated aging tests extrapolated into &#8220;normal&#8221; use.</li>
<li>Secondly, if the disk did fail in however many years into the future, who would your successors claim from, and what? I don&#8217;t have a copy of the &#8220;guarantee&#8221;, but my guess is you&#8217;d at best get replacement media cost, not the lost content value (a wag on cdfreaks suggested that the guarantee was only available to the original purchaser <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> .</li>
<li>Thirdly, if your successors still have the disk in 200 years, what are the odds of having hardware and software to read it? Close to zero I guess.</li>
<li>Fourthly, would you even know what&#8217;s on it? Oh, you wrote a label on the outside, did you? Or perhaps wrote in marker on the disk? Or&#8230;</li>
</ul>
<p>It might seem like a good thing to have disks that last for much longer than the devices. After all, that would remove one error source, and would mean you could use your devices to help in the transfer to newer media. However, assuming the media really were robust, this might lead to a temptation to ignore the transfer, until you (or your descendants or successors) suddenly find your last Blu-ray device has failed!</p>
<p>In practice, of course, relying on <span style="font-weight:bold;">any</span> such claims would be foolish. It&#8217;s not a bad idea to use good quality media, but I would want to choose it based on testing from an independent lab rather than manufacturers&#8217; claims. And then it should fit into a well-planned strategy of media management. I guess it would be wise to keep some archival media of the same type but non-critical  contents, and test them from time-to-time, watching for increased error rates.</p>
<p>Interestingly, I&#8217;ve just re-read (yet again) the <a href="http://public.ccsds.org/publications/archive/650x0b1.pdf">OAIS</a> section 5, which covers these issues. I can&#8217;t say it&#8217;s a lot of help for most people wanting advice. It refers to the whole issue as one of &#8220;Digital Migration&#8221; (not the customary use of the word migration in digital preservation circles these days, where it is mostly used in contrast to emulation), and lists 4 types:<br />
<blockquote>
<ul>
<li>Refreshment: A Digital Migration where a media instance, holding one or more AIPs or parts of AIPs, is replaced by a media instance of the same type by copying the bits on the medium used to hold AIPs and to manage and access the medium. As a result, the existing Archival Storage mapping infrastructure, without alteration, is able to continue to locate and access the AIP.</li>
<li>Replication: A Digital Migration where there is no change to the Packaging Information, the Content Information and the PDI. The bits used to convey these information objects are preserved in the transfer to the same or new media-type instance. Note that Refreshment is also a Replication, but Replication may require changes to the Archival Storage mapping infrastructure.</li>
<li>Repackaging: A Digital Migration where there is some change in the bits of the Packaging Information.</li>
<li>Transformation: A Digital Migration where there is some change in the Content Information or PDI bits while attempting to preserve the full information content.</li>
</ul>
</blockquote>
<p>So that&#8217;s clear, then. I think the first two are merely replacement by identical media. Later text makes clear that the third is the case we would be considering here, ie the replacement of one media type with another, while the fourth would represent what we would currently call migration, ie making some change to the information object in order to preserve its information content. However, despite quite a bit of discussion on scenarios of the various digital migration types, I could not see much in the way of good practice advice there.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[JIF08 Technical Infrastructure session]]></title>
<link>http://dccbi.wordpress.com/2008/07/16/jif08-technical-infrastructure-session/</link>
<pubDate>Wed, 16 Jul 2008 16:55:00 +0000</pubDate>
<dc:creator>dccbi</dc:creator>
<guid>http://dccbi.wordpress.com/2008/07/16/jif08-technical-infrastructure-session/</guid>
<description><![CDATA[At the second day of the JISC Innovation Forum, I attended an interesting discussion in the data the]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>At the second day of the JISC Innovation Forum, I attended an interesting discussion in the data theme on technical infrastructures. This post derives from that discussion, but is neither a complete reflection of what was said, nor at all in the order of discussion; it reflects some bits I found interesting. My thanks to Matthew Dovey who chaired, and to all who contributed; I’ve identified none rather than only some! The session is also being blogged <a href="http://jif08.jiscinvolve.org/2008/07/16/session-3-technical-and-infrastructure-issues/">here…</a></p>
<p>Although early on we were talking about preservation, we came back to a discussion on immediate re-use and sharing; it seems to me that these uses are more powerful drivers than long term preservation. If we organise things right, long term preservation could be a cheap consequence of making things available for re-use and sharing. Motivations quoted here include data as evidence, the validation aspects of the scientific method. We were cautioned that validation from the data is hard; you will need the full data and analysis chain to be effective (more data, providing better provenance…).</p>
<p>Contextual metadata might include (parts of) the original proposal (disclosing purpose and motivation). Funders keep these proposals, as part of their records management systems, but they might not be easily accessible and are likely eventually to be deleted. An action for JISC might be to find ways of making parts of these proposals more appropriately available to support data.</p>
<p>There are issues of the amount of disclosure or discoverability that people want, or maybe should offer whether they want or not; we touched tangentially on open notebook science. Scary, but so is sharing data, for some! Extending re-usability through re-interpreting to integration may be big steps.</p>
<p>To what extent does standardisation help? Well, quite a lot, obviously, although science is slow and science is innovative, so many researchers will either have no appropriate standards to se, or not know about the standards, or have poor or no tools to implement/use the standards, or maybe find the standards less than adequate and overload them with their own private extensions or encodings. And to some extent re-use could be about the process as much as the data (someone spoke of preserving workflows, and someone of preserving protocols, although “process” may be much less formal than either…).</p>
<p>We had a recurring discussion about the extent this feeds back to being a methodological question. There was a question about the efficacy of research practice; how do we maximise positive outcomes? Should we aim change science to achieve better curation? While in the large scale this sounds exceedingly unlikely, research being exceedingly resistant to external change, successes were reported. BADC have managed to get scientists to change their methods to collect better metadata, and maybe better data. Maybe some impact could be made via research training curricula? Another opportunity for JISC?</p>
<p>We talked about the selection of useful data; essential, but how can it be achieved? Some projects, such LHC have this built into the design as the volumes are orders of magnitudes too large to deal with otherwise. While low-retention selection might be attractive, others were arguing elsewhere for retaining more data that are currently lost, to improve provenance (ie successive refinement stages).</p>
<p>The idea of institutional data audit was mentioned; there is a pilot data audit framework being developed now, and at least one institution in the room was participating in the pilot process. This might be a way of bringing issues to management attention, so that institutions can understand their own needs. Extending this more widely may be useful.</p>
<p>We talked about what a research library could do, or maybe has to start thinking of doing. On the small scale, success in digital thesis deposit is bringing problems in managing associated data files (often of widely varied types: audio, video, surveys, spreadsheets, databases, instrumentation outputs, etc). At the moment the answer is often to ZIP up the supplementary files and cross your fingers (as close to benign neglect as possible!). This is very close to the approach in use for years with paper theses, where supplementary materials were often in portable media in a pocket inside the back cover… close the volume and put it on the shelves. This might be acceptable with a trickle, but not a deluge… What would be a better approach in dealing with this growing problem (exacerbated by issues related to the well-known neologism problems of theses, ie that young researchers will likely not use applicable standards in quite the right way)? It was clear that dealing with this stuff is beyond the expertise of most librarians, and requires some sort of partnership with domain scientists. This whole area could represent an opportunity for JISC to help.</p>
<p>Since we were thinking of institutional repositories and larger scale subject repositories, we had a skirmish on the extent to which we need a full-blown OAIS, as it were, or a minimal, just-enough effort. It seemed in a sense the answer was: both! It does make sense to think carefully about OAIS, but also to make sure you do just enough, to ensure that high preservation costs in one area are not preventing collection in other areas (selection anyway being so fallible). A good infrastructure should help keep it cheap. There is a question on how you would know if you were paying too much in effort, or too little? Perhaps repository audit and certification, or perhaps risk assessment might help.</p>
<p>This brought us on to the issue of tool development. Most existing, large scale data centres use home-built, filestore-based systems, not very suitable for small-scale use. Existing repository software platforms are not well suited to data (square peg in round hole was quoted!). Funding developments to improve the fit for data was seen as a possible role for JISC. Adding some better OAIS capabilities at the same time might also be useful, as might linking to Virtual research Environments such as Sakai, or to Current Research Information Systems (CRIS’s). Is the CERIF standard developed by <a href="http://www.eurocris.org/">EuroCRIS</a> helpful, or not?</p>
<p>Overall, it was a useful session, and if JISC takes up some of the opportunities for development suggested, it should prove doubly, trebly or even more useful!</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[LIFE2 Project Conference, British Library, 26 June 2008]]></title>
<link>http://80gb.wordpress.com/2008/06/23/life2-project-conference-british-library-26-june-2008/</link>
<pubDate>Mon, 23 Jun 2008 18:30:30 +0000</pubDate>
<dc:creator>80gb</dc:creator>
<guid>http://80gb.wordpress.com/2008/06/23/life2-project-conference-british-library-26-june-2008/</guid>
<description><![CDATA[Having read through the original LIFE project documentation, I was looking forward to the project co]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Having read through the original <a href="http://www.life.ac.uk" target="_blank">LIFE project</a> documentation, I was looking forward to the project conference for the follow-on research, <a href="http://www.life.ac.uk/2/" target="_blank">LIFE2</a>.  It is all too common, unfortunately, to hear doom-laden rumours being peddled about the supposed high costs of digital preservation, often in contexts where this truism becomes a convenient excuse to avoid addressing the real challenges of digital curation and preservation.  Chris Rusbridge argued in an <a href="http://www.ariadne.ac.uk/issue46/rusbridge/">ARIADNE article</a> that digital preservation being expensive is simply a fallacy.  For the local government archives context at least, it seems to me that there is simply insufficient evidence to either support or discredit the assertion.  Would LIFE2 offer us an objective tool to assess the likely costs of  developing and running a digital preservation service for local government?</p>
<p><a href="http://www.life.ac.uk/2/">LIFE2</a> promised a revised lifecycle costing model, including mappings to relevant digital preservation standards such as <a href="http://public.ccsds.org/publications/archive/650x0b1.pdf">OAIS</a>, clearer element descriptions, and a new set of case studies, including an examination of non-born digital newspaper material.  This case study was designed to allow for the comparison of analogue and digital lifecycles and to begin a cost comparison.</p>
<p>Whilst the revised LIFE2 model is more closely aligned in terminology to OAIS, and the elements now appear in a more logical order, I admit I was disappointed that the model does not seem as transferable to the local government archives context as I had hoped.  As <a href="http://www.beagrie.com/" target="_blank">Neil Beagrie</a> pointed out in his presentation of the costs of curating research data, the decision to exclude infrastructure costs such as the start-up costs of building a digital repository or of maintaining a technology watch service means that the tool could not be used in business cases making the comparison case for or against curating digital material in-house or for outsourcing &#8211; the major decision facing me at <a href="http://www.archives.wyjs.org.uk" target="_blank">West Yorkshire Archive Service</a> &#8211; although some attempt to address this shortcoming is being made with the development of a Generic Preservation Model (GPM), which will be released at the end of the LIFE2 Project.</p>
<p>The newspaper case study also ran into difficulties around differing patterns of access and the problems of retrospective costing.  Although the case study continued using a per entity costing model to assess the relative costs of preserving a digitised newspaper collection with a year&#8217;s analogue curation costs of legal deposit newspapers, the results, although interesting, are not truely comparable.</p>
<p>I had hoped I might be able to use the tool to compare the not inconsiderable costs of building and fitting out an archives building for traditional materials conforming to BS5454, with the costs of developing automated tools and digital storage and management capacity for born-digital and hybrid collections.  There was discussion at the end of the day about how the LIFE project might progress in a next phase, which &#8211; promisingly &#8211; included the development of a predictive tool for costing, further case studies and scenario building, and a proposal that comparison studies are made between the costs of a shared preservation service versus an in-house digital repository.</p>
<p>There was also extensive discussion during the panel session about the need to demonstrate the <em>value </em>of digital preservation, particularly to funding bodies.  The LIFE tool offers a method for digital repositories to assess costs; different kinds of value assessments are required to convince funders.  The point was also made that the more significant properties of digital objects a repository attempts to preserve, the greater the cost &#8211; making me ponder on the potential for integrating the lifecycle costing models into preservation planning tools, such as <a href="www.ifs.tuwien.ac.at/dp/plato/" target="_blank">PLATO</a>.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Digital Preservation begins... in Italy?]]></title>
<link>http://80gb.wordpress.com/2008/06/10/digital-preservation-begins-in-italy/</link>
<pubDate>Tue, 10 Jun 2008 20:58:43 +0000</pubDate>
<dc:creator>80gb</dc:creator>
<guid>http://80gb.wordpress.com/2008/06/10/digital-preservation-begins-in-italy/</guid>
<description><![CDATA[At the DELOS Summer School on Preservation in Digital Libraries, which so far more than lives up to ]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>At the <a title="DELOS Summer School on Preservation in Digital Libraries" href="http://www.dpc.delos.info/ss08/" target="_blank">DELOS Summer School on Preservation in Digital Libraries,</a> which so far more than lives up to my expectation that it will provide an excellent overview of current and emerging digital preservation research and practices.</p>
<p>A comforting thought for local government Archive Services from the presentations on day 1:</p>
<ul>
<li>that digital preservation is as much about organisational and cultural challenges as technical ones. Something we can all start to address &#8211; now.</li>
</ul>
<p>And also a rather more concerning one:</p>
<ul>
<li>the emphasis in <a title="Open Archival Information System" href="http://public.ccsds.org/publications/archive/650x0b1.pdf" target="_blank">OAIS</a> on serving a <em>designated community, </em>something which is in any case hard to determine for traditional record offices with geographically defined collection policies. Priscilla Caplan also points out that the OAIS requirement that material preserved in an archival system should remain <em>understandable </em>to that community has not usually been part of the mandate of ordinary libraries or archives.</li>
</ul>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[The negative cost repository, and other archive services]]></title>
<link>http://dccbi.wordpress.com/2008/06/04/the-negative-cost-repository-and-other-archive-services/</link>
<pubDate>Wed, 04 Jun 2008 01:41:00 +0000</pubDate>
<dc:creator>dccbi</dc:creator>
<guid>http://dccbi.wordpress.com/2008/06/04/the-negative-cost-repository-and-other-archive-services/</guid>
<description><![CDATA[I&#8217;ve been at a meeting of research libraries here in Philadelphia these past two days; a topic]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>I&#8217;ve been at a meeting of research libraries here in Philadelphia these past two days; a topic that came up a bit was the sorts of services that libraries might offer individuals and research groups  in managing their research collections. I was reminded about my post about internal Edinburgh <a href="http://digitalcuration.blogspot.com/2007/08/archiving-service.html">proposals for an archive service</a>, last year. Subsequent to that it struck me that there is quite a range of services that could be offered by some combination of Library and IT services; I mentioned some of these, and there seemed to be some resonance. There could well be more, but my list included:
<ul>
<li>a managed current storage system with &#8220;guaranteed&#8221; backup, possibly related to the unit or department rather than individual</li>
<li>a &#8220;bit bucket&#8221; archive for selected data files, to be kept in some sense as a record (perhaps representing some critical project phase) for extended periods, probably with mainly internal access (but possibly including access by external partners, ie &#8220;semi-internal&#8221;). Might conflate to&#8230;</li>
<li>a data repository, which I would see as containing all or most data in support of publication. This would need to be static (the data supports the publication and should represent it), but might need to have some kind of managed external access. This might extend to&#8230;</li>
<li>a full-blown digital preservation system, ie with some commitment to OAIS-type capabilities, keeping the data usable. As well as that we have the now customary (if not very full)&#8230;</li>
<li>publications repository, or perhaps this might grow to be&#8230;</li>
<li>a managed publications system providing support for joint development of papers and support for publication submission, and including retention &#38; exposure of drafts or final versions as appropriate.</li>
</ul>
<p>I really like the latter idea, which I have seen various references to. Perhaps we could persuade people to deposit if the cost of deposit was <span style="font-style:italic;">LESS</span> than the cost of non-deposit. The negative-cost repository, I like that!</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[]]></title>
<link>http://coffrefortelectronique.wordpress.com/2008/04/15/229/</link>
<pubDate>Tue, 15 Apr 2008 15:21:31 +0000</pubDate>
<dc:creator>Cecurity.com</dc:creator>
<guid>http://coffrefortelectronique.wordpress.com/2008/04/15/229/</guid>
<description><![CDATA[Quels sont les similitudes, les différences et les complémentarités entre les deux textes de référen]]></description>
<content:encoded><![CDATA[Quels sont les similitudes, les différences et les complémentarités entre les deux textes de référen]]></content:encoded>
</item>
<item>
<title><![CDATA[Representation information from the planets?]]></title>
<link>http://dccbi.wordpress.com/2008/04/14/representation-information-from-the-planets/</link>
<pubDate>Mon, 14 Apr 2008 19:18:00 +0000</pubDate>
<dc:creator>dccbi</dc:creator>
<guid>http://dccbi.wordpress.com/2008/04/14/representation-information-from-the-planets/</guid>
<description><![CDATA[Well, from the PLANETS project actually. A PLANETS report written by Adrian Brown of TNA on Represen]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Well, from the <a href="http://www.planets-project.eu/">PLANETS</a> project actually. A PLANETS report written by Adrian Brown of TNA on <a href="http://www.planets-project.eu/docs/reports/Planets_PC3-D7_RepInformationRegistries.pdf">Representation Information Registries</a>, drawn to our attention as part of the reading for the Significant Properties workshop, contains the best discussion on representation information I have seen yet (just in case, I checked the <a href="http://www.casparpreserves.eu/">CASPAR</a> web site, but couldn’t see anything better there). No doubt nearly all of the information is in the OAIS spec itself, but it’s often hard to read, with discussion of key concepts separated in different parts of the spec.</p>
<p>Just to recap, the OAIS formula is that a Data Object interpreted using its Representation Information yields an Information Object. Examples often cite specifications or standards, eg suggesting that the Repinfo (I’ll use the contraction instead of “representation information”) for a PDF Data Object might be (or include) the PDF specification.</p>
<p>Sometimes there is controversy about repinfo versus format information (often described by the repinfo enthusiasts as “merely structural repinfo”). So it’s nice to read a sensible comparison:<br />
<blockquote>&#8220;For the purposes of this paper, the definition of a format proposed by the <a href="http://hul.harvard.edu/gdfr/">Global Digital Format Registry</a> will be used:</p>
<p><span style="font-style:italic;">“A byte-wise serialization of an abstract information model”. </span></p>
<p>The GDFR format model extends this definition more rigorously, using the following conceptual entities:</p>
<p>• Information Model (<span style="font-style:italic;">IM</span>) – a class of exchangeable knowledge.<br />• Semantic Model (<span style="font-style:italic;">SM</span>) – a set of semantic information structures capable of realizing the meaning of the <span style="font-style:italic;">IM</span>.  <br />• Syntactic Model (<span style="font-style:italic;">CM</span>) – a set of syntactic data units capable of expressing the SM.<br />• Serialized Byte Stream (<span style="font-style:italic;">SB</span>) – a sequence of bytes capable of manifesting the CM.</p>
<p>This equates very closely with the OAIS model, as follows:</p>
<p>• Information Model (<span style="font-style:italic;">IM</span>) = OAIS Information Object <br />• Semantic Model (<span style="font-style:italic;">SM</span>) = OAIS Semantic representation information <br /> • Syntactic Model (<span style="font-style:italic;">CM</span>) = OAIS Syntactic representation information<br /> • Serialized Byte Stream (<span style="font-style:italic;">SB</span>) = OAIS Data Object&#8221;</p></blockquote>
<p>This does seem to place repinfo and format information (by this richer definition) in the same class.</p>
<p>Time for a short diversion here. I was quite taken by the report on significant properties of software, <a href="http://www.dpconline.org/docs/events/080407sigpropsMatthews.pdf">presented</a> at the workshop by Brian Matthews (not that it was perfect, just that it was a damn good effort at what seemed to me to be an impossible task!). He talked about specifications, source code and binaries as forms of software. Roughly the cost of instantiating goes down as you move across those 3 (in a current environment, at least).</p>
<ul>
<li>In preservation terms, if you only have a binary, you are pretty much limited to preserving the original technology or emulating it, but the result should perform “exactly” as the original.</li>
<li>If you have the source code, you will be able to (or have to) migrate, configure and re-build it. The result should perform pretty much like the original, with “small deviations”. (In practice, these deviations could be major, depending on what’s happened to libraries and other dependencies meanwhile.)</li>
<li>If you only have the spec, you have to re-write from scratch. This is clearly much slower and more expensive, and Brian suggests it will “perform only gross functionality”. I think in many cases it might be better than that, but in some cases much worse (eg some of the controversy about the MicroSoft-based OOXML standard with MS internal dependencies).</li>
</ul>
<p>So on that basis, a spec as Repinfo is looking, well, not much help. In order for a Data Object to be “interpreted using” repinfo, the latter needs to be something that run or performs; in Brian’s term a binary, or at least software that works. The OAIS definitions of repinfo refer to 3 sub-types: structure, semantic and “other”, and the latter is not well defined. However, Adrian Brown’s report explains there is a special type of “other”:<br />
<blockquote>“…<span style="font-style:italic;">Access Software</span> provides a means to interpret a Data Object. The software therefore acts as a substitute for part of the representation information network – a PDF viewer embodies knowledge of the PDF specification, and may be used to directly access a data object in PDF format.”</p></blockquote>
<p>This seems to make sense; again, it’s in the OAIS spec, but hard to find. So Brown proposes that:<br />
<blockquote>“…representation information be explicitly defined as encompassing either information which describes how to interpret a data object (such as a format specification), or a component of a technical environment which supports interpretation of that object (such as a software tool or hardware platform).”</p></blockquote>
<p>Of course the software tool or hardware platform will itself have a shorter life than the descriptive information, so both may be required.</p>
<p>The bulk of the report, of course, is about representation information registries (including format registries by this definition), and is also well worth a read.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Resumo]]></title>
<link>http://libraryproject08.wordpress.com/2008/03/20/resumos/</link>
<pubDate>Thu, 20 Mar 2008 15:52:27 +0000</pubDate>
<dc:creator>cmesquita</dc:creator>
<guid>http://libraryproject08.wordpress.com/2008/03/20/resumos/</guid>
<description><![CDATA[Hoje começamos a fazer pequenos resumos com as ideias principais dos artigos constantes da bibliogra]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p align="justify">Hoje começamos a fazer pequenos resumos com as ideias principais dos artigos constantes da bibliografia recomendada pelas docentes, bem como de outros artigos que pesquisamos e achamos interessantes.<br />
Assim, deixo o resumo do seguinte documento:</p>
<p align="justify">RODRIGUES, Eloy [et al.] &#8211; RepositóriUM: criação e desenvolvimento do Repositório Institucional da Universidade do Minho. In Congresso Nacional de Bibliotecários, Arquivistas e Documentalistas, 8, Estoril, 2004. [em linha]. [Consult. em 19 Março 2008]. Disponível em www: <a href="http://badinfo.apbad.pt/congresso8/com14.pdf">url:http://badinfo.apbad.pt/congresso8/com14.pdf</a></p>
<p align="justify">Resumo: O artigo começa por enquadrar os repositórios institucionais no panorama do acesso livre, designando-os como colecções digitais para armazenar, preservar e disponibilizar a produção científica e intelectual de uma determinada comunidade, e indicando algumas das suas vantagens.<br />
De seguida, apresenta-se o projecto RepositoriUM, como repositório institucioinal da Universidade do Minho, bem como a plataforma escolhida para a sua implementação, o DSpace, que é constituído por um conjunto de ferramentas para administrar e disseminar conteúdos digitais de acordo com o modelo OAIS.<br />
Apresenta-se a forma de organização dos dados no DSpace, que pretende reflectir a própria estrutura da organização, os metadados utilizados para a descrição dos documentos, e o protocolo OAI-PMH, utilizado pelo DSpace para a interoperabilidade.<br />
De seguida, é apresentado o plano de implementação do RepositoriUM através do sistema DSpace, que consistiu em 4 fases principais, bem como alguns dados estatísticos dos documentos depositados.<br />
Por último, o artigo apresenta ainda os desafios, problemas e soluções na constituição deste projecto.</p>
</div>]]></content:encoded>
</item>

</channel>
</rss>
