tawnyamosier

I found this helpful little guide while trolling around the Library of Congress’ digital preservation page. While I spend all my time at work dealing with issues surrounding the preservation of  the Library’s digital collections, like a lot of people, I don’t always do so well with my own files at home.

Here are a few tips from the Library of Congress that are aimed at helping you preserve and organize your personal digital materials such as important computer files, emails and digital photos.

This is an interesting post from the Library of Congress Digital Preservation site:

The International Internet Preservation Consortium recently released a web archives registry. The registry offers a single point of access to a comprehensive overview of member web archiving efforts and outputs. Twenty-one archives from around the world are currently included; updates will be added as additional archives are made accessible by IIPC members.

In addition to a detailed description of each web archive, the following information is included:

  • Collecting institution
  • Start date
  • Archive interface language(s)
  • Access methods (URL search, keyword search, full text search, thematic, etc.)
  • Harvesting methods (National domain, event, thematic, etc.)
  • Access restrictions

The registry was put in place by the IIPC Access Working Group, which focuses on initiatives, procedures and tools required to provide immediate and future to access archived web material. The registry will also provide a basis for IIPC to explore integrated access and search in the future.

Preserving the web is not a task of any single institution. It is a mission common to all IIPC members, and many practices and lessons are transferable. The launch of the archive registry showcases international collaboration for preserving web content for future generations.

The IIPC was chartered in 2003 with 12 participating institutions. Today, there are over thirty-five member organizations. More information about the IIPC can be found at http://netpreserve.org.

At the Marriott Library, we’ve recently begun looking into what it would take to archive websites that are important to the University.  During some research into this area, I came across the proceedings of the 2009 International Web Archiving Workshop (IWAW).

An interesting project is taking place in France that may change the way web archiving is approached.   At University P. and M. Curie in Paris, researchers are developing a web crawler that will not only detect changes to a website but one that will be able to detect which changes are unimportant (changing ads on a page, etc.) versus which are important to the page’s content.  If successful, this might greatly improve the effectiveness of the web archiving system because digital archives would no longer be gumming up bandwidth and storage space with needless data.

This project is taking place in conjunction with the French National Audio-Visual Institute (INA).  The institute would like to archive French television and radio station websites.  The visual component of the institute’s pages is very important to the project, not just the content.

According to the workshop proceedings, the project idea is to “use a visual page analysis to assign importance to web pages parts, according to their relative location. In other words, page versions are restructured according to their visual representation. Detecting changes on such restructured page versions gives relevant information for understanding the dynamics of the web sites. A web page can be partitioned into multiple segments or blocks and, often, the blocks in a page have a different importance. In fact, different regions inside a web page have different importance weights according to their location, area size, content, etc. Typically, the most important information is on the center of a page, advertisement is on the header or on the left side and copyright is on the footer. Once the page is segmented, then a relative importance must be assigned to each block…Comparing two pages based on their visual representation is semantically more informative than with their HTML representation.”

The main concept and hopeful contribution to the world of web archiving is summed up by the presenters as follows:

• A novel web archiving approach that combines three concepts: visual page analysis (or segmentation), visual change detection and importance of web page’s blocks.

• An extension of an existing visual segmentation model to describe the whole visual aspect of the web page.

• An adequate change detection algorithm that computes changes between visual layout structures of web pages with a reasonable complexity in time.

• A method to evaluate the importance of changes occurred between consecutive versions of documents.

• An implementation of our approach and some experiments to demonstrate its feasibility.

It will be interesting to follow up with this project as it reaches its conclusion and see how its results will affect current web archiving players like Archive-it.org as well as fellow research endeavors like the Memento Project.

You can read about this project in much more technical detail at the IWAW website (unless it’s been taken down and hasn’t been properly archived).

http://iwaw.europarchive.org/

That’s right. You heard me. Lossless video compression is pretty cool. It actually has a coolness factor of about 4.3 out of 5. That’s a lot when you think about it. In fact, TV’s “The Fonz” only has a coolness factor of 4.1, if that puts it into perspective for you.

But why is lossless video compression something we should care about, regardless of how cool it is? As part of my position as Digital Preservation Archivist at the Marriott Library, I’m tasked with creating a sustainable digital preservation program for the library’s unique digital collections and as part of that process, I’m making sure we also conserve server space. Uncompressed audio/video files take up a great deal more space than uncompressed text and photo files and that’s where lossless compression comes in. Lossless video compression refers to the fact that

the output from the decompressor is bit-for-bit identical with the original input to the compressor. The decompressed video stream should be completely identical to the original. – Ian Gilmour, R. Justin Davila

So, unlike lossy compression, lossless compression enables the need to store only the moving parts in any given image, without losing image quality when the original file is uncompressed. If, for instance a scene consists of The Fonz hanging out by his parked motorcycle, chatting up one of the waitresses at Al’s Diner, the compression scheme would be concerned with the objects that are moving within the frame (i.e. The Fonz, the waitress, the birds in the sky). The motorcycle and the diner aren’t changing at all, so there’s no need to store multiple copies of those images. When this scheme is kept internal to the image frame, this is referred to as intra-frame compression which is able to save large amounts of data when compared to the uncompressed file, which would store every pixel in the original image.

Inter-frame compression includes data from across the entire shot or scene. Entire sequences of frames with similarities can be encoded, with only the differences in the frames being specified. This means that the information not changing throughout the entire scene (rather than one frame as in the case of intra-frame compression) of The Fonz and the waitress can be compressed and later decompressed with no loss in image quality. Inter-frame is interesting because under certain conditions (typically when using lower bit-rates) the differences throughout a scene (known as temporal or inter-frame encoding techniques) require less data for picture quality than encoding every frame does.

Inter-frame encoding typically maps groups of pixels within macroblocks which stay the same from one frame to the next [i.e. fixed backgrounds] or which move in the same direction [e.g. moving objects or panned backgrounds]. Rather than encoding these image regions, their relative positions are tracked using motion vectors, which require much less coding.
- Ian Gilmour, R. Justin Davila

For a much more detailed look at lossless video compression, take a look at the piece referred to above by Ian Gilmour of the Australian National Film and Sound Archive and R. Justin Davila of Media Matters, LLC., Lossless Video Compression for Archives: Motion JPEG2 and Other Options.

And stay tuned to this blog for more on various recommended lossless audio and video compression codecs.

© 2012 Marriott Library Blog Suffusion theme by Sayontan Sinha