This is an interesting post from the Library of Congress Digital Preservation site:

The International Internet Preservation Consortium recently released a web archives registry. The registry offers a single point of access to a comprehensive overview of member web archiving efforts and outputs. Twenty-one archives from around the world are currently included; updates will be added as additional archives are made accessible by IIPC members.

In addition to a detailed description of each web archive, the following information is included:

  • Collecting institution
  • Start date
  • Archive interface language(s)
  • Access methods (URL search, keyword search, full text search, thematic, etc.)
  • Harvesting methods (National domain, event, thematic, etc.)
  • Access restrictions

The registry was put in place by the IIPC Access Working Group, which focuses on initiatives, procedures and tools required to provide immediate and future to access archived web material. The registry will also provide a basis for IIPC to explore integrated access and search in the future.

Preserving the web is not a task of any single institution. It is a mission common to all IIPC members, and many practices and lessons are transferable. The launch of the archive registry showcases international collaboration for preserving web content for future generations.

The IIPC was chartered in 2003 with 12 participating institutions. Today, there are over thirty-five member organizations. More information about the IIPC can be found at http://netpreserve.org.

At the Marriott Library, we’ve recently begun looking into what it would take to archive websites that are important to the University.  During some research into this area, I came across the proceedings of the 2009 International Web Archiving Workshop (IWAW).

An interesting project is taking place in France that may change the way web archiving is approached.   At University P. and M. Curie in Paris, researchers are developing a web crawler that will not only detect changes to a website but one that will be able to detect which changes are unimportant (changing ads on a page, etc.) versus which are important to the page’s content.  If successful, this might greatly improve the effectiveness of the web archiving system because digital archives would no longer be gumming up bandwidth and storage space with needless data.

This project is taking place in conjunction with the French National Audio-Visual Institute (INA).  The institute would like to archive French television and radio station websites.  The visual component of the institute’s pages is very important to the project, not just the content.

According to the workshop proceedings, the project idea is to “use a visual page analysis to assign importance to web pages parts, according to their relative location. In other words, page versions are restructured according to their visual representation. Detecting changes on such restructured page versions gives relevant information for understanding the dynamics of the web sites. A web page can be partitioned into multiple segments or blocks and, often, the blocks in a page have a different importance. In fact, different regions inside a web page have different importance weights according to their location, area size, content, etc. Typically, the most important information is on the center of a page, advertisement is on the header or on the left side and copyright is on the footer. Once the page is segmented, then a relative importance must be assigned to each block…Comparing two pages based on their visual representation is semantically more informative than with their HTML representation.”

The main concept and hopeful contribution to the world of web archiving is summed up by the presenters as follows:

• A novel web archiving approach that combines three concepts: visual page analysis (or segmentation), visual change detection and importance of web page’s blocks.

• An extension of an existing visual segmentation model to describe the whole visual aspect of the web page.

• An adequate change detection algorithm that computes changes between visual layout structures of web pages with a reasonable complexity in time.

• A method to evaluate the importance of changes occurred between consecutive versions of documents.

• An implementation of our approach and some experiments to demonstrate its feasibility.

It will be interesting to follow up with this project as it reaches its conclusion and see how its results will affect current web archiving players like Archive-it.org as well as fellow research endeavors like the Memento Project.

You can read about this project in much more technical detail at the IWAW website (unless it’s been taken down and hasn’t been properly archived).

http://iwaw.europarchive.org/

County Map of Utah

The annual Governor’s Medal for Science and Technology awards ceremony will take place tomorrow and one of the seven recipients is Randall J. Olson, MD, Director, John A. Moran Eye Center. The awards program recognizes people and companies in Utah whose career achievements or distinguished service have benefited the state in the areas of science and technology http://www.sltrib.com/news/ci_14111462. We have archived many of Dr. Olson’s research articles in USpace which you can view here http://tinyurl.com/y9uu55a

Permanently housed in the Multimedia Archives of Special Collections, below are just some of the new videos available on Marriott’s YouTube Channel

© 2012 Marriott Library Blog Suffusion theme by Sayontan Sinha