I pulled this from my "file" of goodies discussed on the list.   Perhaps
it will answers some questions.   

Lynne F. Wohleber, Archivist
Episcopal Diocese of Pittsburgh
900 Oliver Building
535 Smithfield St.
Pittsburgh, PA  15222
412-325-0087 x138
FAX:  412-471-5591
[log in to unmask]

-----Original Message-----
From: Archives & Archivists [mailto:[log in to unmask]] On
Behalf Of Richard Urban
Sent: Friday, October 29, 2004 5:28 PM
To: [log in to unmask]
Subject: Re: Archiving websites

Jessica,

At last week's DLF forum the folks at the Internet Archive announced
that
they are working on an open-source web crawler.  Additional information
available at: http://crawler.archive.org/

Here's the abstract from the program:

An introduction to Heritrix, an Open Source Archival-Quality Web
Crawler.
Dan Avery and the Internet Archive Web Archive Technical Team

Heritrix is the Internet Archive's open-source, extensible, web-scale,
archival-quality web crawler project. The Internet Archive started
Heritrix
development in the early part of 2003. The intention was to develop a
crawler for the specific purpose of archiving websites and to support
multiple different use cases including focused and broad crawling. The
software is open source to encourage collaboration and joint development
across institutions with similar needs. A pluggable, extensible
architecture
facilitates customization and outside contribution. Now, after over a
year
of development, the Internet Archive and other institutions are using
Heritrix to perform focused and increasingly broad crawls. The crawler
has
been adopted by the IIPC (International Internet Preservation
Consortium) as
the "official crawler" supported by this group. It is also of particular
interest to universities trying to figure out how to do web archiving.

Cheers,

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Richard Urban
Operations Coordinator
Colorado Digitization Program

http://www.cdpheritage.org
[log in to unmask]
(303) 871-2006
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----Original Message-----
From: Archives & Archivists [mailto:[log in to unmask]] On
Behalf
Of Jessica Tanny
Sent: Friday, October 29, 2004 10:56 AM
To: [log in to unmask]
Subject: Archiving websites

In 2002 there was an interesting conversation on the archives listserv
regarding archiving websites. At that time someone mentioned using Adobe
Acrobat as a way to capture a website (Mount Holyoke has a "how to"
guide
online:
http://www.mtholyoke.edu/lits/csit/documentation/archiving/archiving_web
sit
es.htm).

Recently there was an article in RAIN about the British Library
attempting
to archive parts of the UK web
(http://www.dmeurope.com/default.asp?ArticleID=2115) which said that
they
are planning on using HTTrack to capture the sites.

I was wondering if anyone has found a good (and simple) program that
allows
your organization to capture a large website and still provide access to
it
on a CD-R. I've experimented with HTTrack, Teleport Pro and Adobe
Acrabat
with mixed results.

Thanks,

Jessica Tanny

_____________________________________________
Jessica Tanny, Processing Archivist
University Archives
421 Capen Hall
University at Buffalo, State University of New York Buffalo, NY 14260
tel.
716.645.2916 [log in to unmask]

A posting from the Archives & Archivists LISTSERV List sponsored by the
Society of American Archivists, www.archivists.org.
For the terms of participation, please refer to
http://www.archivists.org/listservs/arch_listserv_terms.asp.

To subscribe or unsubscribe, send e-mail to [log in to unmask]
      In body of message:  SUB ARCHIVES firstname lastname
                    *or*:  UNSUB ARCHIVES To post a message, send e-mail
to
[log in to unmask]

Or to do *anything* (and enjoy doing it!), use the web interface at
     http://listserv.muohio.edu/archives/archives.html

Problems?  Send e-mail to Robert F Schmidt <[log in to unmask]>

A posting from the Archives & Archivists LISTSERV List sponsored by the
Society of American Archivists, www.archivists.org.
For the terms of participation, please refer to
http://www.archivists.org/listservs/arch_listserv_terms.asp.

To subscribe or unsubscribe, send e-mail to [log in to unmask]
      In body of message:  SUB ARCHIVES firstname lastname
                    *or*:  UNSUB ARCHIVES
To post a message, send e-mail to [log in to unmask]

Or to do *anything* (and enjoy doing it!), use the web interface at
     http://listserv.muohio.edu/archives/archives.html

Problems?  Send e-mail to Robert F Schmidt <[log in to unmask]>

A posting from the Archives & Archivists LISTSERV List sponsored by the Society of American Archivists, www.archivists.org.
For the terms of participation, please refer to http://www.archivists.org/listservs/arch_listserv_terms.asp.

To subscribe or unsubscribe, send e-mail to [log in to unmask]
      In body of message:  SUB ARCHIVES firstname lastname
                    *or*:  UNSUB ARCHIVES
To post a message, send e-mail to [log in to unmask]

Or to do *anything* (and enjoy doing it!), use the web interface at
     http://listserv.muohio.edu/archives/archives.html

Problems?  Send e-mail to Robert F Schmidt <[log in to unmask]>