Adobe Acrobat's ability to 'capture' a website in PDF has one noteworthy
flaw that you need to keep in mind.

 

Links on the captured website function the way they are __coded__ rather
than the way they behave. I wish I could find a more eloquent way to
describe that. Perhaps this example will help.

A homepage (index1.html) has text "Frequently Asked Questions"
hyperlinked to faq1.html using the coding <A HREF=faq/faq1.html>.
Because this is a relative URL, the expected result of moving to the FAQ
page in the PDF file occurs. However, if the coding uses an absolute
URL, i.e.  <A HREF=http://www.example.org/faq/faq1.html>, clicking on
the link in the PDF file sends the user out to the Internet instead of
to another page in the same PDF page. 

 

This behavior will deliver whatever substance is at
http://www.example.org/faq/faq1.html at the time the PDF is used. This
could easily be different content because the live webpage has been
updated. It might also simply no longer be available on the Internet,
providing the researcher with a "404 File not found" window. Other
explanations for this latter behavior include changed mapping for the
live file (e.g. http://www.example.org/faq.html) or change in the domain
name (e.g. www.example.org <http://www.example.org/>  becomes
www.example.net <http://www.example.net/> ).

 

Whether the file is different, no longer exists, or has been moved, the
point is that the link in the PDF-captured object directs you outside of
the object itself when its original behavior was clearly to have
navigated the user to another page at that same website.

 

Such is the nature of the web. In our rush to it, we have made it
possible to accomplish many things using different approaches. All are
valid. The preservation choices are not as forgiving.

 

I have other qualms. Contact me offline, if you want to hear them.

 

My best to all of you,

 

Ricc

 

Riccardo Ferrante

Information Technology Archivist & Electronic Records Program Director

------------------------------------------------------------------------
------------------------------------------

Smithsonian Institution Archives - 900 Jefferson Dr. S.W. MRC 414 -
Washington, DC 20013

------------------------------------------------------------------------
------------------------------------------

[Email] [log in to unmask] - [Phone] 202.357.1420 - [Fax] 202.357.2395

 

 

The Smithsonian Institution Archives is relocating to new offices.
Records management, reference services and history of the Smithsonian
queries are unavailable until we reopen in early fall 2006. In order to
serve you better, please check our website for updates and specific
information http://siarchives.si.edu <http://siarchives.si.edu/>  

________________________________

From: Archives & Archivists [mailto:[log in to unmask]] On
Behalf Of Rick Barry
Sent: Wednesday, July 26, 2006 4:17 PM
To: [log in to unmask]
Subject: Re: Capturing websites

 

In a message dated 7/26/2006 3:01:38 A.M. Eastern Standard Time,
[log in to unmask] ([log in to unmask]
<mailto:[log in to unmask](Jessica>  (Jessica Tanny)) writes:

	In 2002 there was an interesting conversation on the archives
listserv
	regarding archiving websites. At that time someone mentioned
using Adobe
	Acrobat as a way to capture a website (Mount Holyoke has a "how
to"
	guide
	online:
	
http://www.mtholyoke.edu/lits/csit/documentation/archiving/archiving_web
	sit
	es.htm).

I believe the current address is
http://www.mtholyoke.edu/lits/ris/documentation/archiving/archiving_webs
ites.htm

 

I back up my own Website www.mybestdocs.com by regularly creating and
naming a blank file using the date in the file name, e.g.,
<bu-mbd060725> in a backup directory on my C-Drive, then publishing my
live Website to that directory/file using MS FrontPage, which is fine
for my purposes. I keep it there and also simply copy that file to a DVD
in case -- I should say for when -- my PC hard drive dies someday when I
least expect it. It is then possible at any time to open that file in
FrontPage and republish it as a whole or examine/edit any page/sub-page
within the Website. I don't do that, because I use such backups for
archival snapshots of my Website reflecting major changes in content or
design. But it would be possible to do so, something that anyone
concerned about maintaining the integrity of archival versions of a
Website would want seriously to consider. 

 

For institutional purposes, I would recommend an open source product
such as Heretrix http://archive-crawler.sourceforge.net/ used by the
Internet Archive, rather than a proprietary software product such as
FrontPage, Acrobat, etc. Alternatively, a good enterprise content
management (ECM) system that is 5015 certified would also work and
provide at least the minimum recordkeeping functionality, if not more.

 

Regards,
 
Rick

Rick Barry
www.mybestdocs.com <http://www.mybestdocs.com/> 
Cofounder, Open Reader Consortium
www.openreader.org <http://www.openreader.org/> 

A posting from the Archives & Archivists LISTSERV List sponsored by the
Society of American Archivists, www.archivists.org. For the terms of
participation, please refer to
http://www.archivists.org/listservs/arch_listserv_terms.asp. 

To subscribe or unsubscribe, send e-mail to [log in to unmask]
In body of message: SUB ARCHIVES firstname lastname *or*: UNSUB ARCHIVES
To post a message, send e-mail to [log in to unmask] 

Or to do *anything* (and enjoy doing it!), use the web interface at
http://listserv.muohio.edu/archives/archives.html 

Problems? Send e-mail to Robert F Schmidt <[log in to unmask]>

A posting from the Archives & Archivists LISTSERV List sponsored by the Society of American Archivists, www.archivists.org.
For the terms of participation, please refer to http://www.archivists.org/listservs/arch_listserv_terms.asp.

To subscribe or unsubscribe, send e-mail to [log in to unmask]
      In body of message:  SUB ARCHIVES firstname lastname
                    *or*:  UNSUB ARCHIVES
To post a message, send e-mail to [log in to unmask]

Or to do *anything* (and enjoy doing it!), use the web interface at
     http://listserv.muohio.edu/archives/archives.html

Problems?  Send e-mail to Robert F Schmidt <[log in to unmask]>