John --

As former Systems Editor/Library Manager at The Tennessean in Nashville,
I have investigated this on and off over the last few years.

There are various approaches to this problem. Given the expense of the
equipment, software, the staff time needed to do the scanning and OCR, I
think most newspapers that have attempted such projects have opted to
outsource the work.

One option is Heritage Microfilm. http://www.heritagemicrofilm.com/ My
understanding is that they will basically undertake the scanning for
free in return for a 50/50 revenue split on sales of the content through
their affiliated website NewspaperArchive.com.

When I last investigated them, I believe they required you to switch
your microfilming contract to them before they would consider the
digitization angle. (This was a problem for Gannett newspapers, since
the company has a contract with Proquest.) However, I seem to recall
someone telling me that they were considering offering a service where
they would do the digitization even if the actual film was still being
handled by another company. However this relies on you being able to
ship them the microfilm masters and/or a good enough copy. For most
newer film, Proquest probably has the masters. For older film, who knows
-- at the Tennessean we never located the masters for our pre-1950s
film, and instead ordered copies from the masters held by the state
archives when we needed to replace a reel. If you digitize from an older
"use" copy of the microfilm, the resulting files are likely to be of far
lower quality due to the wear and deterioration of the film.

If you're interested in pursuing the Heritage option, you might want to
touch bases with other papers that have done it. I'm not aware of any
Gannett properties that have taken the plunge, but I know the Syracuse
Post-Standard began a program to do this a few years ago. I don't know
who is running their library now, but Laura Soto-Barra (now in the news
library at NPR in DC) was director when the program started.

You would also need to look at the legal issues involved in this with
regard to copyright. Heritage claims that there are no problems with
this, but in light of the Tasini ruling and subsequent copyright
paranoia in the newspaper business, I doubt you'll want to take their
word for it. (You could start with papers before 1923, which should be
in the public domain no matter what, though.)

Olive Software (http://www.olivesoftware.com/) offers a software
solution that provides for scanning and OCR, and I believe they can also
offer the associated services. When I investigated this for my former
college newspaper a few years ago, this was the system being offered by
OCLC to libraries wishing to undertake newspaper digitization efforts.
Their system also allows you to import new PDFs into the same system to
provide a seamless archive.

(If you want to take a look at the original proposal I did for The Daily
Orange, there is a powerpoint online at
http://www.dwiggins.net/dopresentation/ . As of yet, this hasn't gotten
off the ground. The page also contains a link to an example of a paper
that was digitized using the OCLC service.)

I believe Proquest also offers a digitization solution. But when I last
looked I believe they were focusing on "national" papers like the New
York Times and The Washington Post, and weren't really marketing their
services to smaller newspapers. Perhaps this has changed in the interim
-- might be worth checking with them.

I know there are some smaller vendors doing this as well. If you're
serious about it, you might want to get someone to look around at the
next NEXPO (newspaper technology) conference to see what's out there.

My suspicion is that projects like this will be difficult to
cost-justify for many bottom-line oriented newspapers. The cost to
digitize and store the vast quantity of content is huge, and the
projected revenue for a single newspaper is likely quite small. One
option would be a partnership along the lines of what Heritage is doing.
(Or, perhaps a newspaper group like Gannett could do it for all of their
papers.)

Another alternative might be to investigate some sort of public-private
partnership with a local library. This is an approach that always
appealed to me, although I was never able to pursue it in Nashville. We
actually had a 40 year index of the newspaper on index cards, and at one
point I priced out having them digitized. It was massively expensive,
but would have made an exceptionally valuable resource far more
accessible. I always wondered if perhaps we could have gotten grant
funding for the project if we agreed to donate the resulting database to
the Nashville Public Library. I never had time to pursue this, but I
think it's an interesting idea.

It's also important to remember that digitization is not the same thing
as preservation. Microfilm is still the gold standard for preserving the
informational content of newspapers, because it has a theoretical
lifespan of 500+ and can basically be read with a magnifying glass.
Given the rapid change in the technology field, computer files are
likely to present far more preservation headaches in the future. While
digitization is an excellent way to provide enhanced access to
collections and to reduce wear-and-tear on traditional archival
materials, it should not be considered a substitute for microfilming or
retaining original documents. (The idea of ditching microfilm in favor
of PDFs gets floated every so often on the NewsLib listserv, and I
always cringe a bit, because while PDFs seem likely to be around for a
while, are we really that confident that we'll be able to preserve them
in a readable form for centuries?)

I would, however, argue for preserving PDFs as a supplement to the
microfilm, since black and white microfilm does not capture a true image
of modern color newspaper pages.

Sorry for this long, rambling response -- I hope you're able to find
some useful tidbits in it somewhere!

-- 

David Dwiggins
Dean’s Fellow for Technology Support
Graduate School of Library and Information Science
Simmons College
300 The Fenway, Boston, MA 02115-5898
617-521-2871  /  [log in to unmask]




>   
>>>> "John Prime" <[log in to unmask]> 5/29/2006 12:42 AM >>>
>>>>         
> Greetings:
>
> I am a new subscriber, and while not technically an archivist, am
> probably
> the only person at my newspaper who regularly/frequently accesses the
> 130-plus years of newspapers we have on microfilm in our "morgue."
>
> Last week, I met with our publisher, editor and several managers to
> convince
> them of the need to digitally archive the newspapers we now have A) on
> microfilm and B) moldering in the basement. They recognize the need to
> preserve these but balked at my suggestion that we secure a digital
> microform scanner to begin this work. They wonder whether or not some
> other
> paper might have pioneered a way to tackle this task or whether it
> might be
> "jobbed out." We also would like to have some way to search through
> the
> final scanned text, though of course I realize this would be wholly
> dependent on the quality of the original printed paper. I can assure
> you
> this was sometimes of good quality, but often lamentable.
>
> Any suggestions as to what course to follow would be most welcome!
>
> John Andrew Prime
> Military Affairs Reporter
> The Times At Shreveport
> A Gannett Newspaper
> 222 Lake Street
> Shreveport, Louisiana
> 71130-0222
> (800) 462-6436 WATS VOX
> (318) 459-3233 VOX
> (318) 459-3301 FAX
> [log in to unmask] 
> [log in to unmask] 
> Co-author of "Barksdale Air Force Base," 2002, Arcadia Publishing, now
> in
> its third printing
>
> Web site: www.shreveporttimes.com 
>   

A posting from the Archives & Archivists LISTSERV List sponsored by the Society of American Archivists, www.archivists.org.
For the terms of participation, please refer to http://www.archivists.org/listservs/arch_listserv_terms.asp.

To subscribe or unsubscribe, send e-mail to [log in to unmask]
      In body of message:  SUB ARCHIVES firstname lastname
                    *or*:  UNSUB ARCHIVES
To post a message, send e-mail to [log in to unmask]

Or to do *anything* (and enjoy doing it!), use the web interface at
     http://listserv.muohio.edu/archives/archives.html

Problems?  Send e-mail to Robert F Schmidt <[log in to unmask]>