Just when archivists are debating digital preservation issues and electronic records management....along comes Xerox will a new idea....... Let's discuss the idea of a new paper product in which print automatically fades away in 16 hours...will this make it to market? OR what if it does, then what are the implications? I guess this product does away with the need to suppress of info and prevent accountability....or what happens if we have digital reprints but no original?......HMMMM.... LINK: http://hardware.slashdot.org/hardware/06/09/07/2243222.shtml Lanell ----------- Quoting "Lucas. Arel" <[log in to unmask]>: > If you do not have images, but only text from the original, you face the > possibility that the rendered PDF text can be wrong, badly read by the OCR > program, and/or not reflect changes made in handwriting to the originals. > In many cases, optical-character-recognition rendering, done to make the > PDF, cannot read or reflect what is in the original documents because your > program may automatically decide what is text and what is graphics and > improperly try to read graphics or "decide" that text is graphics and make a > hodgepodge of your page. Trying to correct PDF or other text from > essentially graphic documents is extremely time-consuming, frustrating, and > sometimes impossible. > > We've been facing that here as we work to set up a digitization project. > Published reports can often be simply rendered into PDF without incident, > but mixed (graphics/text), handwritten, or hand-corrected documents can > cause problems. One excellent example that we tried scanning first off was > a photocopy of a carbon copy that had been hand-corrected as to one > important mistake. The OCR-text version needed a large investment of time > to correct due to the difficulty of reading the original, and only an image > file could pick up the correction and make it legible. I thought it > inauthentic to simply correct the stenographer's mistake in the PDF file. > We do use PDFs as a text-file format. One advantage of an image format is > that you always have an image from which to correct your text files should > this not be possible on first pass, or should errors be discovered later. > > The PDF format can be used to create image-only files, but then you have no > searchable text unless you make one image and one text file of the same > document. (We've been unable to see and correct the OCR'd "invisible layer" > of text in the Adobe "Searchable Image (Exact)," and we haven't been able to > get an answer out of Adobe on how to do this.) So you would need to make two > files of each document: one image and one text, unless there is nothing > remarkable about the image (a published document), or no text worth > searching in a mixed or graphic document. While you're saving two images, > why not make one of them a TIFF? > > You can see in the archives of this list the discussions of the reasons for > using the TIFF format for images. One excellent article to which I think I > was referred from this list is at > http://aic.stanford.edu/sg/emg/library/pdf/vitale/2006-01-vitale-digital_ima > ge_file_formats.pdf (For those not interested in the history of imaging > systems, start with #5, page 30, Image File Formats.") The arguments for > TIFF include the lack of compression (in at least one variety) and its > likely projection into the future as an imaging format. Note that there are > different TIFF formats, and that you will need to be sure that the format > you choose from your imaging program produces files that can be opened and > read by other software. (We scanned upwards of 10 images before realizing > that we were using an incompatible TIFF format.) > > So, yes, I agree that your best policy is to save a "master" TIFF file of > each document image, and a PDF (if that's your chosen format) for searchable > text. If you do only one or the other, even assuming a perfectly legible > textual document, you have lost either the ability to search the text or the > ability to correct the searchable text without consulting the original > document. > > Arel Lucas, C.A. > Archives/Special Collections Librarian > Embry-Riddle Aeronautical University > Prescott Campus > -----Original Message----- > From: Archives & Archivists [mailto:[log in to unmask]] On Behalf > Of Rhue, Monika > Sent: Friday, September 08, 2006 12:19 PM > To: [log in to unmask] > Subject: Digitization Question > Importance: High > > We are working on a digitization project which involves scanning > original documents from our archival collection. The web master want to > scan these letters, correspondence, etc into PDF without creating a > master file. From my research, we should scan all original documents in > TIFF as the master files. Maybe the PDF can be the means in which people > access the documents. It was stated that both the Florida Digital > Archive and the Deep Blue repository at the University of Michigan find > this compliance an acceptable archival format. > > http://www.fcla.edu/digitalArchive/pdfs/PDFGuideline.pdf#search=%22adobe > %20acrobat%20professional%20A-1b%20standard%22 > > http://deepblue.lib.umich.edu/about/deepbluepreservation.jsp > > However, I want to get the opinion of my colleagues. > > Thanks > > Nooma Monika Rhue, MLIS > Archivist/Archival Services Librarian > Inez Moore Parker Archives and Research Center > Johnson C. Smith University > 100 Beatties Ford Road > Charlotte, NC 28216 > 704-371-6741 > Email: [log in to unmask] <mailto:[log in to unmask]> > > > > <http://archives.jcsu.edu/echo> > > A posting from the Archives & Archivists LISTSERV List sponsored by the > Society of American Archivists, www.archivists.org. > For the terms of participation, please refer to > http://www.archivists.org/listservs/arch_listserv_terms.asp. > > To subscribe or unsubscribe, send e-mail to [log in to unmask] > In body of message: SUB ARCHIVES firstname lastname > *or*: UNSUB ARCHIVES > To post a message, send e-mail to [log in to unmask] > > Or to do *anything* (and enjoy doing it!), use the web interface at > http://listserv.muohio.edu/archives/archives.html > > Problems? Send e-mail to Robert F Schmidt <[log in to unmask]> > > A posting from the Archives & Archivists LISTSERV List sponsored by > the Society of American Archivists, www.archivists.org. > For the terms of participation, please refer to > http://www.archivists.org/listservs/arch_listserv_terms.asp. > > To subscribe or unsubscribe, send e-mail to [log in to unmask] > In body of message: SUB ARCHIVES firstname lastname > *or*: UNSUB ARCHIVES > To post a message, send e-mail to [log in to unmask] > > Or to do *anything* (and enjoy doing it!), use the web interface at > http://listserv.muohio.edu/archives/archives.html > > Problems? Send e-mail to Robert F Schmidt <[log in to unmask]> > > > -- Lanell E. James Master of Science in Information School of Information University of Michigan [log in to unmask] A posting from the Archives & Archivists LISTSERV List sponsored by the Society of American Archivists, www.archivists.org. For the terms of participation, please refer to http://www.archivists.org/listservs/arch_listserv_terms.asp. To subscribe or unsubscribe, send e-mail to [log in to unmask] In body of message: SUB ARCHIVES firstname lastname *or*: UNSUB ARCHIVES To post a message, send e-mail to [log in to unmask] Or to do *anything* (and enjoy doing it!), use the web interface at http://listserv.muohio.edu/archives/archives.html Problems? Send e-mail to Robert F Schmidt <[log in to unmask]>