If you do not have images, but only text from the original, you face the possibility that the rendered PDF text can be wrong, badly read by the OCR program, and/or not reflect changes made in handwriting to the originals. In many cases, optical-character-recognition rendering, done to make the PDF, cannot read or reflect what is in the original documents because your program may automatically decide what is text and what is graphics and improperly try to read graphics or "decide" that text is graphics and make a hodgepodge of your page. Trying to correct PDF or other text from essentially graphic documents is extremely time-consuming, frustrating, and sometimes impossible. We've been facing that here as we work to set up a digitization project. Published reports can often be simply rendered into PDF without incident, but mixed (graphics/text), handwritten, or hand-corrected documents can cause problems. One excellent example that we tried scanning first off was a photocopy of a carbon copy that had been hand-corrected as to one important mistake. The OCR-text version needed a large investment of time to correct due to the difficulty of reading the original, and only an image file could pick up the correction and make it legible. I thought it inauthentic to simply correct the stenographer's mistake in the PDF file. We do use PDFs as a text-file format. One advantage of an image format is that you always have an image from which to correct your text files should this not be possible on first pass, or should errors be discovered later. The PDF format can be used to create image-only files, but then you have no searchable text unless you make one image and one text file of the same document. (We've been unable to see and correct the OCR'd "invisible layer" of text in the Adobe "Searchable Image (Exact)," and we haven't been able to get an answer out of Adobe on how to do this.) So you would need to make two files of each document: one image and one text, unless there is nothing remarkable about the image (a published document), or no text worth searching in a mixed or graphic document. While you're saving two images, why not make one of them a TIFF? You can see in the archives of this list the discussions of the reasons for using the TIFF format for images. One excellent article to which I think I was referred from this list is at http://aic.stanford.edu/sg/emg/library/pdf/vitale/2006-01-vitale-digital_ima ge_file_formats.pdf (For those not interested in the history of imaging systems, start with #5, page 30, Image File Formats.") The arguments for TIFF include the lack of compression (in at least one variety) and its likely projection into the future as an imaging format. Note that there are different TIFF formats, and that you will need to be sure that the format you choose from your imaging program produces files that can be opened and read by other software. (We scanned upwards of 10 images before realizing that we were using an incompatible TIFF format.) So, yes, I agree that your best policy is to save a "master" TIFF file of each document image, and a PDF (if that's your chosen format) for searchable text. If you do only one or the other, even assuming a perfectly legible textual document, you have lost either the ability to search the text or the ability to correct the searchable text without consulting the original document. Arel Lucas, C.A. Archives/Special Collections Librarian Embry-Riddle Aeronautical University Prescott Campus -----Original Message----- From: Archives & Archivists [mailto:[log in to unmask]] On Behalf Of Rhue, Monika Sent: Friday, September 08, 2006 12:19 PM To: [log in to unmask] Subject: Digitization Question Importance: High We are working on a digitization project which involves scanning original documents from our archival collection. The web master want to scan these letters, correspondence, etc into PDF without creating a master file. From my research, we should scan all original documents in TIFF as the master files. Maybe the PDF can be the means in which people access the documents. It was stated that both the Florida Digital Archive and the Deep Blue repository at the University of Michigan find this compliance an acceptable archival format. http://www.fcla.edu/digitalArchive/pdfs/PDFGuideline.pdf#search=%22adobe %20acrobat%20professional%20A-1b%20standard%22 http://deepblue.lib.umich.edu/about/deepbluepreservation.jsp However, I want to get the opinion of my colleagues. Thanks Nooma Monika Rhue, MLIS Archivist/Archival Services Librarian Inez Moore Parker Archives and Research Center Johnson C. Smith University 100 Beatties Ford Road Charlotte, NC 28216 704-371-6741 Email: [log in to unmask] <mailto:[log in to unmask]> <http://archives.jcsu.edu/echo> A posting from the Archives & Archivists LISTSERV List sponsored by the Society of American Archivists, www.archivists.org. For the terms of participation, please refer to http://www.archivists.org/listservs/arch_listserv_terms.asp. To subscribe or unsubscribe, send e-mail to [log in to unmask] In body of message: SUB ARCHIVES firstname lastname *or*: UNSUB ARCHIVES To post a message, send e-mail to [log in to unmask] Or to do *anything* (and enjoy doing it!), use the web interface at http://listserv.muohio.edu/archives/archives.html Problems? Send e-mail to Robert F Schmidt <[log in to unmask]> A posting from the Archives & Archivists LISTSERV List sponsored by the Society of American Archivists, www.archivists.org. For the terms of participation, please refer to http://www.archivists.org/listservs/arch_listserv_terms.asp. To subscribe or unsubscribe, send e-mail to [log in to unmask] In body of message: SUB ARCHIVES firstname lastname *or*: UNSUB ARCHIVES To post a message, send e-mail to [log in to unmask] Or to do *anything* (and enjoy doing it!), use the web interface at http://listserv.muohio.edu/archives/archives.html Problems? Send e-mail to Robert F Schmidt <[log in to unmask]>