Documents of 20th-century Latin American and Latino Art

www.mfah.org Home

Text/HTML

Documents of 20th-century Latin American and
Latino Art digitization process

The ICAA and its research teams developed and adopted a number of digitization standards and best practices during the Recovery Phase of the Documents of 20th-century Latin American and Latino Art project. In many cases, the ICAA exceeded its own expectations of discovering documents that could be prepared for publication. As a result, the ICAA further refined and upgraded its digitization standards for recovery and its best practices for implementation. From 2004 to 2011, approximately ten thousand documents were digitized by the project’s research teams and ICAA staff members. For the ICAA, document recovery has been an organic process, beginning with the discovery and retrieval of research materials and continuing as additional research materials have been uncovered. Moreover, since the onset of the Documents Project’s Digital Archive (hereafter “Digital Archive”), the impetus has been to capitalize on technological advances in order to grant access to documents that are little known or otherwise difficult to find, even within the countries or communities where they have been recovered. Digitization—a process that leads to the creation of materials in a digital format—involves a number of logistical and technical processes. These were put to the test by the volume, diversity, origin, and cultural value of the materials uncovered specifically for the ICAA Documents Project. Digitizing historical materials has presented a unique set of challenges for the ICAA. In many instances, the ICAA’s imaging professionals were limited by the quality of the documents that the research teams had access to in the field. Also, in certain cases, the imaging professionals were limited by restrictions and/or modifications to a document’s layout required by the copyright holder. The source images for rare documents were often in less-than-optimal conditions. In certain extreme cases, the original documents were already irrevocably lost and exist solely through copies or other facsimiles in personal archives or in library special collections that remain closed to the general public.

Digital Images

The process of digital conversion demands continuous and creative attention to ever-changing standards and methods for scanning and other digital-conversion systems. The ICAA created a set of scanning and quality standards in response to the unique collection of images retrieved.

When digital images arrive at the ICAA, they undergo a series of modifications before they are uploaded to the Digital Archive. For each document available on the Documents Project’s Digital Archive, master digital image files have been created using a flatbed scanner or a digital camera and stored in an uncompressed TIFF format. Each image file has been labeled with a unique number that corresponds to the equivalent document’s Record ID. Destined for archival storage, these original digital files are not altered, enhanced, or otherwise corrected, creating a record of a document’s authentic form and condition at the time that the scan was made.

Simultaneously, these archival images are converted to digital files that can be accessed by Internet users under the best possible conditions. Lower-resolution copies—or “access” files—are created for publication on the web and are enhanced by professional imaging software, including Photoshop, to ensure the readability of the recovered documents. These files are resized to 8 x 11 ½ inches (or 215.9 mm × 279.4 mm) for printing purposes and stored in a compressed JPEG format. In addition, digital images of especially important, large, oddly shaped, or difficult-to-read documents have undergone customized processes in order to improve their readability, to facilitate their enlargement on the desktop, and to enable users to study their fine details.

Additionally, low-resolution images available on the ICAA Digital Archive have been embedded with a watermark that precludes users from appropriating the images for any purpose other than that associated with the research and educational initiatives for which the ICAA’s website has been developed. These web images provide immediate access to good-quality reference copies of the recovered documents.

Format

A format refers to a set of processes attached to the physical qualities associated with the original medium of a digitized document; these constraints impact the choice of digitization equipment and associated quality assets.

The ICAA digitizes many types of materials, including printed materials (monographs and periodicals), manuscripts (letters, journals, notes, postcards, drawings, etc.), posters and prints, photographs, and so forth. These materials have been digitized from their original form, from facsimiles, or microform (microfilm and microfiche). The ICAA is systematically checking the quality of such digitized materials before they are placed online. After being scanned, enhanced, and reviewed, each document is then converted to a PDF and uploaded to the website for user access.

Conversion to text format

The ICAA has subjected the uploaded PDF files to an OCR (Optical Character Recognition) process. OCR is used to optimize the image files so that users may search and find key words, terms, characters, and numbers in an image. Conversion is done automatically by professional software. The quality of the OCR is highly dependent upon the quality and clarity of the primary document and the different parameters at play during the digitization process. Special conditions that affect OCR include fonts that are either very small or very large and/or that have oddly spaced characters. Handwritten texts and materials using accents and other diacritical marks are also difficult to process accurately because of their unique characters.

Resources

For more information about some of the standards and best practices that the ICAA’s imaging staff and research teams have followed and implemented over the years, please contact Maria B. McGreger: mmgreger@mfah.org

Also, please see:


More

IcaadocsArchive