Home About Subscribe Search Member Area

Humanist Discussion Group


< Back to Volume 32

Humanist Archives: Feb. 16, 2019, 6:12 a.m. Humanist 32.462 - pubs: research agenda for historical and multilingual OCR

                  Humanist Discussion Group, Vol. 32, No. 462.
            Department of Digital Humanities, King's College London
                   Hosted by King's Digital Lab
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org




        Date: 2019-02-15 15:12:21+00:00
        From: Brett Bobley 
        Subject: A Research Agenda for Historical and Multilingual Optical Character Recognition

Dear Colleagues,

I write about a new report that may be of interest to scholars, librarians,
computer and information scientists, and funders:
"A Research Agenda for Historical and Multilingual Optical Character
Recognition"
https://ocr.northeastern.edu/report/

The report, funded by The Andrew W. Mellon Foundation and authored by David
Smith and Ryan Cordell of Northeastern University, outlines a set of nine
recommendations to improve historical and multilingual OCR.

The idea for this report came about several years ago when staff in the Office
of Digital Humanities (ODH) at the NEH noticed that a large number of ODH-funded
projects working with textual materials were stymied or slowed by poor-quality
OCR, particularly historical texts or those with non-Latin scripts. This
observation led to discussions with grantees and with staff at both the Mellon
Foundation and the Library of Congress. Because Mellon staff were already
exploring ways to improve the OCR of digitized texts in Arabic and other
connected scripts, and LC was seeking greater accuracy in the OCR of its large
digitized collection of historical newspapers, we all agreed that a report was
needed assessing the state of the art in OCR and identifying key research tasks
that might help advance the quality of OCR for a variety of textual materials.

The authors of the report interviewed scholars, librarians, and scientists in
the US and abroad as well as people in industry. The report includes nine key
recommendations that they'd like to see funders and the field address. I think
addressing these recommendations, over time, could have a positive impact for
the field.

My team at the NEH is certainly keen to receive proposals in this area. Please
see our recent blog post for more information:  https://www.neh.gov/blog/neh-
invites-proposals-respond-historical-and-multilingual-ocr-report

Brett Bobley
Director, Office of Digital Humanities
National Endowment for the Humanities
Washington, DC



_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php


Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)

This site is maintained under a service level agreement by King's Digital Lab.