9.666 OCR

Humanist (mccarty@phoenix.Princeton.EDU)
Fri, 29 Mar 1996 19:31:09 -0500 (EST)

Humanist Discussion Group, Vol. 9, No. 666.
Center for Electronic Texts in the Humanities (Princeton/Rutgers)
Information at http://www.princeton.edu/~mccarty/humanist/

[1] From: Nick Finke <Nick.Finke@law.uc.edu> (39)
Subject: Re: 9.651 the state of OCR?

For detailed information and leads to yet more information I would check
out the Web page of the Information Science Research Institute at the
University of Nevada - Las Vegas <http://www.isri.unlv.edu>.

The ISRI Annual Reports for the last three years are available from this
site. The Annual Report contains, inter alia, their survey of OCR engines.
You can learn more about OCR than you ever wanted to, but there is a lot
of very helpful info as well. The major problem with the particular
engines covered in the survey is that each vendor is allowed to submit only
one engine, which in many cases is a beta version and is often a
top-of-the-line product that runs on UNIX. This might be less than helpful
for someone who is scanning with a desktop machine, but there is usually a
related product (perhaps slightly less powerful or lacking some high-end
features) that runs on a desktop OS.

Hope this helps,

Nick Finke

>Last year at about this time I made a comparative survey of OCR software and
>found, in brief, the following:
>
>-- OmniPage Pro was clearly the best;
>-- Cuneiform was quite good but not trainable;
>-- TextBridge, despite all the noise, was not worth the trouble
> for the kinds of texts we usually handle.
>
>Would any Humanist care to comment on the state of the OCR art now? Is the
>above assessment, though exceedingly crude, still essentially correct?
>
>I would also appreciate any sources of detailed information. If you think I
>have it all wrong, or am seriously incorrect about any of the above, please
>set the record straight.
>
>Thanks very much.
>
>WM

*******************************************************************
* Nicholas D. Finke Phone:(513)556-0103 *
* Center for Electronic Text in the Law Fax: (513)556-6265 *
* University of Cincinnati College of Law Email: *
* P.O. Box 210142 nick.finke@law.uc.edu *
* Cincinnati, OH 45221-0142 *
*******************************************************************