Humanist Discussion Group, Vol. 16, No. 343.
Centre for Computing in the Humanities, King's College London
Submit to: firstname.lastname@example.org
 From: Brian Whatcott <betwys@DIRECTVInternet.com> (26)
Subject: Re: 16.338 best OCR software?
 From: Willard McCarty <email@example.com> (36)
Subject: OCR software testing & research; hardware
Date: Sat, 23 Nov 2002 09:03:57 +0000
From: Brian Whatcott <betwys@DIRECTVInternet.com>
Subject: Re: 16.338 best OCR software?
At 01:05 AM 11/22/02, you wrote:
> from: Christian Koch <firstname.lastname@example.org>
> date: Fri, 15 Nov 2002 16:17:40
> I'm wondering if some of you have a clear favorite among optical
> character recognition programs (OCRs)? ...
> Have any of you found that it makes a significant difference as
> to the quality of the scanner (flatbed) that you use? ...
> Many thanks!
> Christian Koch
> Oberlin College
> Oberlin, Ohio
I can only offer a straw poll entry based on casual use.
In this category, TextBridge Classic, a Xerox product,
appears to do well enough. It makes the usual proportion of
mistakes which are intuitive - so that a lower case d might
traduce to c and l. I don't find that correcting this proportion
- under 0.1% - is onerous. I have used two scanners - both
cheap, and as is the custom, cheap is getting more capable
as time passes. The usb port which the current Canon
scanner uses is a blessed relief from unwieldy printer ports
formerly used for input. I use Win98.
Altus OK Eureka!
Date: Sat, 23 Nov 2002 09:06:54 +0000
From: Willard McCarty <email@example.com>
Subject: OCR hardware
In response to Chris Koch's query about OCR software, allow me to suggest
beginning with the Information Science Research Institute, University of
Nevada (US), which has for years been concerned with evaluation of systems.
See <http://www.isri.unlv.edu/>, esp under Publications / Journal Articles
and Technical Reports. Note also Center of Excellence for Document
Recognition and Analysis, State University of New York (Buffalo). Then
there are the extensive activities of the Centre for Pattern Recognition
and Machine Intelligence at Concordia University (Canada),
<http://www.cenparmi.concordia.ca/>, esp under publications. Many of these
publications are about the sort of leading-edge research not of much use to
someone wanting to scan documents accurately NOW.
Allow me also to report on a wonderful gadget that does more than well
enough under circumstances of note-taking. This is the C-Pen, www.cpen.com,
a handheld, pen-like device for scanning lines of printed text and
producing e-text. I purchased the C-Pen 600C some time ago and have been
using it for capturing passages from books I am reading. One's arm and hand
guiding a rigid device over the curved surface of a book-page cannot easily
achieve smooth accuracy of movement, but the character-recognition
capability of the device is usually good enough to compensate for minor
variations and the deep valley of the book's gutter. The pre-installed
software compensates for hyphens at the ends of lines, i.e. deletes them
and brings the severed parts of the affected words together.
I have not attempted to measure accuracy, but then I don't care as much
about that as I had thought I might. If I were attempting to digitize
significant amounts of text for subsequent analysis or publishing, I would
never, never use the C-Pen -- for one thing, line-at-a-time scanning is
simply too slow. But in combination with a handheld computer, such as a
Palm (into which one can beam the newly scanned text), it is a marvel.
Note-taking is a very individual thing -- I suppose that's true almost by
definition. In my practice I want sometimes to paraphrase, sometimes to
capture exactly. The C-Pen allows for much less troublesome capture. Of
course, as a result, nowadays I end up with many more and much longer exact
transcriptions. It will take some time before I can observe the effects on
my research, which is bound to be affected somehow. But at this early stage
I am certainly pleased enough not to be regretting the expense.
I have tried using the C-Pen with Palm in public places, e.g. on trains,
but find that its use attracts too much (admittedly polite, this being
London) attention and the amount of space one has makes the fiddling too
fiddly. But settled in a chair, at home or in the library, one has all one
Dr Willard McCarty | Senior Lecturer | Centre for Computing in the
Humanities | King's College London | Strand | London WC2R 2LS || +44 (0)20
7848-2784 fax: -2980 || firstname.lastname@example.org
This archive was generated by hypermail 2b30 : Sat Nov 23 2002 - 04:26:13 EST