16.343 OCR software and hardware

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty@kcl.ac.uk)
Date: Sat Nov 23 2002 - 04:22:12 EST

  • Next message: Humanist Discussion Group (by way of Willard McCarty

                   Humanist Discussion Group, Vol. 16, No. 343.
           Centre for Computing in the Humanities, King's College London
                         Submit to: humanist@princeton.edu

       [1] From: Brian Whatcott <betwys@DIRECTVInternet.com> (26)
             Subject: Re: 16.338 best OCR software?

       [2] From: Willard McCarty <willard.mccarty@kcl.ac.uk> (36)
             Subject: OCR software testing & research; hardware

             Date: Sat, 23 Nov 2002 09:03:57 +0000
             From: Brian Whatcott <betwys@DIRECTVInternet.com>
             Subject: Re: 16.338 best OCR software?

    At 01:05 AM 11/22/02, you wrote:

    > >
    > from: Christian Koch <christian.koch@oberlin.edu>
    > date: Fri, 15 Nov 2002 16:17:40
    > I'm wondering if some of you have a clear favorite among optical
    > character recognition programs (OCRs)? ...
    > Have any of you found that it makes a significant difference as
    > to the quality of the scanner (flatbed) that you use? ...
    > Many thanks!
    > Christian Koch
    > Oberlin College
    > Oberlin, Ohio

    I can only offer a straw poll entry based on casual use.
    In this category, TextBridge Classic, a Xerox product,
    appears to do well enough. It makes the usual proportion of
    mistakes which are intuitive - so that a lower case d might
    traduce to c and l. I don't find that correcting this proportion
       - under 0.1% - is onerous. I have used two scanners - both
    cheap, and as is the custom, cheap is getting more capable
       as time passes. The usb port which the current Canon
    scanner uses is a blessed relief from unwieldy printer ports
    formerly used for input. I use Win98.

    Brian Whatcott
        Altus OK Eureka!

             Date: Sat, 23 Nov 2002 09:06:54 +0000
             From: Willard McCarty <willard.mccarty@kcl.ac.uk>
             Subject: OCR hardware

    In response to Chris Koch's query about OCR software, allow me to suggest
    beginning with the Information Science Research Institute, University of
    Nevada (US), which has for years been concerned with evaluation of systems.
    See <http://www.isri.unlv.edu/>, esp under Publications / Journal Articles
    and Technical Reports. Note also Center of Excellence for Document
    Recognition and Analysis, State University of New York (Buffalo). Then
    there are the extensive activities of the Centre for Pattern Recognition
    and Machine Intelligence at Concordia University (Canada),
    <http://www.cenparmi.concordia.ca/>, esp under publications. Many of these
    publications are about the sort of leading-edge research not of much use to
    someone wanting to scan documents accurately NOW.

    Allow me also to report on a wonderful gadget that does more than well
    enough under circumstances of note-taking. This is the C-Pen, www.cpen.com,
    a handheld, pen-like device for scanning lines of printed text and
    producing e-text. I purchased the C-Pen 600C some time ago and have been
    using it for capturing passages from books I am reading. One's arm and hand
    guiding a rigid device over the curved surface of a book-page cannot easily
    achieve smooth accuracy of movement, but the character-recognition
    capability of the device is usually good enough to compensate for minor
    variations and the deep valley of the book's gutter. The pre-installed
    software compensates for hyphens at the ends of lines, i.e. deletes them
    and brings the severed parts of the affected words together.

    I have not attempted to measure accuracy, but then I don't care as much
    about that as I had thought I might. If I were attempting to digitize
    significant amounts of text for subsequent analysis or publishing, I would
    never, never use the C-Pen -- for one thing, line-at-a-time scanning is
    simply too slow. But in combination with a handheld computer, such as a
    Palm (into which one can beam the newly scanned text), it is a marvel.

    Note-taking is a very individual thing -- I suppose that's true almost by
    definition. In my practice I want sometimes to paraphrase, sometimes to
    capture exactly. The C-Pen allows for much less troublesome capture. Of
    course, as a result, nowadays I end up with many more and much longer exact
    transcriptions. It will take some time before I can observe the effects on
    my research, which is bound to be affected somehow. But at this early stage
    I am certainly pleased enough not to be regretting the expense.

    I have tried using the C-Pen with Palm in public places, e.g. on trains,
    but find that its use attracts too much (admittedly polite, this being
    London) attention and the amount of space one has makes the fiddling too
    fiddly. But settled in a chair, at home or in the library, one has all one


    Dr Willard McCarty | Senior Lecturer | Centre for Computing in the
    Humanities | King's College London | Strand | London WC2R 2LS || +44 (0)20
    7848-2784 fax: -2980 || willard.mccarty@kcl.ac.uk

    This archive was generated by hypermail 2b30 : Sat Nov 23 2002 - 04:26:13 EST