16.264 OCRing

From: Humanist Discussion Group (by way of Willard McCarty (w.mccarty@btinternet.com)
Date: Mon Oct 14 2002 - 03:19:00 EDT

  • Next message: Humanist Discussion Group (by way of Willard McCarty : "16.266 conferences: WorldCALL; TCC"

                   Humanist Discussion Group, Vol. 16, No. 264.
           Centre for Computing in the Humanities, King's College London
                         Submit to: humanist@princeton.edu

             Date: Mon, 14 Oct 2002 08:15:32 +0100
             From: "Arianna Ciula" <ciula@media.unisi.it>
             Subject: Re: 16.262 OCRing the OLD

    I am a PHD student in book sciences at the University of Siena. I've worked
    a bit on OCR field before my degree on Communication Sciences. The
    experimentations we did at the ILC (Computational Linguistics Institute) of
    CNR in Pisa with professor Andrea Bozzi was focused on old (around
    1600)printed books automatic recogniton with the help of a linguistic module
    for Latin. Tha name of the new software was LAperLA (Lettore automatico per
    libri anctichi).
    The system is trainable and may be it could be adapted to your research


    Arianna Ciula
    Dipartimento Documentazione e Tradizioni Culturali
    Universit degli studi di Siena - Italy
    ----- Original Message -----
    From: "Humanist Discussion Group
    <willard.mccarty@kcl.ac.uk>)" <willard@lists.village.virginia.edu>
    To: <humanist@Princeton.EDU>
    Sent: Thursday, October 10, 2002 11:28 AM

    > Humanist Discussion Group, Vol. 16, No. 262.
    > Centre for Computing in the Humanities, King's College London
    > www.kcl.ac.uk/humanities/cch/humanist/
    > Submit to: humanist@princeton.edu
    > Date: Thu, 10 Oct 2002 10:06:20 +0100
    > From: Willard McCarty <w.mccarty@btinternet.com>
    > Subject: OCRing the OLD
    > This in response to Humanist 16.259. Putting aside the legality of
    > the Oxford Latin Dictionary (currently very much in print, and not cheap),
    > some experiments I did years ago with pages of the Dictionary suggest that
    > a great deal of proofreading and correction would be required. There are
    > three columns per page, which when I experimented with the scanning
    > required manual definition; things may have improved with the software
    > since then in that regard. A spelling dictionary, if one could be located,
    > would not help with the numerous abbreviations. The size of type for
    > quotations and references is very small. Overall it looks to me like a
    > manual-entry job.
    > Unfortunately the OLD was, as I recall being told, the last book or major
    > reference book for which OUP used metal plates. So there are no tapes or
    > other digital storage media to be accessed. My contact at the Press said
    > would welcome a digitized version for research purposes -- as would
    > thousands of scholars, no doubt. Perhaps someone here knows whether the
    > Press has begun or is contemplating a digitization project with the OLD?
    > There is of course the Louis & Short provided online by Perseus,
    > www.perseus.tufts.edu.
    > Yours,
    > WM
    > Dr Willard McCarty | Senior Lecturer | Centre for Computing in the
    > Humanities | King's College London | Strand | London WC2R 2LS || +44 (0)20
    > 7848-2784 fax: -2980 || willard.mccarty@kcl.ac.uk |
    > w.mccarty@btinternet.com | www.kcl.ac.uk/humanities/cch/wlm/

    This archive was generated by hypermail 2b30 : Mon Oct 14 2002 - 03:26:17 EDT