4.0080 SIL Spelling Checking; OCR Errors (63)

Elaine Brennan & Allen Renear (EDITORS@BROWNVM.BITNET)
Sat, 19 May 90 20:00:26 EDT

Humanist Discussion Group, Vol. 4, No. 0080. Saturday, 19 May 1990.


(1) Date: Fri, 18 May 90 08:55:36 MDT (52 lines)
From: koontz@alpha.bldr.nist.gov (John E. Koontz)
Subject: Re: 4.0071 Queries

(2) Date: Thu, 17 May 90 21:03:56 CST (11 lines)
From: Peter Shillingsburg <SHILL@MSSTATE.BITNET>
Subject: Re: 4.0050 OCR Scanning Errors

(1) --------------------------------------------------------------------
Date: Fri, 18 May 90 08:55:36 MDT
From: koontz@alpha.bldr.nist.gov (John E. Koontz)
Subject: Re: 4.0071 Queries (53)

The SIL spelling checking package in which Douglas de Lacey was
interested is distributed as a booklet with an accompanying software
disk. The booklet is:

Black, Andy; Kuhl, Fred; Kuhl, Kathy; Weber, David. 1987. Document
preparation aids for non-major languages. Occasional Publications
in Academic Publishing 7. 44 pp.

It can be ordered from:

International Academic Bookstore
Summer Institute of Linguistics
7500 W. Camp Wisdom Road
Dallas, Texas 75236

The price with disk is something like $4.00 or $5.00, with handling. I
suppose that the handling fee would be more for overseas mailing. The
Bookstore will quote you the price, if you write to them. Let me
emphasize that you need to specify 5.25 vs. 3.5 inch disk format when
ordering.

I have gone through the documentation, but have not had occasion to use
the tools, which consist of a series of compiled C programs. They are
designed for doing spelling checking for non-major (e.g., minor and not
so minor indigenous languages of South America and Indonesia, in the
authors' case). he programs allow various combinations of checking words
by linguistic canonical form (e.g., CVCCV, (CCVC)*, etc.) or by
membership in a list of explicitly permitted exceptions. Interactive
changes or batch editing are supported.

The approach of canonical forms is not particularly effective with
English, due to our spelling system, but can be useful with languages
like Quechua, when spelled in a phonemic notation. The problems with
checking the spelling of a Quechua document being, of course, that
Quechua spelling lists for standard spelling checkers are hard to come
by, and that Quechua is highly agglutinative (in something like the
Turkish style).

One nice feature of the tools is that they can be set to work only with
particular fields in a file. So, if your interlinear file has \qch at
the head of Quechua fields and \eng at the head of English fields, the
checkers can be made to confine their attentions to the forms in the \qch
fields. The field format employed is SIL's Standard Format.

The authors suggest that users might want to modify the general
framework of the code to produce new tools. The source code is
available from JAARS, an SIL subsidiary, though I have not ordered it.

(2) --------------------------------------------------------------16----
Date: Thu, 17 May 90 21:03:56 CST
From: Peter Shillingsburg <SHILL@MSSTATE.BITNET>
Subject: Re: 4.0050 OCR Scanning Errors (197)

This is a reply to John Koontz on "correcting" OCR texts:
You appear to think that a correctly scanned text is one that has all
the words spelled correctly. Do you recognize that a correctly scanned
text is one that has all the same errors as the text scanned from?
I tried the fatal shortcut of a spell checker on texts scanned for use
in collations for a scholarly edition. The result was dishonest--no,
that is a moral term; it was just foolish.