4.0101 Errors and CD-ROMs

Elaine Brennan & Allen Renear (EDITORS@BROWNVM.BITNET)
Tue, 22 May 90 17:44:17 EDT

Humanist Discussion Group, Vol. 4, No. 0101. Tuesday, 22 May 1990.

Date: Tue, 22 May 90 10:32:18 EDT
From: Ken Steele <KSTEELE@vm.epas.utoronto.ca>
Subject: Errors & CD-ROMs


Douglas de Lacey inquires about the general reliability of
commercially-available CD-ROM databases, like the British Library
General Catalogue. I don't yet have the luxury of a CD-ROM drive, but I
would like to approach the matter in a somewhat oblique way which will,
I hope, stimulate further discussion.

Last year I ordered the ETC WordCruncher electronic Shakespeare (based
on the Riverside edition by G. Blakemore Evans). Naturally (naively?)
I expected a reliable research tool, but I was alarmed to discover a
high frequency of errors and inaccuracies -- also reported by others who
have used the software. WordCruncher invites corrections by mail, but
does not announce or identify upgrade releases, nor does its proprietary
encryption permit the user to (legally) correct his or her own copy, so
one is quite permanently stuck with these errors.

Likewise, in my work with the old-spelling transcriptions of the quarto
and folio texts of Shakespeare (available through the Oxford Text
Archive), I have been alarmed to discover errors even in those texts
which T.H. Howard-Hill proofread repeatedly before preparing his
Shakespeare concordances (and also in those texts I myself have
proofread letter-by-letter more than once!). Similar errors were
reported by Stanley Wells and Gary Taylor, who used these files in
preparation of the text for the Oxford edition of the Complete Works
(which many argue also includes errors, both intentional and
unintentional).

If such errors can proliferate in commercial and academic texts of about
10-15 megabytes, it is not surprising that a 500 megabyte CD-ROM should
contain fifty times the errors. Even one error per megabyte becomes
considerable as technology develops and databases grow.

It seems to me that accuracy has long been the rallying cry of those who
distrust computing technology -- whether for searching a card catalogue,
preparing a collated edition, or simply looking up a reference. I used
to counter vehemently that the ease of correction in electronic texts
made perfection (eventually) attainable, but I have become less vehement
of late.

Reading a recently-published collection of short stories (in paperback)
I was struck by the number of typographical errors even there. In
reading the book, of course, I consciously and unconsciously made
allowances for the errors, as would any human reader. Perhaps
electronic texts prepared from paper texts are inevitably going to
contain higher error rates (because most people seem to find
proof-reading on-screen so difficult?), but even conventional printed
publications are error-prone (and not only in Renaissance England).

Perhaps we should concentrate our efforts on producing software which
can accommodate human imperfection in all its permutations, rather than
attempting to produce error-free information resources, adapting man in
the image of the machine? Error becomes particularly undeniable when
projects become more ambitious, such as scanning the entire Library of
Congress, providing access to newspapers on-line, or even filling a
CD-ROM with literary texts.

Can anyone offer more theoretical insights on error rates or human
fallibility? Has any careful research been published on the subject?
(My apologies for the length of this note).

Ken Steele
University of Toronto