5.0436 Etext Integrity (1/31)

Elaine Brennan & Allen Renear (EDITORS@BROWNVM.BITNET)
Sun, 10 Nov 1991 20:47:00 EST

Humanist Discussion Group, Vol. 5, No. 0436. Sunday, 10 Nov 1991.

Date: Sat, 09 Nov 91 16:50:39 -0800
From: abosse@reed.edu
Subject: Etext file corruption

Does anyone know of something akin to 'cheksum' file structure checks for
text files? I am thinking of a formula that would take account of
various characteristics of a text file's structure such as number of
words, number of lines..and so on, to generate a unique numerical
'signature.' This value could then be 'checked' everytime the file is
loaded to ensure the data's 'integrity.'

The reason I ask is that a growing number of etexts are being made
available on read AND write media, eg. from ftp sites based on hard
disks rather than read only cd-roms. I imagine that errors included in
text based files, incorporated in them for whatever reason
(transcription errors, bad sectors, fragmentation,
(de)compression...etc) will be more difficult to spot than those of, say
program files, in which the damaged application would simply not run.
Etexts, which are often valuable precisely because they offer different
versions of the same text for study (eg. different translations of the
bible) do not have this kind of built-in 'self-check.' Again, I am
thinking of a future in which various versions of, say, the Canterbury
Tales will be floating around on the net, none of them quite accurate,
brought into a state very similar to that bemoaned by William Caxton in
the Prologue to his own 'corrected and complete' edition.

None of this even addresses the possibility of someone 'maliciously'
changing the ending to their favourite book...I can imagine that the
thrill of hacking Homer would be far more enticing to certain
individuals than adjusting a phone bill. Talk about re-writing the
Canon...

Arno Bosse
Reed College, Portland, OR 97202
abosse@reed.edu