3.421 digitized pictures and scanning (107)

Willard McCarty (MCCARTY@VM.EPAS.UTORONTO.CA)
Fri, 1 Sep 89 20:12:18 EDT


Humanist Discussion Group, Vol. 3, No. 421. Friday, 1 Sep 1989.


(1) Date: Thu, 31 Aug 89 08:51:09 EDT (18 lines)
From: David Megginson <MEGGIN@vm.epas.utoronto.ca>
Subject: Re: 3.413 digitized images, cont. (72)

(2) Date: Thu, 31 Aug 89 15:16:00 EDT (26 lines)
From: <JHUBBARD@SMITH.BITNET>
Subject: digitized images

(3) Date: Fri, 1 Sep 89 09:02:11 EDT (10 lines)
From: amsler@flash.bellcore.com (Robert A Amsler)
Subject: Re: OCLC Text Markup documentation

(4) Date: 31 Aug 89 12:57:52 bst (23 lines)
From: D.Mealand@EDINBURGH.AC.UK
Subject: OCR costs

(1) --------------------------------------------------------------------
Date: Thu, 31 Aug 89 08:51:09 EDT
From: David Megginson <MEGGIN@vm.epas.utoronto.ca>
Subject: Re: 3.413 digitized images, cont. (72)

While Geoff Rockwell's proposal of small (600x600<) images might be
useful for some fields, it would be completely useless for paleography
or editing. The resolution of a microfilm is bad enough that it is
nearly impossible to tell stains apart from text, and I'd guess that the
resolution of a microfilm is something like 150 dots per inch (I may be
way off). Any image of the printed page, to be useful, _must_ be many
times larger than the screen. I'd suggest that displaying one square
inch of a page filling the entire screen would be a good minimum for
any serious work. Again, we run into the problem of storage, since
a page at this size would occupy from 1-4 million bytes of memory. In
a few years this will not be a problem, but perhaps that time has not
yet come.

David Megginson <MEGGIN@vm.epas.utoronto.ca>
(2) --------------------------------------------------------------37----
Date: Thu, 31 Aug 89 15:16:00 EDT
From: <JHUBBARD@SMITH.BITNET>
Subject: digitized images

I also can think of a number of good reasons for keeping the original image
around, together with an encoded version of the text. In looking over the
years for software to do just this thing, I have come up with a few places
to look (though it must be said that Geoffery's idea about storing video
images is very appropriate for certain needs). Recently a company called
Kofax Image Products (3 Jenner Street, Irvine, CA 92718) came up with a
product for manipulating images that claims a 15 to 1 compression ratio,
meaning, I believe, that a 400 DPI scan of a 8.5 by 11 inch page would come
down to app. 100k. They have hardware/software or software-only products
that run under Windows. The software-only product is a mere $100. They
had (or rather one of the Kofax people wrote) an article in a recent BYTE
magazine (the issue escapes me, sometime this summer). The Kofax product
has also been put together in a "image database" application by
a firm known as ImageTech (1-800-451-7566) for app. $900.00. This solution
allows one to put together an "image/document management system" similar to
the ones the Japanese have been marketing for several years. (3M also sells
the Toshiba system in this country under the name Docutron 2000). The
difference is, as far as I can tell, simply cost. The Japanese systems tend
to cost app. $100,000.00, whereas you could put together your own system
for app. $15,000, including fast 386, optical storage, 400 dpi scanner, etc.
I hope to have the ImageTech product to play with soon, and will report any
results that I come up with. Jamie Hubbard (JHUBBARD@SMITH)
(3) --------------------------------------------------------------14----
Date: Fri, 1 Sep 89 09:02:11 EDT
From: amsler@flash.bellcore.com (Robert A Amsler)
Subject: Re: OCLC Text Markup documentation

These are the Association of American Publishers's guidelines.
OCLC has taken over the continued dissemination and user group from AAP.
That is to say, the guidelines are the same ones humanists may have
heard of previously since they were issued a few years ago at the
end of the AAP's Electronic Manuscript Project.

(4) --------------------------------------------------------------27----
Date: 31 Aug 89 12:57:52 bst
From: D.Mealand@EDINBURGH.AC.UK
Subject: OCR costs

Have just seen omnipage running on a mac2 with a mac scanner. Had not
realised omnipage was soft not hardware. Has anyone comparative costs
of Textscan/Textpert, Truescan, Kurzweil and Omnipage listing these
under a)software b)scanning hardware c)total of a+b ? Only this
way can a fair estimate of cost be made. Also does anyone know if
any of these can operate with a good handheld scanner ?

Omnipage set up as described above made a few errors it knew about
and more that it didn't. Its failures on italic font improved
when the contrast darker option was chosen. We didn't seem to
find a means of correcting it while going along though it was displaying
bits of text from time to time. (Documentation didn't arrive with it.)

On clear print or typescript it didn't do badly but on complex bibliography
e.g. columns from Religion Index One the error rate seemed unacceptable.
The book index pages of NT Abstracts went quite well but f printed with a
flourish caused no end of trouble.

David M.