4.0309 Machine-Readable Buddhist Texts (1/126)

Elaine Brennan & Allen Renear (EDITORS@BROWNVM.BITNET)
Mon, 23 Jul 90 18:42:43 EDT

Humanist Discussion Group, Vol. 4, No. 0309. Monday, 23 Jul 1990.

Date: Sun, 22 Jul 90 11:22 CDT
From: Jamie Hubbard <JHUBBARD@WISCMACC>
Subject: Machine-Readable Buddhist Texts

CONNECTIONS

by Jamie Hubbard (jhubbard@smith.bitnet)
(from the AAR Buddhism Section Newletter, 6/90)

The second meeting of the ad hoc group of scholars interested in the use
of computers in Buddhist Studies (known as WCCABS, the Working Committee
on Computer Applications in Buddhist Studies, formed under the American
Institute of Buddhist Studies at Columbia University in 1988) met at the
Hsi Lai Temple last November. We updated each other on the various
projects underway in our field, with special attention to the Buddhist
Canon input projects underway in several parts of the world. I briefly
reported on those projects at the AAR/Buddhism Section Business Meeting,
and will simply repeat here that information, adding several new
developments.

TIBETAN

The Asian Classics Input Project

One of the more exciting moments this year was the recent receipt of a
number of disks from the Asian Classics Input Project containing "the
ten most often requested titles from the Kangyur and Tengyur
collections" including the _Abhisamayalamkara_, _Madhyamakavatara_,
Abhidharmakosa_, Pramanavarttika_, _Vinayasutra_, _Mulaprajna_,
_Uttaratantra_, Catalog to the Kangyur (Derge edition), Catalog to the
Tengyur (Derge edition), Catalog to the Kangyur (Lhasa edition), and the
United States Library of Congress Tibetan-language Holdings. Under the
Project Director, Michael Roach, these texts were input at Sera Mey
Tibetan Monastic University and are distributed in standard ASCII format.
The _enlightenedï policy of distribution (_so long as funding allows the
data created should be offered to the international community without
charge, for the betterment of mankindï) should serve as a light to all
of us as we enter the age of information processing in the field of
Buddhist Studies. For more information contact The Asian Classics Input
Project, Washington Area Office, 11911 Marmary Road, Gaithersburg,
Maryland, USA, 20878-1839, phone (301)948-5569.

PALI The Mahidol Edition of the Pali Canon

Many of you know that the entire Thai version of the Pali canon was
input under the supervision of Dr. Supachai Tangwongsan at Mahidol
University, and is available today with software for either Thai or Roman
character display, search, and printing of any portion of the canon.
This database comprises some 25 million characters, and together with
indices occupies the better part of an 80 megabyte drive.
Unfortunately, the high cost and unwieldy distribution of the database
prevented it from becoming widely available. Fortunately, Lew
Lancaster, just back from a visit to Mahidol University, reports that
they are interested in making the data available on a CD-ROM, a project
that should be relatively easy to accomplish. This summer ought to see
the beginning of the pre-mastering stage for this important contribution.

The Pali Text Society Edition

Two years ago the American Institute of Buddhist Studies secured
permission from the Pali Text Society to input and publish electronic
versions of their editions of the Pali texts (including commentaries and
translations), easily the standard editions in use today. A proposal for
funding this project is still pending with the NEH, but in the meantime
input has begun at the Dhammakaya Foundation in Bangkok, and several
disks have already been received. Lew Lancaster checked their work and
reports a high accuracy rate, attained through a double-input of the text
followed by semi-automatic and manual verification. It is estimated that
the sutta portion will be completed by July, and so work on the pre-
mastering of this data set should also be well underway by the time of
our conference in November.

CHINESE

The input of the Chinese canon is perhaps the most daunting of all canon
input projects. Several initiatives are underway, including the Fo
Kuang Shan sponsored input project to input the Ch'i sha edition that I
reported on at our Anaheim meeting. Lew (he was a busy traveler this
spring) also made contact with the Korean Lay Buddhist Association,
which has pledged their support for the input of the Koryo canon. I have
also heard that Professor Ejima, at Tokyo University, has made plans to
input the Taisho canon, though I have not heard more on that subject.

The Buddhist Text Archive

With all of the financial, technical, scholarly, and other difficulties
of these large text archive projects it is sometimes easy to lose sight
of what is rapidly developing into the single largest database of
machine- readable texts in our field¯ all of the research
(commentaries), editions, and translations input by individual scholars
throughout the world, either first entered into a computer of one sort
or another or later printed from a computerized typesetter. The number
of English translations and studies produced in the last decade alone
would be a substantial database to have available for instant access,
not to mention the Japanese, French, German, and other materials.
Although the preservation and use of these materials involves
significant problems, the fact remains that machine-readable texts and
studies are being created every day in great numbers, and something
needs to be done to record this information and begin the process of
making it available. The Buddhist Text Archive, sponsored by the
American Institute for Buddhist Studies, originally announced in this
Newsletter (Issue #10) and endorsed by WCCABS at their last meeting,
hopes to provide this kind of information-sharing. The Buddhist Text
Archive, like other such archives (Oxford, Rutgers-Princeton, the
Sanskrit Text Archive, etc.), seeks to collect and disseminate
information regarding machine-readable texts, in our case, of interest
to the field of Buddhist Studies. Initially the bare-bones information
about the text (title, creator, original edition, availability,
contents, format, etc.) would be cataloged. It is important to note
that the simple existence of a machine-readable version of a text does
not mean it is available; hence, your manuscript (your encoded text
file) that was just published commercially could still be listed in the
archive, even though no distribution was foreseen. The actual
collection and distribution of these texts is another step, albeit a
distant prospect at this stage.

It is intended that this information will be readily accessible via
IndraNet, though at this writing IndraNet is still in a state of
transition (Bruce Burrill, who donated the original equipment and a
great deal of time as the original sysop, turned the system over to the
American Institute of Buddhist Studies last autumn). I will keep you
posted when IndraNet comes back on-line; in the meanwhile, just think
about a few CD- ROMs, with all of the Chinese, Tibetan, Pali, and
Sanskrit texts at your fingertips, as well as full text versions of all
of the modern research published within the last few decades . . . It
is hard not to become slightly giddy at the prospect, but it does look
as though we are finally getting closer to the time when such is not
just the stuff of dreams (or envy of our colleagues in Classics or
Western religious studies) but everyday reality.