Sanskrit coding, cont. (112)

Mon, 24 Apr 89 21:22:53 EDT

Humanist Mailing List, Vol. 2, No. 883. Monday, 24 Apr 1989.

Date: Mon, 24 Apr 89 16:51 EDT
Subject: more Sanskrit

[The following is from an earlier message that got lost somewhere in
the network. I have added some comments at the end, and removed
the actual coding scheme (which is available if anybody should
have any reason whatsoever for wanting it. . .)]

Regarding your ASCII scheme for Sanskrit/Pali (which I assume you
put out to get just this kind of feedback):

1) There are no doubt numerous schemes already devised, and
the problem of choosing which one, before any project, as
you say, "sets the standard" by sheer magnitude, is a matter
of investigation (whether the project actually sets any standards or
if such is desirable being another matter entirely). For example, what
about the Indian ASCII standard, and what are the folks at
the Library of Congress doing, and etc., etc. These other
schemes should be "discovered" and evaluated.

2) Even though no words in Sanskrit/Pali begin with a
particular letter, it should be in the coding scheme in
both lower and upper-case. The very first thing that
happened to me after creating a set of laser fonts w/
Sanskrit diacritics, leaving out such "unneeded capitals,"
was that I had to print the title page of a text, with the
title all in caps!! So I needed 'em. Ditto with "l with dot
under and macron over," which, while rare, does occur in
dictionaries, grammars, and the like, so better to have it
from the start, in caps as well. I am _not_ a Sanskritist
or Pali scholar, so . . . -->

3) Having said that, I offer my coding scheme, which
OMITS the capital-long-vocalic l, the capital-long-vocalic
r, and even the capital vocalic l!! Oh well, I have yet to
print a dictionary with obscure characters! The only
advantage I can claim in the whole wide world for this
scheme is that I have a corresponding set of laser fonts,
Times Roman in 8 to 12 pt, with 10 and 12 pt italics and
bold, and 14, 18, and 24 pt bold. To make up for my sins of
omission, I have also added the necessary Japanese vowels.
I would like, however, to have the German, French, etc.
that your scheme retains. By the way, *where* are they
retained? Since the upper ASCII is anybody's ball game
(e.g., Hewlett-Packard ain't the same as IBM). . . If anybody is
interested, I would be happy to pass along my coding scheme, though
I would prefer to redo the fonts after some agreement has been

4) Regarding TeX, it seems to me that it is not a good answer, unless
somebody needs all of the typesetting power for which it was devised.
It will be much simpler to use upper ASCII, which needs to be done
for the display anyways. This, plus effective and shared printer drivers,
screen fonts, etc. will facilitate the easy exchange of files and building
up of text archives. Dominik, if I remember correctly, translates his text
before and after editing/printing so that he can use TeX for printing
but edit with the characters on screen. Not only is this tedious, but it
tends to the proliferation of different "recensions." If you don't
translate back from TeX for screen editing, you have to look at
Mah\=ay\=ana and other strange things, which will surely make
verification more problematic. While we all realize that it is not
difficult at all to translate from one scheme to another, the *vast*
majority of users/scholars can't handle it. It seems to me better to give
them tools (mainly screen fonts, printer fonts, and application-specific
printer drivers, if needed, and input macros) that will let them continue
to work in whatever environment they already feel happy.

5) Regarding ligatures and the like, this particular project is for a text
project consisting of romanized text (with accents). Many of the texts
to be included in the future are also from romanized editions. The
ligatures of the originals are already history (that is, lost). I realize
that this is not any sort of an answer, but the point is simply to get
the texts into the machine.

6) Is the IBM "code page" scheme simply different "standardized" sets
of 256 characters? If so, then Dominik's idea for an Indological code
page is exactly where we started-- an upper-ASCII coding scheme for
roman characters with the accents needed for Sanskrit, Pali, ???, plugged
into appropriate spots.

7) I agree that SGML should be investigated and, if possible, adhered

8) By the way, the Pali project that Mathieu speaks of is "a result of an
initiative of some weight, with serious money and institutional backing."
Well, maybe the money isn't as serious as it might be, but it is
starting. It is a group effort, bringing together most of the
Buddhological world in Europe, America, China, Taiwan, and,
hopefully, mainland China and Korea, and aiming ultimately at
Sanskrit, Tibetan, Chinese, and Western language materials as well as
Pali. I know it sounds vast, but what the heck! We are working with the
various other initiatives, though some are harder to get in touch with.
What does anybody know of the Kern Institute?

9) We have now identified a number of different coding schemes used by
individuals. What about the ISO standard mentioned? And other large groups,
especially in South Asia, among the librarians, etc.?