Sanskrit coding, cont. (153)

Fri, 14 Apr 89 22:44:24 EDT

Humanist Mailing List, Vol. 2, No. 838. Friday, 14 Apr 1989.

(1) Date: Fri, 14 Apr 89 01:26:50 EDT (52 lines)
From: Mathieu Boisvert <>
Subject: Discussion on Sanskrit Diacritics

(2) Date: Fri, 14 Apr 89 02:14:09 EDT (62 lines)
From: Mathieu Boisvert <>
Subject: Another addition to Sanskrit Diacritics

(3) Date: Fri, 14 Apr 89 03:52:44 EDT (13 lines)
From: "Patrick W. Conner" <U47C2@WVNVM.bitnet>
Subject: Sanskrit coding (201)

(1) --------------------------------------------------------------------
Date: Fri, 14 Apr 89 01:26:50 EDT
From: Mathieu Boisvert <>
Subject: Discussion on Sanskrit Diacritics

Here is a memo that I just received from Bart van
Nooten, and I thought it might beneficial to the current discussion
on ASCII codes for Sanskrit (if he has not yet send you a copy of
this file). Here it is:

Sanskrit alphabetization.

I suppose many of us realize that standardization of the
alphabet would lead to more interchangeability of programs.
Changing a text is rather trivial, if it is a one-to-one
substitution. You can do 25000 changes in 12 seconds, as I found
out. But changing filters and alphabetization programs is more time
consuming and one hesitates to do it, if there is no other standard.

I just got the most informative letter to you from Dominik
Wujastek. His arguments about the organization of the Upper ASCII
set are well taken. However, I would argue:

1) the printer should not really enter into the matter. Many
Toshibas, in fact, have two downloadable fonts, nos. 4 and 5. In
one of these I keep all my diacritics and their relationship to the
screenfont can be arbitrary.

2) I admit that the graphic characters lead to strange patterns
on the screen, but I have learned to live with them. In a program
where graphics are important, you can usually unload the Sanskrit
screen font.

3) I agree that French and German characters should be retained.
I do not agree about the need for monetary units.

4) I am not familiar with IBM's code page swapping system. But
on the other hand, they have left us such a messy Upper ASCII set
(devised, according to rumors, in a bar by people who were told to
come up with something in a hurry) that we can feel free to introduce
changes as we see fit.

5) I fully support a plan to come up with a standardized system
during the next World Sanskrit Conference. Their announcement already
calls for a section on computers. Perhaps that could be decided

(2) --------------------------------------------------------------66----
Date: Fri, 14 Apr 89 02:14:09 EDT
From: Mathieu Boisvert <>
Subject: Another addition to Sanskrit Diacritics

I would like to thank Dominik and all of you who send their
comments regarding this discussion: it has proved extremely
helpful to me. In reply to Dominik's, I would to add a few
clarifying coments that may give more information on the

First of all, I don't think I made myself clear: I do not want to
establish a standard coding scheme for those who use Sanskrit
diacritics. I am very much aware that all of us use different
codes and that we are quite attach to it. By introducing this
discussion on HUMANIST, I wanted to have feedbacks from
Sanskritists about the "ideal" scheme (if such a thing exists) for
encoding diacritics. By no means do I want to impose this scheme,
but it is to be used for entering an immensely large text: the
whole Buddhist Canon, both in Pali and Sanskrit (that is more than
100 volumes!!). I want feedbacks in order to avoid mistakes that
a neophyte would be inclined to make. Yet, if such a scheme
becomes standard, it will be extremely useful to all of us
(although this is not my aim).

Moreover, Dominik alluded to the copyright problem with the Pali
Text Society. I am in the process of receiving formal
authorization from the P.T.S. In fact, K.R. Norman (President of
the P.T.S.) recently wrote to me saying that we "may proceed on the
assumtion that permission will be forthcoming". If it so happens
that it does not come through, then we will have to alter the

The KDEM system will also be used. I got a sample of Pali text
scanned by the Oxford Computing Centre and the accuracy is more
than 99%, which is extremely good. Of course, we will be
conscientious and make either two scannings of the text, or compare
our scanning of the P.T.S. with the Thai data-bank of the Pali
Canon. The first alternative seems much simpler, since the Thai
data-bank has been done from the Thai Edition of the Pali Canon,
which considerably from the P.T.S., hence rendering comparison of
the two difficult. [Here, it must be understood that the Thai
Edition is not inferior to the P.T.S.; it has used different
manuscripts which, according to Dr. A.K. Warder, are more
"authentic" then the one used by the P.T.S.. Please, there is no
need of spreading this bracketed comment up to the P.T.S. door...]

Also, I want to reassure those concerned with the success of my
doctoral program that I have stopped entering text manually. The
three volumes that have been entered were commentaries (not entered
by the Thais) and many friends helped me for this task. I wanted
to have such an amount of text to try certain programs (in
SNOBOL4+) that I had written and that needed to be tested on larger
portions of text.

Again, I want to express my gratitude to Dominik and others, and
I still welcome any comment at my personal E-Mail address. Yet,
these would need to arrive before April 25th since I must leave
Toronto: I will be teaching a Pali course at the University of
Massachusetts, Amherst.

(3) --------------------------------------------------------------21----
Date: Fri, 14 Apr 89 03:52:44 EDT
From: "Patrick W. Conner" <U47C2@WVNVM.bitnet>
Subject: Sanskrit coding (201)

I've studied Sanskrit long enough to know that you cannot really handle
Devanagari in ASCII unless you ignore the ligatures. Why don't you use a
Mac with a Davanagari font (yes, Virginia, there are Devanagari fonts--check
any MacUser) and write a routine on Espen Aarsleth's Paradigma to convert
the file into an unreadable, but portable ASCII with the ligature equivalents
made up of the phonemic components in ASCII representation linked with a plus
sign or some such thing? Meanwhile, I've lost my Perry Primer. Is the thing
still in print?

Pat Conner