11.0360 accented characters ISOd

Humanist Discussion Group (humanist@kcl.ac.uk)
Wed, 22 Oct 1997 23:46:41 +0100 (BST)

Humanist Discussion Group, Vol. 11, No. 360.
Centre for Computing in the Humanities, King's College London
<http://www.princeton.edu/~mccarty/humanist/>
<http://www.kcl.ac.uk/humanities/cch/humanist/>

[The following is a collection of the replies I received to the query
about the success of an ISO-8859-1 encoding through Humanist and the
various mail systems that process it. Somehow the table of contents seem
to have been devoured in the process of digesting these messages, but I
cannot assign a cause -- and will not automatically blame pesky accented
characters.... It would seem in any case that we are far from anything
like a universal option for sending and receiving e-mail with properly
represented accented characters. It would also seem unjustified to assume
that anyone who cares e.g. about French will be able easily to configure
his or her system to process the ISO encoding properly. I recommend we all
switch to Latin (without the macron). --WM]

---[1]----------------------------------------------------------------

Date: Wed, 22 Oct 1997 11:39:05 +0100 (MET)
From: "C. M. Sperberg-McQueen" <cmsmcq@hd.uib.no>
Subject: Re: 11.0356 accented characters and ISO-8859-1

At 08:05 AM 10/22/97 +0100, you wrote:
>Simply I need to know what happens when you receive a message through
>Humanist encoded in the ISO-8859-1 standard. I'm told that those who can
>read the (natural) language of the message and care to do so will already
>have their e-mail programs properly set, and that anyone who does not have
>his or her program configured for ISO-8859-1 can safely be ignored, since he
>or she clearly won't care about the contents. Is that true?

I don't think it's true, though I'm not sure you can draw the
obvious conclusion from it.

Mail gateways are, for better or worse, not required to accept
mail in coded character sets other than ASCII; other
coded character sets (such as your 8859-1 message) must be
encoded using characters in the ASCII range. The most common
method of doing this is Mime encoding, which I believe I remember
from experience is what your mail system does with 8859 in any
case. (For technical reasons, I am not in a position to tell
from where I'm sitting at the moment.) There are, of course,
other encodings.

Mime gets the information across, but mail gateways are, for
better or worse, not required to be Mime-compliant, so that
mail in ISO 8859 encoded with Mime arrives at my IBM mainframe
looking somewhat the worse for wear. One nice thing about Mime,
however, is that the determined user can decipher it even if
the mail system doesn't, so I have a little filter on the IBM
system to de-mime mail and turn it into the IBM equivalent of
ISO 8859-1.

So I'm a case in point: I do read French, and I do care about
proper spelling, so I strongly prefer that accents be included.
(Native speakers often do without, but it's easier for them than
for non-native speakers.) But I don't have a Mime-compliant mail
system and have no influence over the choice of the mail system
in any case. I would be astonished to discover things are
substantially different in the average Unix installation. Those
of us who read mail on their PCs may have more say in how it
is presented, if they are willing to figure out how to have that
say.

On the other hand, I can handle such mail, and the frequent use of
Mime encoding in mail is a good argument to use to persuade system
administrators to install better mail systems, and better terminal
display systems that can handle eight-bit characters. (Here, the
IBM system is miles ahead of every Unix system I've ever used;
IBM system administrators know that character encoding is a problem
to be solved; Unix system administrators all seem to be in denial.)

So no, not *everyone* in the intended audience is actually likely
to have their mail systems configured right for ISO 8859 part 1.

But you should use ISO 8859-1 nevertheless, as part of the
computing humanist's eternal struggle against the forces of
ignorance, monolingualism, sloth, and 7-bit character sets.
NEVER SURRENDER. WE SHALL OVERCOME. NO COMPROMISE WITH THE
FORCES OF DARKNESS.

-Michael Sperberg-McQueen
(currently away from the mainframe and using a mail system
which displayed the accented characters just fine)

--[2]------------------------------------------------------------------
Date: Wed, 22 Oct 1997 09:10:44 +0100
From: Carl Vogel <Carl.Vogel@cs.tcd.ie>
Subject: ISO

Good morning.

Just a quick reaction to your note yestereday:

"... anyone who does not have
his or her program configured for ISO-8859-1 can safely be ignored, since he
or she clearly won't care about the contents. Is that true?"

No -- particularly for users of unix systems without system administration
permissions. They can be stuck for a long while before centralized
services like mail readers are updated, and mail with non-ascii characters
can be rejected or corrupted.

Take care,
Carl

------------------------------------------------------------------------
Dr. Carl Vogel, O'Reilly Institute, Department of Computer Science
Lecturer in Computational Linguistics
Trinity College, University of Dublin telephone: 353 1 608 1538
Ireland facsimile: 353 1 677 2204
------------------------------------------------------------------------

--[3]------------------------------------------------------------------
Date: Wed, 22 Oct 1997 11:35:56 +0100
From: "by way of Willard McCarty <Willard.McCarty@kcl.ac.uk>"
Subject: Strange characters

Dear Willard,

Not all do have the mail programs which can be instructed to handle
the incoming mail properly. Out complete Faculty of Arts has to do
with PC-mail, which cannot handle any of the schemes used for
accented characters. Your example gave
e acute as Greek theta (ASCII 233)
e grave as Greek Phi (ASCII 232)
c cedille as Greek gamma (ASCII 231)
o ^ as something looking like high s (ASCII 244), and
a acute as Greek alpha (ASCII 224).

I do often get email where higher ASCII characters are replaced by
= plus a Hex code, and most frustrating also email where all higher
ASCII characters are simply stripped off.

I wish you all the best with your moving. My brother bought a house
in England two years ago, and his experiences sounded quite unbelievable
to a Dutch audience. So I have some idea of what you are going through.
Moving house is bad enough as it is.. My dear mother used to say
that even if someone had murdered his whole family, he still didn't
deserve to move.

Thanks for all you are doing for us,

Andrea de Leeuw van Weenen
Department of Comparative Linguistics
Postbus 9515
2300 RA Leiden
The Netherlands
Phone: 071-5272507 (work)

--[4]------------------------------------------------------------------
Date: Wed, 22 Oct 1997 11:39:44 +0100
From: Remi Jolivet <Remi.Jolivet@ling.unil.ch>
Subject: accented characters and ISO-8859-1

Bonjour Willard,

Les accents du premier texte annonçant le colloque virtuel LIL98 ne sont
pas arrivés jusqu'à moi...
Or mon programme (EudoraPro sur Macintosh) est correctement configuré,
ainsi que les machines de transit (serveur pop) de l'université de Lausanne
puisque je reçois régulièrement du courriel en français correctement
accentué et "cédillé" (ç).
Bonne chance pour trouver le maillon défectueux de la chaîne qui nous relie!
A toutes fins utiles voici la copie de l'en-tête développée du message
reçu. Je constate que le Content-Type précise "charset=US-ASCII"...

Avec mes remerciements pour tout le travail offerts aux Humanist[e]s et bon
courage pour ce déménagement.

--[5]------------------------------------------------------------------
Date: Wed, 22 Oct 1997 11:40:59 +0100
From: Carl Vogel <Carl.Vogel@cs.tcd.ie>
Subject: Re: ISO

Hello again,

perhaps then we happen to have the ISO features installed here. i looked
into the two messages you sent this is a bit from the first:

"Depuis sa conception en automne 1996, le colloque "LIL: L'informatiqu=
e
dans les =E9tudies fran=E7aises" se donne comme objectif principal l'=E9c=
hange
de th=E9ories, de m=E9thodes et d'id=E9es sur l'informatique dans ses div="

And this is from the second:

"Depuis sa conception en automne 1996, le colloque "LIL: L'informatique
dans les etudies francaises" se donne comme objectif principal l'echange
de theories, de methodes et d'idees sur l'informatique dans ses divers"

Now, the =E7 (etc.) in the former is an improvement over when any mail
that had an accent was simply rejected here, but the second is significantly
easier on the eyes. (I am of course presuming that by return mail, the
ascii encodings of nonascii characters will reach you as ascii characters,
and not decoded into lovely diacritics -- else I guess you'll find this
mail rather strange.)

take good care,

Carl

--[6]------------------------------------------------------------------
Date: Wed, 22 Oct 1997 11:40:19 +0100
From: Jan-Gunnar Tingsell <jgt@hum.gu.se>
Subject: Re: 11.0356 accented characters and ISO-8859-1

Willard,
We will welcome the implementation of ISO-8859-1. For us with more
characters in our alphabets than the usal english a-z, have for a long
time used the ISO standard. In the discussion list OLDNORSENET, which
we are hosting, we use ISO-8859-1. From the beginning there were
lot of protests from the North American society, but I think most people
have accepted it now. Most new programs also use ISO as the default
character set. The advantages to us who use non-english alphabets
are great.

/Jan-Gunnar

--
Jan-Gunnar Tingsell			<tingsell@hum.gu.se>
Humanistiska fakultetens dataservice	tel:	+46 (0)31 773 4553
Göteborgs universitet			fax:	+46 (0)31 773 4455
URL=http://www.hum.gu.se

--[7]------------------------------------------------------------------ Date: Wed, 22 Oct 1997 14:02:51 +0100 From: John Bradley <john.bradley@kcl.ac.uk> Subject: Re: 11.0356 accented characters and ISO-8859-1

Willard: For the record, Simeon here at King's read the message and displayed the accents without any problem. ... john

---------------------- John Bradley john.bradley@kcl.ac.uk

--[8]------------------------------------------------------------------ Date: Wed, 22 Oct 1997 14:01:49 +0100 From: Hans van der Laan <H.R.van_der_Laan@ThuisNet.LeidenUniv.NL> Subject: (Fwd) 11.0356 accented characters and ISO-8859-1

Dear Willard,

Being a Dutchman, not French, I didn't bother to set my e-mail program at all. I use it right from the box. But yes, I care about the contents (I read French a little), and yes, I did receive the first version of your test, and yes, it was completely as it should be: all diactritical signs were perfect. I am using Pegasus Mail and I think it is the best e-mail program.

Sincerely,

Hans

| Hans van der Laan - Computerraadsman | W.F.Hermanszijde 3 2353 TL Leiderdorp | Tel. (071) 541 64 31 / 589 69 49 | vdlaan@pobox.LeidenUniv.nl

--[9]------------------------------------------------------------------ Date: Wed, 22 Oct 1997 14:02:05 +0100 From: Luis Villar <Luis.Villar@Dartmouth.EDU> Subject: Accented

Willard,

The second message 11.0358(2) was readable. The first one, as usual, is readable, but requires a bit of imagination.

luis

--[10]------------------------------------------------------------------ Date: Wed, 22 Oct 1997 14:02:23 +0100 From: "Charles L. Creegan" <ccreegan@ncwc.edu> Subject: Re: 11.0356 accented characters and ISO-8859-1

On the theory that positive results might also be useful...Using Eudora Light 3.0 under win 3.11, I had no trouble seeing accented characters correctly in the first message.

I have not knowingly done any setup to ensure this...it just worked.

--
Charles L. Creegan    N.C. Wesleyan College    ccreegan@ncwc.edu
http://www.ncwc.edu:80/~ccreegan

--[11]------------------------------------------------------------------ Date: Wed, 22 Oct 1997 14:01:22 +0100 From: "[iso-8859-1] René Audet" <rene.audet@creliq.ulaval.ca> Subject: accents

Hi,

It's quite strange... As a French Canadian, I'm used to read accents in email, but Sinclair's ones appear as squares... I use Eudora Pro 3.1 on Mac, which encode accents in MIME standard, I think (is it really a type of encoding?). Maybe a part of the problem is the software used by Stefan.

Thank you for trying to preserve this particularity of the French language.

René Audet (do you read normally my accents? there are some in my signature)

____________________________________________ René Audet <aaa093@agora.ulaval.ca> Etudiant, maîtrise en littérature québécoise CRELIQ Université Laval Québec

--[12]------------------------------------------------------------------ Date: Wed, 22 Oct 1997 15:34:31 +0100 From: "Christopher G. Fox" <foxchris@lion.shu.edu> Subject: Accented characters

Dear Willard,

.... I wanted to let you know that your first version of the conference announcement did not come through, because Seton Hall is using Lotus Notes as its mail program, and Lotus does not use ISO standards for character encoding. Yes, I have made the open standards argument a million times, but larger forces prevail.

Best wishes,

Chris Fox

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Christopher G. Fox, Ph.D.

Research and Information Design Projects Leader Center for Academic Technology Adjunct Faculty (French), Modern Languages Seton Hall University

voice: 973-275-2753 fax: 973-761-9758 data: foxchris@shu.edu

--[13]------------------------------------------------------------------ Date: Wed, 22 Oct 1997 15:33:07 +0100 From: "Dr. Joel Goldfield" <joel@funrsc.fairfield.edu> Subject: Accented characters

Dear Willard, Thanks for checking on this. I now have my NCSA Telnet program's Session/Translation setting on ISO 8859-1, but it doesn't do a damn bit of good. I still see the usual =E9 business instead of e-acute (ASCII 130), etc., on my Mac PowerBook. Do you have some diagnostic explanation of what's going on?

Thanks, Joel

--[14]------------------------------------------------------------------ Date: Wed, 22 Oct 1997 17:42:26 +0100 From: Robert Kraft <kraft@ccat.sas.upenn.edu> Subject: 11.0357 (1) Colloque LIL98 (fwd)

In the incoming message with my machine configuration I get all these highlighted number codes for the "unusual" characters, from an English, standard ASCII perspective. But I notice that when I forward the message back to you (as below), it all comes out correctly, with accents, etc., and not highlighted codes! (But note the Subject line.) So even though I do care, I apparently haven't cared enough to determine how to fix it. I have received numerous messages with such features (apostrophes coded, umlauts, etc.), and haven't had time to check things out. Now I'm better informed, thanks to your test! (and presumably can get something done about it).

Bob

-- 
Robert A. Kraft, Religious Studies, University of Pennsylvania
kraft@ccat.sas.upenn.edu
http://ccat.sas.upenn.edu/rs/rak/kraft.html

--[15]------------------------------------------------------------------ Date: Wed, 22 Oct 1997 17:44:12 +0100 From: Michael Kessler <mkessler@ceres.sfsu.edu> Subject: Reading text

I had no problem reading the accented text. I am using Pegasus for Windows, and the mail is delivered to a Novell server. I do not know what would happen if I were to retrieve the message with a POP mailer from a PINE account.

******************************************** Michael Kessler voice (415) 338-1662 College of Humanities mailto:MKessler@ceres.sfsu.edu San Francisco State University FAX (415) 338-7030 1600 Holloway Ave. San Francisco, CA 94132 ********************************************

--[16]------------------------------------------------------------------ Date: Wed, 22 Oct 1997 17:43:54 +0100 From: "Espen S. Ore" <Espen.Ore@hd.uib.no> Subject: iso-test

Willard,

>Date: Wed, 22 Oct 1997 08:05:49 +0100 (BST) >Reply-To: humanist@kcl.ac.uk >>From: Humanist Discussion Group <humanist@kcl.ac.uk> >To: Humanist Discussion Group <humanist@lists.Princeton.EDU> >>MIME-Version: 1.0 >Content-Type: TEXT/PLAIN; charset=US-ASCII >Content-Transfer-Encoding: 8BIT >X-To: Humanist Discussion Group <humanist@lists.princeton.edu> >X-Listprocessor-Version: 8.1 -- ListProcessor(tm) by CREN

It might have worked if you used another "charset" encoding (see above). It did not work by my computer, probably because it believes tha 8-bit characters in something called US-ASCII are extended DOS-charset rather than ISO 8859/1 (so you have to teach the mailer to tell that it is sending ISO).

espen

--[17]------------------------------------------------------------------ Date: Wed, 22 Oct 1997 23:08:46 +0100 From: "N. Heer" <heer@u.washington.edu> Subject: ISO 8859-1

Willard,

Very few IBM compatible computers sold in the USA include the ISO 8859-1 code page. Some of the new multilingual browsers do because, I think, ISO 8859-1 is one of the standards adopted by the WWW. The fault is clearly Microsoft's because they have long refused to incorporate ISO standards in their software. The only way I was able to read your message was by loading into memory a different screen font in the 8859-1 code page. Some of us have been fighting for a number of years now to get Microsoft to use ISO 8859-6 for its Arabic software instead of using its own codepage for Arabic (MS 1256). This whole problem of code pages will of course be solved when everyone, and especially Microsoft, starts using Unicode (ISO 10646). Nicholas

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Nicholas L. Heer, Professor Emeritus Department of Near Eastern Languages and Civilization University of Washington, Box 353120, Seattle, WA 98195-3120, USA E-Mail: heer@u.washington.edu Telephone: 206-325-0852 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

--[18]------------------------------------------------------------------ Date: Wed, 22 Oct 1997 23:10:24 +0100 From: Gary Shawver <gshawver@chass.utoronto.ca> Subject: Re: 11.0357 (1) Colloque LIL98

Dear Willard,

I'm afraid I'm among those for whom the first message is full of strange (and I don't mean foreign) characters. Is this a system or application thing?

gary

--[19]------------------------------------------------------------------ Date: Wed, 22 Oct 1997 23:10:47 +0100 From: Glenda Warren Carl <carlg@southwestern.edu> Subject: Colloque Lil98

Dear Professor McCarty,

I am one of the benighted few (?) who cannot read messages like the first one you sent. That is, such messages are legible except for accented characters. These appear usually as a square and occasionally as another character (an inverted question mark, for example). I know what the word is supposed to be and so can mentally supply the necessary characters, but it is a nuisance. There may be settings I can change that would make these characters appear as they should, but I don't know what those settings are.

I hope this is useful information for you. Please feel free to respond privately or on-list if I can tell you more.

Glenda Carl

******************************* Glenda Warren Carl Southwestern University Georgetown, Texas 78627-0770 (512) 863-1590 FAX (512) 863-1846 carlg@southwestern.edu http://www.southwestern.edu/~carlg/French_Web/maison.html

------------------------------------------------------------------------- Humanist Discussion Group Information at <http://www.kcl.ac.uk/humanities/cch/humanist/> <http://www.princeton.edu/~mccarty/humanist/> =========================================================================