11.0526 gleanings from West and East

Humanist Discussion Group (humanist@kcl.ac.uk)
Mon, 19 Jan 1998 18:36:09 +0000 (GMT)

Humanist Discussion Group, Vol. 11, No. 526.
Centre for Computing in the Humanities, King's College London

[1] From: Willard McCarty <Willard.McCarty@kcl.ac.uk> (86)
Subject: gleanings

[2] From: Steve McCarty <steve_mc@ws0.kagawa-jc.ac.jp> (48)
Subject: Computing in Japanese

Date: Fri, 16 Jan 1998 20:17:32 +0000
From: Willard McCarty <Willard.McCarty@kcl.ac.uk>
Subject: gleanings

Dear Colleagues:

Perturbations in my schedule due to turbulence at Christmas -- must pleasant
turbulence, I assure you -- has made it difficult to take all the little
steps that result in my bringing in of such sheaves as lie below:
remembering what day it is, getting to the news agent's in time to get a
Guardian on Thursday, having the coins at the right moment, and so on.
Yesterday I managed, however, so here I am with the usual offerings from the
public arena.

(1) Karlin Lillington, "Mouth to mouse", on the development of practical
speech-recognition equipment that will allow us to talk to our machines.
Bill Gates wants it to happen, so I guess it will.... The bit that
particularly got my attention concerns the relationship between progress in
the area and linguistics:

"Surprisingly, speech teehnologies don't use linguistic models at all.
Initially, researchers thought linguisties coupled with artificial
intelligence were the way forward.

"But 'linguistics models don't work very well,' says [Steve] Young [director
of the Speech, Vision and Robotics Research Group at Cambridge], because
they tend to offer strict rules which humans don't follow in real
conversation. 'So instead we analyse huge volumes of text, count the
occurrenees of given words and then design statistical models.'

"The linguistics experts may yet have their day. Researchers now think they
need the subtleties of linguistic models to achieve true 'natural language',
where a computer would not just recognise words but understand what you're
trying to say. 'Take the sentence 'Mr Wright will write a letter right now'.
A natural language model would help predict which of those words I used and
where,' says [Kevin] Schofield [senior programming manager for Microsoft's
speech technology group]."

I wonder, ignorantly, to what extent the spread of probabilistic modeling
will put subtle, or not so subtle pressure on us all to say the predictable
things. "'People can say anything they want because what we've learned is
that they only ask about 15 things,' says director of AT&T's Speech
Laboratory Larry Rabiner."

Do I ask only about 15 different questions? Is this the world we want to
live in? Ok, yes, I realise that these folks are addressing questions that
people repeatedly ask in commercial and similar transactions, and I too
would go around the bend having to respond to them, and would wish a
computer in my place, but still I wonder if the end result is not a
dumbing-down of playful wit (which one does get frequently among people in
those situations, at least in London) into efficient exchange of commands.

(2) Emma Keelan, "A big white lie", on the hidden special effects behind the
movie The Winter Guest, set in a small costal town in northern Scotland.
Crucial to the film is the bleak landscape, showing a sea completely frozen
over. Only the frozen sea and overall bleakness were not originally there
but added using a system called Domino, which does something called "digital
compositing". See Quantel's page on the movie, at
<http://www.quantel.com/news/5jan98.htm>. First it was still photos showing
things that never happened (my favourite remains the Diesel clothing
company's remake of the Yalta Conference, for which see
<http://ilex.cc.kcl.ac.uk/year1/yalta.jpg>) and, one hears, fashion models
whose ideal bodies have been digitally remade, and now moving landscapes one
can never visit.

(3) Jack Schofield, "Netwatch", which mentions the RTMARK site, for the
organisation that "has been funding the sabotage of mass-produced items for
nearly five years." They're behind the Barbie Liberation Organisation's GI
Joe/Barbie projects and the famous SimCopter hack. See

(4) Keith Devlin, "You win sum", about the call by the Director of the U.S.
National Security Agency on the mathematics community to join with the NSA
and other defense agencies in preparing for what he calls World War Four,
which (you've guessed it) will be played out entirely in cyberspace.
Mathematicians, it seems, have a crucial role to play in maintaining the
integrity of the online technologies on which the world is coming
increasingly to depend. I recall as a lad in a special summer programme in
mathematics at Berkeley, California a number theoretician told us with great
pride that his subject was the most perfect because no one would ever find a
use for it. He's no doubt dead now, but if I were to run into him, I'd
advise him to take up literary criticism, perhaps -- or is our skill with
interpretation of complex discourse just what MI5 and the CIA are looking for?

And now for the If That Were Not Enough for You Department:

(5) Danny Penman, "Armageddon by accident", tells the tale of an incident on
25 January 1995 when a US scientific probe was mistaken by the Russians for
a multiple-warhead missle rising off the coast of Norway and bound for the
heart of Russia. "[Boris] Yeltsin had 150 seconds left on the clock when the
decision [to launch a massive counter-attack against N America and Europe]
was postponed...." Remember all those nuclear weapons? It seems they are
still on alert, and it also seems that our communications and computing
systems are not up to the task. "All it takes is the failure of a 5 cent
chip or a problem like the millenium bug and we could be looking at an
accidental nuclear war."

Comments, as always, are welcome. If you can find your voice.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dr. Willard McCarty, Senior Lecturer, King's College London
voice: +44 (0)171 873 2784 fax: +44 (0)171 873 5801
e-mail: Willard.McCarty@kcl.ac.uk

Date: Mon, 19 Jan 1998 11:09:10 +0900
From: Steve McCarty <steve_mc@ws0.kagawa-jc.ac.jp>
Subject: Computing in Japanese

From: steve_mc@ws0.kagawa-jc.ac.jp (Steve McCarty)

This installment aims to provide a brief overview, and readers can
experiment with accessing some Chinese characters used in Japan.
The way East Asian languages are encoded and decoded may also
offer some insight into how the Internet works.

Because Chinese characters can exceed 20 strokes, they each
require twice as many bytes as plain ASCII text. Therefore from
the viewpoint of Japanese word-processing, where English is a
subset of Japanese, ASCII letters take half a space.

Web servers in Japan use ASCII such as in UNIX code, yet Japanese
language software is not necessary. But since Chinese characters
each need twice the bytes of ASCII text, they cannot be uploaded
to a Web server or later downloaded "as text." From the ASCII
viewpoint, East Asian text must be uploaded as something like
raw data, in the case of the Fetch ftp program. The ASCII source
file looks like gibberish except for the HTML tags, yet Japanese
language software reads it, that is, decodes it as it was encoded.

Similarly, Japanese language sent to U.S-based mailing lists can
be read by recipients with Japanese software. Anomalies do occur,
however, as data goes from Japanese system Internet gateways
through English ones and back. There has been some gibberish
at the edges of these gleanings, apparently because cyberspace,
like the rest of nature, abhors a vacuum.

Let us try a quick experiment where you can see some Chinese
characters if you open your browser or view this message at URL:

Japanese language is converted into many inline .GIF images at an
innovative Website called Shodouka, which in Japanese means
conversion into calligraphy. You might select and copy the exact
URL for this message from the location window and then paste it
into the launchpad at <http://www.shodouka.com>. Then the file
will be reloaded with the Chinese characters written below
converted into images quite resembling a Japanese font.

This is my institution and position from my signature file for
messages in Japanese. It takes only eight Chinese characters,
so does it appear as something like 16 characters of gibberish?

There is also a shortcut that I employ on Web pages where you
just click on the highlighted text and the embedded URL goes to
the Shodouka Launchpad. For example, you could see Japanese,
romanized and English versions of haiku poems, all previously
published, with several for each season. Now I am going to
write the HTML, including a "mailto" around my signature, so
let us see what happens as it goes onto the Web via hypernews:

c/haiku.html">Bilingual Haiku Scroll</A>

_Bon voyage_,
<A HREF="mailto:steve_mc@ws0.kagawa-jc.ac.jp">Steve McCarty</A>

Humanist Discussion Group
Information at <http://www.kcl.ac.uk/humanities/cch/humanist/>