4.0545 TEI Workshop Trip Report (1/129)

Elaine Brennan & Allen Renear (EDITORS@BROWNVM.BITNET)
Mon, 1 Oct 90 21:49:53 EDT

Humanist Discussion Group, Vol. 4, No. 0545. Monday, 1 Oct 1990.

Date: Mon, 1 Oct 90 14:09:59 EDT
From: elli%ikaros@husc6.BITNET (Elli Mylonas)
Subject: TEI trip report

Trip report on TEI workshop, Sep. 22-24, UI at Chicago

Last weekend, I attended a TEI sponsored workshop
in Chicago, in order to learn about, discuss, criticize and
generally get in up to my elbows in the TEI draft
guidelines for encoding texts. It was a great weekend,
spent mainly talking about tagging texts, SGML and
other such arcana. We also had the opportunity to walk
by several of Chicago's museums, and look at them
longingly, but Michael Sperberg-McQueen and Lou
Burnard (co-editors of the TEI) kept us too busy for
random cultural expeditions. They did feed us several
pleasant meals of different ethnic provenance, however.
The participants were both SGML neophytes and
experts, but all were experts on the subject of the texts
they had to tag. We also had a few computer scientists
to keep us honest.

The workshop was a combination of presentations on
various features of the TEI draft guidelines, and
presentations from participants of their types of texts.
These texts were also used as test cases for applying
the guidelines. About 24 people attended; most are on
TEI subcommittees and the steering committee, but
there were several from affiliated projects who are
entering texts as diverse as Nietzsche, Pierce, women
writers in English, Classical Greek literature and
linguistically analyzed materials.

On Saturday morning, Lou and Michael reviewed the
basic features of TEI.1, the DTD that is described in the
guidelines. In the afternoon, we tried to collectively tag
a passage from _Tristram Shandy_ using these
guidelines. It was interesting to note that problems
arose not so much from any ambiguity of TEI.1, but from our
inability to decide exactly what it was we were tagging
and what features we considered salient. We finally had
to decide by fiat that our persona was that of
lexicographer, represented by Antonio Zampolli, and
our text was a paperback edition represented by Lou.
When this was straightened out, we were able to argue
productively about features to be tagged, and how to tag
them.

The next morning, several of the participants
presented the sample texts they had brought. We all
then split into groups and tried to tag these texts so
they would both fulfill the needs of the person creating
them, and conform to the TEI specifications. Each
group then summarized its results to the whole
workshop. The texts we discussed were an 18th c.
American letter presented by David Chestnutt, some
Nietzsche from the unpublished notebooks that was
brought by Malcom Brown, a passage of parallel Greek
and English from Herodotus presented by Sebastian
Heath, and a few pages from an 17th. c. edition of the
trials of two Quaker women from Elaine Brennan of the
Women Writer's Project.

Each of these texts has different problems, and most
projects that are tagging them can only spend a limited
amount of their resources on the tagging. The
consensus was that it was possible to create minimally
TEI conformant texts with a little effort.
There are still features and details to be ironed out
in the guidelines, and the co-editors took note of these as
they arose. The worst
problems come from projects that have a great need to
record the presentation of their text, as well as the
content.

Sunday afternoon was spent learning about some of
the more arcane aspects of the TEI guidelines, and
discussing software tools. Topics covered were tags for
linguistic analysis, reference and hypertext links,
parallel texts, and extending the guidelines. Three
categories of software tools were described: SGML-
intelligent tools (editors, validators and translators),
SGML-aware (same thing but not fully SGML
conformant) and SGML-ignorant tools that can be made
to work with SGML documents. There are a number of
high-end, true SGML tools available on numerous
platforms. Many of them are built around the Software
Exoterica SGML parser. There was also some
discussion of the general pattern matching and
translation tools that are often used to tag SGML
documents. Examples of these are lex and sed under
Unix, and Qued/M on the Mac. We were able to play
with Software Exoterica's CheckMark and SoftQuad's
Author/Editor. The former is a very elegant SGML
validator, and the latter is an SGML editor and validator
with some formatting options.

Monday morning most of the workshop participants,
"hard-core TEI maniacs" though they may be, looked
rather tired. It may have been the dinner at the French
Bistro, and 30 minute evening walk back through
downtown Chicago that did it. However, we went on to
discuss some more difficult texts that had been brought:
a document in an Eskimo language that contained
several parallel linguistic transcriptions from Gary
Simons of SIL, a syntactically parsed document from a
newspaper article from Beatrice Santorini of UPenn,
and a play of Aeschylus, which was my contribution.

The workshop ended in the early afternoon with a
self-evaluation, and a discussion of the future plans of
the TEI. Everyone liked the workshop; the hands-on
tagging of other people's texts, and the general
discussion that followed got the highest marks. We all
thought that more general workshops that start with
SGML basics would be useful for other projects or
individuals that have texts to tag. However, it is very
important for the TEI to work on identifying and
describing software tools. One of the biggest problems
we all had faced was the lack of SGML-(and therefore
TEI-)conformant software that would make the tagging
process easier. More discussion of these should take
place on TEI- L. It would also be useful to have TEI
workshops at the professional meetings such as the
MLA.

(There will be another TEI workshop held in Oxford,
for European TEI maniacs. If the Chicago one is any
indication, it should be both informative and fun. Highly
recommended!!)

Elli Mylonas, representative of the Perseus Project, and
member of Subcommittee 2 (text representation).