9.402 encoding & the TEI

Humanist (mccarty@phoenix.Princeton.EDU)
Wed, 20 Dec 1995 21:51:03 -0500 (EST)

Humanist Discussion Group, Vol. 9, No. 402.
Center for Electronic Texts in the Humanities (Princeton/Rutgers)
http://www.princeton.edu/~mccarty/humanist/

[1] From: Patrick Durusau <pdurusau@emory.edu> (101)
Subject: Re: 9.395 encoding & TEI

I almost hesitate to respond to Ian Lancashire's last posting on
encoding & TEI as it appears that his remarks are not advancing the
discussion of use of the TEI Guidelines in text encoding. Furthermore,
there are others more qualified to detail the reliance of the TEI group
on previous encoding efforts. However, there are humanists who are not
yet active in text encoding and it would be unfortunate if misleading
and inaccurate comments about TEI to clouded their consideration of the TEI
Guidelines for text encoding.

1. SGML as interpretation?

Lancashire writes:

>
> If any user's SGML tag-set and DTD are interpretative, then isn't
> the TEI tagset also interpretative? Does it define -- as widely
> claimed -- "a standard form for the interchange of textual material"
> (TEI P3, p. 11)? It does define a form. Is it a standard one?

Despite the repeated confusion of SGML and the TEI Guidelines in his
comments, readers should note the following distinction:

SGML is an extensible metalanguage for the creation of markup systems.
If there is any doubt about its lack of interpretation, consider the
following statements from Clause 1 of the SGML standard.

In Clause 1 of ISO 8879, the standard specifies:

_a) Specifies an abstract syntax known as the Standard Generalized Markup
Language (SGML). The language expresses the DESCRIPTION of a document's
structure and other attributes, as well as other information that makes
the markup interpretable._ (emphasis added)

The note to Clause 1 of ISO 8879 states:

_NOTE -- This International Standard does not:

a) Identify or specify "standard" document types, document architectures,
or text structures._

One could use the SGML standard to create an encoding for any document
structure that is encountered. Yes, that encoding scheme would be
interpretive, but that is a separate issue from the use of SGML.

The TEI Guidelines are an SGML based set of encoding guidelines. All
encoding guidelines are interpretative but that is hardly a discovery for
anyone who has completed a freshman *English* (substitute the first year
language of your choice) course. As noted in the TEI Guidelines:

_It is important to remember that every document type definition is an
interpretation of the text. There is no single DTD which encompasses any
kind of absolute truth about a text, although it may be convenient to
priviledge some DTDs above others for particular types of analysis._ (TEI
Guidelines, vol.1, p. 19)

For text interchange, the TEI Guidelines have the potential to avoid the
conflict of parochial or ad hoc encoding schemes. Such encodings may
may comfort their creators but are a disservice to the wider scholarly
community by using scarce financial resources for the creation of
encoded texts of limited circulation.

2. Non-discussion of prior encoding efforts.

Lancashire writes:

> To me they are. Instead of first discussing what a humanities encoding
> scheme should provide -- for example, by looking at 20 years of
> real encoding practice (by groups like TLG, TLF, CETEDOC, the OCP community,
> etc., and I include WordCruncher and TACT users in the "etc.") -- TEI
> seized on SGML as the final solution. TEI P3 does not discuss
> the previous history of textual encoding either in the humanities or
> in computational linguistics. Why not? TEI P3 does not discuss why
> SGML was chosen as the format.

I think a history of textual encoding in the humanities and computational
linguistics would be an excellent resource for scholars. From the
standpoint of writing suggested ways of encoding texts, it would increase
the size of the Guidelines greatly to discuss and genuflect to every
prior encoding effort and not really be relevant to the task at hand. I
am sure the Guidelines reflect in the suggestions the experience of the
numerous humanists who participated in most if not all of the significant
prior encoding efforts.

>Was there serious discussion of why
> humanities encoding schemes really developed the way they had over 20
> years and -- especially -- why they hadn't evolved into something like SGML?

Unknown. Why didn't physicists prior to Einstein develop the theory of
relativity? Why didn't Einstein discuss the preceeding twenty years of
physics before stating his own theory?

3. SGML Limitations?

> What troubles me most about SGML the syntax is that it is
> interpretative itself. Its rigid assumption that only one structure
> can be recognized (by SGML browsers, editors, etc.) at a time, and
> that no more than two structures can be encoded in any document, jars
> with what the humanities sees in texts. Most texts we study were plainly
> not written by authors who understood SGML or even who agreed with the
> SGML community that texts could have any dominant structure, let alone
> a hierarchical one.
>

For the notion of SGML syntax as interpretation see section 1, above.
The claims that SGML can encode only two structures in any document is
simply false. In the original SGML standard, the CONCUR feature was
unfortunately left as optional, but that does not limited ones ability to
encode multiple structures for a single text.

In my first post I called for examples of text structures that could not
be encoded using the TEI Guidelines. Several weeks later, I am still
waiting for an example of such a text structure. Rather than continue a
debate in the abstract, Lancashire should post a reference to say a
portion of some manuscript of his choosing and supply photocopies to
anyone who wished to encode said material using the TEI Guidelines. One
or two manuscript pages should be long enough for a fair test without
being an undue burden on those wishing to participate.

It is by using the TEI Guidelines that we will identify areas of
difficulty or where extensions need to be made. They are guidelines after
all and not set in stone, a fact which seems to escape some critics.

(I will be offline from 12-23-95 until 12-29-95, but I will catch up on
all the postings to the Humanist at that time.)

Patrick Durusau
Information Technology
Scholars Press
pdurusau@emory.edu