9.362 encoding &c.

Humanist (mccarty@phoenix.Princeton.EDU)
Wed, 6 Dec 1995 19:29:54 -0500 (EST)

Humanist Discussion Group, Vol. 9, No. 362.
Center for Electronic Texts in the Humanities (Princeton/Rutgers)

[1] From: Robin Cover <robin@utafll.uta.edu> (54)
Subject: encoding and interpretation

[2] From: "James R. Adair" <jadair@emoryu1.cc.emory.edu> (15)
Subject: a bit of nitpicking re quotes

[3] From: Martin Mueller <martinmueller@nwu.edu> (32)
Subject: encoding and interpretation

[4] From: Lou Burnard <lou@vax.ox.ac.uk> (146)
Subject: RE: 9.358 encoding, accents, interpretation

[5] From: Tzvee Zahavy <zahavy@andromeda.rutgers.edu> (7)
Subject: graphics shot of text

Date: Wed, 6 Dec 95 08:16:05 CST
From: Robin Cover <robin@utafll.uta.edu>
Subject: encoding and interpretation

I am compelled to register my opinion that SGML is not interpretive,
or more precisely, that it's potentially confusing and meaningless
to assert that "SGML is interpretive." This against Ian Lancashire's
summarizing statement "It's good to have some consensus that SGML is
interpretive" (June 5).

In the strictest sense, and in a profoundly important sense, SGML is just
a formal language for defining the syntax of markup languages. In this
fundamental sense, SGML is defined in ISO 8879 (not including the annexes)
so that even Charles Goldfarb's commentary on SGML is not determinative:
neither the intents of the designers nor subsequent applications of
SGML are determinative in establishing whether "SGML is interpretive"
or not. The ISO standard defines SGML.

Markup languages, in the above sense, may be prescriptive (e.g.,
constraining the structure of a memo I intend to write using
the markup tags) or descriptive (analytical). But SGML markup
languages may also serve as formalisms used ONLY by software
applications that need to communicate data to each other. In that
context, "prescriptive" refers only to the behavior of programs
that generate and parse SGML-tagged text. SGML languages (as in the
case of Panorama and EBT's DynaText style sheet language) may also
be used SOLELY as a means of representing how text is to be
rendered on a screen, or printed on paper, or how hypertext links are
to behave. In short, SGML may be used to model and represent many
things other than "text" in settings there the notion of (human)
intrepretation is meaningless.

It is meaningful to assert that an encoder applying TEI tags to a
literary work necessarily makes critical judgments about the text
in order to encode the text. In that sense, it may be allowable to
assert that "TEI tags are interpretive." The primary impulse of
the TEI DTD is to "mark up" existing texts through the introduction of
SGML tags. But TEI is not SGML.

The distinction between "SGML" and "TEI" (where TEI is an SGML
application) needs to be understood. I would urge Ian and others to
cease using the terms interchangeably in the context of a discussion
where precise thinking is called for. Why is the distinction important?
Because as Michael Sperberg-McQueen and others have tried to point out
-- as yet apparently without success -- effecting any sort of textual
representation, using explicit or implicit structuring, through markup
or "no markup" is "interpretive" in the sense that bothers Ian Lancashire.

The most common applications of SGML in the commercial world involve
systems that simply constrain the creation of textual information in
what users conceive as "documents." I doubt that people in these
production environents think of SGML (if then are aware of its
presence and importance in their software systems) as "interpretive."

OK. Carry on.


Robin Cover Email: robin@utafll.uta.edu ("uta-ef-el-el")
6634 Sarah Drive
Dallas, TX 75236 USA In case of link failure, use:
Tel: (1 214) 296-1783 (h) robin@acadcomp.sil.org
Tel: (1 214) 709-3346 (w)
FAX: (1 214) 709-3380 SGML Page: http://www.sil.org/sgml/sgml.html

Date: Tue, 5 Dec 1995 23:33:59 -0500 (EST)
From: "James R. Adair" <jadair@emoryu1.cc.emory.edu>
Subject: a bit of nitpicking re quotes

I realize that the world will not come to an end over this matter, but
after reading several recent posts containing various combinations of
quotation and accent marks, I would like to remind people that the symbol
` is not, and never has been, an opening single quote; it is a grave
accent. Words or phrases inside single quotes should look like 'xxx yyy,'
not `xxx yyy.' Neither should the acute accent be used as a closing single
quote. The practice of using the grave accent incorrectly seems to stem
from old DOS fonts, in which the grave accent does look more or less like
an opening single quote on the monitor. If you want to avoid using
straight quotes on both sides of a word or phrase, curly (smart) quotes
should be used, but never accents.

Jimmy Adair
Manager of Information Technology Services, Scholars Press
Managing Editor of TELA, the Scholars Press World Wide Web Site
---------------> http://scholar.cc.emory.edu <-----------------

Date: Tue, 5 Dec 1995 23:51:33 -0600
From: Martin Mueller <martinmueller@nwu.edu>
Subject: encoding and interpretation

The interesting discussion between Ian Lancashire and Michael Sperberg
McQueen about what SGML can or cannot do may rest on unresolvable aesthetic
or "theological" disagreements. I have heard it said that what you can
translate is not poetry and what you cannot translate is not philosophy.
Some deep belief in 'transcription without loss' underlies SGML/TEI. Texts
are, in Nelson Goodman's terms, infinitely "allographic." and Michael
Sperberg McQueen is an "allographer," who would unflinchingly accept a
spelling of his name as a string of entity references if it had to come to

On the other hand when Ian Lancashire argues for the irreducible elements
of presentation, I think he is at some level arguing for the "autographic"
aspects of the typesetter's art. The question about a TEI description of
George Herbert's Easter Wings or 20th century concrete poetry is not just
whether it can be done, but at a deep level whether it should be done. If
it were done perfectly, wouldn't it be a "fake"?

Twenty years ago I used to joke that virtually all quarrels of literary
theory could be found in the disputes of the church fathers about the
human and divine nature of Christ, if only one had the patience to comb
through them. Now the same questions reemerge as disputes about
technological standards. In a recent issue of _Shakespeare
Quarterly_,(44,1993) Peter Stallybrass and Margareta de Grazia make a
flamboyant argument for the'materiality' of the texts of _King Lear_ and
see their return to textuality materiality as something to be looked "at"
rather than "through" as a decisive advance over the sacramamental idealism
of earlier critics. In their peroration they ask us to contemplate the 'ink
in the page' and not to forget the typesetter's urine that soaked "the
leather casing of the balls that inked the press" (281-2). A strangely
sacramental incarnation indeed. Could it be rendered in SGML/TEI?

The question may be frivolous, but it points to a deep tension. Alphabets
and derived codes are allographs that never quite free themselves from an
autographic lure. Whether SMGL/TEI should be blessed or cursed for
resisting the temptations of autography is a matter that should perhaps be
left to theologians.

Date: Wed, 06 Dec 1995 13:17:21 +0000
From: Lou Burnard <lou@vax.ox.ac.uk>
Subject: RE: 9.358 encoding, accents, interpretation

Just one thing on this old chestnut about French accents which no-one seems to
have remarked on yet:

- if you use SGML entities, the text is automagically displayed right by
all HTML browsers

- if you use semi-mystical arbitrary sequences that nearly work most of
the time, like that cited by my esteemed colleague Veronis, the text is
not displayed right by any HTML browser

I do however share his view that in the long run, proper support of real
ISO character sets is the only answer. This will happen in spite of and
without reference to any bleating in the humanities research community
-- because people need to shift boxes in the francophone world just as
much as elsewhere.

I also think that a simple point is being missed here: there is a
distinction between what you type and what you read. Whether you use the
ingenious recode program from Montreal to convert between the two or
whether you embed the conversion function in some other text processing
tool such as an SGML-aware editor, it still has to be done.


p.s. Some waspish responses to Professor Lancashire's recent note, in
anticipation of a more thoroughgoing reaction from Chicago:

> No traditional humanities journal, I'm informed, has >reviewed TEI P3.
Awareness of SGML in the humanities is weak.

It's good to learn that the sole touchstone of excellence is being
reviewed in a "traditional humanities journal", (whatever that means).
It makes life so much easier, doesn't it, to know that all we need look
at is just what everyone else is looking at?

>It's good to have some consensus that SGML is interpretative. That's a start.
>If so, do we not have to say that no editor can adopt anyone
>else's tagset as markup without thoroughly understanding the meaning
>and implications of the tags and of the syntax of SGML?

Yes. Editors should not use tags without understanding their meaning.
Next question please.

> Does this
>concession also not mean that Fortier was right in saying
>that tagging must be the responsibility of the individual researcher?

Yes, if he meant that researchers must choose which tags to use, just as
they must choose which words to use in communicating their perceptions.
No, if he meant that researchers can arbitrarily use "foo" tags to mark
occurrences of things they believe to be "grommit"s. No, if he means
that they can use "foo" tags to mark things everyone else calls

>If so, no one can accept TEI or SGML without questioning their assumptions.

No-one should accept anything without question, but the basic principles
of both TEI and SGML have been on the table for discussion for so many
years now that one might be excused for thinking that some of them were
no longer in question.

>While serving on TEI committees, I objected strenuously -- to no
>effect -- to the failure of TEI to address presentational (typographic) and
>analytic bibliographic encoding problems. Here I use Sperberg-McQueen's

My recollection is that your comments were taken very seriously, and
indeed caused some substantial revisions in the finally published text
of P3.

> My objections to TEI were not welcomed. It was as if W. W. Greg,
>Fredson Bowers, and Thomas Tanselle had never existed, as if they had
>never discovered anything about the elements and structures of texts.

This is a vile smear. I hope you will retract it. The last TEI working
group meeting I attended actually spent most of an afternoon working as
a practical example on a particularly nasty piece of editorial
reconstruction by Bowers, and the results are in the published minutes
of the meeting.

>Despite what the three critiques say, the TEI DTD does *not* handle these

I think, if you read my last note to the very end, you'll find a
statement to the effect that work in this particular area of scholarly
endeavour had indeed to be curtailed because of lack of funding. So we
are in vehement agreement that (a) the TEI dtd doesn't currently do
everything analytic bibliographers want (b) this is a bad thing. What we
seem to disagree about is whether this is an irredeemable situation.

>I have difficulty understanding how even SGML can be
>*assumed* to handle either typographic encoding or the structures of
>analytic bibliography when no one has published formally or informally
>a successful DTD that does so.

Well, skepticism is certainly a good thing, but I think that SGML has
demonstrated its abilities to handle so many such widely differing
things that I think it's a pretty fair bet that it can. Is there an
alternative, equally powerful, equally widely supported?

>The HI and SPACE elements, and the REND
>attribute, and as well the MILESTONE and FORMEWORK elements, do
>not satisfy the needs of declarative markup for manuscripts or books,
>despite the valuable work of Peter Robinson and the editors.

The purpose of the paragraph you quote is indeed to make that point,
though it should be noted that for many purposes (including indeed
Robinson's) they are adequate to the needs. Of course, you are remain at
liberty to opine that the collation of all surviving mss of the
Canterbury tales is not a scholarly activity.

>Now, am I also saying that you all should *not* use SGML, as Patrick
>Durusau implies? Certainly not. I say this. The SGML community has
>not demonstrated that it can handle the most basic textual structures
>of use in the humanities, not after all the TEI discussions and
>debates that raged from 1987 to 1994.

Another vile smear. Unless by "most basic" you mean "used by analytic
bibliographers" (who, when I last looked, though important, were not the
overwhelming majority of humanities users)

>For example, how does one create a hierarchy of book, gathering, inner
>and outer forms, and page? Hint: the problem is with the forms. Or,

This is the "multiple hierarchies" chestnut, discussed in chapter 31 of
P3, and several other places. I don't mean to imply that it's not a
problem, just that solutions to it have been discussed in great detail
without seemingly getting into certain heads, and also without said
heads coming up with any practical alternative solutions.

>Please let us all know if someone has done these things. TEI P3, at
>1289 pages, is longer than Goldfarb's Handbook at 664 pages, but I
>have not found any reference to forms or acrostics.

Sorry. No reference to strip cartoons or cheese box labels or price
tickets or advertising slogans either. Yet strangely enough, I know of
people who have envisaged ways of encoding all of these, at least to
their own satisfaction, using the TEI scheme!

>I did not say that TEI or SGML could not encode italics as italics but
>rather that TEI and SGML were uninterested in what the TEI community
>began to call "rendition" in those days.

So uninterested indeed, that we made the REND attribute a global one
which could be specified for every single element in the scheme. So
uninterested indeed that for the majority of "interpretative" phrase
level elements we explicitly provide a "non-interpretive" alternative
("hi" vs. "foreign", "seg" vs "w", etc). So uninterested that we point
the reader to other ongoing standardization work in this area (DSSSL).

>Yet there is a sizable difference between asserting that a given ink-blot
>is a b instead of a p, and including -- in TEI's core tagset -- tags
>defining the structure of a poem without producing tags for a
>rhythmical unit or for a metrical foot! It is certainly NOT obvious
>that the fundamental unit of verse is the line. It is this kind of
>bland interpretation in TEI that can arouse some skepticism.

This is a mildly interesting remark. It seems to be reproaching the TEI
for not privileging the views of those people concerned with detailed
metrical analysis over those who are content to treat verse simply as a
set of lines. There are of course detailed tags for detailed metrical
analysis, but they are not in the TEI core, precisely because more
people treat verse as lines than treat verse as collections of feet.
"Core" does not imply "better", just "more numerous". To have
misunderstood this reveals a rather fundamental misconception about the
modular organization of the TEI scheme which I find, frankly, rather
alarming in someone of prof Lancashire's eminence.

>If my fellow users of SGML want to develop a tagset for physical
>description of books and manuscripts and for analytical bibliography,
>they should contact Murray McGillivray at the Department of English at
>the University of Calgary. He took the initiative of organizing a
>successful physical and online conference on encoding medieval
>manuscripts. As far as I know, we haven't restricted admission to
those who have experimented with fire in Toronto.

I wasn't aware that prof Lancashire actually used SGML in earnest, but
I'm glad to hear he does! I also hope that the Calgary group to which he
refers will provide useful input to the process of extending the TEI
coverage to include the kinds of materials to which he refers -- and
will try to resist re-inventing too many wheels.

in haste


Date: Tue, 5 Dec 1995 18:55:36 -0500
From: Tzvee Zahavy <zahavy@andromeda.rutgers.edu>
Subject: graphics shot of text

Actually the best way to preserve the graphic representation of text is via
Adobe Acrobat or Wordperfect Envoy.


Dr. Tzvee Zahavy
Internet email: zahavy@andromeda.rutgers.edu