Home About Subscribe Search Member Area

Humanist Discussion Group


< Back to Volume 32

Humanist Archives: Feb. 1, 2019, 7:47 a.m. Humanist 32.410 - the McGann-Renear debate

                  Humanist Discussion Group, Vol. 32, No. 410.
            Department of Digital Humanities, King's College London
                   Hosted by King's Digital Lab
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org


    [1]    From: Dr. Herbert Wender 
           Subject: Re: [Humanist] 32.402: the McGann-Renear debate (16)

    [2]    From: James Cummings 
           Subject: Re: [Humanist] 32.402: the McGann-Renear debate (57)

    [3]    From: William Pascoe 
           Subject: Re: [Humanist] 32.402: the McGann-Renear debate (171)

    [4]    From: Peter Robinson 
           Subject: Re: [Humanist] 32.402: the McGann-Renear debate (44)

    [5]    From: Desmond  Schmidt 
           Subject: Re: [Humanist] 32.402: the McGann-Renear debate (23)

    [6]    From: Gabriel Egan 
           Subject: Re: [Humanist] 32.402: the McGann-Renear debate (59)


--[1]------------------------------------------------------------------------
        Date: 2019-01-31 17:36:29+00:00
        From: Dr. Herbert Wender 
        Subject: Re: [Humanist] 32.402: the McGann-Renear debate

Patrick and Alexander,

it was my failure not to name Sperberg-McQueen as author of the cited snippet,
but your's not to verify the source. To say it clear: I don't share the polemics
nor the laudationes in the cited mail.

Desmond,

you speak of "the text itself" (with some 'fundamental features') on the one
side and on the other of "a model" of this text (a hierarchical one in your
example). Can you point me to an introduction to differentiate between 'language
of observation' and 'language of theory' in markup contexts?

Greetings, Herbert



--[2]------------------------------------------------------------------------
        Date: 2019-01-31 16:33:58+00:00
        From: James Cummings 
        Subject: Re: [Humanist] 32.402: the McGann-Renear debate

I'm quite enjoying the discussion with regards to text, modelling, and digital
representations of it. But there seem to be a number of misrepresentations of
XML that seem to come up again and again. Some people may be accidentally
conflating the XML language with popular vocabularies expressed in it. XML in
itself does not imply inline embedded markup, nor does it imply any particular
tagset that models semantic or renditional information in specific. XML can be
stand-off, can relate many files together, and in nesting and URI-based
attributes can form graph models of relations if its creator desires. The only
hierarchy required by XML is that any additional elements if they exist are
nested inside the outer root element. There is, indeed, no requirement for
multiple elements, a single self-closing or 'empty' root element can still be a
well-formed XML document. Though for the kinds of models being discussed, I
don't think that having a single element with an enormous number of attributes
would be a helpful way of attempting to model our understanding about a text. I
only point this out because people saying things like XML enforces a
hierarchical view (it doesn't it could be many very flat files) and must be
embedded markup (when one's markup could be all out-of-line) when these are
strawman arguments that we should all avoid and get back to the real issues.
Arguments about formats are *boring* when we all know that we can transfer
between formats and many people look at the data through an intermediate, often
lossy, view. Some formats are better than others at expressing particular forms
of models, but discussing the data modelling is more interesting than the
particular format it is modelled in.

Modelling our fragmentary and limited understanding of texts is what most
encoding projects using XML are attempting to do (in my experience). Desmond
suggests that is only worth having renditional information because "You can't
markup all the desirable semantic aspects of a text without spending $10,000 per
page and then no one could read it. So even then it is worthless." And instead
suggests marking up renditional features and extracting them with concept mining
or AI. This may be all well and good for the kind of work he wants to do, but it
is not for everyone. None of the projects I know are marking up _all_ the
possible semantic aspects of a text, but only those which are germane to their
research questions. (Indeed, marking up all the renditional aspects of a text,
if you really mean _all_ is just as cumbersome as marking up all the semantic
aspects.) By doing such markup they are linking concepts based on their subject
knowledge, and beyond concept mining or most AI so far, to particular locations
in the text, and then en masse pulling them out for other forms of analysis
because humans find it hard to hold all of this in their minds at one time. That
a highly marked up text is unreadable is another strawman based on a very
limited conception of 'reading' -- whether expressed as XML, RDF, JSON, LMNL, or
anything else, marked up texts are most often 'read' through transformation to
other formats, extraction of the encoded data across a corpus, or queried as
structures in themselves.


I'll try to go back to lurking now,

James


--

Dr James Cummings, James.Cummings@newcastle.ac.uk
Senior Lecturer in Late-Medieval Literature and Digital Humanities
School of English, Newcastle University


--[3]------------------------------------------------------------------------
        Date: 2019-01-31 15:17:18+00:00
        From: William Pascoe 
        Subject: Re: [Humanist] 32.402: the McGann-Renear debate

I tried to resist but, on the XML/markup/meaning/heirarchies debate....

This point is maybe the trivial one:
"...what McGann was saying was that every technology is a
(particular) theory of text and realises a model of text and as such
privileges certain views on text and impedes others..."

We say this sort of thing about a lot of things, and often sometimes that it
applies to everything - that nothing avoids complicity in some kind of
priveleging etc. This is an important starting point for a lot of critique to
tease out how so and whether it's wrong, but not so valuable for XML. The
privileging is mostly a problem if it isn't noticed (assumptions and biases
etc), is abused for nefarious purposes or if it becomes totalitarian (this is
the only way permitted, such that only certain views may ever be privileged and
others are always impeded). Taking XML as an example, of course it priveleges
some things and impedes others - that's why we only use XML for some things,
those things it's useful for, and not others. It's main practical limitation for
its intended purpose is the difficulty of overlapping tags. If one thing starts
in the middle of another thing but ends after that other things closing tag, you
have a problem. So maybe use some other markup technique, or you just find a way
to fudge it using XML attributes or something (when you reach a structure's
limitations you fudge it, common in IT). Renear seems well aware of this because
he cites his own article on it
http://www.stg.brown.edu/resources/stg/monographs/ohco.html) - so I'm not sure
why he's so committed to heirarchy, maybe he's just limiting it to something
like 'heirarchy within Domain Specific Language space' (to use the jargon of
ontology in the DSL of IT, not philosophy, LOL). It's not like it has
contributed to some sort of political oppression.

To get back to the point, we only use XML to tag or render the specific things
we're interested in, and there are as many hierarchies as questions we might ask
about a text (parts of speech? speaking characters? prose or poetry? gender?
those parts I feel were scary or funny? or that in the manuscript the 's' was
rendered with a specific ligature characteristic of anonymous scribe K457? or
'rhetorical structures that overlap and infect the syntax and semantics'[McGann
{maybe McGann is a little naive about the uses of XML in these early days?}].
XML's purpose *is to privilege* the things we're interested in, knowing full
well our mark up choices are based on our assumptions, so we can see it or
process it, use it, better - so there's nothing underhanded or deceptive about
XML's privileging and impeding. That's it's explicit purpose. XML privileges the
heirarchies we're interested in - duh!

Maybe "even character encoding conveys textual meaning by renditional features"
is the bit not so trivial, though it looks it at face value. When I type fiction
I like to use a 'plain text' editor, with a fixed width serif typeface, like an
old typewriter or like I'm coding, and not a word processor. This is just
because it 'feels' more raw, and 'not yet rendered' even though I know perfectly
well it is rendered in this typeface chosen specifically because this rendering
carries that meaning/aesthetic of 'unrendered', and it could easily be any other
chosen typeface, it has to be one and can't be none. What's more non trivial
about it though is the old debate about Platonic forms. This is the bit that
takes us to the OP question: 'Is there another way of viewing "abstract"?

Does there exist an abstract 'a' that is not instantiated in some rendering?

When we encounter the character 'a' it is always an embodied instance of it (a
rendering), in some typeface or handwriting, or whatever. But what is it that
all these 'a's have in common that enables me to recognise them as being
instances of 'a'? Socrates says the form of 'a', that exists immaterially, and
prior to the particular instance of 'a' (since it's only by comparison of this
instance with an already existing idea of 'a' that I could decide that this
instance is an 'a'). Aristotle says there's no need to make up immaterial
eternal immaterial realms - the only real substance is the instances of 'a'
among which we notice similarities. If I think 'a' do I always 'see' in my
imagination some rendering, or is 'a' thinkable without a font? Does it always
simply denote the sound 'a', and only that?

One thing worth pointing out is that when we do wish to pragmatically abstract a
concept 'a' from all the possible instances of it, not tied to any type face, we
are constrained to utilise some arbitrary representation of it - a label that
has nothing to do with the letter 'a' itself, an encoding that *refers* to it
without *being* it - perhaps because it has no 'essence', alternatively because
that 'essence' is abstract and not instantiable, so it's essentially abstract
(and the only way to try to avoid semantic value, and so remove all
contingencies that might make us want to change the label that should not be
changed, as we find in UIDs in databases, is to try to remove all semantic value
from the label with a purely arbitrary identifier {and don't conflate
'arbitrary' with 'abstract'}). Eg: UTF8: U+0061; HTML Entity: a Binary:
01100001; etc.

Which leads, of course, 2000 years or so later to Derrida's points about
différance. The problem of 'a' and reference, applies to all that has
meaning, and what can we think of that doesn't have meaning, or doesn't exist as
what it is because of it's meaning? This is a millennia old problem that we
can't resolve here, but it does inform why semiotics and literary studies has
become so central to philosophy.

Ultimately the root of much argument over this might just be arbitrary and
definitional (the root of so many arguments, as I first realised at 14,
listening to a heated argument between my father and his girlfriend over whether
hair conditioner was good for your hair, or just put a coating on it - I learned
a lot about ontology, epistemology and human nature from that one).

If one person defines a 'text' as x, y and z. Then we can just accept that's the
phenomena they're talking about, leave it at that, and carry on with asking
whether what they say is true of what they speak of. If Renear simply says,
'Look, we are just limiting our problem domain here to things which are
intentional, real, linguistic, hierarchical and abstract because that's going to
be useful, and for now it covers most of what we can do with an ASCII
characterset and that's what we want to use because we have limited space and we
only are considering Anglophone needs right now because we're American. Let's
just say that's what we mean by 'text' now, because it's 'common sense' and
expedient." then fine, whatever.

But if he is saying something broader than this, about 'that which can be
read/written/meaningful' - then we can ask whether those claims about
readable/writable/meaningful things are true. Are all
meaningful/readable/writable things intentional, real, linguistic, heirarchical
and abstract?

It's an easy *NO* on every point in ways that are by now well known to anyone
who's done a bit of semiotics:

- real: they have properties independent of our interests in them and our
theories about them.
Yes, in so far as they are materially instantiated, but texts qua texts have no
meaning but what is read by a reader - things must be read as texts or they are
not 'texts'. Otherwise they are just ink and paper, just positive and negative
charges on magnetic surface etc. (there are some nuances about perceiving texts
that we don't know how to read but none the less understand they are some kind
of writing which must mean something)

- abstract: the objects which constitute texts are abstract, not material,
objects.

This is a far more problematic philosophical claim. See above.

- intentional: texts are, necessarily, the product of mental acts
Well, some twigs fell on the ground in the shape of the letter A, and then there
were two crossed over like an X, etc. Also, the reading I read might not be the
reading the writer intended. Oh no wait, I think he means 'intentional' in the
philosophical sense, not the common meaning of 'what the author intended'. Ie: a
text is the object of cognition. So ok. Maybe this is a yes (see 'real').

- hierarchical: the structure of texts is fundamentally hierarchical
Yes you can see them that way but there are any number of heirarchies, people
will implement them differently to each other, they overlap, and they may have
internal structures that overlap, and sometimes the structure is more rhizomatic
or based on coincidental property collections (keyword tagging), etc. *Is there
a tree in the text if there's no-one there to walk it?* So a text *may* be
heirarchical, depending on the reading, not 'fundamentally'.

- linguistic: texts are linguistic objects; renditional features are not parts
of texts, and therefore not proper locations for textual meaning.
This is just the result of phonocentric bias. You can read a lot from whether
someone scrawled a message in haste or not. You only need consider Heiroglyphs,
Mixtec, Aztec, emoticons and Mandarin characters, and how you can 'read' Dutch
genre painting, or pictorial murals. Even the land can be 'texts'. Or dance -
notice how bees' dance moves encode linear journey narratives. Or at the limits
of zoosemiotics [Uexkull] - What is the meaning of the sun to a sunflower? Is
the turning of a sunflower's head its reading of the sun?




Dr Bill Pascoe
eResearch Consultant
Digital Humanities Lab
hri.newcastle.edu.au
Centre for 21st Century Humanities

T: 0435 374 677
E: bill.pascoe@newcastle.edu.au

The University of Newcastle (UON)
University Drive
Callaghan NSW 2308
Australia



--[4]------------------------------------------------------------------------
        Date: 2019-01-31 11:32:15+00:00
        From: Peter Robinson 
        Subject: Re: [Humanist] 32.402: the McGann-Renear debate

My contribution to this debate is as follows:

1. Contrary to Renear: I argue that every text has a double aspect. It is BOTH
an act of communication, precisely as Renear describes it, with the properties
of intentionality, communication and structure as he defines it. It is ALSO the
material inscription of the text: the document (which might be an audio
recording, or similar) in which the text is inscribed.

2. Both aspects of the text may be represented as an ordered hierarchy of
content objects (a “tree”). We may represent the communicative act in the
familiar terms of a work divided into (say) acts, themselves divided into
scenes, into lines, etc. We may represent the document as a book divided into
pages divided into columns divided into line blocks, etc.

3. The two trees (the communicative act tree, and the document tree) are
entirely distinct.

4. The usual formulation of the two trees, as “overlapping hierarchies”, is
inadequate. This description implies that the leaves of the two trees are in the
same order in both, so that one only need overlay the two trees on a single
stream of characters. In fact, the two trees are distinct to the point that the
shared content objects in each tree may be in radical different orders (and
commonly are, in authorial manuscripts, and indeed in printed books where page
layout disposes the text across columns, marginalia, running headers and
footers, etc).

5. Accordingly: a better model for the two trees is that of a single set of
leaves, shared by the two trees.

6. While every text has these two trees, many others may also be applied to any
text.

I’ve outlined some of this in the opening sections of
http://www.digitalhumanities.org/dhq/vol/11/2/000293/000293.html. These
principles have been enacted in the Textual Communities architecture
(www.textualcommunities.org and www.textualcommunitiessandbox.org; see also
https://wiki.usask.ca/display/TC/Textual+Communities. The fullest account of
this argument, with an outline of how it was implemented, is given in my paper
at ADHO 2018, https://wiki.usask.ca/display/TC/Creating+and+Implementing+an+Onto
logy+of+Documents+and+Texts

Peter



--[5]------------------------------------------------------------------------
        Date: 2019-01-31 10:54:23+00:00
        From: Desmond  Schmidt 
        Subject: Re: [Humanist] 32.402: the McGann-Renear debate

On 31/1/19, Patrick Sahle wrote:

>Gabriel, we are all grateful for the work of Allen Renear and others who
>made it clear that for many textual genres hierarchy is an essential
>structural feature. We will never fall back behind that.

OK, so now you are weakening it. First hierarchies were "fundamental",
now they are "essential for many textual genres" (and not others).
Look if they were truly essential I simply couldn't disagree with you.
But I do.

But please take a moment to consider that even in texts that can be
analysed in a deep hierarchical way, such as plays, hierarchies only
exist in practice because of the tools that require them. Let's say
you format a play by Shakespeare in Microsoft Word. Headings of acts,
scenes and speeches are just part of a linear text. But it all looks
fine and dandy. So where is the "essential" need for hierarchies now?

Desmond Schmidt
eResearch
Queensland University of Technology



--[6]------------------------------------------------------------------------
        Date: 2019-01-31 10:00:16+00:00
        From: Gabriel Egan 
        Subject: Re: [Humanist] 32.402: the McGann-Renear debate

Dear HUMANISTs

Desmond Schmidt wrote:

 > It was I who wrote that bit not Patrick [Sahle].

Oops, sorry about that. Desmond went on:

 > Last time I looked at a copy of Shakespeare there
 > were no angle-bracketed tags, just black marks on
 > the pages.

The "copy of Shakespeare" that exists as black marks
on the pages is no more the real thing than the copy
that has angle-bracketed tags in it. Indeed, I happen
to have first read Shakespeare in its angle-bracketed
tagged version, reading on a computer screen the 1989
Oxford Complete Works edition distributed as COCOA-
tagged ASCII files on floppy disks. You seem to be
saying that I wasn't reading a copy of Shakespeare--
is that really your position?

 > It is quite possible to model Shakespeare without any
 > reference to hierarchies, even though there are acts,
 > scenes, speeches and so forth.
 >
 > So the hierarchies are only in your head when you
 > analyse a text. It might be a familiar way of modelling
 > them but it's just that - a model, not a fundamental
 > property of the text itself.

All the dialogue lines occur inside speeches, all the speeches
occur inside scenes, and all the scenes occur inside acts,
and there are exactly five acts. Can we agree that this itself
is a hierarchy? If not, could you explain why not as it would
mean I'm having trouble following your understanding of
the word 'hierarchy'. This hierarchy cannot be said to be
merely a product of my analysis of the writing since
the writers themselves discussed this aspect of play
writing in their surviving letters about their plays.

Specifically, they would describe a play as not ready
because one of the acts had not been completed. Also,
in their manuscript alterations they would add inked
lines to particular phrases that had appeared to come
loose from particular speeches in order to make clear
which speeches they came from. Surely then the hierarchy
(the need for each dialogue line to belong inside a
speech) was in their heads, the heads of the creators
of the writings, not merely in my head as the consumer
of the writings.

Regards

Gabriel Egan






_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php


Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)

This site is maintained under a service level agreement by King's Digital Lab.