Home About Subscribe Search Member Area

Humanist Discussion Group


< Back to Volume 32

Humanist Archives: March 13, 2019, 7 a.m. Humanist 32.542 - the illusion of 'progress' and transfer of knowledge

                  Humanist Discussion Group, Vol. 32, No. 542.
            Department of Digital Humanities, King's College London
                   Hosted by King's Digital Lab
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org


    [1]    From: Elisa Beshero-Bondar 
           Subject: Re: [Humanist] 32.537: the illusion of 'progress' and transfer of knowledge (294)

    [2]    From: Jim Rovira 
           Subject: Re: [Humanist] 32.537: the illusion of 'progress' and transfer of knowledge (79)


--[1]------------------------------------------------------------------------
        Date: 2019-03-12 21:47:00+00:00
        From: Elisa Beshero-Bondar 
        Subject: Re: [Humanist] 32.537: the illusion of 'progress' and transfer of knowledge

Desmond,

Having no need for hierarchies suggests an easy flatness, but even you
aren't living without hierarchy when you assemble stand-off layers to
communicate the actions of a writing process. As soon as you need to
communicate an interpretation of a document, you have to do so in terms of
hierarchies that show and tell the difference between metadata and data,
between framework and text, between obscure mark and semantic sequence.

You can't even be without hierarchy in reading my post and responding to it
in sentences with subjects and verbs and modifiers. Humans communicate in
grammars that involve hierarchy and so do the models and systems we build.
You can run from hierarchy but sooner or later you will have to use it to
interact with someone in a shared frame of reference. Yes, of course, any
*single* hierarchy poses a reductive view, and yes, we do exist in a world
of multiplicity and overlap, but that does not invalidate or nullify the
ubiquitous presence of hierarchies being troubled, disturbed, and vexed by
overlap. Hierarchy is what we think with, how we build, what we choose to
push aside in deciding to reduce and simplify one complicated thing into
multiple layers. You don't rid yourself of complexity or hierarchy by
making layers, because eventually, you need to assemble those layers in
order for them to be read and interpreted together.

We can even describe the vexing troubles caused by overlap using hierarchy
and twisting around it. When you create layers, when I output a standoff
architecture in TEI, neither of these exists without a structure that forms
them into an alternative hierarchy. That is why I write so strongly that we
should stop pretending that we have "no use for" hierarchy and start
thinking in terms of multiple hierarchies because that's what we do and how
we work when we build our projects.

Your quote from Claus Huitfeldt on Wittgenstein's manuscripts is an
expression of despair with which I sympathize but find short-sighted. First
of all, we should recognize the difference between the manuscripts as
physical objects and the manuscript as a modeled document objects, and as
soon as we do that, we become aware, as intelligent and observant people
curious about manuscripts, that we decide on structures for organizing them
to read them and find ways to convert them into something meaningful to us
and the communities that have an interest in Wittgenstein. We can give up
the effort in despair, of course, and I'm afraid that's what happens to
many a nascent very complicated edition project, but we can also do what we
can by expressing what we observe. If what we observe is based on
describing relationships between pen marks, erasures, and locations on the
page, we have hard work ahead. But we have equally hard work ahead if we
are going to prioritize semantic sequencing--how do we determine that?
Hierarchical frames of reference are necessary for any engagement that
determines meaning from marks on surfaces, and we may never agree that our
hierarchies are adequate, but the least we can do is recognize that they
are multiple and that they are going to intersect and overlap. And we can
study those areas of intersection and overlap and find them worthy of
continued discussion.

I can well understand how encoding the Harpur manuscript led to such
frustration - a frustration well known to manuscript scholars. But I am
sorry that you felt compelled to try to  "beat consistency out of the
original TEI." If I follow you in what you say here and elsewhere of the
Harpur project, I imagine that the trouble developed as the humans involved
in the original encoding failed to apply their markup consistently or
failed to find a common ground for expressing manuscript activity in nested
markup. Yes, I'll grant you this is a problem and there can be such a thing
as inconsistent, unexpressive, or erring code to the extent that it does
not express meaningfully. But that doesn't mean that we ought to be
finished with XML. We were fortunate not to have such trouble in designing
an algorithm for collating previous digital editions of *Frankenstein*
because we were able to read and follow and bridge the work of the text
scholars who came before us. Our work isn't so unusual--it's similar to
what happens when people review previous scholarship in producing a new
scholarly edition in print or digital form. We search for ways to connect
with previous scholarship and incorporate it into new structures.

Our Frankenstein Variorum is quite a bit different from your work, since we
are working with the intricate and well-constructed Shelley-Godwin Archive
edition. I did not have to build or rebuild that edition of the MS
notebooks to collate it with the other extant digital representations of
Frankenstein's print editions. Because I had the benefit of clearly
signaled markup, I was able to design a collation algorithm that helped me
to identify alignment chunks and incorporate the very editorial markup of
the earlier digital editions (ranging from web 1.0 to TEI) into sequenced,
aligned, comparable units for collation. It has been fun to work on that
part of the project and illuminating: Hierarchy is not limited to
expressing a singular "essence" of a document, and when we work with it in
recognition of multiple co-existing and intersecting parts we are able to
design bridges and connecting spans instead of isolated silo towers. The
difference is in the thinking we apply to our data models, not the presence
or absence of hierarchy.

I did not presume to know better than the scholars who designed the
Shelley-Godwin archive, and I don't feel compelled to say its approach to
page-by-page surface and zone markup is at all flawed. Rather we were
fortunate to work with a very clear and regularly parsable model of the
text - a TEI encoding of the notebooks that signaled not only the
page-by-page layout but also semantic sequence of inserted passages. I had
the benefit of clear signal posts in the S-GA markup connecting marginal
insertions and deletions to the most likely semantic positions in the
"main" text. Those sequences were not always singular and sometimes they
are fragmentary--the course of manuscript reading never runs smoothly and
yet it runs, and I am grateful first to Charlie Robinson and other scholars
in the 1990s long before this was ever encoded in TEI for working out the
reading sequence - I'm happy to carry on that work and build on it. I am
sorry when people seem to think that markup is something too complicated to
deal with, and so far do I dissent with this view that I've made it part of
my work to teach people what interesting things they can build with markup.

Thanks to the semantically intricate, regularly XPath-able encoding of the
S-GA, we were able to write code to parse the notebooks and collate them
with other editions that were not organized by page, surface, and zone. We
have made decisions about what constitutes a singular alignment vs multiple
alignments--sometimes passages of the notebooks constitute in our collation
two or three fragmentary witnesses, not just one, and I know you and others
on this list are familiar with the complexities of this. I don't pretend
that working manuscript encoding into collation was easy work, but I did
not consider myself having to "beat" anything violently out the encoding we
were working with, and I'm grateful to the coders who came before me to
make our variorum work possible. I'm the person on the Frankenstein project
tasked with that hard work and I face it as a manuscript scholar--my
algorithm is a work flow for decision making that must make decisions about
hierarchical relationships: what constitutes main text vs a second or third
copy? Where do I need to insert a signal of alignment to ensure my
collation software doesn't make a false positive? To me this is deeply
interesting work requiring hierarchical markup for which I have much use.

The S-GA's consistent markup permitted us to program with XSLT a
resequencing of the manuscript notebook encoding according to its semantic
flow. The very markup of insertions and deletions is worked into our
collation process because, as many on this list understand,
machine-assisted collation involves treating documents (whether they
include markup or pseudo-markup or not) *as if they were* plain text. We
opted to include deletion and insertion data in our collation to make the
little-known deleted passages of the manuscript visible in comparison with
the print editions that are better known. The work was complex but driven
by an age-old text-scholarly question of illuminating that which was once
hidden.

In order to collate documents with different hierarchies, we had to
"flatten" every hierarchical model--to treat it as if it were a running
stream of plain text. We differ fundamentally here if you believe that the
running stream of text to be wound and woven into collation is the essence
of an edition. I don't believe it to be--I believe it to be an illusion
generated to ease processing, and that the edition we build needs to raise
hierarchies again. What we produce on the other side of collation is a
hierarchical expression of a collation woven into TEI XML critical
apparatus, a hierarchy that expresses alignment and boundaries of bundled
passages according to reading witnesses. That is far more meaningful and
readable in hierarchical form than it could be in flattened text.

As long as human beings are reading and writing and editing, there will
never be such a thing as perfectly flat "plain text" - that is the illusion
of which I speak strongly here. Hierarchies are what we make when we read
and prioritize and decide to emphasize one thing and ignore, mask, or
displace another. We can't avoid it because that's how we work and build.

Desmond, I am not a programmer like you but a not-so-simple English
Literature professor working at a regional teaching campus. I am a trained
textual scholar who has taken a deep interest in what I can build with the
XML family of languages because they give me capacity to express
relationships I find unavailable in the syntax of conventional research or
old-fashioned print editions. I teach these things to my undergraduates (
http://dh.newtfire.org) who build and model cultural artifacts using XML,
and then apply research questions to investigate what they can learn from
their modeling. I appreciate the expressive capacity in the TEI's shared
vocabulary and guidelines because they speak to me as a member of a
scholarly community. I don't think any of us requires that every edition be
encoded in exactly the same way, and indeed, I think every digital edition
project requires its editors to find distinct ways to express its
distinctive complexities. The application of a shared vocabulary would be
intellectually deprived if every edition were encoded according to the
convenience of a programmer who wants a one-size-fits-all interface. I
would like people to think about how to write code in TEI that communicates
with other markup rather than allowing myself to sit quietly while they are
told that XML is impossible to work with and that hierarchies are just not
useful.

My own thinking about hierarchies has been stimulated by process of
flattening and raising XML for collation, to intersect multiple document
models. Related to that, this list might enjoy our Zen of Flattening and
Raising presentation here:
https://slides.com/elisabeshero-bondar/zenraising#/  and paper here:
https://www.balisage.net/Proceedings/vol21/html/Birnbaum01/BalisageVol21-Birnbau
m01.html, with many thanks to David Birnbaum and Michael Sperberg-McQueen.

I hope you find my work interesting and not unappealing, and really not so
far different from your own. Perhaps we can be properly introduced someday
in a context where we're not trying to prove one another wrong, but can
find ways to bridge our differences and respect our data modeling. When we
do that, though, I am sure we'll find structured hierarchies to be a
meaningful way to communicate.

Best,
Elisa
--
Elisa Beshero-Bondar, PhD
Director: Center for the Digital Text
https://www.greensburg.pitt.edu/digital-humanities/center-digital-text|
Associate Professor of English
University of Pittsburgh at Greensburg
Humanities Division
150 Finoli Drive
Greensburg, PA  15601  USA
E-mail: ebb8@pitt.edu



On Tue, Mar 12, 2019 at 4:17 AM Humanist  wrote:

>                   Humanist Discussion Group, Vol. 32, No. 537.
>             Department of Digital Humanities, King's College London
>                    Hosted by King's Digital Lab
>                        www.dhhumanist.org
>                 Submit to: humanist@dhhumanist.org
>
>
>
>
>         Date: 2019-03-11 21:45:47+00:00
>         From: Desmond Schmidt 
>         Subject: Re: [Humanist] 32.533: the illusion of 'progress' and
> transfer of knowledge
>
> Elisa,
>
> I don't understand why you react so strongly to the slight comment by
> Joris that he had no need of hierarchies. In this I concur with him.
> On this topic I'd like to quote Claus Huitfeldt, who in 1995 reacted
> to the then recent adoption of SGML:
>
> "I am not convinced that Wittgenstein’s manuscripts are basically
> hierarchical structures. Potentially, for all I know, any feature may
> overlap with any other feature. Besides, I do not even know what the
> hierarchies should consist of, or whether the identification of such
> hierarchies would be particularly illuminating."
>
> It is interesting to note that all the arguments we are having in this
> thread and preceding ones are about these same issues: that no one can
> definitively decide what the hierarchies are or even mount a
> convincing case that they communicate useful information. The
> hierarchies are mostly a requirement of the underlying markup system.
> They are created in a self-justifying way to enable syntax checking of
> themselves. So when our document parses correctly we feel chuffed that
> the text is now properly encoded when in fact only the tags are.
>
> On reading your article written with Raffaele I am reminded of the
> many attempts I made to beat some consistency out of the original TEI
> encodings of our Harpur manuscripts. Three times I thought I "had it"
> in developing ways to determine the surrounding context of a
> particular insertion or deletion or variant. But the result always
> left me dissatisfied however much I tweaked my program. And I am an
> experienced software engineer who knew the material well. In the end
> reliability could only be achieved by combining an automated method
> with human intervention in difficult cases.
>
> The reason for this failure is that the information we need to enable
> cross-document collation simply isn't present in the transcription.
> I'll repeat what I said earlier: Encoding primarily for graphical
> features (deletion, insertion) or physical layout (zones, x,y
> coordinates) means that we can't hope to extract what amounts to
> temporal information about how the text stood locally at any one time.
> And we need that if we are going to collate.
>
> To illustrate this point I refer to No 2. of my "tough cases" observed
> in making our edition of Harpur.
> (http://charles-harpur.org/tough-cases). Here you will see repeated
> revision of various parts of a stanza extending to two pages. The
> author doesn't even bother to delete earlier versions consistently.
> Trying to work out from spatial data here as to what replaces what is
> a decidedly difficult problem that only humans can ever hope to
> resolve. Or example 6 d) which contains revisions on a recto of a
> printed text with corrections on the verso. Where do these changes on
> the verso actually go? And which words are repeated (not crossed out)
> or superseded by the ones on the recto?
>
> What you appear to be claiming in this article is that you have
> developed an automated way to collate any texts conforming to TEI by
> removing the markup to a standoff representation. I would like to
> point out that the transcribed base texts of corrected manuscripts in
> many cases are jumbled-up nonsense. For that reason collation
> algorithms cannot reliably compare texts with embedded variants. I
> challenge you to show me an algorithm that can do it in a
> mathematically proven way.
>
> Maybe it is OK for humanists who are used to fuzzy things. But if we
> are truly going to interoperate as you say - and in this I agree that
> it is a highly desirable property for our digital encodings - we need
> a reliable way to express variation between documents and between
> editorial projects, and XML cannot meet that requirement, however much
> we might want it to. The reason is not that the XML format cannot
> express it but that humans cannot write it consistently enough to
> allow that to happen.
>
> Desmond Schmidt
> eResearch
> Queensland University of Technology
>
>


--[2]------------------------------------------------------------------------
        Date: 2019-03-12 14:14:36+00:00
        From: Jim Rovira 
        Subject: Re: [Humanist] 32.537: the illusion of 'progress' and transfer of knowledge

Just speaking about hierarchies in text in general, Adorno attempted to
write his thesis on Kierkegaard without any topic sentences. He attempted
to weigh every sentence equally. I was reading about 100 pages a day at the
time and it took me two weeks to get through his 127 page text, and I'm
still pretty sure I picked out a topic sentence every ten pages or so.
Hierarchies are taught to grade school students when they're being taught
to write -- topic sentence, supporting detail or evidence, conclusion or
transition -- and before then, children are taught to read looking for
topic sentences. Even just sentence structure -- subject, verb, object --
is a hierarchy. But, I'm talking English. German is, or at least can be,
written in little semantic Russian dolls, but remembering that also reminds
me of the Austrian grad student who told me that German speaking students
prefer to read Kant in English translation before attempting him in German.
Linear or hierarchical models are the simplest to process, so people tend
to write using them. It takes a lot of work and deliberation to write a
text absent any hierarchies, and when they are written, they are
notoriously difficult to read. Saying they don't exist is to claim that
there are no organizational principles in any text, ever. That's certainly
possible with some texts, and more possible with shorter texts than with
longer ones, but I think we should view that as an exception rather than a
rule. Finnegans Wake appears to be written in a kind of circle, so where
does that really begin or end? But do we really want to treat every text
like it was Finnegans Wake?

Saying there are organizational principles other than hierarchies will I
think lead to debates over nomenclature rather than actual structure. We
can talk center and periphery if you prefer circular or globular models to
linear ones, but that still weights some features of the text over others.
Might create more possibilities, though.

Jim R



--
Dr. James Rovira 
Bright Futures Educational Consulting


   - *Writing for College and Beyond* (A first year writing textbook. Lulu
   Press, forthcoming. .pdf files available for preview if you're interested
   in considering this text for your classroom. It is fully customizable for
   departmental orders.)
   - *Reading as Democracy in Crisis: Interpretation, Theory, History
   *  (Lexington Books,
   in production)
   - *Rock and Romanticism: Post-Punk, Goth, and Metal as Dark Romanticisms*
    (Palgrave Macmillan,
   May 2018)
   - *Rock and Romanticism: Blake, Wordsworth, and Rock from Dylan to U2*
   
(Lexington
   Books, February 2018)
   - *Assembling the Marvel Cinematic Universe: Essays on the Social,
   Cultural, and Geopolitical Domains*
   ,
   Chapter 8 (McFarland Books, 2018)
   - *Kierkegaard, Literature, and the Arts*
   ,
   Chapter 12 (Northwestern UP, 2018)
   - *Blake and Kierkegaard: Creation and Anxiety*
   
(Continuum,
   2010)

Active CFPs

   - *Women in Rock/ Women in Romanticism*
   ,
   edited anthology
   - *David Bowie and Romanticism*
   ,
   edited anthology



_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php


Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)

This site is maintained under a service level agreement by King's Digital Lab.