Humanist Discussion Group

Humanist Archives: March 4, 2019, 7:58 a.m. Humanist 32.505 - standoff markup & the illusion of 'plain text'

                  Humanist Discussion Group, Vol. 32, No. 505.
            Department of Digital Humanities, King's College London
                   Hosted by King's Digital Lab
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org


    [1]    From: Desmond  Schmidt 
           Subject: Re: [Humanist] 32.499: standoff markup & the illusion of 'plain text' (94)

    [2]    From: Wendell Piez 
           Subject: Re: [Humanist] 32.499: standoff markup & the illusion of 'plain text' (32)

    [3]    From: Iian Neill 
           Subject: Re: [Humanist] 32.499: standoff markup & the illusion of 'plain text' (27)


--[1]------------------------------------------------------------------------
        Date: 2019-03-03 21:00:58+00:00
        From: Desmond  Schmidt 
        Subject: Re: [Humanist] 32.499: standoff markup & the illusion of 'plain text'

Patrick,

you touch upon an important point: that it has been the goal of
XML-based editions for the past 15 years or so to get ever closer to
recording the spatial relationships between pieces of text on a page.
And bound up with this goal is the idea that a perfect capture of such
information would unlock multiple ways to investigate the text which
would then be a kind of blending of markup, annotation and "plain"
text much as you describe.

As you probably have already guessed, I don't share this idea. I have
encountered in practice some serious problems with this approach to
making a digital edition.

The first question is why? Why should we seek to make a rough
approximation of a manuscript page that can be precisely photographed
(not without loss of information of course) but still vastly inferior
to the page-facsimile image that already captures the spatial
relationships between fragments of text?

A text encoded for spatial information can't even be used as a reading
text. This is something that Munoz and Viglianti (2015) pointed out
recently. To produce a reading text we actually need another encoding
that conflicts with the spatial perspective and requires a re-ordering
of textual fragments.

Also you can't compare two versions of a text that have been encoded
in this way. Comparison tools can only compare linear transcriptions
of one version or layer of a document but not text mixed up with
arbitrary alternatives and information about where text-blocks go on
the page. This severely limits what we can do with our edition.

It makes it very hard to edit. All that complex markup to record the
position of textual fragments prevents ordinary human editors who are
not technical experts from participating in the edition. They can't
share their transcriptions with their peers because no one can agree
on how particular features should be recorded, or they misunderstand
the complex record of features made by someone else. Damage will
result and collaboration will fail. You of all people should
appreciate this because you wrote about it in 2006.

I'm a strong believer in divide and conquer, and in the KISS
principle. If we are going to make digital editions that are
affordable and easy for everyone to do, and if we are going to
collaborate in making them, we need a simple interface that anyone can
use. And for a text to be fit for many purposes, we actually need a
simple not a complex textual representation at its core.

-------------------------------
Trevor Muñoz and Raffaele Viglianti (2015) Texts and Documents: New
Challenges for TEI Interchange and Lessons from the Shelley-Godwin
Archive JTEI 8 http://jtei.revues.org/1270

Patrick Durusau (2006) Why and How to Document Your Markup Choices in
L. Burnard, K O'Bren O'Keefe and J. Unsworth (eds) Electronic Textual
Editing, pp.299-309.
-----------------------

Desmond Schmidt
eResearch
Queensland University of Technology

>
> But "plain text" in an electronic system is an illusion. Why not abandon
> the distinction between text, markup and annotations, capturing all of
> them in a database, upon which queries then search and/or render a
> particular "view" of a "text" for your viewing?
>
> If you desire XML, for further processing, that is one rendering of a
> text, as is rendering in SVG, for example, such that readers can choose
> dynamic renditions of variant versions, with or without a base version
> being displayed.
>
> Or any annotation of a text, as well as annotations ofÂ  annotations.
>
> If, as you say, we should stop clinging to the file metaphor for
> annotations, let's free ourselves of it with regard to texts.
>
> Granting that for many purposes I would prefer a rendering that mimics a
> hand written mss, but that is only one possibility out of many.
>
> Gaps, spaces, margins, etc., can all have unique records in a database,
> or even records based on unique x - y coordinates on a physical witness.
>
> With that change, we can speak of renderings of texts, even renderings
> that we claim match physical witnesses. Some renderings carry
> annotations, some don't.
>
> Hope you are having a great weekend!
>
> Patrick
>



--[2]------------------------------------------------------------------------
        Date: 2019-03-03 18:12:22+00:00
        From: Wendell Piez 
        Subject: Re: [Humanist] 32.499: standoff markup & the illusion of 'plain text'

Dear Willard,

Goodness, now we are recommending text bases instantiated as a graph
model: both of these sound a lot like Luminescent's internal model, or
for that matter the experimental system CMSMcQ mentioned way back in
the dawn of this thread:

Haentjens Dekker, Ronald, and David J. Birnbaum. “It's more than just
overlap: Text As Graph.” Presented at Balisage: The Markup Conference
2017, Washington, DC, August 1 - 4, 2017. In Proceedings of Balisage:
The Markup Conference 2017. Balisage Series on Markup Technologies,
vol. 19 (2017). https://doi.org/10.4242/BalisageVol19.Dekker01.

Or for old timers: https://github.com/wendellpiez/Luminescent  (I'm
not dead yet, I think I'll go for a walk!)

Yet I fail to see why any of this once and future promising work
invalidates XML in any way, whether XML is viewed as some sort of
arguable abstraction, or a practical technology that none of us (even
experts) can see in its entirety?

Regards, Wendell




--
Wendell Piez | wendellpiez.com | wendell -at- nist -dot- gov
pellucidliterature.org | github.com/wendellpiez |
gitlab.coko.foundation/wendell  - pausepress.org



--[3]------------------------------------------------------------------------
        Date: 2019-03-03 11:59:55+00:00
        From: Iian Neill 
        Subject: Re: [Humanist] 32.499: standoff markup & the illusion of 'plain text'

Hi Patrick,

I'm not sure I entirely understand what you mean when you suggest
abandoning the distinction between text, markup and annotations; at least,
I can't visualise what data structure this would be represented by, nor how
it would be edited by the user (e.g. adding and removing characters and
annotations). I find the concept intriguing, I just don't follow the
technical realisation of it.

In some ways, though, 'Codex' may fulfill some of the other requirements
you suggest (although I may have misunderstood you). For example, a
'standoff property text' (SPT) in 'Codex' is represented by a (:Text) node
for the raw text and a cluster of (:StandoffProperty) nodes for the
annotations. These (:StandoffProperty) nodes can also be linked to other
nodes in the graph (e.g., people, places, assertions). Further, an SPT can
be thought of as analagous to the text content of an XML element; which
means that (:Text) nodes can (and are) related to other (:Text) nodes as
required. For example, the default text type in the system is 'page body'
which can be linked to other texts fulfilling the function of margin note,
footnote, end notes and even intertexts (like hypertexts). And because the
editor suports zero-width annotations (between characters) you can inject
references to other texts at any point without disturbing the 'text flow'.

Best regards,
Iian




_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php
Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)
This site is maintained under a service level agreement by King's Digital Lab.