Humanist Discussion Group

Humanist Archives: March 6, 2019, 6:41 a.m. Humanist 32.516 - standoff markup & the illusion of 'plain text'

                  Humanist Discussion Group, Vol. 32, No. 516.
            Department of Digital Humanities, King's College London
                   Hosted by King's Digital Lab
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org




        Date: 2019-03-05 17:21:33+00:00
        From: raffaeleviglianti@gmail.com
        Subject: Re: [Humanist] 32.505: standoff markup & the illusion of 'plain text'

Dear Desmond,

I'm joining this conversation late and perhaps a bit unprepared not having
followed it through completely; however, I would like to address some of
the points in your latest message from the perspective of the
Shelley-Godwin Archive (S-GA - the project discussed in the article by
Muñoz and I that you cited) and some of my latest thinking on the subject.

You ask why should we seek to make a rough approximation of a manuscript
page through encoding. I think the answer is the same reason why printed
facsimile editions with transcriptions and images side-by-side are created
(such as the Garland Shelley facsimile editions). At the very least, they
provide a map to the facsimile page and a readable transcription by
scholars who know these documents deeply. Encoding and the digital medium
can be used to produce digital versions of these publications with new
affordances, so that's why this is done. The goal is not to create a rough
approximation of the manuscript page, but to formalize scholarship around
the page and the manuscript and provide a useful tool to others. I think
S-GA, in its current form, mostly matches this ideal.

In our (Muñoz and I) paper we discuss how it is not easy to use this
approach to encoding that I just described to display a reading text, but
this is quite different from saying that it cannot be used as such at all.
In fact we provide an automatically generated reading text on the site
that, while imperfect, it is still useful. To return to my comparison with
printed facsimile editions: you wouldn't use those as a reading text
either: the goals are different. Text encoding reflects your editorial
goals.

This doesn't prevent anyone from wanting for more than one goal for a given
textual resource and I agree that standoff is a viable solution for this.
The TEI Guidelines document a number of standoff techniques, yet they are
often used at a small scale because they can be challenging to apply and
validate. But standoff markup on plain text must encounter the same issues,
with the further impediment of not being able to rely on markup for simple
references to identifiers. In short, further hierarchies can be layered on
on top of (or adjacently to) an existing encoding through the use of
pointers (I discuss some this in a forthcoming JTEI article).

To give an example, Elisa Beshero-Bondar and I (et al) have been working
around this concept in the creation of a variorum edition for Frankenstein
that incorporates S-GA TEI data without modification through the use of
pointers. We have a central 'spine' (or stand-off collation) that uses
pointers to other TEI-encoded documents to identify variants. See
Beshero-Bondar and Viglianti 2018 for an overview.

The point I want to make is that I don't think that standoff requires a
'plain text' to be targeted by a number of annotations and hierarchies. In
fact, there is no such thing as a plain text: even non-XML encoded text
contains markup, often imposed by convention. Computer representation of
textual information was developed around the idea of text-as-string and we
must not mistake this representation for what text really is.

Best,
Raff Viglianti

Beshero-Bondar, Elisa E., and Raffaele Viglianti. "Stand-off Bridges in the
Frankenstein Variorum Project: Interchange and Interoperability within TEI
Markup Ecosystems." Presented at Balisage: The Markup Conference 2018,
Washington, DC, July 31 - August 3, 2018. In Proceedings of Balisage: The
Markup Conference 2018. Balisage Series on Markup Technologies, vol. 21
(2018). https://doi.org/10.4242/BalisageVol21.Beshero-Bondar01.

--
Raffaele Viglianti, PhD
Research Programmer
Maryland Institute for Technology in the Humanities
University of Maryland

On Mon, Mar 4, 2019 at 2:58 AM Humanist  wrote:

>                   Humanist Discussion Group, Vol. 32, No. 505.
>             Department of Digital Humanities, King's College London
>                    Hosted by King's Digital Lab
>                        www.dhhumanist.org
>                 Submit to: humanist@dhhumanist.org
>
>
>         Date: 2019-03-03 21:00:58+00:00
>         From: Desmond Schmidt 
>         Subject: Re: [Humanist] 32.499: standoff markup & the illusion of
> 'plain text'
>
> Patrick,
>
> you touch upon an important point: that it has been the goal of
> XML-based editions for the past 15 years or so to get ever closer to
> recording the spatial relationships between pieces of text on a page.
> And bound up with this goal is the idea that a perfect capture of such
> information would unlock multiple ways to investigate the text which
> would then be a kind of blending of markup, annotation and "plain"
> text much as you describe.
>
> As you probably have already guessed, I don't share this idea. I have
> encountered in practice some serious problems with this approach to
> making a digital edition.
>
> The first question is why? Why should we seek to make a rough
> approximation of a manuscript page that can be precisely photographed
> (not without loss of information of course) but still vastly inferior
> to the page-facsimile image that already captures the spatial
> relationships between fragments of text?
>
> A text encoded for spatial information can't even be used as a reading
> text. This is something that Munoz and Viglianti (2015) pointed out
> recently. To produce a reading text we actually need another encoding
> that conflicts with the spatial perspective and requires a re-ordering
> of textual fragments.
>
> Also you can't compare two versions of a text that have been encoded
> in this way. Comparison tools can only compare linear transcriptions
> of one version or layer of a document but not text mixed up with
> arbitrary alternatives and information about where text-blocks go on
> the page. This severely limits what we can do with our edition.
>
> It makes it very hard to edit. All that complex markup to record the
> position of textual fragments prevents ordinary human editors who are
> not technical experts from participating in the edition. They can't
> share their transcriptions with their peers because no one can agree
> on how particular features should be recorded, or they misunderstand
> the complex record of features made by someone else. Damage will
> result and collaboration will fail. You of all people should
> appreciate this because you wrote about it in 2006.
>
> I'm a strong believer in divide and conquer, and in the KISS
> principle. If we are going to make digital editions that are
> affordable and easy for everyone to do, and if we are going to
> collaborate in making them, we need a simple interface that anyone can
> use. And for a text to be fit for many purposes, we actually need a
> simple not a complex textual representation at its core.
>
> -------------------------------
> Trevor MuÃ±oz and Raffaele Viglianti (2015) Texts and Documents: New
> Challenges for TEI Interchange and Lessons from the Shelley-Godwin
> Archive JTEI 8 http://jtei.revues.org/1270
>
> Patrick Durusau (2006) Why and How to Document Your Markup Choices in
> L. Burnard, K O'Bren O'Keefe and J. Unsworth (eds) Electronic Textual
> Editing, pp.299-309.
> -----------------------
>
> Desmond Schmidt
> eResearch
> Queensland University of Technology



_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php
Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)
This site is maintained under a service level agreement by King's Digital Lab.