Humanist Discussion Group

Humanist Archives: March 8, 2019, 6:10 a.m. Humanist 32.523 - standoff markup & the illusion of 'plain text'

                  Humanist Discussion Group, Vol. 32, No. 523.
            Department of Digital Humanities, King's College London
                   Hosted by King's Digital Lab
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org


    [1]    From: Desmond  Schmidt 
           Subject: Re: [Humanist] 32.520: standoff markup & the illusion of 'plain text' (45)

    [2]    From: Desmond  Schmidt 
           Subject: Re: [Humanist] 32.516: standoff markup & the illusion of 'plain text' (125)


--[1]------------------------------------------------------------------------
        Date: 2019-03-07 20:33:43+00:00
        From: Desmond  Schmidt 
        Subject: Re: [Humanist] 32.520: standoff markup & the illusion of 'plain text'

Willard, if I may be allowed a second response...

On 3/7/19,  Herbert Wender wrote:
> Perhaps I don't understand what's going on there, you are the expert. But I
> think it could be of interest what a non-expert takes out of the comparison
> between text and encoding.

This reminds me of something Hugh Cayless said earlier, that
transcriptions are inherently complex because texts are complex, and
that we need to become technically expert in order to edit at all. But
a digital edition requires the work of many people, not just the
technical designer. I have felt painfully at the centre of the two
main editions I have worked on, and with the latest one I have tried
my best to move to the periphery and let the real editors get on with
the job. If technical experts are so clever why can't they design
easy-to-use interfaces that allow ordinary people to participate?

On 3/7/19,  Herbert Wender wrote:
> How do you come from this physical oriented encoding (which splits a
> coherent deletion because there are two pen strokes) to the logical encoding
> to give f.e. a Zeller like matirx representation of textual development?

And On 3/7/19,  Jan Christoph Meister wrote:
> There's  a fundamental conceptual barrier between an
> analogue/spatial model of text as a two dimensional physical continuum
> that extends across pages, but is at the same time an n-dimensional
> historic and semantic phenomenon,

I wholeheartedly concur. Prioritising the spatial dimension precludes
any adequate representation of the temporal dimension of a text, which
in my view is far more important. The evolution of the text in the
author's mind is what we expect to get out of reading any manuscript
and not simply where there was free space on the page to write
corrections.

Desmond Schmidt
eResearch
Queensland University of Technology


--
Dr Desmond Schmidt
Mobile: 0481915868 Work: +61-7-31384036



--[2]------------------------------------------------------------------------
        Date: 2019-03-07 19:46:15+00:00
        From: Desmond  Schmidt 
        Subject: Re: [Humanist] 32.516: standoff markup & the illusion of 'plain text'

Dear Raff,

thanks for replying and apologies for my late response. I have always
regarded the SGA as one of the leading DSEs exploring what can be done
in the digital medium. And while I agree that we need to place the
facsimile next to the transcription I was trying to say that I didn't
think that recording the spatial layout of text on a manuscript page
should be the primary goal of any transcription. I think that those
using XML for this purpose have over the years gradually made the
diplomatic view more and more prominent because XML encourages this
style of encoding, not because it answers any real user needs or even
reflects what we did in print.

You concede that the reading text you derive form the SGA primary
encoding is imperfect. I think that is natural because authors do not
reliably cross out text they replace. They miss punctuation, they
repeat context to indicate where an insertion should go, etc. Also
prioritising the spatial layout puts bits of text out of their
temporal order in the cases of inserted lines or words added to the
start of lines etc. So it IS kind of impossible as I said.

It is good that you are looking at ways of implementing standoff
techniques. But I think XML makes this very difficult. We have got rid
of the XML and simply edit the text as simplified HTML directly in the
browser. Then when it is saved the markup is converted into standoff
properties, with no hierarchies at all, giving us a clean text for
comparison and searching. I'll read your paper and won't comment on it
now.

Desmond Schmidt
eResearch
Queensland University of Technology

On 3/6/19, Humanist  wrote:
>                   Humanist Discussion Group, Vol. 32, No. 516.
>             Department of Digital Humanities, King's College London
>                    Hosted by King's Digital Lab
>                        www.dhhumanist.org
>                 Submit to: humanist@dhhumanist.org
>
>
>
>
>         Date: 2019-03-05 17:21:33+00:00
>         From: Raffaele Viglianti 
>         Subject: Re: [Humanist] 32.505: standoff markup & the illusion of
> 'plain text'
>
> Dear Desmond,
>
> I'm joining this conversation late and perhaps a bit unprepared not having
> followed it through completely; however, I would like to address some of
> the points in your latest message from the perspective of the
> Shelley-Godwin Archive (S-GA - the project discussed in the article by
> Muñoz and I that you cited) and some of my latest thinking on the subject.
>
> You ask why should we seek to make a rough approximation of a manuscript
> page through encoding. I think the answer is the same reason why printed
> facsimile editions with transcriptions and images side-by-side are created
> (such as the Garland Shelley facsimile editions). At the very least, they
> provide a map to the facsimile page and a readable transcription by
> scholars who know these documents deeply. Encoding and the digital medium
> can be used to produce digital versions of these publications with new
> affordances, so that's why this is done. The goal is not to create a rough
> approximation of the manuscript page, but to formalize scholarship around
> the page and the manuscript and provide a useful tool to others. I think
> S-GA, in its current form, mostly matches this ideal.
>
> In our (Muñoz and I) paper we discuss how it is not easy to use this
> approach to encoding that I just described to display a reading text, but
> this is quite different from saying that it cannot be used as such at all.
> In fact we provide an automatically generated reading text on the site
> that, while imperfect, it is still useful. To return to my comparison with
> printed facsimile editions: you wouldn't use those as a reading text
> either: the goals are different. Text encoding reflects your editorial
> goals.
>
> This doesn't prevent anyone from wanting for more than one goal for a given
> textual resource and I agree that standoff is a viable solution for this.
> The TEI Guidelines document a number of standoff techniques, yet they are
> often used at a small scale because they can be challenging to apply and
> validate. But standoff markup on plain text must encounter the same issues,
> with the further impediment of not being able to rely on markup for simple
> references to identifiers. In short, further hierarchies can be layered on
> on top of (or adjacently to) an existing encoding through the use of
> pointers (I discuss some this in a forthcoming JTEI article).
>
> To give an example, Elisa Beshero-Bondar and I (et al) have been working
> around this concept in the creation of a variorum edition for Frankenstein
> that incorporates S-GA TEI data without modification through the use of
> pointers. We have a central 'spine' (or stand-off collation) that uses
> pointers to other TEI-encoded documents to identify variants. See
> Beshero-Bondar and Viglianti 2018 for an overview.
>
> The point I want to make is that I don't think that standoff requires a
> 'plain text' to be targeted by a number of annotations and hierarchies. In
> fact, there is no such thing as a plain text: even non-XML encoded text
> contains markup, often imposed by convention. Computer representation of
> textual information was developed around the idea of text-as-string and we
> must not mistake this representation for what text really is.
>
> Best,
> Raff Viglianti
>
> Beshero-Bondar, Elisa E., and Raffaele Viglianti. "Stand-off Bridges in the
> Frankenstein Variorum Project: Interchange and Interoperability within TEI
> Markup Ecosystems." Presented at Balisage: The Markup Conference 2018,
> Washington, DC, July 31 - August 3, 2018. In Proceedings of Balisage: The
> Markup Conference 2018. Balisage Series on Markup Technologies, vol. 21
> (2018). https://doi.org/10.4242/BalisageVol21.Beshero-Bondar01.
>
> --
> Raffaele Viglianti, PhD
> Research Programmer
> Maryland Institute for Technology in the Humanities
> University of Maryland
>



--
Dr Desmond Schmidt
Mobile: 0481915868 Work: +61-7-31384036




_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php
Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)
This site is maintained under a service level agreement by King's Digital Lab.