Home | About | Subscribe | Search | Member Area |
Humanist Discussion Group, Vol. 32, No. 520. Department of Digital Humanities, King's College London Hosted by King's Digital Lab www.dhhumanist.org Submit to: humanist@dhhumanist.org [1] From: Dr. Herbert WenderSubject: Re: [Humanist] 32.516: standoff markup & the illusion of 'plain text' (71) [2] From: Jan Christoph Meister Subject: Re: [Humanist] 32.505: standoff markup & the illusion of 'plain text' (137) --[1]------------------------------------------------------------------------ Date: 2019-03-06 22:01:52+00:00 From: Dr. Herbert Wender Subject: Re: [Humanist] 32.516: standoff markup & the illusion of 'plain text' [NB In the following, angle-brackets have been replaced by brace-brackets to circumvent a current problem in Humanist's software. Apologies to all. --WM] Raffaele, it's one of my hobbies to look at the source code of digital distributed editions under the perspective what information may be derived without those extractions are foreseen by the distributors. In past I've worked with a WordCruncher incorporated edition of Robert Musil's mss., with the SGML conformant encoding of the Weimarer Ausgabe of Goethe's works (Chadwyck-Healey, with the famous question at the end of each session: "Wollen Sie wirklich Goethes Werke beenden?") and with the facsimile edition of Kafka's "Process" distributed as PDF files by Stroemfeldt, facs & transcript side-by-side as in one of the views in similar XML/TEI mss editions. No wonder that your posting motivated me to look at a page of Notebook A in the "Frankenstein" archive. To be precise: It was sufficient to look at folio "1r" to think that often it were better to say nothing about modifications (as f.e. the Kafka editors in their transcriptions) and to show them instead to say only a half of the truth. Perhaps I don't understand what's going on there, you are the expert. But I think it could be of interest what a non-expert takes out of the comparison between text and encoding. (To remember: I feel here only as guest under professionals because I'm eye-handicapped what causes that I can't read handwritten texts without many mistakes; therefor I can't reference to the facs page.) When MOD means 'modification' the first one appears after "often" (see the snippet beneath the following) and embraces 2 deletions and one addition in sequence. Perhaps somewhere else in your representation system there are linearized representations of the 2 states: the status quo ante "Those events ... are often caused by slight or trivial occurences." and the final state "Those events ... often derive their origin from a trivial occurrence." I would think that a somehow 'logical' description would hypothesize the modification as substitution, from "are ... caused by" to "derive their origin from", and when there is another hand in the ADD element I would suppose that the same hand has enacted the deletions, isn't it? Why stands the deletion of "are" outside the MOD if not as consequence of the monohierarchical model? Why the revising hand is not attributed to the MOD element? How do you come from this physical oriented encoding (which splits a coherent deletion because there are two pen strokes) to the logical encoding to give f.e. a Zeller like matirx representation of textual development? All the best, Herbert [snip] {line}Those events which materially influence our fu{/line} {line}ture destinies {del rend="strikethrough"}are{/del} often {mod} {del rend="strikethrough"}caused{/del} {del rend="strikethrough"}by slight or{/del} {add hand="#pbs" place="superlinear"}derive thier origin from a{/add} {/mod} tri{/line} {line}vial occurence{del rend="strikethrough"}s{/del}. ... {/line} [/snip] --[2]------------------------------------------------------------------------ Date: 2019-03-06 10:43:23+00:00 From: Jan Christoph Meister Subject: Re: [Humanist] 32.505: standoff markup & the illusion of 'plain text' I must say I feel equally undead as Wendell - in CATMA we've been combining "raw text" and external stand-off markup in a (graph) data base (originally relational, now Neo4J) since 2012... It's a web application and the user need not read code, or XML, or be a DB expert to annotate and analyze their texts and corpora, either individually (1 text/corpus 1 user) or collaboratively (n texts/n users). And there's more to this approach than merely being able to handle nested / overlapping / discontinous structures (which imho is really a problem of the past, as is the Renear-McGann debate which, as far as I'm concerned, Buzzetti 2002 "Digital Representation and the Text Model", New Literary History, Vol. 33, No. 1 had already pretty much superseded). The conceptual gain of abstracting from the text = file model in this way, I believe, is that * markup can be n-dimensional: any 'source' string or fragment thereof can be attributed n properties (p) by n annotators (a) during n mark-up sessions (s) * there is no inherent restriction in terms of choice of property _values_: p2 assigned by a77 in s45 can contradict, duplicate, expand etc. p1 assigned by u14 in s46. In other words, you can of course enforce inter annotator agreement by stipulating procedures and conventions, but our markup concept per se is not only oblivious to this, yet expressly based on the principle that every annotation is a unique instance in its own semantic and functional right * source text and markup are no longer conceptualized as two distinct types of entities, i.e. "text" and "meta-text = annotation", but are modeled as a discursive continuum in which the roles of 'source' and 'annotation' are functionally defined and can change at any time. This continuum can then * be queried at any complexity, level and in any combination (e.g., "Show me all instances of discontonous strings across the works of Dante where more than 3 out of 5 annotators assigned conflicting property values within the same property category AND where the string was not automatically pos-tagged as verb or auxiliary AND where the median Z-score for the preceeding 2 nouns was greater than 0.00034 AND where at least one annotator expressed a positive sentiment in a free-text commentary = meta-annotation" [don't ask me for the use case though...]) And that's still not all. The true beauty is versioning: in CATMA 6.0 we use the Web Annotation Data Model's JSON-LD format in a Git/Gitlab environment. This means that every user generates annotations in their own git repository (and versions thereof) and Gitlab then manages the data exchange (fetch, merge, push) between users. Query operations are executed against an in memory graph representation of the data. All the above is just my brief summary of our developer Marco's much more detailed comment on the technical aspects. These apart, the crux of the matter seems to be encapsulated in Desmond's observation Why should we seek to make a rough approximation of a manuscript page that can be precisely photographed (not without loss of information of course) but still vastly inferior to the page-facsimile image that already captures the spatial relationships between fragments of text? Agreed. There's a fundamental conceptual barrier between an analogue/spatial model of text as a two dimensional physical continuum that extends across pages, but is at the same time an n-dimensional historic and semantic phenomenon, and the discrete/digital representation of text as a computable character string. The two offer different epistemological advantages, so it's really a philosophical choice, not a technological. Chris ------------------------ Dr. Jan Christoph Meister Universitätsprofessor für Digital Humanities - Schwerpunkt Deutsche Literatur und Textanalyse - Universität Hamburg, Institut für Germanistik Überseering 35 / Raum 08064 22 297 Hamburg +49 40 42838 2972 +49 172 40865 41 http://jcmeister.de http://catma.de Am 04.03.2019 um 09:58 schrieb Humanist: > Humanist Discussion Group, Vol. 32, No. 505. > Department of Digital Humanities, King's College London > Hosted by King's Digital Lab > www.dhhumanist.org > Submit to:humanist@dhhumanist.org > > > Date: 2019-03-03 18:12:22+00:00 > From: Wendell Piez > Subject: Re: [Humanist] 32.499: standoff markup & the illusion of 'plain text' > > Dear Willard, > > Goodness, now we are recommending text bases instantiated as a graph > model: both of these sound a lot like Luminescent's internal model, or > for that matter the experimental system CMSMcQ mentioned way back in > the dawn of this thread: > > Haentjens Dekker, Ronald, and David J. Birnbaum. “It's more than just > overlap: Text As Graph.” Presented at Balisage: The Markup Conference > 2017, Washington, DC, August 1 - 4, 2017. In Proceedings of Balisage: > The Markup Conference 2017. Balisage Series on Markup Technologies, > vol. 19 (2017).https://doi.org/10.4242/BalisageVol19.Dekker01. > > Or for old timers:https://github.com/wendellpiez/Luminescent (I'm > not dead yet, I think I'll go for a walk!) > > Yet I fail to see why any of this once and future promising work > invalidates XML in any way, whether XML is viewed as some sort of > arguable abstraction, or a practical technology that none of us (even > experts) can see in its entirety? > > Regards, Wendell > > > > > -- > Wendell Piez | wendellpiez.com | wendell -at- nist -dot- gov > pellucidliterature.org | github.com/wendellpiez | > gitlab.coko.foundation/wendell - pausepress.org ------------------------ Dr. Jan Christoph Meister Universitätsprofessor für Digital Humanities - Schwerpunkt Deutsche Literatur und Textanalyse - Universität Hamburg, Institut für Germanistik Überseering 35 / Raum 08064 22 297 Hamburg +49 40 42838 2972 +49 172 40865 41 http://jcmeister.de _______________________________________________ Unsubscribe at: http://dhhumanist.org/Restricted List posts to: humanist@dhhumanist.org List info and archives at at: http://dhhumanist.org Listmember interface at: http://dhhumanist.org/Restricted/ Subscribe at: http://dhhumanist.org/membership_form.php
Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)
This site is maintained under a service level agreement by King's Digital Lab.