Home | About | Subscribe | Search | Member Area |
Humanist Discussion Group, Vol. 32, No. 463. Department of Digital Humanities, King's College London Hosted by King's Digital Lab www.dhhumanist.org Submit to: humanist@dhhumanist.org [1] From: C. M. Sperberg-McQueenSubject: XML and web services (36) [2] From: philomousos@gmail.com Subject: Re: [Humanist] 32.452: the McGann-Renear debate (120) --[1]------------------------------------------------------------------------ Date: 2019-02-16 06:08:49+00:00 From: C. M. Sperberg-McQueen Subject: XML and web services Desmond Schmidt says that when he said XML was invented by IBM and Microsoft, through the organ of the W3C, to serve the needs of web services. Document processing was very much a sideline. what he meant was that "the overwhelming use of XML in its heyday was for Web Services, not as a document format". I wish him better luck, in future, in the difficult task of constructing sentences that say what he means instead of saying very different things. In the meantime, the claims he made about the creation of XML (as opposed to those he may or may not have meant to make) remain false. - XML was not invented by IBM and Microsoft. - XML was not invented to serve the needs of web services (even if some of those who created it thought it would be useful for web services and said so in public, in what proved a fairly successful attempt to interest other people in supporting XML). - Document processing was not a sideline in the creation of the XML specification but the main focus of those who made the spec. DS has every right to believe, and to argue in public, that the suitability of a technology for document representation for digital humanities work depends on its popularity with web services implementors; I don't see the connection myself, but then I've never been much nterested in fashion. ******************************************** C. M. Sperberg-McQueen Black Mesa Technologies LLC cmsmcq@blackmesatech.com http://www.blackmesatech.com --[2]------------------------------------------------------------------------ Date: 2019-02-15 14:36:46+00:00 From: philomousos@gmail.com Subject: Re: [Humanist] 32.452: the McGann-Renear debate Re Desmond's post in 32.452: > It is quite possible for a TEI encoding of holograph manuscripts to be > so complex that it is practically, although not literally, impossible > to edit. That is, it is just as likely to be damaged as improved by > any attempt to edit it. If it is shared by a group of editors this > level of complexity is reached much sooner. The problem then becomes: > how do you communicate your understanding of the "howling wind-storm" > of tags that results to your colleagues so they may share your > interpretation of the textual phenomena being described? I'd say two things about this, the first, that this class of manuscripts became easier to encode with TEI after the addition of the and associated elements to the Guidelines. I don't know the relative chronologies, and it's a bit hard to investigate right now, with the TEI Vault being unavailable due to a server outage at ADHO. Git tells me that the genetic encoding elements were added in late 2011, but I'm not sure precisely when they were first released. Second, that the workflow you describe sounds complex indeed, and it might not be practical to do it with TEI. That doesn't invalidate TEI, it just means it might not work well for your circumstances. That also doesn't rule out that it might be possible to adjust it to work for your circumstances. > Here is a moderately difficult example. A succession of hired > transcribers simply refused to encode this for us. I wonder how > hierarchies help us here? > > http://charles-harpur.org/corpix/english/harpur/A87-2/00000131a.jpg > > Undoubtedly that's a tough one. I feel quite certain it can be done in TEI, but of course that's dependent on time, funding, and access to local expertise. I can't and won't fault you for making different decisions than I might have. > Breaking it down into separate layers as we have done is close to the > method Michael describes, and renders the editorial task perfectly > manageable. > > > http://charles- harpur.org/View/Twinview/?docid=english/harpur/poems/h509&version1=/h509b/layer- > final I have a couple of concerns about this. Firstly, I'd strongly recommend using a visual indicator of change that goes beyond color, as readers with impaired color perception will have trouble with it. Secondly, I worry that this method might give a false impression of what's going on. If we look at line 2, comparing "layer 1" and "layer 2", we see something like: layer 1: Keeps munching st illth ecorn of the tall crop layer 2:W illcease not to devour the tall-eared corn, which is what running a `diff` operation on the two lines might give you. But this is not at all what has happened in the text, as the image shows us. First, the whole line was canceled, and then the line of layer 2 was written above it. I also wonder about the "layer" concept. Do we know that each layer represents a single editorial stage? That "Even as the mighty son of Telamon" was canceled at the same time as the original line 2? Of course, I'm wholly ignorant here and this might be a perfectly representative model of the poet's editing process. It's exceedingly hard (dare I say impossible) to generalize across temporal, genre, and disciplinary differences. ----- Michael, (Apology for my accidental Petrification accepted. I thought it was pretty funny :-) (4) So far I agree with Peter and simply wish to refine my argument. But > there is one point where we disagree. I don't think it is true that > representations are equivalent if they can be transformed into each other > without loss of information. Perhaps I misunderstand Peter's point, but > this seems to overlook information entropy. It must be obvious that some > representations are more efficient than others, and encode the same > information in fewer bits. Otherwise it would be "paragraph" not "p", > and "line-group", not "lg." But there is also the more important point in > practice, that some representations are more laborious to make than others. > In many common cases of textual editing, XML is both more laborious than > the alternatives, and it would not surprise me if it also required many > more bits for the same information (though I could well be wrong there). In > other cases, like preparing a well-structured reading text to be rendered > on a variety of devices in different ways, it is surely the ideal > technology. There is also the simple matter of elegance, which matters > because it goes to the interpretability of a representation by a human. > Information entropy is the right way to think about this, but I'd have thought that the entropy here is zero by definition. If we assume a finite set of document representations A and transformations t1 and t2, then if for each instance of A, t1(A[n]) -> B[n] and t2(B[n]) -> A[n], A[n] and B[n] are equivalent (if we exclude the case where the transformations contain most or all of the information of the resultâno cheating). That's not to say that form A may not be easier to work with than form B, or potentially more expressive (so that I might be able to make an A[n] that couldn't be roundtripped). But this gives us something we can write tests for, which is good enough for me. Efficiency is a different question, as is labor. Your mileage will vary depending on the human and computing resources you have available, and, as I think we've demonstrated here, arguing about formats from our own perspectives is as futile as the interminable arguments people have about programming languages. I can say that Python* sucks, but that doesn't mean _you_ shouldn't use it. All the best, Hugh * (I'm actually very fond of Python, lest I start another fight. Don't @ me.) _______________________________________________ Unsubscribe at: http://dhhumanist.org/Restricted List posts to: humanist@dhhumanist.org List info and archives at at: http://dhhumanist.org Listmember interface at: http://dhhumanist.org/Restricted/ Subscribe at: http://dhhumanist.org/membership_form.php
Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)
This site is maintained under a service level agreement by King's Digital Lab.