Home | About | Subscribe | Search | Member Area |
Humanist Discussion Group, Vol. 34, No. 110. Department of Digital Humanities, King's College London Hosted by King's Digital Lab www.dhhumanist.org Submit to: humanist@dhhumanist.org [1] From: philomousos@gmail.com Subject: Re: [Humanist] 34.108: notation, software and mathematics (30) [2] From: William PascoeSubject: Re: [Humanist] 34.106: how to represent act, scene, line, speech, and page structure? (18) [3] From: Desmond Schmidt Subject: Re: [Humanist] 34.106: how to represent act, scene, line, speech, and page structure? (42) --[1]------------------------------------------------------------------------ Date: 2020-06-14 12:37:37+00:00 From: philomousos@gmail.com Subject: Re: [Humanist] 34.108: notation, software and mathematics I fear it may be necessary to posit a corollary to Godwin’s Law for Humanist: as the length of a discussion thread on Humanist increases, the probability of arguments about OHCO/the evils of XML/the splendor of older or newer alternatives (unusable alas, at present, but one day...) approaches 1. The difficulties involved in disentangling the purity of theory from the messiness of implementation are real, and perhaps the latter is best avoided on a list such as this. Unfortunately for me, that is precisely where I prefer to spend my time, as it is where things actually get done. Indeed, if I need fast search of a (let’s say for the sake of argument) TEI XML corpus, I will build an index for it and use that. Does that make the structure of the source data somehow bad? I think there are some on the list who would answer yes, but I don’t believe in a single, perfect data structure, so I regard that as an implementation detail. Relational databases did indeed come to dominate the market, because they met a lot of needs quite well (and still do). More recently, perhaps readers noticed the whole “NoSQL” movement away from them. They seem to have survived, however, probably because they never stopped being useful. It remains to be seen whether XML and related technologies will make it through the JSON era. I haven’t seen a suitable replacement yet. All the best, Hugh P.S. XQuery may be regarded as a superset of XPath. It is a full programming language in its own right, and incidentally, does a beautiful job of processing JSON! --[2]------------------------------------------------------------------------ Date: 2020-06-14 09:50:38+00:00 From: William Pascoe Subject: Re: [Humanist] 34.106: how to represent act, scene, line, speech, and page structure? Before a JSON vs XML war erupts that may eclipse the heirarchical vs non- heirarchical debate, please bear in mind some things may be better for some things than others, and in different circumstances. For example, I think the suggestion that JSON wins against XML was probably in relation to marking up metadata, or transport of data. The task of describing and annotating plays is more about mark up of text. Metadata and markup are two different problems. Personally, I'd favor JSON for handling metadata on the web, and wouldn't even attempt using JSON to markup a play. I propose a better question than, "Does JSON beat XML?" might be, "What sort of things would JSON or XML be better for?" Kind regards, Bill Pascoe --[3]------------------------------------------------------------------------ Date: 2020-06-14 07:50:52+00:00 From: Desmond Schmidt Subject: Re: [Humanist] 34.106: how to represent act, scene, line, speech, and page structure? HI Michael, I don't think I said that. I'm sorry if I did imply it. No. JSON is an awful format for transcriptions of plays or any other literary documents. What I meant to say was that JSON is a simple format that is growing in popularity for web applications. I suspect Peter probably meant something similar. What intrigued me about what he said was that COCOA, the format used originally by OCP, which only uses milestone tags for everything, might actually be a better format than XML for transcriptions. An example would be something like [speaker Hamlet]. When reading a document linearly from start to finish the value of "speaker" would be valid until it was overridden by a tag of the same type, such as [speaker Ophelia]. Since the structure of the digital document would be completely flat, the overlap problem would not exist, and queries like "what are all the speeches on page X" would become possible. I make this suggestion for two reasons. First, the eventual obsolescence of textual markup technologies that were, like SGML/XML, brought into digital humanities from the outside, might be overcome by devising our own simple format (like COCOA). If we could provide a translator into the current formatting technology, say COCOA to HTML, then our transcriptions of original documents like plays would be independent of changes in technology. Second, I don't see how any features of transcribed documents could not be represented in this way. Maybe you can. I'd be happy to explain how I would represent them in COCOA, and you could explain how you would represent them in XML, and then perhaps we could see which textual model delivers more bangs per buck for a given level of complexity. Desmond -- Dr Desmond Schmidt Mobile: 0480147690 _______________________________________________ Unsubscribe at: http://dhhumanist.org/Restricted List posts to: humanist@dhhumanist.org List info and archives at at: http://dhhumanist.org Listmember interface at: http://dhhumanist.org/Restricted/ Subscribe at: http://dhhumanist.org/membership_form.php
Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)
This site is maintained under a service level agreement by King's Digital Lab.