Home | About | Subscribe | Search | Member Area |
Humanist Discussion Group, Vol. 34, No. 121. Department of Digital Humanities, King's College London Hosted by King's Digital Lab www.dhhumanist.org Submit to: humanist@dhhumanist.org [1] From: C. M. Sperberg-McQueenSubject: on JSON and documents and generalizations and concrete examples (26) [2] From: Desmond Schmidt Subject: Re: [Humanist] 34.117: annotating notation (67) --[1]------------------------------------------------------------------------ Date: 2020-06-17 14:34:14+00:00 From: C. M. Sperberg-McQueen Subject: on JSON and documents and generalizations and concrete examples I thank Peter Robinson for his informative description in Humanist 34.117 of his proposal to model texts as collections of leaves shared among multiple trees. (Although it must be said that from a botanical viewpoint that is one spectacularly bewildering metaphor.) Like PR and others, I would gladly avoid an XML vs JSON flame war. But my request to see a JSON representation of a real document with multiple structures was seriously meant, and I am disappointed that PR has shown us no concrete examples of the JSON used by his Textual Communities system to read or write such documents. Concrete examples would allow his general statements about efficiency and transparency to be usefully understood and discussed; without examples, they are about as informative as Mongo DB's advertising copy. I hope that Humanist has not become too dignified and high-minded to be dirtied with practicalities and technical details. --- C. M. Sperberg-McQueen Black Mesa Technologies LLC cmsmcq@blackmesatech.com http://www.blackmesatech.com --[2]------------------------------------------------------------------------ Date: 2020-06-17 09:00:00+00:00 From: Desmond Schmidt Subject: Re: [Humanist] 34.117: annotating notation Peter, I think it might help people with this long posting if I ventured to provide an executive summary between the two lines below. Tell me if I'm wrong in any essential details. ---------- TC consists of 3 collections of things: 1. text fragments 2. document tree nodes 3. act-of-communication tree nodes The nodes in the trees correspond roughly to the elements of an XML tree. The things in the text collection are just fragments of unicode text. Each tree has internal nodes and leaf-nodes which point to the text fragments. Not all the text fragments need be in each tree. Further trees representing other hierarchies may also be added. With this arrangement you can easily can find out where in a quire/page/column or line a piece of text resides, and also in which part of the act-of-communication tree it belongs: say to a particular speech/scene/act in a play. The key is the collection of text fragments which in my mind form a kind of spine to which the two trees attach themselves. ---------- The argument about XML vs. JSON here seems quite irrelevant to me: the user will be entirely unaware that mongodb stores the 'things' as JSON (actually BSON) documents internally: one document per node. You could equally store the same structure in an XML database like MarcLogic, ExistDB or BaseX with no discernible difference in performance. One concern with this design is that there is no distinction in the text collection between fragments potentially belonging to different versions. Their identity is provided solely by the two trees. You argue that you could represent a heavily revised text in this system, but I wonder if you have actually tried. How would you get a text revised say 9 times into this structure, in any practical way, even if you could theoretically represent it? There is a good short example in my 'Tough cases" No.2 (http://charles-harpur.org/tough-cases/) Another concern is the load this puts on the mongodb database. You need one BSON document per node. If you stored a few thousand documents in this system you would end up with several million nodes. Not impossible to handle of course, but each document is connected to the other only by its ID. If your program failed at some point or contained an error you could easily end up with thousands of orphaned nodes floating around inside the database, which you would have to purge periodically, assuming you could do so safely. Another worry is that you are storing documents in a fragmented and unreadable state. How do you export the data for archiving in a coherent form? As an example of potential data loss when using a database of this kind we imported some images of Harpur's poems in rare newspapers and then lost or deleted the originals. Later when we changed our data design we had to get them out again and discovered that images with spaces in their file-names could not be extracted due to a bug in mongodb. So we had to remake them all. :-(. Desmond -- Dr Desmond Schmidt Mobile: 0480147690 _______________________________________________ Unsubscribe at: http://dhhumanist.org/Restricted List posts to: humanist@dhhumanist.org List info and archives at at: http://dhhumanist.org Listmember interface at: http://dhhumanist.org/Restricted/ Subscribe at: http://dhhumanist.org/membership_form.php
Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)
This site is maintained under a service level agreement by King's Digital Lab.