Home | About | Subscribe | Search | Member Area |
Humanist Discussion Group, Vol. 34, No. 113. Department of Digital Humanities, King's College London Hosted by King's Digital Lab www.dhhumanist.org Submit to: humanist@dhhumanist.org [1] From: philomousos@gmail.com Subject: Re: [Humanist] 34.110: annotating notation (46) [2] From: John KeatingSubject: Re: [EXTERNAL] [Humanist] 34.110: annotating notation (73) --[1]------------------------------------------------------------------------ Date: 2020-06-15 15:51:09+00:00 From: philomousos@gmail.com Subject: Re: [Humanist] 34.110: annotating notation I can think of few things more tiresome than reprising the more-or-less annual arguments over the goodness or evil of particular file formats. But maybe there's productive discussion to be had around the ideas of simplicity and correctness. Desmond Schmidt and (iirc) Peter Robinson both expressed a desire for simpler formats, by which I think they mean in part "schema-less" formats. A schema gives you things like (broad) consistency checking, some datatype checking, and it can be used as an aid to the user encoding a document by giving them suggestions. I would argue that schemas allow you to sustain a higher level of complexity than you would be able to otherwise. But are these benefits worth the additional upfront cost? We might think of some examples, drawn from different domains. HTML and Markdown are both relatively simple, and to the extent that they have consistency checking, it's at the processing level. Can a browser render your HTML in a readable way? Can your Markdown be turned into HTML? If so, then all is well. Is that enough though? RDF can represent basically any sort of information (though I'd note it's atrocious for documents). Leaving aside for the moment recent attempts to develop constraint languages for RDF, the main check for consistency comes again when you try to process it. For a language like TEI, on the other hand, you're supposed to use a schema, and your documents are subject to a sort of strict type checking. If they fail validation, they're not proper TEI. And that checking is independent of the question of whether you can process them or not. Arguably, validation makes it more likely a document will be processable, but it's not a guarantee. I have often wondered whether the TEI community takes this too far. Michael Sperberg-McQueen will remember arguments around the propriety (or "cleanliness") of modifications to the TEI. I now think we don't have quite the correct take on that question, but I'm still thinking about it. Certainly, members of the TEI community have often shown great reluctance to customize (that is, to modify the schema themselves) in ways that are more expansive, rather than more restrictive than the "baseline" schema. Relational databases also have schemas, and are *very* strict about adherence to them. The "NoSQL" movement was, in some respects, a rebellion against this strictness. Is the desire for simplicity an invitation to a short trip up a cul-de-sac? Is the desire for rules and constraints too much to ask, especially of a newcomer? Is there some happy medium? Maybe we could have a more productive discussion along those lines...My own, slightly cynical, take after many years of experience is that everything is a tradeoff. You pay now, or you pay later. All the best, Hugh --[2]------------------------------------------------------------------------ Date: 2020-06-15 09:20:24+00:00 From: John Keating Subject: Re: [EXTERNAL] [Humanist] 34.110: annotating notation I’m delighted to see budding discussions here on textual representation using JSON. Representation of anything using markup is always open to debate and it is no different when using JSON. But it is worth having those debates. I have been using JSON for DH “database/archival” projects (using NoSQL database management systems) for some time now and have limited experience that I will share here. For example, recently we modeled Early Modern prosopographical data using both XML and JSON; the latter for use as documents in MongoDB and as graphs for Neo4j. We have been examining speed and storage/retrieval capabilities for JSON/XML encodings in scenarios where the record numbers are in the hundreds of thousands. I have not yet fully-decided if I prefer JSON over XML, but I think a few years of practical work with JSON for DH projects will give some insight. I’m listing some observations here that may be of interest to this community. I’ll focus on JSON mostly; there are so many XML experts in this community I know that I do not need to make explicit comparisons. - JSON works well with REST APIs and this is crucial for providing CRUD access to your encoding storage. Converting between JSON objects, and programming objects/data structures (eg. in JavaScript, Java, Python, etc.) is straightforward and fast. Deserializing/serializing JavaScript is fully automated. - JSON models, when implemented as BSON for MongoDB, say, supports both normalised and de-normalised forms, and is very flexible. Graph representation (modeling) using JSON is straightforward when using Neo4j and similar tools; here all entities are normalised. - JSON encoding can be strictly managed using JSON Schema which is a vocabulary that allows you to annotate and validate JSON documents (https://json- schema.org). There was an updated specification in 2019-09. Of course you can use schema-free JSON if you wish. - JSON schema validation rules are easily implemented with using JSON for archival purposes, for example, using the mongoose package with NodeJS and MongoDB. - JSON tools for searching, and more importantly updating data stores, are extremely flexible and powerful, eg. Mongoose and MongoDB. Cypher searching is powerful model utilised by Neo4j. - JSON conversation back and forth between XML is straightforward but you need to be aware that JSON structures don’t have a feature similar to XML attributes. You have to make a decision on how you are going to handle the translation if that is what you wish to do (migration). And the process in not easily reversible without knowing the rules. You can play with conversions of your documents here (http://www.utilities-online.info/xmltojson/#.Xuc3-i2ZN24) and here (https://codebeautify.org/xmltojson). - There are issues with Namespaces in JSON; you need to manage this in some way. Converting between XML (with namespaces) to JSON has been an issue for years, There are some great articles on this online here (http://niem.github.io/reference/concepts/namespace/json/) and here (https://stackoverflow.com/questions/34071614/xml-to-json-with-namespaces). - JSON supports only UTF-8 encoding, has no display capabilities, and has an object type (and these three factors are usually key for XML encoders who are interested in XML which does have display capabilities, is typeless, and supports various encoding formats). Okay, I made an explicit comparison here! I do like working with JSON, and I suspect that this is because I work with data and the web-enabled apps that process those data. I hope to have more insight after a few more projects. -- Dr. John G. Keating, Department of Computer Science, Maynooth University, Maynooth, Co. Kildare, Ireland. (E) john.keating@mu.ie (T) +353 1 708 3854 _______________________________________________ Unsubscribe at: http://dhhumanist.org/Restricted List posts to: humanist@dhhumanist.org List info and archives at at: http://dhhumanist.org Listmember interface at: http://dhhumanist.org/Restricted/ Subscribe at: http://dhhumanist.org/membership_form.php
Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)
This site is maintained under a service level agreement by King's Digital Lab.