Humanist Discussion Group

Humanist Archives: June 16, 2020, 9:07 a.m. Humanist 34.113 - annotating notation

                  Humanist Discussion Group, Vol. 34, No. 113.
            Department of Digital Humanities, King's College London
                   Hosted by King's Digital Lab
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org


    [1]    From: philomousos@gmail.com
           Subject: Re: [Humanist] 34.110: annotating notation (46)

    [2]    From: John Keating 
           Subject: Re: [EXTERNAL] [Humanist] 34.110: annotating notation (73)


--[1]------------------------------------------------------------------------
        Date: 2020-06-15 15:51:09+00:00
        From: philomousos@gmail.com
        Subject: Re: [Humanist] 34.110: annotating notation

I can think of few things more tiresome than reprising the more-or-less
annual arguments over the goodness or evil of particular file formats. But
maybe there's productive discussion to be had around the ideas of
simplicity and correctness. Desmond Schmidt and (iirc) Peter Robinson both
expressed a desire for simpler formats, by which I think they mean in part
"schema-less" formats. A schema gives you things like (broad) consistency
checking, some datatype checking, and it can be used as an aid to the user
encoding a document by giving them suggestions. I would argue that schemas
allow you to sustain a higher level of complexity than you would be able to
otherwise. But are these benefits worth the additional upfront cost?

We might think of some examples, drawn from different domains. HTML and
Markdown are both relatively simple, and to the extent that they have
consistency checking, it's at the processing level. Can a browser render
your HTML in a readable way? Can your Markdown be turned into HTML? If so,
then all is well. Is that enough though? RDF can represent basically any
sort of information (though I'd note it's atrocious for documents). Leaving
aside for the moment recent attempts to develop constraint languages for
RDF, the main check for consistency comes again when you try to process it.

For a language like TEI, on the other hand, you're supposed to use a
schema, and your documents are subject to a sort of strict type checking.
If they fail validation, they're not proper TEI. And that checking is
independent of the question of whether you can process them or not.
Arguably, validation makes it more likely a document will be processable,
but it's not a guarantee. I have often wondered whether the TEI community
takes this too far. Michael Sperberg-McQueen will remember arguments around
the propriety (or "cleanliness") of modifications to the TEI. I now think
we don't have quite the correct take on that question, but I'm still
thinking about it. Certainly, members of the TEI community have often shown
great reluctance to customize (that is, to modify the schema themselves) in
ways that are more expansive, rather than more restrictive than the
"baseline" schema. Relational databases also have schemas, and are *very*
strict about adherence to them. The "NoSQL" movement was, in some respects,
a rebellion against this strictness.

Is the desire for simplicity an invitation to a short trip up a cul-de-sac?
Is the desire for rules and constraints too much to ask, especially of a
newcomer? Is there some happy medium? Maybe we could have a more productive
discussion along those lines...My own, slightly cynical, take after many
years of experience is that everything is a tradeoff. You pay now, or you
pay later.

All the best,
Hugh


--[2]------------------------------------------------------------------------
        Date: 2020-06-15 09:20:24+00:00
        From: John Keating 
        Subject: Re: [EXTERNAL] [Humanist] 34.110: annotating notation

I’m delighted to see budding discussions here on textual representation using
JSON. Representation of anything using markup is always open to debate and it is
no different when using  JSON. But it is worth having those debates. I have been
using JSON  for DH “database/archival” projects (using NoSQL database management
systems) for some time now and have limited experience that I will share here.
For example, recently we modeled Early Modern prosopographical data using both
XML and JSON; the latter for use as documents in MongoDB and as graphs for
Neo4j. We have been examining speed and storage/retrieval capabilities for
JSON/XML encodings in scenarios where the record numbers are in the hundreds of
thousands.

I have not yet fully-decided if I prefer JSON over XML, but I think a few years
of practical work with JSON for DH projects will give some insight. I’m listing
some observations here that may be of interest to this community. I’ll focus on
JSON mostly; there are so many XML experts in this community I know that I do
not need to make explicit comparisons.

- JSON works well with REST APIs and this is crucial for providing CRUD access
to your encoding storage. Converting between JSON objects, and programming
objects/data structures (eg. in JavaScript, Java, Python, etc.) is
straightforward and fast. Deserializing/serializing JavaScript is fully
automated.

- JSON models, when implemented as BSON for MongoDB, say, supports both
normalised and de-normalised forms, and is very flexible. Graph representation
(modeling) using JSON is straightforward when using Neo4j and similar tools;
here all entities are normalised.

- JSON encoding can be strictly managed using JSON Schema which is a vocabulary
that allows you to annotate and validate JSON documents (https://json-
schema.org). There was an updated specification in 2019-09. Of course you can
use schema-free JSON  if you wish.

- JSON schema validation rules are easily implemented with using JSON for
archival purposes, for example, using the mongoose package with NodeJS and
MongoDB.

- JSON tools for searching, and more importantly updating data stores, are
extremely flexible and powerful, eg. Mongoose and MongoDB. Cypher searching is
powerful model utilised by Neo4j.

- JSON conversation back and forth between XML is straightforward but you need
to be aware that JSON structures don’t have a feature similar to XML attributes.
You have to make a decision on how you are going to handle the translation if
that is what you wish to do (migration). And the process in not easily
reversible without knowing the rules. You can play with conversions of your
documents here (http://www.utilities-online.info/xmltojson/#.Xuc3-i2ZN24) and
here (https://codebeautify.org/xmltojson).

- There are issues with Namespaces in JSON; you need to manage this in some way.
Converting between XML (with namespaces) to JSON has been an issue for years,
There are some great articles on this online here
(http://niem.github.io/reference/concepts/namespace/json/) and here
(https://stackoverflow.com/questions/34071614/xml-to-json-with-namespaces).

- JSON supports only UTF-8 encoding, has no display capabilities, and has an
object type (and these three factors are usually key for XML encoders who are
interested in XML which does have display capabilities, is typeless, and
supports various encoding formats). Okay, I made an explicit comparison here!

I do like working with JSON, and I suspect that this is because I work with data
and the web-enabled apps that process those data. I hope to have more insight
after a few more projects.

--
Dr. John G. Keating,
Department of Computer Science,
Maynooth University, Maynooth, Co. Kildare, Ireland.
(E) john.keating@mu.ie (T) +353 1 708 3854






_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php
Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)
This site is maintained under a service level agreement by King's Digital Lab.