14.0374 markup, encoding, content-modelling

From: by way of Willard McCarty (willard@lists.village.Virginia.EDU)
Date: 10/18/00

  • Next message: by way of Willard McCarty: "14.0376 new media and social life"

                   Humanist Discussion Group, Vol. 14, No. 374.
           Centre for Computing in the Humanities, King's College London
             Date: Wed, 18 Oct 2000 09:42:41 +0100
             From: Wendell Piez <wapiez@mulberrytech.com>
             Subject: Re: 14.0368 markup, encoding, content-modelling, primitives
    At 10:30 AM 10/17/00 +0100, Francois wrote:
     >Would a third term help?
     >         Content modeling
     >...     Encoding
     >...     Markup
     >Encoding would cover working out the relationships between the various
     >elements, attributes, entities of a markup scheme.
    Mm, I was with us up to this point, and in particular I agreed with the
    "intuitions" of Fotis and Thierry (since the lexicographers haven't tackled
    this to my knowledge, I agree intuition and usage is what we have to go
    on). But I think Francois steps a bit too far from currently-recognized
    semantics into a distinction that may be useful, but isn't at all common. I
    also think the containment relation is backward.
    Remember that a text can be "encoded" without being its marked up. In fact,
    any electronic ("machine readable") text must be, ipso facto, encoded.
    Standard text encodings include US-ASCII, EUC-JP, ISO 8859 in its variants,
    etc. etc., including, now, Unicode (ISO/IEC 10646). These all provide
    mappings from written characters into bit-sequences of known lengths,
    enabling a digital processor to handle them internally. More broadly,
    however, Morse code is be an encoding in this sense. At its loosest, I'd
    suppose a code to be a representation of one type of information in another
    form, either to facilitate or to obfuscate its transmission.
    Markup is an addition of code to code: a layering of an encoding practice
    following a different protocol, over and above an initial layer. All markup
    is encoding, but not all encoding is markup. The super-added protocol must
    include a way to make the distinction between which encoded sequences are
    "data" and which, "markup."
    Think of your favorite plain-text transcription of a literary work. Much of
    the difficulty that comes from processing even a good, clean, well-edited
    plain text, arises from the fact that so much information in it (say, the
    boundaries between chapters) is not explicit in its code. We might call the
    creative use of white space, all caps for CHAPTER headings etc., a kind of
    "passive" or "implicit" markup -- but since it's not explicit it's
    relatively difficult to program a machine to handle it.
    Wendell Piez                            mailto:wapiez@mulberrytech.com
    Mulberry Technologies, Inc.                http://www.mulberrytech.com
    17 West Jefferson Street                    Direct Phone: 301/315-9635
    Suite 207                                          Phone: 301/315-9631
    Rockville, MD  20850                                 Fax: 301/315-8285
        Mulberry Technologies: A Consultancy Specializing in SGML and XML

    This archive was generated by hypermail 2b30 : 10/18/00 EDT