15.193 e-publishing: XML, TEI

From: by way of Willard McCarty (willard@lists.village.Virginia.EDU)
Date: Wed Aug 22 2001 - 04:34:44 EDT

  • Next message: by way of Willard McCarty: "15.194 bookworms &al."

                   Humanist Discussion Group, Vol. 15, No. 193.
           Centre for Computing in the Humanities, King's College London
                   <http://www.princeton.edu/~mccarty/humanist/>
                  <http://www.kcl.ac.uk/humanities/cch/humanist/>

             Date: Wed, 22 Aug 2001 09:26:49 +0100
             From: Wendell Piez <wapiez@mulberrytech.com>
             Subject: Re: 15.183 Leyton's book? publishers and XML?

    Hi Martin,

    You ask,

    At 03:29 AM 8/17/01, you wrote:
    >I'm trying to get an idea of the extent to which properly-digitized
    >documents -- XML documents, really, using DTDs based on TEI standards --
    >are acceptable to academic publishers.

    I can't say anything about academic publishers, but I can say that XML is
    being increasingly used (as SGML has actually long been used, at least in
    some places) by larger publishers for their own editorial and
    editorial-to-production processes. Generally they are not at the point
    where they expect, or even have made provision for, authoring directly in
    markup. (The exception to this is such things as reference books and other
    kinds of publication where authoring is very much subordinate to an
    editorial process. But these folks are *not* accepting arbitrary markup:
    they'll mandate the DTD themselves.)

    In the high-tech publishing market, for example such publishers as O'Reilly
    (I can name names where I don't actually have specific knowledge covered by
    an NDA :-), support for authors who want to use markup is definitely on the
    rise: but it is a slow process.

    Note that publishers are wary of exposing the technologies of their
    internal processes to outsiders, since this means competitors can get a
    look. Thus, for example, mandating a DTD (even a public DTD such as DocBook
    or TEI) might be seen as "exposing" a little more of their business
    processes than they like. Oddly, a semantically opaque format such as Word
    is actually a feature to them from this point of view. They expect to
    change the encoding in any case. (While this may be less true in the
    academic publishing sector, there is also less money there for the
    necessary engineering -- both technical and social -- to support markup
    from authoring through editorial stages.)

    If a publisher does use XML internally, chances are it's not TEI, which is
    not sufficiently constraining to be worth a whole lot to them. If their
    markup is anything like TEI, it'll be a highly constrained subset, probably
    not validating to P3 but to their own derived version. There are good
    reasons for this. If I was an editor or production manager for a press, I
    would be skeptical of any author who wanted me to process TEI -- since I
    know how much engineering that requires. I would say "hey, markup, great!",
    but then would want to see the most constrained TEI subset to which they
    conform. Given that "TEI" might almost as well mean "kitchen sink" in this
    context, doing the necessary analysis to understand their TEI (out of the
    universe of possible TEI), then write post-DTD validators, stylesheets etc.
    to process it into something useful to me, would almost certainly be more
    expensive than stripping their tagging and starting fresh. (Especially
    given who I'd have to pay to do these respective jobs. If the volume were
    high enough, it could be worth it, since economies of scale could kick in.
    But for one book?)

    I'd feel better (I'd be celebrating!) if the author said "it's TEI, but
    tell me what markup to target and I'll write the stylesheets myself" --
    which some authors are now able to do. But then they're not giving me TEI,
    are they?

    Markup pays for itself very quickly as it scales up. But TEI, in itself, is
    not sufficiently constrained to scale very well. (DocBook is somewhat
    better, and as I said I can see some niche publishers like O'Reilly working
    towards DocBook support.) TEI is excellent for supporting a wide range of
    scholarly research purposes. But there is a direct tradeoff between the
    breadth of this range, and the requirements of a production line.

    > Are there many publishers yet who
    >would accept (for example) the text of a book for publication in XML
    >format? How many are still insisting on camera-ready copy, MSWord
    >documents, PDFs etc? How many academic publishers are doing e-publishing,
    >and what document formats are they using?

    My guess is that you'll find things all over the map. Academic publishers
    continue to experiment with e-publishing, but it will almost always be in
    "bespoke" formats (i.e. custom-engineered markup systems) including
    varieties of XML (including Open E-book) and even HTML.

    >All insights and relevant experiences much appreciated -- please name names
    >if you can. I'll be happy to summarize responses to the list.

    Can't name names 'cause of those NDAs ... but as to academic publishers
    specifically, I don't speak from firsthand knowledge (haven't worked with
    any), but rather from my assessment of the current state of the
    technologies in the context of editorial and production work.

    I hope the perspective sheds some light, in any case.

    Regards,
    Wendell

    ======================================================================
    Wendell Piez mailto:wapiez@mulberrytech.com
    Mulberry Technologies, Inc. http://www.mulberrytech.com
    17 West Jefferson Street Direct Phone: 301/315-9635
    Suite 207 Phone: 301/315-9631
    Rockville, MD 20850 Fax: 301/315-8285
    ----------------------------------------------------------------------
        Mulberry Technologies: A Consultancy Specializing in SGML and XML
    ======================================================================



    This archive was generated by hypermail 2b30 : Wed Aug 22 2001 - 05:11:50 EDT