16.179 new approach to overlapping hierarchies

From: Humanist Discussion Group (by way of Willard McCarty (w.mccarty@btinternet.com)
Date: Sat Aug 24 2002 - 11:24:53 EDT

                   Humanist Discussion Group, Vol. 16, No. 179.
           Centre for Computing in the Humanities, King's College London

             Date: Sat, 24 Aug 2002 08:17:41 -0700
             From: Patrick Durusau <pdurusau@emory.edu>
             Subject: Just-In-Time-Trees (JITTs)


    I thought Humanist readers might be interested in the latest line of attack
    Matthew O'Donnell and I have taken on the problem of overlapping
    hierarchies in texts. The presentation that was made at the Extreme Markup
    conference (Montreal, 2002) is now available on the SBL website,
    http://www.sbl-site2.org/Overlap/ (follow the link to Just-In-Time-Trees

    We propose that the declaration of the document root and the markup to be
    recognized should be moved from the syntax layer and made a part of the
    processing of a text. That change in the model for handling markup removes
    the various problems with overlapping markup that have been the subject of
    numerous proposals but few widespread implementations since the rise of
    SGML. Our latest proposal differs from all prior ones in that it allows the
    use of standard XML software for the processing of texts, while allowing
    extensive experimentation with markup languages for the encoding of texts.

    Our argument for markup recognition is grounded in the text of ISO 8879
    (concur) and extends that concept to XML by the use of filters to declare
    the document root and markup to be recognized.

    The only resource available at this particular moment is the presentation
    from the Extreme Markup conference but a more formal paper should appear at
    that location by late September along with sample code for experimenting
    with the technique.

    The oddest question that has been voiced in response to our proposal is how
    serious a problem is overlap for humanities texts? I consider it odd since
    any number of humanities projects, including the TEI Guidelines, make
    repeated references to the need to record overlapping hierarchies in texts.
    There are also the questions raised by authors such as Jerome
    McGann, http://jefferson.village.virginia.edu/~jjm2f/jj2000aweb.html,
    about the use of markup for representation of texts. Still, the importance
    of the problem is one more of personal experience for me than a systematic
    analysis of texts of interest to humanists. As part of our research, I
    would like to develop (or learn about) more convincing arguments for
    overlapping hierarchies in texts.

    Suggestions of prior studies, measures of overlap and its importance and
    similar resources would be greatly appreciated. One possible candidate for
    constructing a measure of overlap are the minimum tree-to-tree editing
    distance algorithms but I am sure there are others.




    Patrick Durusau
    Director of Research and Development
    Society of Biblical Literature

