19.689 WordHoard: close reading & analysis of tagged texts

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty_at_kcl.ac.uk>
Date: Mon, 3 Apr 2006 07:24:51 +0100

               Humanist Discussion Group, Vol. 19, No. 689.
       Centre for Computing in the Humanities, King's College London
                   www.kcl.ac.uk/humanities/cch/humanist/
                        www.princeton.edu/humanist/
                     Submit to: humanist_at_princeton.edu

         Date: Mon, 03 Apr 2006 07:18:21 +0100
         From: Martin Mueller <martinmueller_at_northwestern.edu>
         Subject: Northwestern University announces WordHoard

Academic Technologies and the Library at Northwestern University are
happy to announce the release of WordHoard at
<http://wordhoard.northwestern.edu>http://wordhoard.northwestern.edu.

Named after an Old English phrase for the verbal treasure unlocked by
a wise speaker, WordHoard is an application for the close reading and
scholarly analysis of deeply tagged literary texts. It applies to
highly canonical literary texts the insights and techniques of corpus
linguistics, that is to say, the empirical and computer-assisted
study of large bodies of written texts or transcribed speech. In the
WordHoard environment, such texts are tagged by morphological,
lexical, prosodic, and narratological criteria. They are mediated
through a digital page or user interface that lets scholarly but
non-technical users explore the greatly increased query potential of
textual data kept in such a form.

The development of WordHoard has been supported by a generous grant
from the Andrew W. Mellon Foundation. The current release includes
the remains of Early Greek epic in Greek and translation, all of
Chaucer and Shakespeare, and Spenser's Faerie Queene. The texts have
been tagged by morphosyntactic, lexical, prosodic, and narratological
criteria. The English texts have been tagged according to a common
scheme that enables users to compare Chaucer with Spenser or
Shakespeare from a variety of perspectives.

WordHoard may be seen as a textbase with an unusually flexible set of
concordance features. Much attention has been paid to a user
interface that allows for the side-by-display of arbitrarily chosen
passages in the same field of vision. Concordance searches may
quickly be grouped and regrouped by various criteria, including
speaker gender or prosodic status in the case of Shakespeare.

Every word occurrence in the texts is a link that can be activated to
display in a GetInfo window all the information the text may be said
to know about all forms of the word in that location. This is very
useful for texts that have much orthographic or morphological
variety, such as Spenser or Chaucer, not to speak of Homer or Hesiod:
for any given word in the text the reader is a second away from a
table that shows all the spellings of all the forms of that word
sorted by frequency, thus giving an immediate overview of actual usage.

WordHoard includes a statistical engine that supports a variety of
procedures common in Natural Language Processing. For example, users
can look for words that are disproportionately common or rare in
Shakespeare's comedies when compared with the tragedies or all of
Shakespeare. The current release includes precompiled work sets for
analysis. In later releases, users will be able to configure sets for
their own purposes.

WordHoard also includes an annotation module. In the current release,
this module supports the display of the Iliad scholia as true textual
marginalia. Later releases will support user annotation not only of
particular locations in a text but of words wherever they occur. A
prototype of WordHoard with user generated annotation is in operation
at Northwestern, but it will require additional security feature
before it can be released.

WordHoard is a Java Web Start/Swing application. It requires a
broadband connection and will not work over a modem. Many operations
in WordHoard involve extensive shuttling between the client and the
server. WordHoard will therefore generally be quite a bit faster in
on-campus environments, where information moves at the same speed in
both directions, than in off-campus environments where download
speeds are between five and times as fast as upload speeds. General
network traffic and the complexity of queries or size of result sets
also are important variables. We will be very interested in getting
feedback from users about how the application works in different
environments. WordHoard has a Send Error Report in its File Menu.
This was designed to point out errors in the tagging, but it can be
used just as effectively for general comments. You may also send
email to <mailto:martinmueller_at_northwestern.edu>martinmueller_at_northwestern.edu.
Received on Mon Apr 03 2006 - 02:48:30 EDT

This archive was generated by hypermail 2.2.0 : Mon Apr 03 2006 - 02:48:34 EDT