11.0049 tagging Dracula

Humanist Discussion Group (humanist@kcl.ac.uk)
Tue, 20 May 1997 21:20:39 +0100 (BST)

Humanist Discussion Group, Vol. 11, No. 49.
Centre for Computing in the Humanities, King's College London

Date: Mon, 19 May 1997 18:28:11 MST
From: Gloria McMillan <gmcmillan@east.pima.edu>
Subject: Paper on TEi-SGML _Dracula_ (need comments)...)

From: Gloria McMillan <gmcmillan@east.pima.edu>

> PLAYING DRACULA TAG: the Adventures of the
> Two-Housewife _Dracula_ TEI-tagging Team
> by Gloria McMillan
> adjunct Writing Instructor, Pima Community College
> Tucson, Arizona
> One scorching day in Tucson, Arizona, my friend Sherri Greenberg and
> I decided to tag _Dracula_. We knew that putting TEI-SGML tags on every
> speech of every character in that novel would be more than a fortnight's
> work, but we were determined. We had been discussing the prospects and our
> shared opinion was that every minor poet from "mainstream" literature would
> be given precedence before anyone would get around to tagging the classics
> of horror and science fiction. Being fans of the classics of these genres,
> this was, for us two, a dim prospect. I wondered if we would "crack" under
> the strain of going speech-by-speech and character-by-character through
> _Dracula_.
> "Are you sure you want to do this?" I asked Sherri, thinking that she
> might wish to be let off lightly.
> Sherri smiled and nodded cheerfully, _Dracula_ being a special favorite
> of hers. I recalled that she was and is one of the most patient and
> methodical of quilters. Quilters are a special kind of person. They're
> tough. She had plenty of chances to show just how tough she had to be to
> complete this unheard-of project, as we shall see, but, before that, I
> should like to give some overview of our method for this project.
> Sherri took on much of the manual locating of speeches of characters in
> _Dracula_'s hardcopy version. She had little interest in getting involved
> with the actual placing of computer code into the electronic text. That
> was to be my job. Sherri often made her contributions by mail, sending me
> several sheets of page notations which I would tag into TEI SGML and then
> cross off the lists. One complication was that our editions were not
> entirely in synchronous pagination. She was very fond of her book, so I
> just used the word search to locate her sentences.
> Our first breakthrough came when I attended one of Lou Burnard's
> workshops on TEI text-encoding on July 25th, 1995, in Santa Barbara. Lou
> helped us to place a proper DTD on the front of our document at that point.
> While I greatly admired the Author-Editor they used at the workshop, I knew
> that we would be completing our _Dracula_-tagging project with no way to
> check the accuracy of our tagging. Everywhere I looked online for
> shareware (or even commercial) parsers, they required far more memory than
> that which I had on my 386 IBM clone PC. I was working on two old PCs: an
> XT and a 386. We were about two-thirds into our project when my husband
> told me that his lab no longer needed the 386 and I could have it, but it
> too seemed to lack the amount of memory needed for an Author-Editor-style
> parsing environment. So, we were flying blind, as the cliche goes.
> I created a set of Word Perfect macros for many of the SGML tags that
> I would be using. In most instances, these tags worked excellently. In
> most instances but one truly unfortunate one (see below). The Word
> processor that I was using was Word Perfect 5.0. The age of this processing
> program was no real handicap, because we were working with items in a text
> and one program could do as well as another in tagging such items. The
> only thing that would have made a marked increase in our performance would
> have been to have a functional parser. Since this month (and too late for
> our initial tagging on _Dracula_), I am now using an Icon language parser
> program that can find anything between 'start' and 'end' tags. However, at
> the time of doing this tagging I didn't have the parser. I should say that
> even this Icon parser is a 'kluge job'; I explained to an advanced
> computing major at the University of Arizona what it was I was trying to do
> and he made me the Icon parser for ten dollars. It works very well and has
> succeeded in pulling all the tagged speeches of a given character from the
> whole novel. My 386 PC would not allow the Icon program to run over the
> whole novel, but ran out of memory, so I just programmed a batch file to
> run it over each chapter and then to delete each chapter output file, after
> these had been placed into a master output file. This was my first attempt
> at what was, for me, an elaborate batch operation. It did the job and
> allowed the Icon parser to create a file for all speeches by a given
> character throughout the novel in one step.
> I entered the computer tags as carefully as I could, but had no way at
> the time to check for accuracy. With no monitoring, I dropped a few
> end-tags and inadvertently placed /DIV2 end tags at the beginnings of some
> of the diary entries. After much consternation, I located the source of
> this problem in the fact that I had created some Word Perfect macros, but
> hadn't had enough available two-stroke macros to do everything, so not all
> tags had a unique macro. My solution proved disastrous. I was supposed to
> be remembering to delete the "/" from in front of the DIV2 tags and I had
> forgotten, so just began putting /DIV2 end tag at the beginning of the DIV2
> journals, diaries, and memos all through the _Dracula_ text! Alan Morrison
> of the Oxford Text Archive found these flaws in the way the DIV2
> designations on diary and journal entries were handled. But he pronounced
> these relatively minor cleaning-up operations and nothing which should
> cause our nearly two years of tagging to be invalidated.
> Little had we known when we picked _Dracula_ to tag that its narrative
> structure resembled a maze of clippings, journals, diaries, and memos.
> As a reader, none of this jumps out at one. However, life is different for
> the unwary TEI tagger. I shall try to provide some brief illustrations of
> how we picked a certain solution over another choice of tagging.
> In the hardcopy 1897 Penguin edition of _Dracula_ that I used for my
> manual searches, the earliest instance of a "nested structure" comes in
> chapter one. The chapter divisions of this novel are tagged as DIV1.
> Journal entries are smaller elements and are classed as DIV2. The first
> Harker journal narrative in chapter one is coded:
> <DIV2 TYPE='entry' n=00000503> (start tag)
> ...text...
> </DIV2> (end tag)
> But we found that within this DIV2 narrative, there are speeches by
> other characters. The earliest speech occurs on page 10, where a peasant
> woman says to Harker (within the Harker journal), "'The Herr Englishman?'"
> And Harker answers, "'Yes"', I said. "'Jonathan Harker.'"
> In terms of tagging the speeches of Jonathan Harker, we had to decide
> whether his every word in a given journal counted as his "speech" or
> whether only the direct quoted speeches that he made to other characters
> counted as his "speech". Counting Harker's every word in his journal as
> speeches to-be-tagged ran into the logical problem of what was to be done
> with the directly-quoted speeches of other characters in his same journal.
> These weren't Harker's words. These speeches by other characters not only
> "belonged" to the total speeches those other characters, but they could
> distort the speech patterns that we would collect and analyze as Jonathan's
> style of speaking. The only solution that made sense to Sherri and me was
> to count only the direct quotations of Harker as Harker's and to count the
> directly-quoted speeches of other characters as belonging to those other
> characters even when the speeches of other characters occurred within a
> journal entry of Harker's. So, we tagged the exchange between Jonathan and
> the old peasant woman this way,
> <DIV2 TYPE='entry' n=00000503> (start tag)
> ...text...
> <q who=WOM> The Herr Englishman? </q>
> <q who=HARK> Yes. </q>
> <q who=HARK> Jonathan Harker. </q>
> ...text...
> </DIV2> (end tag)
> Note that there is no indication in these DIV2 tags of which
> narrator's journal or diary entry printed above is. Unless there is a
> further way to refine these tags, this is a flaw in the TEI system, because
> useful analyses could be made by comparing the narrative style of
> characters with their directly quoted speeches. Mikhail Bakhtin mentions
> the 'heteroglossia' that may exists between the speech habits of different
> characters in a novel. Few novels have quite so many narrators as has
> _Dracula_, so scholars might have a Bakhtinian field day tracing first how
> the characters actual speeches compare with one another and then how the
> directly-quoted speech of a character compares with that character's style
> of narrating. With this in mind, I found a way on the VAX VMS to do
> wildcard searches pull up the narratives of individual narrators, but I
> never found a way to distinguish between individual narrators, using TEI
> DIV2 tags. My knowledge of these tags is not exhaustive, however, and
> perhaps there is a way to add 'person' to the designation of an 'entry'.
> At this point in its evolution, the TEI edition of _Dracula_ has only DIV2
> tagged narratives, but there is no distinction as to what person's entry
> the tagged section of text is.
> We marked pagination, paragraphing, chapters, narrative divisions such
> as diaries and journals, and speeches by individal characters. Other
> feature that we might have marked include: quotations, foreign language
> speeches, and allusions. There are instances of characters in the novel
> saying things in Hungarian, Romanian, or German, for instance. Amongst the
> allusions we might have added are a few to commercial advertisements of the
> day. When Count Dracula says, "The blood is the life, Mr. Renfield," the
> words sound to our modern ears as though the Count is making some allusion
> to Communion. But this sentence was a commercial slogan for a blood tonic
> and posters bearing these words were visible all over London in 1897. So,
> this would qualify as a "layered" allusion. It does hearken to the Biblical
> Deuteronomy 12:23, which states, "Only be sure that thou eatest not the
> blood; for the blood is the life; and thou mayest not eat the life with the
> flesh." Yet the other allusion is also present, so this would be an
> interesting allusion to try to tag for both senses of its meaning.
> Previously scholars, such as John Burrows, had used such
> character-tagged speeches in his landmark study, _Computation into
> Criticism_, dealing with word frequency counts in the complete novels of
> Jane Austen. Burrows states that scholars "would be severely handicapped
> by the dearth of available data for other writers than Jane
> Austen...(Burrows 130)". Sherri and I have tried to fill this gap with a
> lively complement to the somewhat austere Miss Austen. With our text, as
> with Austen, the speeches of individual characters may be analyzed and
> statistically compared by pulling up their tags.
> Our method may have resembled madness to our families and others about
> us, though. Sherri, my stalwart passage-marking collaborator, reported
> that she had a feeling that the other women were watching her at her son
> Samuel's dance class. All about were women reading _Good Housekeeping_ and
> writing letters, comparing how much money they made at the casino and so
> on. Little girls twirled by in pink tutus and hair pulled up tight into
> ponytails, giggling and taking practice figures on the polished wooden
> floor.
> Sherri sat there on the "mothers' bench" week after week, frenetically
> marking lines in her big black combined _Frankenstein_ and _Dracula_.
> Because the book was so big and strikingly black, she felt that the others
> noticed when the time of casual reading had long since passed, and yet,
> there she was, still hunched over her book, frantically marking those
> passages. Sherri began to cover the title of her book with her notebook
> when she came into the dance class. She confessed to me that she "wished
> it could have been one of those little paperback editions". She had
> enjoyed _The Addams Family_, but never realized that she would be in it
> someday.
> Her son Samuel also noticed her fascination with the book _Dracula_.
> Since he was prone to nightmares, Sherri only told the 6-year-old Sam that
> she was working on her "computer project".
> I did most of my tagging at home, not from fear of humiliation -- at
> least I don't think it was that -- but because I didn't seem to be able to
> concentrate well out in public.
> Actually, in light of all that it took to get _Dracula_ tagged, we might
> have learned our lesson. But some time ago Sherri said that she was
> thinking what a shame it was that there was no TEI-SGML character-tagged
> Sherlock Holmes available. My mind reeled at the very thought of tagging a
> whole series, but she hasn't brought the idea up lately. Perhaps the
> Two-Housewife __Dracula-tagging team is destined to go the way of the
> Beatles.

VIRTUAL CLASSROOM: Diversity University MOO
TELNET> 8888
login as: co guest
Type: @go #2673