From: CBS%UK.AC.RUTHERFORD.MAIL::CA.UTORONTO.UTCS.VM::POSTMSTR 14-JAN-1989 09:53:36.32 To: archive CC: Subj: Via: UK.AC.RUTHERFORD.MAIL; Sat, 14 Jan 89 9:50 GMT Received: from UKACRL by UK.AC.RL.IB (Mailer X1.25) with BSMTP id 9331; Sat, 14 Jan 89 09:50:04 GM Received: from vm.utcs.utoronto.ca by UKACRL.BITNET (Mailer X1.25) with BSMTP id 1923; Sat, 14 Jan 89 09:49:54 G Received: by UTORONTO (Mailer X1.25) id 0407; Fri, 13 Jan 89 14:46:36 EST Date: Fri, 13 Jan 89 14:46:07 EST From: "Steve Younker (Postmaster)" To: archive@UK.AC.OXFORD.VAX ========================================================================= Date: 1 December 1987, 00:21:22 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributor: Sterling Bjorndahl - Claremont Grad. School Subject: range of discussion on HUMANIST I appreciate the concern about HUMANIST's self-editorial policy. There is a spirit missing from HUMANIST's discussions that has been present in other discussions I have been a member of. On the other hand, I can remain a member of HUMANIST in good conscience because it does not take much time away from my other duties, which are considered by others here to be of a higher priority. Two specific examples: I was a member of the info-c discussion on ARPA (linked with the sister discussion group on usenet). This was a free-flowing discussion with frequent cries from subscribers asking submitters to control themselves. There was a great deal of redun- dancy and even inanity mixed with a few nuggets of valid and even brilliant discussion. Although I enjoyed it immensely on the whole, I had to quit because I couldn't afford that many hours of extra reading per week. After a while, the returns just weren't great enough to put up with the noise. On the other end of the editorial spectrum was the arpanet RISKS digest. A "digest" means that the moderator is also an editor. All submissions are sent to him, and he exercises editorial judgement on everything submitted. Once or twice a week, as volume dictates, the collected and edited submissions are mailed in one package to all sub- scribers, with a refreshing dash of humour added. The kind of give- and-take conversations that have been referred to can still happen in this environment, because the moderator is essentially benign unless serious redundancies and/or inanities occur. (I believe that the mod- erator was getting full credit for his work in this, and it was proba- bly a part of his job description.) Nevetheless, even so edited, the volume became more than I could deal with effectively (despite the fascinating subject matter, by the way: risks to the public from com- puters and automated systems). So although I find HUMANIST occasionally on the "dead" side, I have no trouble maintaining my subscription since it does not demand too much of my time. Discussions happen in private, and if I want to get in on them I can contact the initiator. I admit, I wouldn't mind seeing a bit more activity in HUMANIST on occasion, and I think people with issues of broad interest (such as the recent discussion on the OED) should feel free to bring these issues forward. But if HUMANIST has to err, I would rather it err on the dead side, lest I be forced to resign. Let my vote be so registered. Sterling Bjorndahl Institute for Antiquity and Christianity Claremont Graduate School Claremont, California ========================================================================= Date: 1 December 1987, 09:25:37 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributor: Jim Cerny Subject: The Dirty Dozen ... Plus??? This is just to add to the warning passed on by Stuart Hunter about Lehigh's direct experience with a virus in some publicly obtained copies of COMMAND.COM. There are apparently a number of other programs that have had work done to their genes to turn them into malignant viruses. They have come to be called "The Dirty Dozen," though there are more than a dozen. These have been described in a number of computer center newsletters in the last year or so. The most recent description I've seen was "Beware The Dirty Dozen: Software That Destroys," CAUSE/EFFECT, v. 10, n. 6, November 1987, pp. 44-45. (which is reprinted from the "Technical Update" publication at the Univ. Cincinnati Computing Center, September 1, 1987). Jim Cerny University Computing, Univ. N.H. J_CERNY@UNHH ========================================================================= Date: 1 December 1987, 09:34:18 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributor: Dr Abigail Ann Young Subject: Discussions [34 ll, counting this one] Well, pace Sterling Bjorndahl, I don't find HUMANIST on the dead side and I don't want to! But I think I know exactly what he's talking about. Recently it seems that queries or opinions appear and then die in electronic silence. In fact, there seems in general to be less discussion now than there was a few months ago. I don't know to what to attribute this. It could reflect a need on the part of those of us who teach or provide services to students to prepare for and then deal with the demands of a new academic session. It could be that no-one has very much to say at the moment. But I have wondered recently whether we were all feeling a reluctance to say much brought on by our worthy moderator's urgings towards self-editing (with the consequent responsiblity of editing and posting a resulting conversation, if any) and our new awareness of the cost factor for the Antipodes at least. I certainly find the current "full" discussion on the details about the electronic OED interesting and a nice change, even though I had already found out a lot of it at the Waterloo conference, and I wish that I'd kept my query about the Rutgers database general now too. So I am glad that Willard has passed on what others have had to say, and I think perhaps we should try out for a bit making all discussion general. We could make use of a subject line to indicate the topic of a posting, and whether it were part of an on-going discussion, thus enabling those who need to clear their readers quickly to ignore discussions which were not of interest to them. Abigail Young Research Associate, Records of Early English Drama University of Toronto young at utorepas ========================================================================= Date: 1 December 1987, 11:14:50 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributor: William J. McCarthy Subject: discussions (15 lines) I would like to express my approval of the contents of the recent message about discussions on HUMANIST. Although I have no interest in scanning the turgid "flames" of the digitally deranged, it seems much more than unlikely that HUMANISTs will inundate one another with drivel; and, I am content to attempt to follow the threads of the discussions on my own. Certainly it >is< easy enough to dispatch into oblivion (I have set up a macro to just that purpose) any piece of mail in which one has no interest. As it now stands, HUMANIST seems a touch too formal. ========================================================================= Date: 1 December 1987, 14:24:09 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributor: "Michael Sperberg-McQueen" Subject: CD ROMs, micro- and mainframe computing with large corpora A late contribution to the discussion provoked by Abigail Young about CDs as a medium of data distribution. [60 lines or so.] I think Dr. Young hit the nail on the head with the question "Are there people out there waiting with bated breath for the new OED on CD ROM?" Because certainly if we're not excited about the OED as a group, then we're not as a group going to be very excited about anything. Yes, I AM waiting with bated breath for an electronic OED, but I was far more excited to learn it would be available on tape than I was to hear about the CD ROM version. I like and use my PC, and I hope someday to be able to work with massive textual corpora on it, but at least for the moment I think magnetic tape is a far better medium for distribution. For one thing, I don't have a CD ROM drive, and I don't know anyone who does, except for Bob Kraft and a classicist here who has a Ibycus micro on loan but does her Greek word processing on our mainframe. Tape drives, on the other hand, will be available at any school in the country. For another, tape drives allow me to change the data -- add to it, enhance it, reduce its size -- and make another copy. CD ROM doesn't. For that reason alone, I'll wait for WORM before buying a new drive for my PC. And finally, mainframes seem to me by and large better at dealing with large quantities of data. That is changing, to be sure. But I can edit the Nibelungenlied in storage on the mainframe, and extract every occurrence of the name 'Sivrit' in a couple of seconds. My PC with its 640 Kbytes can only hold a fourth or so of the Nibelungenlied in RAM at a time. To be sure, a micro-Ibycus could also find all the occurrences of 'Sivrit' in a few seconds -- if the Nibelungenlied were on a CD ROM. But it's not, and there aren't enough Germanic philologists in the country to make it economically feasible to make one. Nor do I WANT a frozen, unalterable text of the Nibelungenlied. I want to be able to index it, to add parsing information or scansions to the file so I can search on them, and so on. Not to mention the need to correct typos in the transcription and add manuscript variants. For all this, we need erasable media, not CD ROMs. Magnetic tapes do have the drawback, for some users, that they are typically readable only on mainframes. (There are PC-based 9-track tape drives, but they aren't real common.) And many humanists don't like working on mainframes. Even for those users, however, the local academic computing center should be, and almost always is, in a position to read the tape and help the user download the data to a microcomputer. No, it's not always easy. And no, it's not always fast. A megabyte an hour or so. But the chances are good the academic computer center knows how to do it, and does it regularly. All the ones I've ever known as a user or staff member do. There may be centers that do NOT provide this kind of service, although I have never seen one and never heard of one. But if they exist, those centers should be DRIVEN to provide support for humanities computing, support for microcomputing, and support for data exchange between mainframes and micros. If they are not providing these services, they are not doing their job. Given the kind of support computer centers ought to be providing for humanist users, and given the kind of flexible text humanistic work seems to need, I think CD ROMs look much less promising as a means of data distribution than WORM disks and magnetic tape, and in some cases floppy disks. All of which is just one user's opinion. -Michael Sperberg-McQueen University of Illinois at Chicago (U18189 at UICVM) ========================================================================= Date: 1 December 1987, 14:26:50 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributor: Marshall Gilliland Subject: Subject line comments (25 lines) Oh, my, the HUMANIST subject lines may get long. Now, in addition to the honest-to-goodness subject, and to the number of lines in the message, Abigail Young suggests "we could make use of a subject line to indicate the topic of a posting, and whether it were part of an on-going discussion, thus enabling those who need to clear their readers quickly to ignore discussions which were not of interest to them." Maybe we serious, dull writers can use such an augmented subject line as a place to pun? But woe is me for, alas, my mailer does not accept long subject lines. Can it be that some people will have to read the beginning of the message to learn what we want to ignore? Will we be like this lady: Lizzi Borden took an axe And plunged it deep into the VAX Don't you envy people who Do all the things you want to do? (Thanks to Jerry Whitnell in California for the ditty.) Maybe we'll relax a bit as our marking gets frantic and we hear the carols of the season. Marshall Gilliland U of Saskatchewan ========================================================================= Date: 1 December 1987, 15:55:42 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributor: "James H. Coombs" Subject: Concordance for Mac Does anyone know of concordance programs for the Mac? Thanks. --Jim ========================================================================= Date: 1 December 1987, 15:58:31 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributor: "Michael Sperberg-McQueen" Subject: Text encoding guidelines -- progress report (225 lines) A followup on the current status of the ACH effort to formulate guidelines for text encoding practices. ****************************************************************** * NOTE: The following encoding conventions have been used to * * represent French accents throughout this message: * * * * To Represent Accents -- Pour la representation des accents * * / acute accent - accent aigu * * ` grave accent - accent grave * * * * The accent codes are typed Les codes pour les accents se * * AFTER the letter, and are trouvent APRES la lettre qu'ils * * used with both upper and modifient, et s'utilisent avec * * lower case letters. les majuscules aussi bien que * * les minuscules. * ****************************************************************** On November 12 and 13, 1987, 31 representatives of professional societies, universities, and text archives met to consider the possibility of developing a set of guidelines for the encoding of texts for literary, linguistic, and historical research. The meeting was called by the Association for Computers and the Humanities and funded by the National Endowment for the Humanities. The list of participants is appended to this document. The participants heartily endorsed the idea of developing encoding guidelines. In order to guide such development, they agreed on the following principles: The Preparation of Re/daction des directives Text Encoding Guidelines pour le codage des textes Pougheepsie, New York 13 November 1987 1. The guidelines are intended 1. Le but des directives est de cre/er to provide a standard format un format standard pour l'e/change for data interchange in des donne/es utilise/es pour la humanities research. recherche dans les humanite/s. 2. The guidelines are also 2. Les directives sugge/reront intended to suggest principles e/galement des principes pour for the encoding of texts l'enregistrement des textes in the same format. destine/s a` utiliser ce format. 3. The directives should 3. Les directives devraient a. define a recommended a. de/finir une syntaxe recommande/e syntax for the format pour exprimer le format, b. define a metalanguage b. de/finir un me/ta-langage for the description de/crivant les syste`mes de of text-encoding schemes, codage des textes, c. describe the new format c. de/crire par le moyen de ce and representative me/talangage, aussi bien qu'en existing schemes both in prose, le nouveau syste`me de that metalanguage and codage aussi bien qu'un choix in prose. repre/sentatif de syste`mes de/ja` en vigueur. 4. The guidelines should 4. Les directives devraient proposer propose sets of coding des syste`mes de codage utilisables conventions suited for pour un large e/ventail various applications. d'applications. 5. The guidelines should 5. Sera incluse dans les directives include a minimal set of l'e/nonciation d'un syste`me de conventions for encoding codage minimum, pour guider new texts in the format. l'enregistrement de nouveaux textes conforme/ment au format propose/. 6. The guidelines are to be 6. Le travail d'e/laboration des drafted by committees on: directives sera confie/ a` quatre comite/s centre/s sur les sujets suivants: a. text documentation a. la documentation des textes, b. text representation b. la repre/sentation des textes, c. text interpretation c. l'analyse et l'interpre/tation and analysis des textes d. metalanguage definition d. la de/finition du me/talangage et and description of son utilisation pour de/crire le existing and proposed nouveau syste`me aussi bien que schemes ceux qui existent de/ja`. co-ordinated by a steering Ce travail sera coordonne/ par un committee of representatives comite/ d'organisation ou` of the principal sie`geront des repre/sentants des sponsoring organizations. principales associations qui soutiennent cet effort. 7. Compatibility with existing 7. Dans la mesure du possible, le standards will be maintained nouveau syste`me sera compatible as far as possible. avec les syste`mes de codage existants. 8. A number of large text 8. Des repre/sentants de plusieurs archives have agreed in grandes archives de textes en form principle to support the lisible par machine acceptent en guidelines in their function principe d'utiliser les directives as an interchange format. en tant que description des formats We encourage funding agencies pour l'e/change de leurs donne/es. to support development of Nous encourageons les organismes tools to facilitate this qui fournissent des fonds pour la interchange. recherche de soutenir le de/veloppement de ce qui est ne/cessaire pour faciliter cela. 9. Conversion of existing 9. En convertissant des textes machine-readable texts to lisibles par machine de/ja` the new format involves the existants, on remplacera translation of their automatiquement leur codage actuel conventions into the syntax par ce qui est ne/cessaire pour les of the new format. No rendre conformes au format nouveau. requirements will be made for Nul n'exigera l'ajout the addition of information d'informations qui ne sont pas not already coded in the de/ja` repre/sente/es dans ces texts. textes. (trad. P. A. Fortier) ****************** The further organization and drafting of the guidelines will be supervised by a steering committee selected by the three sponsoring organizations: ACH (the Association for Computers and the Humanities), ACL (the Association for Computational Linguistics), and ALLC (the Association for Literary and Linguistic Computing). Drafts of the guidelines will be submitted for comment to an editorial committee with representatives of all participating organizations (in addition to the sponsors, thus far: the Modern Language Association, the Association for Computing Machinery Special Interest Group for Information Retrieval, and the Association of American Publishers; the following groups have indicated interest informally but have not yet formally pledged participation, in most cases pending a foraml vote: the Linguistic Society of America, the Association for Documentary Editing, the American Philological Association. The American Anthropological Association, plus several organizations within Europe, are now being asked to consider participation. The interchange format defined by the guidelines is expected to be compatible with the Standard Generalized Markup Language defined by ISO 8859, if that proves compatible with the needs of research. The needs of specialized research interests will be addressed wherever it proves possible to find interested groups or individuals to do the necessary work and achieve the necessary consensus. Formation of specific working groups will be announced later; in the meantime, those interested in working on specific problems are invited to contact either Dr. C. M. Sperberg-McQueen, Computer Center, University of Illinois at Chicago (M/C 135), P.O. Box 6998, Chicago IL 60680 (on Bitnet: U18189 at UICVM), or Prof. Nancy Ide, Dept. of Computer Science, Vassar College, Poughkeepsie NY 12601 (on Bitnet: IDE at VASSAR). - N.I., C.M.S-McQ ------------------------------------------------------------------------------ List of Participants NOTE: Association names are given following the names of their representatives at this meeting. Helen Aguera, National Endowment for the Humanities Robert A. Amsler, Bell Communications Research David T. Barnard, Department of Computing and Information Science, Queen's University, Ontario Lou Burnard, Oxford Text Archive Roy Byrd, IBM Research Nicoletta Calzolari, Istituto di linguistica computazionale, Pisa David Chestnutt (Assoc. for Documentary Editing, American Historical Assoc.), Department of History, University of South Carolina Yaacov Choueka (Academy of the Hebrew Language), Department of Mathematics and Computer Science, Bar-Ilan University Jacques Dendien, Institut National de la Langue Francaise Paul A. Fortier, Department of Romance Languages, University of Manitoba Thomas Hickey, OCLC Online Computer Library Center Susan Hockey (Association for Literary and Linguistic Computing), Oxford University Computing Service Nancy M. Ide (Association for Computers and the Humanities), Department of Computer Science, Vassar College Stig Johansson, International Computer Archive of Modern English, University of Oslo Randall Jones (Modern Language Association), Humanities Research Computing Center, Brigham Young University Robert Kraft, Center for the Computer Analysis of Texts, University of Pennsylvania Ian Lancashire, Center for Computing in the Humanities, University of Toronto D. Terence Langendoen (Linguistic Society of America), Graduate Center, City University of New York Charles (Jack) Meyers, National Endowment for the Humanities Junichi Nakamura, Department of Electrical Engineering, Kyoto University Wilhelm Ott, Universitaet Tuebingen Eugenio Picchi, Istituto di linguistica computazionale, Pisa Carol Risher (American Association of Publishers), American Association of Publishers, Inc. Jane Rosenberg, National Endowment for the Humanities Jean Schumacher, Centre de traitement e/lectronique de textes, Universite/ catholique de Louvain a` Louvain-la-neuve J. Penny Small (American Philological Association), U.S. Center for the Lexicon Iconographicum Mythologiae Classicae, Rutgers University C.M. Sperberg-McQueen, Computer Center, University of Illinois at Chicago Paul Tombeur, Centre de traitement e/lectronique de textes, Universite/ catholique de Louvain a` Louvain-la-neuve, Belgium Frank Tompa, New Oxford English Dictionary Project, University of Waterloo Donald E. Walker (Association for Computational Linguistics), Bell Communications Research Antonio Zampolli, Istituto di linguistica computazionale, Pisa, Italy [end of message] ========================================================================= Date: 1 December 1987, 16:22:58 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributor: Dr Abigail Ann Young Re: CD-ROMs & other media; on-going [body of message 26 ll inclusive] (Was that too long, Marshall?) Does anyone have any information on WORM drives? A non-HUMANIST colleague told me he had heard about them at an IBM-sponsored conference and that they were the best thing since sliced bread, basically. I've also heard that a disk for an IBM WORM drive would be capable of being written to only once, which would certainly make such a disk only slightly more useful than a CD-ROM, and considerably less useful than a magnetic tape. I am always suspicious of new devices which will revolutionize my life and save me time, trouble, etc. I think it is because I tended to believe the Popular Science/Mechanics picture of the future when I was a child. But a WORM drive & disk capable of multiple disk writes as well as reads sounds very, very appealing. Abigail Ann Young Research Associate, Records of Early English Drama University of Toronto young at utorepas ========================================================================= Date: 2 December 1987, 00:17:29 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributor: "James H. Coombs" Subject: Sonar; Mac; concordance vs. retrieval (54 lines) I asked about concordance programs for the Mac. Someone sent me the review of Sonar and a couple of others have mentioned it. The review does not say anything about concording texts with Sonar, however. I have never used one of these retrieval programs. I have used WatCon and have written a concordance program for the IBM PC (for multiple versions of the same text). IS Sonar appropriate for generating concordances? concordances that will be printed and distributed? Does it properly handle lines of poetry, for instance? and give columns of lines with locations? I assume that WordCruncher from BYU can do such, since it is a descendent of a concording program (unless there is an equivocation on "concord" here, and please let us all know if there is). I am in the process of designing a retrieval engine and browser for the American Heritage Dictionary. When I think of retrieval programs, I think of inverted indices, hash tables, and the like. "Use this information to go find X and then let's Y it." That, to me, is a typical retrieval action, and the access is typically random. Concording, however, at least in the traditional sense, is sequential and exhaustive. One COULD use a retrieval application to concord a text, but it would be very inefficient and would probably require additional programming anyway. One would have to have a means to call the retrieval engine iteratively for every word in the text as well as the means to format and write the results someplace. Are WordCruncher and Sonar dual applications? In order to index, one has to perform much of the same processing as is required for concording (process sequentially and exhaustively, split words out of lines, stop words, lemmatize?, cross reference (See also xxx)?). Well, some of the routines are the same anyway, at least to the extent that the developer of one type of application would have a start on developing the other. It begins to sound like integrated systems a la Symphony vs. 1-2-3. Does the system that offers both really do both jobs well? Or, first I guess, are there systems that offer both? --Jim Dr. James H. Coombs Software Engineer, Research Institute for Research in Information and Scholarship (IRIS) Brown University jazbo@brownvm.bitnet ========================================================================= Date: 2 December 1987, 12:12:08 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributor: Bob Kraft Subject: CD-ROM & WORM [88 lines] The recent observations by Abigail Young and Michael Sperberg-McQueen on CD-ROM and WORM technologies call for some comment from the "pro" (and experienced) side. I hope to keep them brief, just to pinpoint some of the issues. Michael's comments seemed to me to miss many crucial points, and did not reflect the attitudes or situation of numerous people with whom I am in regular contact. 1. The difference between CD-ROM and WORM for this discussion is negligible, as Abigail suspected. Right now, WORM drives are more expensive and less tested publicly, but cheaper to produce a single disk. But once you have that single WORM disk, which currently costs about $65, there is no price advantage to making multiple copies (50 copies would cost $3250). With the CD-ROM, it might cost $3000 to master but each additional copy would cost very little (perhaps $ 7 each for 100). Thus it would be much cheaper to make 100 copies of a CD-ROM than 100 copies of a WORM disk at present. And the CD-ROM holds more than twice as much as the WORM disks with which we are working. So WORM is fine for limited production or in-house purposes, CD-ROM is better for larger distribution, etc. Neither can be changed once they are mastered, although WORM can be mastered in stages, while CD-ROM is a once for all mastering process. 2. Are people anxiously waiting for data distributed on CD-ROM? In my experience, YES. We have many advance orders for the CCAT CD-ROM, and more inquiries. Ted Brunner can report on the TLG experience. What sorts of people are asking? Obviously, IBYCUS SC owners (about 130 machines) who are set up to use CD-ROM as part of the package; Librarians, who need massive amounts of data in a bibliographically controlled context (static is good, in this setting!); the mass of individual scholars/students who are not in a tape-oriented environment such as Michael describes (his experience is not at all typical, even at the ideal level, of the majority of people with whom I am in contact -- people in small colleges, seminaries, or operating individually, with no access to a real mainframe or effective consultation). 3. What is attractive to these inquirers? Several fairly obvious things. (1) Amount of material available -- e.g. all of Greek literature through the 6th century on the TLG disk! (2) Price of the material (on tape, the TLG data cost over $4000; on CD-ROM, it is about 10% of that) (3) Convenience of storage, access, etc. -- I would rather download from a CD-ROM than from a tape drive, any day. It is the old roll vs codex issue once again (microfilm vs microfiche, etc.). (4) Quality control -- what is on the CD-ROM may have errors, but at least they can be identified and controlled (and corrected in a later release); I don't have to wonder whether my dynamic file has become corrupted (as happens more than I want to admit). (5) Speed of access to large bodies of data -- even if the programs are not yet in place and it will take 20 times as long to search a large CD-ROM file on the IBM than on IBYCUS, it is at least possible to do the search (or to search multiple files, in various configurations), which is extremely difficult in any other manner short of a dedicated mini. I am rambling and apologize. Much more needs to be said, but I need to finish preparing ID tables for the CCAT CD-ROM if it is to be mastered by the end of the year! Perhaps it would not be feasible economically to put the Nibelungenlied on its own CD-ROM, but to have it as a small part of a CD-ROM with all sorts of other texts is what we are talking about! That is not only feasible, but it seems to me highly desirable, IBYCUS or not. And I can still download what I want to edit, or manipulate, etc. I lose none of that capability. But I gain by having the original fixed at hand for comparison, etc. Libraries will rapidly be CD-ROM centered, and that is as it ought to be. Hopefully computer centers will not be bypassed by this exciting and useful development! Bob Kraft ========================================================================= Date: 2 December 1987, 14:41:59 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributor: Jim Cerny Subject: Summary of responses on KJV Bible for Macintosh (incl. [152 lines] Thanks to everyone who responded to my recent inquiry about the availability of the King James Version of the Bible for the Apple Macintosh. I've tried to acknowledge or quote from all the responses (as of 01-Dec) in the summary that follows. ====================================================================== John J. Hughes (XB.J24@STANFORD) had the most definitive answer, reflecting no doubt the research for his book "Bits Bytes and Bible Studies". Robin C. Cover (ZRCC1001@SMUVM1) referenced this book and Marshall Gilliland (GILLILAND@SASK) and Tim Seid (ST401742@BROWNVM) mentioned sources that Hughes lists. Hughes wrote: ---------------------------------------------------------------------- There are several companies that sell King James Versions of the Bible for Macintoshes. Here are their names, addresses, and so forth. The first program is reviewed in detail in chapter 3 of BITS, BYTES, & BIBLICAL STUDIES (Zondervan, 1987). THE WORD Processor Bible Research Systems 2013 Wells Branch Parkway, Suite 304 Austin, TX 78728 (512) 251-7541 $199.95 Requires 512K; includes menu-driven concording program CP/M version available for Kaypros. MacBible Encycloware 715 Washington St. Ayden, NC 28513 (919) 746-3589 $169 128K; text files that may be read by MacWrite and Microsoft Word. MacScripture Medina Software P.O. Box 1917 Longwood, FL 32750-1917 (305) 281-1557 $119.95 128K; text files designed to be used with MacWrite. ======================================================================= Marshall Gilliland (GILLILAND@SASK) pointed to a very unexpected source, i.e., one of the DECUS (DEC Users Society) tapes. We are an active VAX/VMS site and we did indeed have the tape. It is on VAX System SIG Symposium tape VAX86D (from the Fall 86 DECUS meeting in San Francisco). In uncompressed form the files take about 9000 VAX disk blocks (roughly 5 MB). It is all in upper case. Presumably could be downloaded to a PC, but don't think I will attempt that! Gilliland wrote, in part: ----------------------------------------------------------------------- If you have VAX equipment there and get DECUS tapes then ask one of your systems people for the copy of the ascii text of the KJ Bible that was on a DECUS tape not too long ago (I think in 1987). Marshall Gilliland English Dept. U. of Saskatchewan ======================================================================= Tim Seid (ST401742@BROWNVM) pointed me to CCAT (Center for Computer Analysis of Texts) and Bob Kraft (KRAFT@PENNDRLN) from CCAT also responded. Bob Kraft also sent me several files about CCAT and its services and I've tacked CCAT's info-file at the end of this summary ... "old hands" may be aware of CCAT's electronic newsletter, ONLINE NOTES, but it was new to me and their info-file tells how to subscribe. Bob Kraft wrote: ----------------------------------------------------------------------- I have not seen my MAC person (Jay Treat) since your inquiry about the KJV arrived, but I am reasonably sure that it is already available from CCAT for the MAC, or will be very soon. We have been distributing the KJV and RSV (along with the Greek and Hebrew texts of the Bible) to IBM types for over a year now, and all these materials will be on our soon to be released CD-ROM. Most of it has been ported to the MAC as well. I will send you an order form and other information separately. Bob Kraft ======================================================================= Ronald de Sousa (DESOUS@UTORONTO) mentioned the possibility of using DIALOG services. de Sousa wrote: ----------------------------------------------------------------------- You'll probably get some satisfactory answers, but in the meantime I wonder whether you you that the cheap after-hours service of DIALOG Info Services, called "Knowledge Index", has the King James full text on line, and can be searched using the search options of that service. I seem to recall that for $200 you'd get about 8 hourse of search time -- quite enough for a limited project. Of course, the same is available on DIALOG itself, with somewhat more sophisticated options.. ======================================================================= Roger Hare (R.J.HARE@EDINBURGH.AC.UK) responded from JANET that Catspaw Inc. has the King James Bible. They specialize in supporting PC-based implementations of SNOBOL and related products, as I recall. Roger Hare wrote: ----------------------------------------------------------------------- Catspaw do a version of the King james Bible for 50 dollars. My catalogue dosen't say what machine it's for, but if you have access to a maniframe perhaps you could get it onto your Macintosh via file transfers? their address is: Catspaw Inc. PO Box 1123 Salida Colarado 81201 USA. Roger Hare. ======================================================================= Finally, Chuck Bush (ECHUCK@BYUADMIN) mentioned that they have the King James Bible at the Humanities Research Center at Brigham Young University and I presume he could supply more details. Chuck Bush wrote: ----------------------------------------------------------------------- At BYU we do have the text of the King James Bible in machine readable form. The original data is on a mainframe, but we have downloaded it to PC disks etc. for those who have ordered it in other forms. I have a copy of it on a Macintosh Bernoulli cartridge from which it would be relatively easy to copy it to some other Macintosh medium--even floppies. However, this is just the TEXT. There isn't any software to access it conveniently. Sonar is the only text retrieval software I know of for the Macintosh and I don't think it would be very satisfactory. For one thing, it couldn't give you chapter and verse references. Chuck Bush Humanities Research Center Brigham Young University ======================================================================= Interested HUMANISTs should also consult the guide to external services of the Center for Computer Analysis of Texts (CCAT), Univ. of Pennsylvania, available from Jack Abercrombie (JACKA@PENNDRLS.BITNET) ========================================================================= Date: 2 December 1987, 20:29:00 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Vox populi (46 lines) Dear Colleagues: My thanks to the several people who offered their views on the conversational style of HUMANIST. The majority of speakers have clearly voiced a preference for a somewhat more open manner of conversational exchange than has been the rule so far. For what it's worth, I welcome this change without reservation, since HUMANIST is by design ruled chiefly by its members rather than by its editor. Until an absolutely foolproof method of screening out junk mail is found, I will continue to have all submissions to HUMANIST sent first to me and will forward the ones of human origin to the membership. This means very little work for a very large improvement in the quality of the environment. One of the interesting (but, I guess, not surpising) characteristics of HUMANIST is the number of members who never say anything -- yet continue to put up with the large volume of mail. I imply no criticism whatsoever, for there are many noble and practical reasons for remaining silent. Nevertheless, I suspect that some members may occasionally have something to say but wonder if what they have to say is worthy. In general the advice I follow is, say it and see what happens. One possibility for the diffident is to send in a contribution with a note attached asking my advice, for whatever it's worth. Please let me know if anything about HUMANIST bothers you or otherwise seems to need improvement. The ListServ software (written and maintained on a voluntary basis by a remarkable person who lives in Paris) we cannot fundamentally alter. It has certain characteristics that some may consider flaws but that seem to me merely features to be exploited in the best possible way. Locally HUMANIST is supported by my Centre and by the good will of our Computing Services, i.e., by two busy people. There's not much that can be done given these resources, but some changes can be made without much effort -- like the screening of junk mail. In short, lead on! Yours, W.M. _________________________________________________________________________ Dr. Willard McCarty / Centre for Computing in the Humanities University of Toronto / 14th floor, Robarts Library / 130 St. George St. Toronto, Canada M5S 1A5 / (416) 978-4238 / mccarty@utorepas.bitnet ========================================================================= Date: 2 December 1987, 22:53:10 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributor: Sebastian Rahtz Heres one for the eager punters; a colleague of mine wants to study the New Kingdom El-Amarna literature (Egypt, mid 14th C BC). Anybody care to say if someone has already typed in such stuff onto the computer? apologies if its obvious... sebastian rahtz computer science university southampton uk ========================================================================= Date: 2 December 1987, 23:23:34 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributor: "Michael Sperberg-McQueen" Subject: CD ROMs, mainframes Many thanks to Bob Kraft for his cogent remarks about CD ROMs. I seem to have given a rather scrooge-like impression in my most recent posting about CD ROMs and PCs, which does not reflect my positive opinion of PCs. Yes, CD ROMs are ideal for certain kinds of data distribution, especially for (a) stable data and (b) large numbers of recipients. For humanistic research applications with those characteristics, they are also obviously good ideas. WORM disks, or better yet eraseable mass storage devices, would make many of the same advantages available for non-static data and small numbers of recipients. But neither description fits all research fields. I am less convinced that institutional support for faculty use of mainframes and microcomputers is untypical in North America. This is an empirical question, and I would like to put it up for discussion: what is the situation at the sites represented on HUMANIST with regard to: (a) support for humanities computing formally provided by the institution via centralized or specialized facilities, (b) faculty-student computing on mainframes or minis (c) institutional support for microcomputing (d) institutional support for mainframe-micro data transfer. It is possible that Bob Kraft is right and my experience is untypical. But it seems also possible that Penn and CCAT get so much business from people without mainframe access because those who do have local computer centers get their help locally. It would be useful, I think, for all of us if we could get some idea of the facts in this area. The ACH Special Interest Group for Humanities Computing Resources (the sponsor of HUMANIST) did plan once to distribute a questionnaire to gather this information but the final questionnaire design seems to have been delayed, so let's caucus informally now. Michael Sperberg-McQueen, University of Illinois at Chicago ========================================================================= Date: 2 December 1987, 23:34:33 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Electronic OED -- for the blind? Contributor: Norman Zacour I have a blind, computerized friend, a professor of English and a professional writer, who got very excited when I passed on to him the recent messages from HUMANISTS about plans for making the OED available in electronic form. He had visions - no joke intended - of consulting it through his speech synthesizer on his PC. His enthusiasm was dampened by the planned use of colour to display certain types of information. Does anyone happen to know if the OED has any plans for handicapped users? I suppose that there are still architects who design monumental buildings without ramps for wheelchairs, but perhaps... ========================================================================= Date: 3 December 1987, 09:55:14 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion Comments: Contributor: Robin C. Cover From: MCCARTY@UTOREPAS Subject: Al-Amarna Correspondence (in MRT format) [96 lines] Sebastian Rahtz asked whether the El-Amarna letters exist in digitized format somewhere. I doubt whether many HUMANISTS are interested in west-semitized Akkadian texts, but this query (and its answer) provides an opportunity to tell a sad and familiar tale...and perhaps an opportunity for someone to come forward with better news than I have to tell... The good news (for our assyriologist friend in the UK) is that Knudtzon's edition of the El-Amarna letters is in machine-readable format. I have used the massive printed "concordances" (two tomes, each about 7 inches thick). These printouts originated at UCLA, so the best bet is to contact Giorgio Buccellati at the department of Near Eastern Studies, who might make tapes or diskettes available. UCLA has a growing corpus of MRT material for the ancient Near East, and in time it will be available publically as part of Buccellati's hypermedia project for Mesopotamia (Computer-Aided Analysis of Mesopotamian Materials); some materials are currently available from Undena, and Buccellati passed out sample diskettes of digitized Eblaite texts at AOS. The sad tale I mentioned earlier is as follows: Du Cerf (a Paris publisher) recently released a superb volume in its series Litteratures anciennes du Proche-Orient on the El-Amarna letters. Its author/translator is William Moran of Harvard University, recognized as a (probably THE) leading Amarna scholar, who has been putting together this polished volume over the past 30-odd years. His translations are based upon extensive museum collations of the tablets, together with restorations that can be made only by someone so familiar with the "idioms" of international diplomacy (in the 14th century B.C.E) as Professor Moran is. So, the MRT edition we *REALLY* want is Moran's, not the 1915 edition of Knudtzon. But you won't find it published on diskette with this Du Cerf volume (which does not even have transliterated original text). According to the publishers, it would not be cost-effective to publish the original text on paper, and as for a MRT edition of the text....well... Shortsightedness like this has to stop, but who is responsible for "stopping it?" A single individual (as in this case, Moran) probably can do very little to force publishers to change their ways. But how about collective bargaining....we publish such scholarly materials ONLY with publishers that are sensitive about the future of scholarship, and about the precious treasure we have in ancient literature. This means placing premium value on original texts in machine-readable form -- only thus are they truly useful and accessible to modern scholarship -- and making these texts available in the public domain. I suspect that this problem is more acute for orientalists than for classicists and other humanities-literary subspecialty areas; we have special orthographies and printing problems which are expensive and demanding. But my suggestion is that we must encourage and demand higher standards of cooperation from publishers such that valuable (priceless!) human efforts are not lost on a Macintosh diskette after it passes from the departmental secretary or word-processing pool to the publisher. Does anyone else share this point of view? Am I too idealistic? While I am in a lament mode, I might as well refer to another problem that needs attention: the problem of coding standards. There are several efforts underway internationally to "encode" ancient Near Eastern texts in transliteration (Toronto - RIM; UCLA; Rome; Helsinki; etc), but to my knowledge there are no agreed-upon standards. In the case of purely alphabetic scripts, the problem is frustrating but not fatal, since we can use consistent-changes programs to standardize the data for archiving. In the case of syllabic (logographic; heiroglyphic) scripts -- Akkadian, Sumerian, Hittite, Elamite, Egyptian -- the plethora of transliteration schemes is more problematic. No-one sends this kind of data with an SGML prologue, so the best we can hope is that the encoding is consistent and that we can unravel the format codes. If anyone knows about efforts to introduce standards for transliteration and format-coding, would you kindly let me know? I understand that the committee for encoding standards (Nancy Ide; Michael Sperberg-McQueen) recently funded by NEH will not initially address the needs of orientalists. If there are other orientalists "out there" on the HUMANIST reader list -- should we organize ourselves? Apologies to all if this is arcane, recondite or just downright boring. I'd like to know if anyone out there shares some of my frustrations, or sees solutions. Professor Robin C. Cover 3909 Swiss Avenue Dallas, TX 75204 (214) 296-1783 ========================================================================= Date: 3 December 1987, 09:58:21 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion Comments: Contributor: Brendan O'Flaherty From: MCCARTY@UTOREPAS Subject: E-mail to Australia Can anyone tell me if e-mail to the Antipodes (ie Australia) has a charge? and if so who pays---the sender if outside Australia or the Receipient? Thanks in advance. ========================================================================= Date: 3 December 1987, 13:36:52 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: The Thesaurus Linguae Graecae (TLG) on CD-ROM The following has been contributed by Theodore Brunner, Director of the TLG Project, from a memo circulated to all TLG customers. Anyone wishing to arrange for a license agreement should contact Professor Brunner, Thesaurus Linguae Graecae, University of California at Irvine, Irvine, CA 92717 U.S.A., telephone: (714) 856-7031, e-mail: TLG@UCIVMSA.bitnet. The license per CD-ROM , including a copy of the printed TLG Canon, is not expensive: ini- tial registration fee (plus first year fee) is $200 to institu- tions and $120 to individuals; annual fee $100 to institutions, $60 to individuals; optional one-time payment for 5 years $500 to institutions, $300 to individuals. (All prices are in US $.) _________________________________________________________________ TLG CD-ROM CUSTOMERS: We have been receiving numerous questions related to TLG CD ROM dissemination plans and policies; here is miscellaneous informa- tion on these subjects: l. To date, the TLG has produced two CD ROMs, disk "A" and disk "B". Disk "A" contains approximately 27 mlllion words of TLG text, as well as an electronic version of the TLG Canon. Disk "B" contains the same 27 million words of text, the TLG, and an Index to the TLG texts on the CD ROM. Disk "A" also contains miscellaneous non-TLG materials, including some Latin, Coptic, and Hebrew texts, some epigraphical materials, as well as portions of the Duke Data Bank of Documentary Papyri. The non-TLG materials were included on TLG CD ROM "A" for one reason only: this disk was produced (as was CD ROM "B") primarily for experimental purposes, i.e., to aid in the development of software resources designed to enhance utilization of the (relatively new) CD ROM data storage medium. Neither disk "A" nor disk "B" reflects the High Sierra format standard (established after both of these CD ROMs were produced. 2. In short order, the TLG will release a new CD ROM, disk "C". This disk will contain approximately 41.5 million words of TLG text, an index to this text material, and the TLG Canon. Individuals and institutions already holding license to "A" or "B" disks are entitled to receive "C" disks free of charge. This (as provided for in the license agreement governing use of TLG ROMs) will be on an exchange basis, i.e., disks previously issued by the TLG must be returned to the TLG prior to the issuance of a "C" disk. TLG LICENSEES SHOULD NOT RETURN THEIR "A" OR "B" DISKS UNTIL DISK "C" IS OFFICIALLY RELEASED. [Notice will appear on HUMANIST when disk "C" is ready.] 3. Questions have been raised about the absence of non-TLG material on the "C" disk. The TLG controls and licenses only its own materials, and license agreements previously executed pertain to the TLG materials on the disks only. Current TLG CD ROM licensees may, of course, continue to use their ("A" or "B") disks throughout the course of their license period; they will not be issued "C" disks, however, until they have returned their earlier CD ROM versions to the TLG. It is the case, however, that the Packard Humanities Institute (PHI) will be releasing its own CD ROM in the very near future; this disk will contain Latin, Coptic, Hebrew, and epigraphical materials, as well as a significant portion of the Duke papyrological data bank. It can be assumed that individuals and institutions desirous of these materials can make arrangements with PHI to gain access to them on a PHI disk. Further informa- tion on this subject can be obtained by contacting John Gleason, Packard Humanities Institute, P.0. Box 1330 Los Altos, CA 94022 U.S.A. 4. We have received numerous requests for technical documenta- tion related to the forthcoming TLG CD ROM "C". The internal organization of the text files and of the I.D. table files will be identical to the organization of these files on TLG CD ROM "A". The file directory and author table will be reorganized to reflect the High Sierra standard. More detailed documentation is currently being prepared and should be ready for distribution in the near future. Theodore F. Brunner, Director November 8, 1987 _________________________________________________________________ ========================================================================= Date: 3 December 1987, 15:00:56 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion Comments: Contributor: "Michael Sperberg-McQueen" From: MCCARTY@UTOREPAS Subject: Enlightening the publishers, encoding Semitic (65 lines) Three cheers for Robin Cover's idea of group pressure to bring publishers to their senses regarding the preservation and distribution of machine-readable materials. Some publishers, to their credit, are already alert to the issues involved--or so say people who should know. But there are still an awful lot of them out there who behave the way Renaissance printers did with Carolingian manuscripts: mark it up, print it, and throw it out. Anything we can do to preserve the fruits of scholarly labors, we should do. It would also be useful to have a better developed system of text archives in North America -- either a network of regional or discipline-based archives, or one central archive that would take anything (the way Oxford does). The latter would be appealing because fewer texts might fall through cracks in the system, but specialized collections would remain important because they can do more intensive work on their holdings, the way Penn's CCAT does. A central North American text archive, acting in concert with the European archives, might also be in a position to help exert the kind of group pressure on publishers that Robin Cover suggests. Making the publisher's texts usable, by documenting as far as possible the usual systems of typesetting codes found in the publishing industry, is one goal of the ACH/ACL/ALLC initiative for text-encoding guidelines. (That goal is not wholly explicit in the final document I posted here a couple of days ago, but it was discussed at length during the planning meeting at Vassar and clearly is important to a lot of people.) The consensus of the planners at Vassar was also that transliteration practices, and conventions for the encoding of character sets, should at least be documented as far as possible in the guidelines. Many participants were leery of making specific recommendations for the representation of specific characters, since local hardware features and requirements can vary so widely. Nevertheless, the experts present agreed that it would not be insuperably difficult to provide adequate documentation for the encoding of scripts which, like Semitic scripts, provide special challenges to most commonly available hardware. That means that the guidelines can and should contain full information on practices for encoding texts of interest to Orientalists--if the Orientalists will document their existing practices. If they can also agree on common recommendations for future work, that consensus can and should also be documented. The same goes for any and all other specialized interests. These guidelines will belong to the humanities computing community as a whole, and I hope the community will work together to make them as complete and useful as we can. Again, I reiterate the invitation: anyone interested in helping formulate the guidelines, either in general or with respect to some specific question (e.g. the encoding of Akkadian, or the encoding of numismatic materials, or the encoding of manuscript variants, or the prosodic transcription of oral texts, or the encoding of hypertext materials, or ...), should please contact Nancy Ide or myself. This invitation will be periodically renewed, as details for the formal arrangement of the drafting committees are set, but if you let us know now, we will have a better idea of how much interest there is, and what kinds of special problems are on people's minds. Michael Sperberg-McQueen, University of Illinois at Chicago P.S. The opinions here expressed are as always mine, not necessarily those of my employer, or the ACH, or the guidelines steering committee. ========================================================================= Date: 3 December 1987, 19:16:39 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion Comments: Contributor: David Nash From: MCCARTY@UTOREPAS Subject: E-mail to Australia (24 lines) E-mail involving ACSNet (Australia, through the international gateways, or even domestically between sites I think) has a charge for the Australian end (whether sender or receiver). It was something like 10c/message plus 2c/line about a year ago. Apparently many institutions do not (yet?) pass on the charge to individual users. The official position could presumably be got from postmaster@munnari.oz, i.e. David Nash Center for Cognitive Science 20B-225 MIT Cambridge MA 02139 ========================================================================= Date: 3 December 1987, 19:20:14 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion Comments: Contributor: Laine Ruus From: MCCARTY@UTOREPAS Subject: Archives (75 lines) In response to Prof. Cover's impassioned plea, I can only say that it IS possible, with some concerted effort to force publishers to change their ways. The American Sociological Association has recently, as of Sept 1, 1987, in fact, begun to require of all periodicals published under their aegis, that any computer readable files, (both data and software) BE CITED in the bibliography. There is an effort under way now to convince other academic publishers to follow suit. There are a number of reasons for citation of computer files: (a) computer-readable files are intellectual property in their own right, quite as much as publications in other media, eg on paper, film, audio-tape, canvas, etc. (This has been recognized by that most conservative institution, the American Library Association, since the late 1970s.) The authors (properly called 'principal investigators'), producers, publishers, editors, and translators, of computer-readable files deserve for their labours the same acknowledgement and recognition as do the authors, composers, etc of intellectual property in more traditional media. (b) the citation of source materials in the bibliographies of publications acknowledges the source materials used in the research process, thus enabling ones peers to follow the same line of reasonsing, using the same source materials, to (hopefully) come to the same conclusions, thus corroborating our initial reasoning - ie the peer review process. (c) once computer-readable files are cited in bibliographies, they will get picked up in the citation indices, and thus eventually come to the attention of tenure committees. Thus individual 'authors' of these things will in time receive their due academic brownie-points. But citing computer-readable files is not enough. There must also be a mechanism for preserving them for posterity and making them available to others for secondary analysis. Researchers are reluctant to make 'their' files available to others for fear that they will not receive their due acknowledgement (- the polite reason). Mandatory citation of computer files in publications should help reduce this fear. Many researchers are not aware that there in fact exists a network of local data archives/data libraries in academic institutions throughout the United States and Canada, as well as a well developed system of national data archives in Europe, most recently in Hungary, Israel, and the USSR. Granted, these data archives primarily concentrate on 'social science' data files, primarily because that is the field from which the initial impetus for their creation came. However, this orientation is not cast in stone. And most of these data archives/libraries could with appropriate overtures, be convinced that there are other user communities that also need their services. The social scientists just happen to have been among the earliest and most vociferous. The point being that there is already an institutional framework, staffed by knowledgeable and experienced people who with very little effort could provide the network of text archives that humanists seem to want - all they want is a little proding. ------------------------------------------------------------ Laine Ruus, University of British Columbia Data Library userDLDB@ubcmtsg.bitnet ========================================================================= Date: 3 December 1987, 19:22:30 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion Comments: Contributor: STEPHEN@VAX.OXFORD.AC.UK From: MCCARTY@UTOREPAS Subject: E-mail to Australia There is a relay at ULCC (UK) called EAN which links with ACSnet - the fact that you do not register before submitting suggests it is 'free': you may be able to learn further from mailing an enquiry to laision@uk.ac.ean-relay EAN can also link you to other European sites as well - maybe to addresses 'missing' from EARN stephen@uk.ac.oxford.vax ========================================================================= Date: 4 December 1987, 13:02:56 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributed by Bob Kraft Subject: CD-ROMs Just to supplement Ted Brunner's information on the TLG CD-ROM, regarding the non-TLG materials such as were included on TLG disk "A" -- the present plan is for the Packard Humanities Institute (PHI) jointly with the Center for Computer Analysis of Texts (CCAT) at Penn to produce an "experimental" CD-ROM at the heart of which will be various Latin texts (being prepared at PHI), Greek Papyri (Duke) and Inscriptions (Cornell, Princeton Institute for Advanced Study), and a variety of biblical and related materials in various languages (Hebrew, Greek, Latin, Coptic, Syriac, Aramaic, Armenian) as well as sample files from various other sources and projects (e.g. Dante Commentary project, Milton Latin project, Kierkegaard in Danish, Arabic poetry, some word lists, etc.). I call this disk a "Sampler," and it is scheduled to be ready for distribution by the end of this month (December). Again, the aim is to give scholars, software developers, etc., a body of consistently formatted (more or less!) materials on which to work in various directions and at little cost. There will be a notice on HUMANIST when the PHI/CCAT joint CD-ROM "Sampler" is ready for distribution! Bob Kraft for CCAT ========================================================================= Date: 4 December 1987, 13:10:19 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributed by "James H. Coombs" Subject: Enlightening the publishers, encoding Semitic (65 lines) Michael Sperberg-McQueen has suggested that we need a text archive in North America. Is that a generally felt need? What could a text archive here offer that Oxford does not offer? Certainly, shipping would be faster and cheaper, but is there something more substantial? Or are there real hardships now? Or, could our needs be addressed by some adjustments in the services that Oxford provides---such that we might better discuss our needs with Oxford instead of duplicating their efforts. If we DO need an archive in North America, who should institute and manage it? What is the proper sort of organization? And what's in it for them? Will it be a costly burden? Or are we willing to pay for materials in order support such a facility? Would it be commercial or non-profit? --Jim Dr. James H. Coombs Software Engineer, Research Institute for Research in Information and Scholarship (IRIS) Brown University jazbo@brownvm.bitnet ========================================================================= Date: 4 December 1987, 13:17:33 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributed by James H. Coombs Subject: ACH text markup Some thoughts on guidelines for text markup, in response to Michael Sperberg-McQueen's note. 1) Markup must be descriptive. 2) Delimiters should be '<' and '>' in conformance with the default of the new SMGL standard. 3) Markup/tag attributes should be allowed, and attribute names should be descriptive. 4) There should be no attempt at establishing a "closed" tag set. The current AAP SGML application allows for definition of new tags, but it does not support such definition in a practical way. The consequence is that people will use "list items," for example, when they should be using "line of poetry." Within these guidelines, it can only be healthy to provide a list of tags that people should choose from when tagging certain entities. The point of this is that we cannot predict what textual elements will be of significance for what researchers. We have to allow for the discovery of textual elements that no one has categorized previously. At the same time, there is no point in having 30 different tags for "line of poetry." The guidelines should make clear that DESCRIPTION is paramount and that the use of particular tags is secondary. 5) In so far as possible, there should be requirements for minimal tagging. It would be a mistake to fail to tag "verse paragraphs" and "book" in *Paradise Lost*, for example, and any version that does not provide such tags must be considered inadequate and, ultimately, rejected. 6) There can be no limit placed on "maximal" tagging. If a researcher needs every word tagged, we must allow for this. It is a trivial matter to ignore or strip out such tagging. Researchers with such needs cannot, at least for now, reasonably expect that others will provide such exhaustive tagging. Putting (5) and (6) together, we have a principle of base-level tagging with as much additional information as the original researchers care to provide. Where there are common needs that may not be shared by the original researcher, it may still be appropriate to require that those common needs be met. For example, the original researcher may not need to know about verse paragraphs, but we should still require that they be appropriately tagged. 7) Referential markup should be used in place of "special" characters, such as accented characters. If a particular configuration supports an acute accent, for example, in hardware, the researcher may take advantage of those facilities. When checking the document into an archive or passing it on to others, however, the acute accent must be translated to "á" (or whatever the SGML standard specifies---don't have my copy at hand). This is off the top of my head, but enough for now. I have other ideas on this stuff, but they can come out if discussion ensues. I am interested in the project, but I don't have the time or money to travel to meetings right now. I also get the feeling from the preliminary document that you posted that people are re-inventing SGML. We already have, in SGML, a metalanguage for generating descriptive markup languages. I don't think that we need Document Type Definitions right now, but even they might turn out to be useful once SGML is established and SGML-support tools become widespread. I haven't provided any defense of descriptive markup or SGML here. We discuss the advantages of these systems in "Markup Systems and the Future of Scholarly Text Processing," *Communications of the ACM*, November 1987--- written with Allen H. Renear and Steven J. DeRose. Interested in any and all comments! --Jim Dr. James H. Coombs Software Engineer, Research Institute for Research in Information and Scholarship (IRIS) Brown University jazbo@brownvm.bitnet ========================================================================= Date: 4 December 1987, 16:03:15 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: "International Educational Computing" POSSIBLE COURSE 3551 - SUMMER, 1988 - R. G. RAGSDALE The European Conference on Computers in Education is being held in Lausanne, Switzerland, July 24-29, 1988. When the World Conference on Computers in Education was held there in 1981, a substantial number of OISE students attended, some as a portion of a course offered by Bob McLean. I propose to offer a course, 3551 - International Educational Computing: An Interaction of Values and Technology, which would take place in Switzerland, around the dates of the conference. Permission to offer the course formally depends on several factors, including the number of students likely to attend. Plans are incomplete at this time, but a projection of the plans indicates the following format, assuming that all necessary arrangements for housing, classroom, etc., can be made. The course participants will meet together July 18-22 to study previous research and theory on values and technology, methods for evaluating the effects of technology, and case studies in business and education of technology-value conflicts. The daily schedule will have more formal sessions (lecture, seminar) in the mornings and less formal sessions (group discussions) in the early evening, with afternoons free for individual study or other activities (scheduled class time for each day will be four hours, probably two and a half in the morning, one and a half in the evening). During this week, participants will select and prepare for the issue(s) they plan to study during the conference. At the conference, each participant will focus on one or more topics, such as a particular age range, subject matter area, or type of computer application. They will collect material from the formal sessions, but also from informal interviews with others attending the conference, both presenters and those who are only attending. August 1 is a Swiss national holiday (which all course participants should enjoy), so the remaining sessions will take place August 2-5, following the same schedule as the first week. During this time the results of the previous week's activities will be presented and group feedback will obtained. Final papers will be due in mid-September. Preliminary arrangements for accommodation and classroom space have been made at Aiglon College, an English boarding school in Chesieres, Switzerland, about one hour from Lausanne by train and bus. Room rates include the "taxe de sejour" which gives access to the recreational facilities of Villars, such as the swimming pool, ice skating, etc. _________ ____ Estimated_Cost Based on 1987 prices, the airfare to Geneva is $927, room and board is 860SF (Swiss Francs) for 20 days, and the conference registration is 280SF (higher after January 31). At current exchange rates, these items total almost $2,000. A better estimate would include ground transportation, other likely expenses (chocolate, etc.), and possible price increases. It seems extremely unlikely that necessary expenses would exceed $2,500. Anyone who is interested in participating in this course should indicate this to me in writing (including, if possible, your "estimate of certainty"). ========================================================================= Date: 6 December 1987, 11:02:37 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributed by Nancy Ide Subject: TEXT MARK-UP (73 lines) I recently responded to Jim Coombs' remarks concerning the principles developed at Poughkeepsie as a basis for the development of a standard for encoding machine-readable texts. He suggested that we make our discussion "public," in the spirit of recent remarks on HUMANIST, and so I will briefly describe what has been said and put forth my reply. I inidicated to Jim that much of what he says is very much in the spirit of the discussions at Poughkeepsie among the 31 participants. This shou ld be made clearer in the minutes of the meeting, which Lou Burnard has drawn up and which will be available from him or me in a few days. Especially, we intend to make the standard extensible to accomodate the unforeseen needs of individual projects. I also indicated that the standard will *recommend* a minimum set of tags for texts, which is stated in the principles under number 5, I believe. We had a lively discussion on this topic (actually, all of the discussions we very lively!) at the Poughkeepsie meeting, with some disagreement about specifying a minumum. This is why *recommend* is in emphasis. The feeling at the meeting was that we can *require* nothing, but we can do our best to "guide the perplexed" and provide some idea of what it makes sense to encode regardless of how the text is originally intended to be used. I should point out here that among participants in the Poughkeepsie meeting, there were two clear perspectives on the whole issue of encoding texts: one saw most encoding as a future endeavor, and the other was focused on texts already encoded. One's opinion concerning whether most texts have been encoded already or have yet to be encoded obviously affects opinion on the importance of specifying a minimum set of tags for encoded texts. Jim responded to me suggesting that we could refuse to accept texts that had been encoded without the "minimum" tags we might expect. He made all of the excellent arguments for insisting that certain tags be included *anytime* a tex is encoded. But the problem here is that I am not sure who the "we" who is to do this refusing actually is. If someone does not provide the minimum tags but has encoded the collected works of some obscure author I am interested in, will I refuse to accept the text? If I am an archive, should I refuse to take the text--that is, is it better to have an inadequately tagged text or none at all? Admittedly, in some cases it may be better to start from scratch and re-enter a text, if the existing version is pitifully done. But most of the time it will be easier to go in and mark whatever I need to mark in the existing version than to re-enter the text entirely. Similarly, we cannot expect archives to ensure that their texts contain a minimum tag set. This was a point of considerable concern to the keepers of archives present at the meeting, and led to the final agreement that only the tags that are present (whatever they may be) in a text that is distributed by an archive will conform to the standard. This requirement in itself will necessitate the writing of programs to perform tranlsation to the new scheme, another topic addressed at some length and for which there seems to be support. However, note that the principles indicate that texts now contained in the archive need not be converted retrospectively. Naturally, although this is not required we hope that it will occur in many cases. So, the guidelines that will be developed will recommend a minimum set of tags---especially, for those things that are easily encoded when the source text is at hand and which are also obviously of use in most types of analysis. However, it does not appear to me that it is reasonable to require such tagging. We can only hope that the recommendation is enough to inspire most researchers to provide the minimum set of tags when they encode new texts. Nancy M. Ide ide@vassar.bitnet ========================================================================= Date: 6 December 1987, 11:10:43 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributed by Nancy Ide Subject: more on mark-up (34 lines) In my earlier message I neglected to summarize my reply to Jim Coombs concerning SGML. We have every expectation that the standard we devise will be an application of SGML, but until we know fully our needs it is not prudent to commit ourselves to SGML. We know, for instance, that while it is possible to define multiple parallel hierarchies in SGML it is not entirely straightforward, and such parallel hierarchies are likely to be used extensively in encoding machine-readable texts intended for literary, linguistic, and historical analysis. We hope that in any event the standard will be compatible with SGML, which, as Jim points out, is bound to become widely accepted and used. Also, Jim had some concern about our defining a meta-language, since SGML (the abstract syntax) is in fact a meta-language for describing a mark-up scheme. The concrete syntax of SGML is one mark-up scheme described by this abstract syntax. However, our goal is to provide a meta-language in which *all* existing mark-up schemes can be described (which may prove to be impossible), and it seems to us that the abstract syntax of SGML is inadequate for this task. The abstract syntax of SGML was not intended for this purpose, it should be noted. Nancy M. Ide ide@vassar.bitnet ========================================================================= Date: 6 December 1987, 11:15:37 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributed by C. S. Hunter Subject: Use of electronic communictions (29 lines) Willard notes the high percentage of "silent participants" on HUMANIST. My experience with computer conferencing systems makes his note not at all surprising. At the University of Guelph we have had our CoSy conferencing system available free of charge to all faculty for some years now. Only about 40 % of the faculty actually took us up on the offer of a free account on the system. Of that 40 %, only 25 % (or less) actively use the system more than once a week. The ratio of active to passive participants on the system is something like 1 : 9. The same is roughly true on the student system, where only about 10 % of the registered users are actual active participants. We are now studying the phenomenon to determine what factors contribute to the individual use or non-use of computer-mediated communication among academics. . C. Stuart Hunter, University of Guelph cshunter@uoguelph ========================================================================= Date: 6 December 1987, 11:41:45 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: E-mail to Australia Contributed by Emmanuel Tov IN REPLY TO THE QUESTION OF BRENDAN O'FLAHERTY (3 DEC) I CAN TELL YOU THAT MAIL FROM SYDNEY (MACQUARIE UNIV.) TO ISRAEL AND EUROPE AND THE U.S. IS FREE AS WELL AS REVERSE MAIL. EMANUEL TOV ========================================================================= Date: 6 December 1987, 16:58:56 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributed by "James H. Coombs" Subject: Text encoding [In reply to Nancy Ide's points about SGML and related matters. The inset paragraphs quote from her messages. -- ed.] We have every expectation that the standard we devise will be an application of SGML, but until we know fully our needs it is not prudent to commit ourselves to SGML. A minor philosophical point, I guess: I don't think that we CAN know our needs fully. We need standards that accommodate needs that cannot be predicted today. The practical consequence of this observation, which I'm sure Nancy would agree with, is that one should seek a "productive" system instead of a system that satisfies everything on a list, and one should not spend a lot of time developing the list. We know, for instance, that while it is possible to define multiple parallel hierarchies in SGML it is not entirely straightforward, and such parallel hierarchies are likely to be used extensively in encoding machine-readable texts intended for literary, linguistic, and historical analysis. What are "multiple parallel hierarchies"? I can guess, but I want to be sure that I understand the problem. In a most documents, we have, for example, pragmatic and syntactic hierarchies. One has no difficulty marking up documents for both at the same time (although one does not normally mark up the latter descriptively). Pragmatically, we have things like [ [ [ ] [ ] ] ] CHAPTER SECTION PARAGRAPH PARAGRAPH Syntactically, we might have [ [ ] [ [ ] ] ] S NP VP NP So far as I know, there are no difficulties in marking up both types of hierarchies. One could argue that we really have a single hierarchy here, but, conceptually at least, we have two different domains: pragmatics and syntax. Well, this distinction is bound to be controversial, to say the least! This is probably the wrong list for a discussion about syntax vs. pragmatics, etc. I can try other examples, but I'm still guessing. And I'm still wondering what the difficulty is in encoding them under SGML. However, our goal is to provide a meta-language in which *all* existing mark-up schemes can be described (which may prove to be impossible), and it seems to us that the abstract syntax of SGML is inadequate for this task. What is the practical value of a metalanguage that generates all markup languages? I would think that it would be so abstract as to be of no value. I suspect that this is part of the goal of salvaging work that has been inadequately coded. I believe that we will be better off if we worry less about the past and plan more for the future. I suppose that it's true that publishers have typesetting tapes in their basements, and that we could use those tapes. I think that we have to accept that those tapes are of little value until someone converts the coding to descriptive markup. I have the typesetting tape for the American Heritage Dictionary (sorry, can't distribute it); no one wasted time trying to figure out how to use that tape as it is now. I know of several projects that are based on that tape, and all required conversions. Ideally, the tape would have been converted once and for all (and it apparently has been now). Whether it's a dictionary or a literary text, we can expect that inadequate coding will cause considerable work for anyone attempting to use the database. A metalanguage that includes procedural markup as well as descriptive markup will not help in such a case, because one still has to map procedural markup onto descriptive markup in order to be able to work with meaningful entities (definition, paragraph, etc.). Since procedural markup tends to be performed somewhat arbitrarily and does not normally provide a one-to-one relationship between entity and markup, there is no metalanguage that will help a researcher perform the necessary conversions. What we really need is a sensible and dynamic standard. I don't think that anyone would argue that that standard should be anything other than descriptively based. Since we are going to have to convert texts to descriptive markup in order to use them anyway, why not just develop the standard and convert as necessary. Trying to save the past is just going to retard development. I haven't mentioned SGML so far. Is there a problem with SGML? I have heard complaints, and we addressed them in our article. No one expects individual scholars to master the full syntax and to generate Document Type Definitions (DTD). What we want is accurate and consistent descriptive markup. In our experience at Brown, people have no difficulties mastering the principles of descriptive markup. We can leave the development of DTDs to experts. --Jim Dr. James H. Coombs Software Engineer, Research Institute for Research in Information and Scholarship (IRIS) Brown University jazbo@brownvm.bitnet ========================================================================= Date: 6 December 1987, 17:12:16 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributed by "James H. Coombs" Subject: Markup: on requirements My thanks to Nancy Ide for moving the discussion out to HUMANIST. Things have fallen a little out of sequence, but the ideas are more important than the sequence anyway. I have also heard from Michael Sperberg-McQueen, and I hope that he will post his very informative note as well. If this discussion becomes aggravating for the majority of HUMANISTs and there is enough interest, then perhaps we can form a separate mailing list. So, here is my (unedited) reply to the issue of requirements. While we may not be able to require that people conform to a standard fully, we can refuse to accept inadequate texts. There is an atmosophere of poverty now such that we are anxious to have whatever we can get our hands on. At the extreme, even now most of us would reject a text that is all in upper case and contains errors---it turns out to be easier to do it oneself. If we consider what things will be like or could be like in a few years though, I think it's appropriate to say that there are certain minimal standards (or one must comply with with a standard). First, we don't accept just anything for other scholarly documents. Second, we will have more alternatives for sources. Third, we want high quality sources so that people won't have to keep reworking or entirely redoing. If I can't count on a text from a particular archive to meet my needs, what is my motivation for bothering with that archive; and what is the motivation for the archive's existence? I certainly would not want to see it supported by public funds. I don't think that this places an inordinate burden on individual researchers. For the most part, I'm sure that it's considerably less burdensome than ensuring that one's bibliography, for example, accords with the MLA style sheet (and what bibliography unambiguously does?). --Jim I should elaborate briefly. First, I have/had a tape of Milton's *Paradise Lost*; it was so bad that I would prefer to start from scratch. Second, I think that we have a right to expect archives to set and maintain certain standards. Perhaps they don't want to accept that responsibility right now. If not, then I think that we should be planning to develop and support a good archive. Does such an archive need several programmers for text validation and maintenance? Then they should have the support to hire them. Let's centralize the expense as much as possible. Currently, we have no idea who is entering what and how they are doing it. Even if we could get people to go to the archive, the current approach means that many people are going to have to massage texts into useful formats, and every project will have to ensure that the text is accurate. It's as if we all had to revise our copies of *Paradise Lost* and then go proof read them before we could use them. Finally, I have texts that I have entered, marked up, and proof read, but I'm reluctant to check them into an archive that is inconsistent at best. Whatever professional credit I might get for the contribution---well, let's say that the effort is somewhat discredited by the state of the archive. It's like publishing a book with XYZ press instead of ABC. I would be happy to send it off to someone who provides full services and validates text, and I would be happy to make any necessary corrections. To reverse the roles, I am reluctant to acquire a text from an archive that makes no guarantees. After all, in the process of keyboarding a text, I get to read it, and the time goes quickly. It's the proofreading that is burdensome, and I still have to proofread. (Or do I get to say that I used X's text, and X is going to accept the responsibility for errors.) --Jim Dr. James H. Coombs Software Engineer, Research Institute for Research in Information and Scholarship (IRIS) Brown University jazbo@brownvm.bitnet ========================================================================= Date: 6 December 1987, 17:24:17 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: ACL Applied Natural Language Conference (833 lines) The following is republished from IRLIST, the Information Retrieval List. -- ed.] -------------------------------------------------------------------------- The printed version of the following program and registration information will be mailed to ACL members early in December. Others are encouraged to use the attached form or write for a booklet to the following address: Dr. D.E. Walker (ACL), 445 South Street - MRE 2A379, Morristown, NJ 07960, USA, or to walker@flash.bellcore.com, specifying "ACL Applied" on the subject line. ASSOCIATION FOR COMPUTATIONAL LINGUISTICS SECOND CONFERENCE ON APPLIED NATURAL LANGUAGE PROCESSING 9 - 12 February 1988 Austin Marriott at the Capitol, Austin, Texas, USA Tutorials: Joe C. Thompson Conference Center, University of Texas at Austin ADVANCE PROGRAM Features: Six introductory and advanced tutorials Three days of papers on the state-of-the-art Distinguished luncheon speakers A panel of industry leaders Exhibits and demonstrations REGISTRATION : 7:30am - 3:00pm, Tuesday, 9 February, Joe C. Thompson Conference Center, University of Texas at Austin, 26th and Red River. 7:00pm - 9:00PM, Tuesday, 9 February 8:00am - 5:00pm, Wednesday, 10 February 8:00am - 5:00pm, Thursday, 11 February 8:00am - 12:00n, Friday, 12 February Austin Marriott at the Capitol, 701 East 11th Street EXHIBITS : 10:00am - 6:00pm, Wednesday, 10 February 10:00am - 6:00pm, Thursday, 11 February 9:00am - 12:00n, Friday, 12 February Austin Marriott at the Capitol TUTORIALS: TUESDAY, FEBRUARY 9, 1988 Joe C. Thompson Conference Center, University of Texas at Austin, 26th and Red River. 8:30 12:30 INTRODUCTION TO NATURAL LANGUAGE PROCESSING James Allen, University of Rochester 8:30 12:30 MACHINE-READABLE DICTIONARIES: A COMPUTATIONAL LINGUISTICS PERSPECTIVE Bran Boguraev, Cambridge University, and Beth Levin, Northwestern University 8:30 12:30 SPOKEN LANGUAGE SYSTEMS: PAST, PRESENT, AND FUTURE Salim Roucos, BBN Laboratories, Inc. 1:30 5:30 THE TECHNOLOGY OF NATURAL LANGUAGE INTERFACES Carole Hafner, Northeastern University 1:30 5:30 THE ROLE OF LOGIC IN REPRESENTING MEANING AND KNOWLEDGE Bob Moore, SRI International 1:30 5:30 MACHINE TRANSLATION Sergei Nirenburg, Carnegie Mellon University RECEPTION: 7:00pm - 9:00pm, Tuesday, 9 February Austin Marriott at the Capitol, 701 East 11th Street GENERAL SESSIONS WEDNESDAY, FEBRUARY 10, 1988 9:00 9:15 OPENING REMARKS AND ANNOUNCEMENTS Norman Sondheimer, General Chair (USC/Information Sciences Institute) Bruce Ballard, Program Chair (AT&T Bell Laboratories) Jonathan Slocum, Local Arrangements Chair (MCC) Donald E. Walker, ACL Secretary-Treasurer (Bell Communications Research) SESSION 1: SYSTEMS 9:15 9:40 The Multimedia Articulation of Answers in a Natural Language Query System Susan E. Brennan (Hewlett Packard) 9:40 10:05 A News Story Categorization System Philip J. Hayes, Laura E. Knecht and Monica J. Cellio (Carnegie Group) 10:05 10:30 An Architecture for Anaphora Resolution Elaine Rich and Susann Luper-Foy (MCC) SESSION 2: GENERATION 11:00 11:25 The SEMSYN Generation System: Ingredients, Applications, Prospects Dietmar Roesner (Universitaet Stuttgart) 11:25 11:50 Two Simple Prediction Algorithms to Facilitate Text Production Lois Boggess (Mississippi State University) 11:50 12:15 From Water to Wine: Generating Natural Language Text from Today's Applications Programs David D. McDonald (Brattle Research Corporation) and Marie M. Meteer (Bolt, Beranek and Newman) 12:15 2:00 LUNCHEON Guest Speaker: Grant Dove Chairman and CEO of MCC. Prior to joining MCC in July l987, Mr. Dove had been with Texas Instruments for 28 years, having served as Executive Vice President since l982. SESSION 3: SYNTAX AND SEMANTICS 2:00 2:25 Improved Portability and Parsing Through Interactive Acquisition of Semantic Information Francois-Michel Lang and Lynettte Hirschman (Unisys) 2:25 2:50 Handling Scope Ambiguities in English Sven Hurum (University of Alberta) 2:50 3:15 Responding to Semantically Ill-Formed Input Ralph Grishman and Ping Peng (New York University) and Evaluation of a Parallel Chart Parser Ralph Grishman and Mahesh Chitrao (New York University) SESSION 4: MORPHOLOGY AND THE LEXICON 3:45 4:10 Triphone Analysis: A Combined Method for the Correction of Orthographical and Typographical Errors Koenraad DeSmedt (University of Nijmegen) and Brigette van Berkel (TNO Institute of Applied Computer Science) 4:10 4:35 Creating and Querying Hierarchical Lexical Databases Mary S. Neff, Roy J. Byrd, and Omneya A. Rizk (IBM Watson Research Center) 4:35 5:00 Cn yur cmputr raed ths? Linda G. Means (General Motors) 5:00 5:25 Building a Large Thesaurus for Information Retrieval Edward A. Fox, J. Terry Nutter (Virginia Tech), Thomas Ahlswede, Martha Evens (Illinois Institute of Technology), and Judith Markowitz (Navistar International) 6:30 **** RECEPTION Microelectronics and Computer Technology Corporation (MCC) THURSDAY, FEBRUARY 11, 1988 SESSION 5: SYSTEMS 8:30 8:55 Application-Specific Issues in NLI Development for a Diagnostic Expert System Karen L. Ryan, Rebecca Root and Duane Olawsky (Honeywell) 8:55 9:20 The MULTIVOC Text-to-Speech System Olivier Emorine and Pierre Martin (Cap Sogeti Innovation) 9:20 9:45 Structure from Anarchy: Meta Level Representation of Expert System Predicates for Natural Language Interfaces Galina Datskovsky Moerdler (Columbia University) SESSION 6: TEXT PROCESSING 10:15 10:40 Integrating Top-Down and Bottom-Up Strategies in a Text Processing System Lisa F. Rau and Paul S. Jacobs (General Electric) 10:40 11:05 A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text Kenneth W. Church (AT&T Bell Laboratories) 11:05 11:30 A Tool for Investigating the Synonymy Relation in a Sense Disambiguated Thesaurus Martin S. Chodorow, Yael Ravin (IBM Watson Research Center) and Howard E. Sachar (IBM Data Systems Division) 11:30 11:55 Dictionary Text Entries as a Source of Knowledge for Syntactic and Other Disambiguations Karen Jensen and Jean-Louis Binot (IBM Watson Research Center) 12:00 1:45 LUNCHEON Guest Speaker: Donald E. Walker Manager of Artificial Intelligence and Information Science Research at Bell Communications Research, and Secretary-Treasurer of ACL and IJCAII.. SESSION 7: MACHINE TRANSLATION 1:45 2:10 EUROTRA: Practical Experience with a Multilingual Machine Translation System under Development Giovanni B. Varile and Peter Lau (Commission of the European Communities) 2:10 2:35 Valency and MT: Recent Developments in the METAL System Rudi Gebruers (Katholieke Universiteit Leuven) 3:00 5:00 PANEL: Natural Language Interfaces: Present and Future Moderator: Norman Sondheimer (USC/Information Sciences Institute) Panelists: Robert J. Bobrow (BBN Laboratories), Developer of RUS Jerrold Ginsparg (Natural Language Inc.), Developer of DataTalker Larry Harris (Artificial Intelligence Corporation), Developer of Intellect Gary G. Hendrix (Symantec), Developer of Q&A Steve Klein (Singular Solutions Engineering) Co-Developer of Lotus HOW 5:00 6:00 RECEPTION Austin Marriott at the Capitol FRIDAY, FEBRUARY 12, 1988 SESSION 8: SYSTEMS 8:30 8:55 Automatically Generating Natural Language Reports in an Office Environment Jugal Kalita and Sunil Shende (University of Pennsylvania) 8:55 9:20 Luke: An Experiment in the Early Integration of Natural Language Processing David A. Wroblewski and Elaine A. Rich (MCC) 9:20 9:45 The Experience of Developing a Large-Scale Natural Language Text Processing System: CRITIQUE Stephen D. Richardson and Lisa C. Braden-Harder (IBM Watson Research Center) SESSION 9: MORPHOLOGY AND THE LEXICON 10:15 10:40 Computational Techniques for Improved Name Search Beatrice T. Oshika (Sparta), Bruce Evans (TRW), Janet Tom (Systems Development Corporation), and Filip Machi (UC Berkeley) 10:40 11:05 The TICC: Parsing Interesting Text David Allport (University of Sussex) 11:05 11:30 Finding Clauses in Unrestricted Text by Stochastic and Finitary Methods Eva Ejerhed (University of Umea) 11:30 11:55 Morphological Processing in the Nabu System Jonathan Slocum (MCC) SESSION 10: SYNTAX AND SEMANTICS 1:30 1:55 Localizing Expression of Ambiguity John Bear and Jerry R. Hobbs (SRI International) 1:55 2:20 Combinatorial Disambiguation Paula S. Newman (IBM Los Angeles Scientific Center) 2:20 2:45 Canonical Representation in NLP System Design: A Critical Evaluation Kent Wittenburg and Jim Barnett (MCC) REGISTRATION INFORMATION AND DIRECTIONS PREREGISTRATION MUST BE RECEIVED BY 25 JANUARY; after that date, please wait to register at the Conference itself. Complete the attached ``Application for Registration'' and send it with a check payable to Association for Computational Linguistics or ACL to Donald E. Walker (ACL), Bell Communications Research, 445 South Street MRE 2A379, Morristown, NJ 07960, USA; (201) 829-4312; walker@flash.bellcore.com; ucbvax!bellcore!walker. If a registration is cancelled before 25 January, the registration fee, less $15 for administrative costs, will be returned. Full conference registrants will also receive lunch on the 10th and 11th. Registration includes one copy of the Proceedings, available at the Conference. Copies of the Proceedings at $20 for members ($30 for nonmembers) may be ordered on the registration form or by mail prepaid from Walker. TUTORIALS : Attendance is limited. Preregistration is encouraged to ensure a place and the availability of syllabus materials. RECEPTIONS : The Microelectronics and Computer Technology Corporation (MCC) will host a reception for the conference at its site on Wednesday evening. To aid in planning we ask that you complete the RSVP on the registration form. In addition there will be receptions at the conference hotel on Tuesday evening and Thursday afternoon. EXHIBITS AND DEMONSTRATIONS : Facilities for exhibits and system demonstrations will be available. Persons wishing to arrange an exhibit or present a demonstration should contact Kent Wittenburg, MCC, 3500 W. Balcones Center Drive, Austin, TX 78759; (512)338-3626; wittenburg@mcc.com as soon as possible. HOTEL RESERVATIONS : Reservations at the Austin Marriott at the Capitol MUST be made using the Hotel Reservation Form included with this flyer. Reservations subject to guest room availability for reservations received after 25 January 1988. Please mail to: Austin Marriott at the Capitol Attn: Reservation Office 701 East 11th Street Austin, Texas 78701 (512) 478-1111 AIR TRANSPORTATION : American Airlines offers conferees a special 35% off full coach fare, 30% off full Y fares for passengers originating in Canada, or 5% off any published roundtrip airfare applicable to and from Austin. Call toll free 1-800-433-1790 and give the conference's STAR number S81816. If you normally use the service of a travel agent, please have them make your reservations through this number. DIRECTIONS : There is one public exit from Robert Mueller Airport in Austin; at the traffic light, turn right (onto Manor Rd.) and drive to Airport Blvd. (approx. 1/4 - 1/2 mile). Turn right on Airport Blvd., and drive to highway I-35 (approx. 1-2 miles). Turn left (south) onto I-35, heading toward town. Get off at the 11th-12th St. (Capitol) exit, and drive an extra block on the access road, to 11th St. The Marriott is on the SW corner of that intersection (across 11th St., on the right). A parking garage is attached. The Marriott at the Capitol operates a free shuttle to and from the airport. Cab fare would be approx. $6. The Joe C. Thompson Conference Center parking lot is on the SW corner of Red River and 26th Street; the entrance is on Red River, and a guard will point out the center (adjacent, to the west). Directions to JCT from Marriott parking garage: Turn right (S) on I-35 frontage road, turn right (W) on 10th St., turn right (N) on Red River, and drive [almost] to 26th. APPLICATION FOR REGISTRATION Association for Computational Linguistics, Second Conference on Applied Natural Language Processing, 9 - 12 February 1988, Austin, Texas NAME _________________________________________________________________ Last First Middle AFFILIATION (Short form for badge ID) ___________________________________________________________ ADDRESS _______________________________________________________________ _______________________________________________________________________ _______________________________________________________________________ _______________________________________________________________________ TELEPHONE ____________________________________________________________ COMPUTER NET ADDRESS _________________________________________________ REGISTRATION INFORMATION (circle fee) NOTE: Only those whose dues are paid for 1988 can register as members. ACL NON- FULL-TIME MEMBER* MEMBER* STUDENT* by 25 JANUARY $170 $205 $85 at the Conference $220 $255 $110 *Member and Non-Member fees include Wednesday and Thursday luncheons; Students can purchase luncheon tickets at a reduced rate. LUNCHEON TICKETS FOR STUDENTS: $10 each; Wednesday _____; Thursday ________; amount enclosed $ ______ LUNCHEON TICKETS FOR GUESTS: $15 each; Wednesday _____; Thursday ________; amount enclosed $ ______ SPECIAL MEALS: VEGETARIAN ______ KOSHER ______ EXTRA PROCEEDINGS: $20 members; $30 non-members; amount enclosed $ ______ TUTORIAL INFORMATION (circle fee and check at most two tutorials) FEE PER TUTORIAL ACL NON- FULL-TIME MEMBER MEMBER* STUDENT by 25 January $75 $110 $50 at the Conference $100 $135 $65 *Non-member tutorial fee includes ACL membership for 1988; do not pay non-member fee for BOTH registration and tutorials. Morning Tutorials: select ONE: INTRODUCTION: Allen LEXICONS: Boguraev & SPEECH: Roucos Levin Afternoon Tutorials: select ONE: INTERFACES: Hafner LOGIC: Moore TRANSLATION: Nirenburg TOTAL PAYMENT MUST BE INCLUDED : $ ____________ (Registration, Luncheons, Extra Proceedings, Tutorials) Make checks payable to ASSOCIATION FOR COMPUTATIONAL LINGUISTICS or ACL. Credit cards cannot be honored. RSVP for MCC Reception: Please check if you plan to attend the MCC reception on Wednesday evening, February 10th. _________ Send Application for Registration WITH PAYMENT before 25 January to the address below; AFTER 25 January, wait to register at Conference: Donald E. Walker (ACL) Bell Communications Research 445 South Street, MRE 2A379 Morristown, NJ 07960, USA (201)829-4312 walker@flash.bellcore.com ucbvax!bellcore!walker APPLICATION FOR HOTEL REGISTRATION Reservations subject to guest room availability for reservations received after 25 January 1988. In the event of unanticipated demand, rooms will be assigned on a first-come, first-served basis. Please send in your reservation request as early as possible. NAME _________________________________________________________________ Last First Middle AFFILIATION ___________________________________________________________ ADDRESS _______________________________________________________________ _______________________________________________________________________ _______________________________________________________________________ _______________________________________________________________________ TELEPHONE ____________________________________________________________ Room Requirements Single $64 ________ Double $74 ________ Date and time of arrival _________________________________________ Date and time of departure _______________________________________ Complete if arrival after 6PM __________________________________________________________________ Credit Card Name Number Expiration Date Send Application for Hotel Reservation to: Austin Marriott at the Capitol Attn: Reservation Office 701 East 11th Street Austin, Texas 78701 (512) 478-1111 ASSOCIATION FOR COMPUTATIONAL LINGUISTICS SECOND CONFERENCE ON APPLIED NATURAL LANGUAGE PROCESSING TUTORIALS 9 February 1988 Joe C. Thompson Conference Center, University of Texas at Austin Morning 8:30 A.M. - 12:30 P.M. 8:30 12:30 INTRODUCTION TO NATURAL LANGUAGE PROCESSING James Allen, University of Rochester ABSTRACT This tutorial will cover the basic concepts underlying the construction of natural language processing systems. These include basic parsing techniques, semantic interpretation and the representation of sentence meaning, as well as knowledge representation and techniques for understanding natural language in context. In particular, the topics to be addressed in detail will include augmented transition networks (ATNs), augmented context-free grammars, the representation of lexical meaning, especially looking at case-grammar based representations, and the interpretation of pronouns and ellipsis. In addition, there will be an overview of knowledge representation, including semantic networks, frame-based systems, and logic, and the use of general world knowledge in language understanding, including scripts and plans. Given the large range of issues and techniques, an emphasis will be placed on those aspects relevant to existing practical natural language systems, such as interfaces to database systems. The remaining issues will be more quickly surveyed to give the attendee an idea of what techniques will become important in the next generation of natural language systems. The lecture notes will include an extensive bibliography of work in each area. INTENDED AUDIENCE This tutorial is aimed at people who are interested in learning the fundamental techniques and ideas relevant to natural language processing. It will be useful to managers who want an overview of the field, to programmers starting research and development in the natural language area, and to researchers in related disciplines such as linguistics who want a survey of the computational approaches to language. BIOGRAPHICAL SKETCH Dr. James Allen is an Associate Professor and Chairman of the Computer Science Department at the University of Rochester. He is editor of the journal Computational Linguistics and author of the book Natural Language Understanding, published in 1987. In 1984, he received a five-year Presidential Young Investigator award for his research in Artificial Intelligence. 8:30 12:30 MACHINE-READABLE DICTIONARIES: A COMPUTATIONAL LINGUISTICS PERSPECTIVE Branimir Boguraev, Cambridge University, and Beth Levin, Northwestern University ABSTRACT The lexical information contained explicitly and implicitly in machine-readable dictionaries (MRDs) can support a wide range of activities in computational linguistics, both of theoretical interest and of practical importance. This tutorial falls into two parts. The first part will focus on some characteristics of raw lexical data in electronic sources, which make MRDs particularly relevant to natural language processing applications. The second part will discuss how theoretical linguistic research into the lexicon can enhance the contribution of MRDs to applied computational linguistics. The first half will discuss issues concerning the placement of rich lexical resources on-line; raise questions related to the suitability, and ultimately the utility, of MRDs for automatic natural language processing; outline a methodology aimed at extracting maximally usable subsets of the dictionary with minimal introduction of errors; and present ways in which specific use can be made of the lexical data for the construction of practical language processing systems with substantial coverage. The second half of the tutorial will review current theoretical linguistic research on the lexicon, emphasizing proposals concerning the nature of lexical representation and lexical organization. This overview will provide the context for an examination of how the results of this research can be brought to bear on the problem of extracting syntactic and semantic information encoded in dictionary entries, but not overtly signaled to the dictionary user. INTENDED AUDIENCE This tutorial presupposes some familiarity with work in both computational and theoretical linguistics. It is aimed at researchers in natural language processing and theoretical linguists who want to take advantage of the resources available in MRDs for both applied and theoretical purposes. The issues of providing substantial lexical coverage and system transportability are addressed, thus making this tutorial of particular relevance to those concerned with the automatic acquisition, on a large scale and in a flexible format, of phonological, syntactic, and semantic information for nlp systems. BIOGRAPHICAL SKETCHES Dr. Branimir Boguraev is an SERC (UK Science & Engineering Research Council) Advanced Research Fellow at the University of Cambridge. He has been with the Computer Laboratory since 1975, and completed a doctoral thesis in natural language processing there in 1979. Recently he has been involved in the development of computational tools for natural language processing, funded by grants awarded by the UK Alvey Programme in Information Technology. Dr. Beth Levin is an Assistant Professor in the Department of Linguistics, Northwestern University, Evanston, IL. She was a System Development Foundation Research Fellow at the MIT Center for Cognitive Science from 1983-1987 where she assumed major responsibility for directing the MIT Lexicon Project. She received her Ph.D. in Electrical Engineering and Computer Science from MIT in June 1983. 8:30 12:30 SPOKEN LANGUAGE SYSTEMS: PAST, PRESENT, AND FUTURE Salim Roucos, BBN Laboratories, Inc. ABSTRACT: This tutorial will present the issues in developing spoken language systems for natural speech communication between a person and a machine. In particular, the performance of complex tasks using large vocabularies and unrestricted sentence structures will be examined. The first Advanced Research Projects Agency (ARPA) Speech Understanding Research project during the seventies will be reviewed, and then the current state-of-the-art in continuous speech recognition and natural language processing will be described. Finally, the types of spoken language systems' capabilities expected to be developed during the next two to three years will be presented. The technical issues that will be covered include acoustic-phonetic modeling, syntax, semantics, plan recognition and discourse, and the issues for integrating these knowledge sources for speech understanding. In addition, computational requirements for real-time understanding, and performance evaluation methodology will be described. Some of the human factors of speech understanding in the context of performing interactive tasks using an integrated interface will also be discussed. INTENDED AUDIENCE: This tutorial is aimed at technical managers, product developers, and technical staff interested in learning about spoken language systems and their potential applications. No expertise in either speech or natural language will be assumed in introducing the technical details in the tutorial. BIOGRAPHICAL SKETCH: Dr. Salim Roucos has worked for seven years at BBN Laboratories in speech processing such as continuous speech recognition, speaker recognition, and speech compression. More recently, he has been the principal investigator on integrating speech recognition and natural language understanding for developing a spoken language system. His areas of interest are statistical pattern recognition and language modeling. Dr. Roucos is chairman of the Digital Signal Processing committee of the IEEE ASSP society. Afternoon 1:30 P.M. - 5:30 P.M. 1:30 5:30 THE TECHNOLOGY OF NATURAL LANGUAGE INTERFACES Carole D. Hafner, Northeastern University ABSTRACT This tutorial will describe the development of natural language processing from a research topic into a commercial technology. This will include a description of some key research projects of the 1970's and early 1980's which developed methods for building natural language query interfaces, initially restricted to just one database, and later made "transportable" to many different applications. The further development of this technology into commercial software products will be discussed and illustrated by a survey of several current products, including both micro-computer NL systems and those offered on higher-performance machines. The qualities a user should look for in a NL interface will be considered, both in terms of linguistic capabilities and general ease of use. Finally, some of the remaining "hard problems" that current technology has not yet solved in a satisfactory way will be discussed. INTENDED AUDIENCE This tutorial is aimed at people who are not well acquainted with natural language interfaces and who would like to learn about 1) the capabilities of current systems, and 2) the technology that underlies these capabilities. BIOGRAPHICAL SKETCH Dr. Carole D. Hafner is Associate Professor of Computer Science at Northeastern University. After receiving her Ph.D. in Computer and Communication Sciences from the University of Michigan, she spent several years as a Staff Scientist at General Motors Research Laboratories working on the development of a natural language interface to databases. 1:30 5:30 THE ROLE OF LOGIC IN REPRESENTING MEANING AND KNOWLEDGE Robert C. Moore, SRI International ABSTRACT This tutorial will survey the use of logic to represent the meaning of utterances and the extra-linguistic knowledge needed to produce and interpret utterances in natural-language processing systems. Problems to be discussed in meaning representation include quantification, propositional attitudes, comparatives, mass terms and plurals, tense and aspect, and event sentences and adverbials. Logic-based methods (unification) for systematic specification of the correspondence between syntax and semantics in natural language processing systems will also be touched on. In the discussion of the representation of extra-linguistic knowledge, special attention will be devoted to the role played by knowledge of speakers' and hearers' mental states (particularly their knowledge and beliefs) in the generation and interpretation of utterances and logical formalisms for representing and reasoning about knowledge of those states. INTENDED AUDIENCE This tutorial is aimed at implementors of natural-language processing systems and others interested in logical approaches to the problems of meaning representation and knowledge representation in such systems. BIOGRAPHICAL SKETCH Dr. Robert C. Moore is a staff scientist in the Artificial Intelligence Center of SRI International. Since joining SRI in 1977, Dr. Moore has carried out research on natural-language processing, knowledge representation, automatic deduction, and nonmonotonic reasoning. In 1986-87 he was the first director of SRI's Computer Science Research Centre in Cambridge, England. Dr. Moore received his PhD from MIT in 1979. 1:30 5:30 MACHINE TRANSLATION Sergei Nirenburg, Carnegie Mellon University ABSTRACT The central problems faced by a Machine Translation (MT) research project are 1) the design and implementation of automatic natural language analyzers and generators that manipulate morphological, syntactic, semantic and pragmatic knowledge; and 2) the design, acquisition and maintenance of dictionaries and grammars. Since a short-term goal (or even medium term goal) of building a system that performs fully automated machine translation of unconstrained text is not feasible, an MT project must carefully constrain its objectives. This tutorial will describe the knowledge and processing requirements for an MT system. It will present and analyze the set of design choices for MT projects including distinguishing features such as long-term/short-term, academic/commercial, fully/partially automated, direct/transfer/interlingua, pre-/post-/interactive editing. The knowledge acquisition needs of an MT system, with an emphasis on interactive knowledge acquisition tools that facilitate the task of compiling the various dictionaries for an MT system will be discussed. In addition, expectations, possibilities and prospects for immediate application of machine translation technology will be considered. Finally, a brief survey of MT research and development work around the world will be presented. INTENDED AUDIENCE This tutorial is aimed at at a general audience that could include both students looking for an application area and testbed for their ideas in natural language processing and people contemplating starting an MT or machine-aided translation project. BIOGRAPHICAL SKETCH Dr. Sergei Nirenburg, Research Scientist at the Center for Machine Translation at Carnegie-Mellon University, holds an M.Sc. in Computational Linguistics from Kharkov State University, USSR, and a Ph.D. in Linguistics from the Hebrew University of Jerusalem, Israel. He has published in the fields of parsing, generation, machine translation, knowledge representation and acquisition, and planning. Dr. Nirenburg is Editor of the journal Computers and Translation. SECOND CONFERENCE ON APPLIED NATURAL LANGUAGE PROCESSING Conference Committee General Chair Norman Sondheimer, USC/Information Sciences Institute Secretary-Treasurer Donald E. Walker, Bell Communications Research Program Committee Bruce Ballard (Chair), AT&T Bell Laboratories Madeleine Bates, BBN Laboratories Tim Finin, Unisys Ralph Grishman, New York University Carole Hafner, Northeastern University George Heidorn, IBM Corporation Paul Martin, SRI International Graeme Ritchie, University of Edinburgh Harry Tennant, Texas Instruments Tutorials Martha Palmer, Unisys Local Arrangements Jonathan Slocum, MCC (Chair) Elaine Rich, MCC Exhibits and Demonstrations Kent Wittenburg, MCC Publicity Jeffrey Hill and Brenda Nashawaty, Artificial Intelligence Corporation ------------------------------ ========================================================================= Date: 6 December 1987, 18:22:13 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contibuted by Robert Amsler Subject: Reply to James H. Coombs `ACH Text markup' message (109 lines) (I'll make this reply public from the start since Nancy Ide already had to double-back and make her's public afterwards. It may, however, become a suitable topic for a more extended private discussion between those with an interest in text encoding standards.) As Nancy already noted, SGML is the most likely model which will be used for the Humanities Text Standard, however there was considerable concern at the meeting by the French delegation about the workshop endorsing SGML as the official standard to be emulated. In view of that, it was deemed essential to avoid specifically saying this in favor of the broader statement that we'd attempt to be compatible with applicable existing standards where possible. Specifically, this also includes character transliteration standards--which are a considerable part of a humanities text standard's encoding problems. (I can hardly wait for ISO to adopt an official standard for encoding Egyptian hieroglyphics in ASCII!) I would also however like to make a strong statement that from a computational perspective there is no need for any one format to be the only one used. What is needed is that any format must be fully documented and an information-preserving transformation of the contents of any approved standard format. This was captured in the statement that the standard would be an `interchange' format. This does beg the issue of how the transformation takes place, i.e. a program needs to be written or capable of being run on the `other' format and on hardware available to the recipient of the data, but it is important to note that an SGML-like format may appear as very formidable to users who believe they will have to type in all the special codes manually--whereas a `keyboarding' format may be just as faithful in representing the information without undo burden to the typist. I'm sure you will agree to this since your excellent CACM article notes that one of the most overlooked forms of markup is the use of traditional English punctuation and spacing conventions. Returning to your message's points, your 4th point seems to me to be exceptionally good and something that we did not explicitly get to in the Poughkeepsie meeting, i.e., ``4) There should be no attempt at establishing a "closed" tag set. The current AAP SGML application allows for definition of new tags, but it does not support such definition in a practical way. The consequence is that people will use "list items," for example, when they should be using "line of poetry." Within these guidelines, it can only be healthy to provide a list of tags that people should choose from when tagging certain entities. The point of this is that we cannot predict what textual elements will be of significance for what researchers. We have to allow for the discovery of textual elements that no one has categorized previously. At the same time, there is no point in having 30 different tags for "line of poetry." The guidelines should make clear that DESCRIPTION is paramount and that the use of particular tags is secondary.'' I think the means by which this latter goal, of not having 30 equivalent tags for the same text element, is to be handled will be an important role of the text encoding standards subcommittees. What it strikes me are needed here are the database concept of a `data dictionary' to provide definitions for all the `tags' and the information-science concept of a tangled hierarchical thesaurus of tags (terms) including the 4 major categories of `broader tag' (BT), `narrower tag' (NT), `related tag' (RT) and `use instead' (XT ?) type of pointers. Thus the standards subcommittees should begin work on a thesaurus of tags which defines each tag's intended domain of text entities, its relationship to other more general and more specific tags as well as related tags and tags which should be used instead of a given tag. This means, for example, that in tagging a text feature, one could use a generic tag such as `paragraph' or a more specific tag such as `summation paragraph' and that an author would have a guidebook of established possible tags that would tell them the options and what qualifications a text object had to have in order to qualify for the use of such a tag. I do think it is important to allow for arbitrarily deep extensions of the tagging, but any standard will have failed if every author has to resort to inventing their own tags to encode text. Note, this is still independent of the issue of `required minimum tags' in that the dictionary and thesaurus of tags only tell the user how a tag should be used and what alternatives exist to its use--they do not say that a tag must (or must not) be used (except in the case of the `use instead' pointers that attempt to avoid tags being used ambiguously). My model of what such a Thesaurus should look like is the ERIC Thesaurus of Descriptors. Robert A. Amsler Bellcore 435 South St., Morristown, NJ 07960 (201) 829-4278 ========================================================================= Date: 7 December 1987, 09:40:52 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Contributed by Sterling Bjorndahl Subject: The CD-ROM debate: erasable optical disks Speaking personally, I am not going to run out and buy a CD-ROM drive for my home computer until we see what the next generation of laser technology is going to look like in terms of cost and performance. The latest I have heard on the topic is on page 12 of the December _Byte_: "Matsushita, the large Japanese parent company of Panasonic, ... will deliver a prototype of an erasable optical disk drive next year, probably in the third quarter, a company spokesperson said. It will probably be competing with products from Sony, Philips, and Kodak. Matsushita has invested heavily in the phase-change scheme, so that's probably the technology that will be incorporated in the drive it brings to market. In phase-change technology, molecules of tellurium suboxide change from an amorphous noncrystalline state to a crys- talline state and back again, depending on the type of laser beam ap- plied. But the company is also studying other approaches, including magneto-optical (which Sony is using) and dye-polymer technologies. One hurdle all the pioneers of erasable optical drives will have to leap is the slowness of the units, caused partially by the size of the optical disk head, which is much bigger than a head in a typical magnetic drive." CD-ROM is available now, with texts that I want to use, so I am glad that I have access to a system that can use that technology (Ibycus). And I think Bob Kraft has listed some excellent reasons for using CD-ROM technology where appropriate. However, I don't want to spend my own money on something that will limit my flexibility in the future. Thus my caution, until I can determine whether the new technology will be practical for me. I remember reading an article on erasable optical drives in a popular m aga- zine within recent months. I thought it was Scientific American, but I can't locate the article among the issues in my magazine rack. Does anyone else know of it? I believe it was on magneto-optical technology, and I remember it mentioning data densities on the order of current CD-ROM tech- nology rather than the current WORM technology (which seems to be worse than half as dense as CD-ROM). Sterling Bjorndahl Institute for Antiquity and Christianity Claremont Graduate School Claremont, California ========================================================================= Date: 7 December 1987, 13:23:51 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Research opportunity (58 lines) Contributed by E S Atwell Dear fellow "Computational Humanities" researcher, Do you know of any young graduates interested in corpus-based computational research on the English language? I have an opportunity for an aspiring researcher to come to Leeds for a 'taster', to work on a large collaborative project. I would be very grateful if you could forward the following details to any potential candidates you know of. Thank you for your help, Eric Steven Atwell Centre for Computer Analysis of Language and Speech AI Division, School of Computer Studies phone: +44 532 431751 ext 6307/6119 Leeds University, Leeds LS2 9JT U.K. UUCP: ...!seismo!mcvax!ai.leeds.ac.uk!eric EARN/BITNET/ARPA: eric%leeds.ai@ac.uk -------------------------------------------------------------------------- Vacancy: RESEARCH ASSOCIATE to develop a NATURAL LANGUAGE PARSER COMMUNAL: COnvivial Man-Machine Understanding through NAtural Language Artificial Intelligence Group, School of Computer Studies, Leeds University COMMUNAL is a large collaborative research project aiming to develop a robust Natural Language interface to Expert Systems, allowing access through natural English dialogue. This will require software to analyse and parse the user's input; convert it into the internal Knowledge Representation formalism; infer an appropriate response or solution; and generate output in ordinary English. At Leeds University, we will develop a powerful parser, based on a Systemic- Functional model of English grammar. The other partners in the project are: UWIST (project coordinators), the Ministry of Defence, ICL, and Longman. The appointee will be principally involved in designing, building, testing and documenting the parser software, using POPLOG prolog on a Sun Workstation. She/he will be expected to liaise with and learn from other researchers in the Centre for Computer Analysis of Language and Speech (CCALAS) and related research groups at Leeds and elsewhere; there will be opportunities for travel, to coordinate research with other partners, and to present results at international conferences. The post is for a fixed term of 18 months in the first instance, although the project may continue to a Second Phase. Starting salary is to be 8185 p.a., with an expected 7% increase in March 1988 and a further increment later. We require an appointment as soon as possible; please contact Eric Atwell via JANET (eric@uk.ac.leeds.ai) or EARN/BITNET (eric%leeds.ai@ac.uk), or by phone on (+44 532) 431751 ext.6119 or 6307 for further details of the post and how to apply; I can also give some idea of cost of living, housing etc for applicants outside the UK. ========================================================================= Date: 8 December 1987, 09:34:02 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion Comments: From: amsler@flash.bellcore.com (Robert Amsler) From: MCCARTY@UTOREPAS Subject: Coombs' ``Markup: On Requirements'' message While I have great sympathy for the goals expressed by James H. Coombs in this message, I have no optimism about the methods suggested to achieve those goals. The issue here is one of money and the existing source of such funding would be the same source of funding which currently supports research in the humanities. If we propose that a computer archive in the humanities should have all these desirable properties, then unless a new source of funding is provided, it would have to take funds away from other types of humanities research. The alternative would be to create a self-funded archive which would have to derive funding from the sale of copies of its machine-readable data. This seems possible, perhaps funded by a surcharge something like that of the current copyright clearance center to whom most libraries send payments when they make photocopies of magazine and journal articles. However such a center would have to also be prepared to legally sue users of copyrighted data who did not pay for their copies. I have no trouble with this since as Howard Webber recently said, if we interfere with the flow of funding back to the creators of intellectual property, we will eventually cut off the funds to develop such works. At present most texts in the humanities in machine-readable form are either the result of funded research or `donations' of humanists time. This creates a poorman's archive. The real owners of the bulk of the humanities texts not available are the publishers, who routinely destroy the machine-readable works they print because of a variety of excuses similar to those of monks burning manuscript pages to light their candles. We need to form an archive in which major humanities publishers would be eager to deposit their machine-readable tapes--for the purpose of generating additional revenue from their computational use. I do not think attempting to Prussianize either the volunteer humanities data enterers nor the existing marginally-funded archives would be a very good idea. Robert A. Amsler Bellcore Morristown, NJ 07960 ========================================================================= Date: 8 December 1987, 09:38:34 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion Comments: From: amsler@flash.bellcore.com (Robert Amsler) From: MCCARTY@UTOREPAS Subject: Text Encoding, a reply to James H. Coombs comments [This is a reply to some of James H. Coombs comments on Nancy Ide's message] Coombs writes: ``A minor philosophical point, I guess: I don't think that we CAN know our needs fully. We need standards that accommodate needs that cannot be predicted today. The practical consequence of this observation, which I'm sure Nancy would agree with, is that one should seek a "productive" system instead of a system that satisfies everything on a list, and one should not spend a lot of time developing the list.'' Once upon a time I was doing a survey of the keywords and descriptors used to characterize articles in the Communications of the ACM. The keywords were author-supplied terms that described their article's content; the descriptors were selected by the author's from a pre-specified set of similar content descriptors supplied by the ACM. What I discovered was that as I collected more and more instances of keywords created by the authors, there was no closure whatsoever. The set of terms just kept expanding and there were large numbers of keywords which only one author used, and then only for one article. This is how I see the problem of the selection of tags for text entities in documents. That is, if the system is completely open and `productive' there will be little commonality between author's selections--whereas if the authors are offered a wide-range of approved tags to select from, then they will manage to find tags which meet their needs. ------- ``What are "multiple parallel hierarchies"? I can guess, but I want to be sure that I understand the problem. In a most documents, we have, for example, pragmatic and syntactic hierarchies.'' The term was used at the Vassar meeting by David Barnard, I believe. The statement was in reference to the difficulty of software developers in providing software capable of interpreting a document written in the full SGML standard and as far as I'm aware there is still no full-SGML-capable software available. I assumed that he was referring to the potential division of a work into OVERLAPPING tagged segments, i.e. it would be possible to have a work with tags which ended inside the span of other still running open tags, e.g., ... ... <\line> ... <\foreign-word> ... <\sentence> <\line> The problem here is that some entity would be broken into two parts if any entity were extracted. ``What is the practical value of a metalanguage that generates all markup languages? I would think that it would be so abstract as to be of no value.'' Who said `generates'; what we were discussing was a meta-language which `parses' all markup languages--a sort of least upper bound markup language. The thought was that we needed to accomodate all reasonable existing texts with markup information already in them. We weren't intending to require existing texts with carefully worked out markup schemes to be redone in a scheme which would offer nothing new to their markings other than a different way of noting the same information. However, your next point is well-taken... ``I suspect that this is part of the goal of salvaging work that has been inadequately coded.'' Actually we were thinking of salvaging work that had been ADEQUATELY coded before a standard was available. Rather than requiring every such work to be recoded in a new format, it was hoped that the new format could accept the existing works as is. Whether that is possible or not, as Nancy stated, is an open question since we haven't yet collected the documentation for existing collections of text and their formats. ``I believe that we will be better off if we worry less about the past and plan more for the future. I suppose that it's true that publishers have typesetting tapes in their basements, and that we could use those tapes.'' Actually, no they don't. They ordinarily don't get the tapes from the printers and if they did would only get the last version on tape before the final manual cut-and-paste corrections. Publishers thus routinely ignore and discard this phototypesetting data as a useless intermediate step and save the `valuable' printing plates instead. One reason is there is no common format in which to save the data for reuse. Each printer has their own variant of the hardware/software. However, regardless of that, the next is very true. ``I think that we have to accept that those tapes are of little value until someone converts the coding to descriptive markup. .... Whether it's a dictionary or a literary text, we can expect that inadequate coding will cause considerable work for anyone attempting to use the database. A metalanguage that includes procedural markup as well as descriptive markup will not help in such a case, because one still has to map procedural markup onto descriptive markup in order to be able to work with meaningful entities (definition, paragraph, etc.). Since procedural markup tends to be performed somewhat arbitrarily and does not normally provide a one-to-one relationship between entity and markup, there is no metalanguage that will help a researcher perform the necessary conversions. You are mixing two things here. First, while it is true one cannot go from a typesetting tape to a descriptive markup in one step, it doesn't necessarily follow that the procedural markup is useless. A case in point is dictionaries. There IS NO descriptive markup standard for dictionary entries (I'm working on developing one with a number of other computational lexicologists, but none exists right now), yet the phototypesetting tapes of dictionaries are very useful to creating a descriptive markup of their contents. Headwords are typeset in boldface, possibly outdented, certainly starting new lines; parts of speech are in italic, pronunciations are in special fonts for their phonetic characters, usually enclosed in (,)'s or similar delimiters. Etymologies prefer [,]'s. Labels are in italics, sense numbers in boldface, definition texts in Roman type, with examples sometimes offset in <,>'s and sometimes in italics. All of these are positionally context-sensitive within the dictionary entry. Their descriptive nature can usually be unambiguously determined from the positional and font information on a phototypesetting tape. It would be a genuine aid to the people who today decode such phototypesetting tapes if they were in only ONE procedural markup language. At present they are in innumerably many different markup languages. ``What we really need is a sensible and dynamic standard. I don't think that anyone would argue that that standard should be anything other than descriptively based. Since we are going to have to convert texts to descriptive markup in order to use them anyway, why not just develop the standard and convert as necessary. Trying to save the past is just going to retard development.'' The reason is that ther conversion is going to have to be done fairly often UNTIL a standard for both procedural and descriptive markup is available. We have no future without the publisher's adopting a descriptive markup eventually, but until they do, we have no sensible future in hand-entry of published books when some electronic typesetting format is available. Keyboarding the OED, for instance, took several MILLION dollars! If the typesetting data had been available in machine readable form, it would probably have reduced the effort by a factor of ten. Again... Robert A. Amsler Bellcore Morristown, NJ 07960 ========================================================================= Date: 8 December 1987, 09:46:13 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Encoding schemes, text archive (reply to Coombs) Contributed by "Michael Sperberg-McQueen" James Coombs has suggested I post my reply to his comments on text encoding. This is it; I have also appended a note on the phrase 'multiple incompatible hierarchies' which seems to be unclear. ---------- Many thanks for your note about text encoding and the ACH/ACL/ALLC initiative in particular. I agree with you in every detail, as far as I can see, and will just try to clarify a couple of things quickly, which can be discussed at length later. 1 in preparing for the Vassar meeting, the ACH committee on text encoding had come up with a plan similar to the base + extensions that you suggest. The basic idea of user extensions was universally accepted, but the word 'require' was complete anathema to a number of people at the Vassar meeting. These were (a) those in charge of large existing text archives, who wanted to make very sure the guidelines would not turn into something their funding agencies would eventually require (no single quotes here!) them to conform to; and (b) some people worried about the possibility that funding agencies and their reviewers might use the 'requirements' of any guidelines to refuse funding to anyone who deviates from the required minimal tagging, even for adequate scholarly reasons. It was agreed to 'recommend' certain minimal tags (the verse paragraphs of Milton would be a good example) for newly encoded texts, but consensus could not be reached on any more than that. This was a disappointment to me, but appears on reflection to affect not the structure of the guidelines but only the choice of words to describe it. In any case, there will be a fairly extensive pre-defined tag set, I expect, but not a closed one. 2 SGML should have been mentioned explicitly in the closing document, but at the last minute some delegates objected that such details were too low-level to deserve mention in such a statement of principles. The objection was presented as being stylistic, but may have been partly substantive. In any case, the planning group at Vassar were unwilling to commit themselves to SGML without reservation, because it was not clear how well SGML proper could handle the multiple incompatible hierarchies necessary for a lot of textual research, and some objected to what they said was SGML's verbosity. The SGML supporters did succeed in persuading the group that SGML should be used, unless experience showed it simply could not. (We know full well experience will show no such thing.) Whether we try to formulate formal document type definitions or not remains to be seen, but given the unregulated habits of the texts we study, cleanly defined hierarchies of the sort DTDs are designed for won't be very easy or do anyone much good. (The OED people said that they use SGML syntax but have never bothered with a DTD and never missed one. The variety of entries in the dictionary, they said, is such that a type definition couldn't be written in advance anyway, and written after the fact would just be an inventory of the various forms of entries they had empirically found.) In any case, we are hoping not to re-invent SGML. In fact, some people were very interested in attempting to use SGML for the metalanguage required to describe existing encoding schemes, but I am uncertain whether SGML itself will be useful in defining the syntax and semantics of procedural markup or of old card-oriented encodings with author / play / act / scene / line references encoded in columns 73-80. But perhaps when I finally get my hands on a copy of the standard itself, I'll find out it can do all of that too. ------- [ end of extract from original note ] ----- HUMANISTs will be very interested in the article Coombs et al. have just published in the Communications of the ACM, and I encourage anyone interested in encoding texts or in using encoded texts to read it. Further clarifications and suggestions: 3 'multiple parallel hierarchies' (Ide) and 'multiple incompatible hierarchies' (above) seem not to be immediately clear to all. We mean: if you mark a text with BOOK PART CHAPTER PARAGRAPH SENTENCE TOKEN you have one hierarchy, but marking the same text with VOLUME PHYSICAL-PAGE PHYSICAL-LINE TOKEN gives a different and incompatible hierarchy, as does PART CANTO STANZA LINE FOOT SYLLABLE. Physical lines cannot fit at all into the text-pragmatics + syntax hierarchy, and neither can the metrical units. But physical layout and metrical units are crucial for analytic bibliographers and metrists. SGML formally allows definitions of such messy multiple hierarchies only in its optional portions -- so some SGML applications won't be able to handle such parallel hierarchies 'correctly'. No one seems to know whether this will matter in practice. Michael Sperberg-McQueen, Univ. of Illinois at Chicago ========================================================================= Date: 8 December 1987, 09:58:27 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Bibliography or bibliographer needed (40 lines) Contributed by "Rosanne G. Potter" I am editing a book on Literary Computing and Literary Criticism containing essays by Richard Bailey, Don Ross, Jr., John Smith, Paul Fortier, C. Nancy Ide, Ruth Sabol, myself and others. I am looking for someone who already has, in a fairly advanced state, a bibliography on this subject, or who is an experienced bibliographer and can put one togehter in the month of Jan and Feb (the latest). Anyone who meets either of these descriptions or can suggest the name of someone who could fulfill this need, please let me know. The book is completely written, a copy could be sent immediately to anyone seriously interested in the project. The current situation is that Art Evans at U of Penn Press wants to publish the book is, in fact, planning to get it into the Fall List, but we are both waiting on two readers reports--the UPENN board is not as enthusiastic about the possibility of a collocation between LIT CRIT and Computing as either Art or I wish--so they must be convinced by the readers reports. Whether Penn publishes it or not, I have little doubt that I will be able to find a suitable publisher--though not likely one who will publish it as quickly--and that a bibliography will be required by some reader or board soon. (I'm surprised it hasn't happened yet.) Rosanne G. Potter Department of English Iowa State University Ross Hall 203 (515) 294-2180 (Main Office) (515) 294-4617 (My office) (515) 232-4473 (Home) BITNET: GG.BIB@ISUMVS or S1.RGP@ISUMVS ========================================================================= Date: 8 December 1987, 10:06:53 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Text archives, centers (92 lines) Contributed by "Michael Sperberg-McQueen" Jim Coombs asks whether a North American text archive would get us anything we can't already get from Oxford, and if so how it should be funded and organized. I think it can, and this is why: 1 First, note: We *don't* need a North American text archive just as an archive or text repository. In this area, Oxford's work can hardly be faulted. They take everything, they work hard to document everything, they distribute as freely as their donors allow them to. 2 A North American center, though, could and should be set up to be funded by a number of universities. No one university in North America is likely to fund the kind of public service Oxford performs, let alone anything more. But ongoing funding from many schools could make it possible for a center to do some things that the Oxford Archive just does not have the funding or staff to do. 3 A North American center does not (thank heaven) have to compete with Oxford; it would make far better sense to work in cooperation. Oxford (in the person of Lou Burnard), it pleases me to say, agrees. 4 A center should provide a locus for cooperation in all the areas where universities now must pay large amounts of money to re-invent the wheel. That is: a creating new machine-readable texts (preferably according to some rational plan, as well as on demand) b documenting existing machine-readable texts both for users and for collecting libraries -- that is, the center should provide a basis for cooperative library cataloguing of machine-readable texts and the distribution of the catalogue records to the library community c upgrading existing machine-readable texts, converting them to a standard format and checking (or spot-checking) their validity d distributing all these texts to scholars e training of users and of computer advisors/consultants, via summer seminars, short-term grants to individuals to work in residence on their projects (in funding-agency terms, acting as a re-granting agency, I think this is called) f (possibly) assisting software development, either by helping establish and encourage cooperation among university-based developers or by performing development work of its own. (Frankly, I'm a little unsure how useful or feasible this is, but it's a point one often hears, so I mention it.) This is not an exhaustive list. It reflects what I know happens at Oxford, Toronto, Penn, BYU and such places. Also what ICPSR (the Inter-University Consortium for Political and Social Research) does now. 5 Funding -- it seems to me the universities should pay for this center, just as they do for ICPSR. We don't want just a consortium of humanities-computing centers, because many universities don't choose to support humanities computing that way. We want services to include enough library services that at least some university or college libraries will want to join. We want data distribution to be important enough that local data archives will want their schools to join. We want enough emphasis on humanities research that humanities departments will lobby for membership, enough benefits to computing consultants/support staff that computer centers will be in favor too. Who pays the membership fee will obviously depend on the internal politics of the institution, but the membership should be by institution. (Obviously it also must be possible to support the needs of independent scholars. But arrangements to that end must not allow schools to reap the benefits of having a center while evading the costs of supporting it.) The motive to join must, I think, be partly altruistic, partly financial. By joining such a consortium a school can help support humanities research in general, and get an awful lot of data free. Not joining, then, must mean the data costs money. It is not hard to figure that a consortium membership can be far cheaper than acquiring a scanner, paying maintenance on it, and running it within the school. Joining must also be preferable to buying the data from the consortium. 6 A center like this could support text-encoding standards in ways I think Oxford would find difficult. Without some deep changes in its funding, the Oxford Archive can't hope to convert its holdings to a new standard. A new center could make that part of its raison d'etre. 7 Obviously, there is no need to limit the membership of a consortium such as I have just described to North America. But I think that's where the need is, research in Europe being organized on different lines. 8 This concept of a consortium-supported general-purpose archive and center contrasts sharply with that of cooperation among humanities computing centers and with that of a set of regional or discipline-based centers, which have been propounded in recent years from some quarters. I hope those who prefer those plans to this will be persuaded to describe to use how they would prefer to see things organized. That would be extremely useful to us all. Michael Sperberg-McQueen, University of Illinois at Chicago ========================================================================= Date: 8 December 1987, 10:11:52 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: In reply to Robert Amsler on ACH Text Markup Contributed by "James H. Coombs" Robert says: I would also however like to make a strong statement that from a computational perspective there is no need for any one format to be the only one used. What is needed is that any format must be fully documented and an information-preserving transformation of the contents of any approved standard format. This was captured in the statement that the standard would be an `interchange' format. First, I am confused by the word "format." I would like to see something more specific, such as "markup language." Perhaps ACH does intend something broader than markup language though. I need a definition to know that I know what is being referred to. Does "format" here include things like the location of markup? Microsoft Word, I am told, stores markup at the end of the file. Well, that seems to me on further thought to be a markup language with something like a postfix syntax. While it's true that we can process files with a variety of markup languages, we need more than full documentation. We have full documentation of the procedural markup language for Waterloo SCRIPT, for example, but that markup language does not provide us with the information that we need; instead of telling researchers that an entity is a "verse paragraph", it tells Waterloo SCRIPT what procedures to perform at a particular point in the text stream. Perhaps "information preserving" is intended to capture this notion somewhat. Well, the information must be encoded in the text in the first instance before we can preserve it, and we want researchers to encode the information when they enter the text. An "interchange format"? Again, I'm a little confused, or I'm not convinced that primary needs are being addressed. We have standards for document interchange that preserve nothing but formatting information, e.g., font changes. If the primary instance of a text encodes nothing more than formatting information, then we will have information preservation, but the information that we preserve will not be the information that we really need. We will know how to print a text, but we won't know (computationally) what the individual entities are. I imagine that Robert and others agree with everything that I have said. What I am asking for is more precision and, above all, a commitment to descriptive markup. In addition, I am asking for some restrictions on the markup languages. Specifically, markup should be contiguous with the elements that are being marked up; markup should appear in the text stream. Yes, we can process files that store all of the (electronic) markup at the beginning or at the end, or part here and part there. But why should we invite this complexity? Can we reasonably expect every scholar to have access to programmers who can convert many formats into one? Or can we reasonably expect every scholar to have a separate concordance program, a separate retrieval program, etc., for every possible markup langauge? or for every markup language that is used in one of the texts that the scholar needs? While it may be convenient in the short term for someone to sit down and type in Microsoft Word, our acceptance of Word documents would be very expensive to many people for many years. What is the value of a standard that allows this? it is important to note that an SGML-like format may appear as very formidable to users who believe they will have to type in all the special codes manually--whereas a `keyboarding' format may be just as faithful in representing the information without undo burden to the typist. I'm sure you will agree to this since your excellent CACM article notes that one of the most overlooked forms of markup is the use of traditional English punctuation and spacing conventions. If SGML appears formidable to people, let's educate them, and let's develop software that minimizes the effort. Currently popular software seems to minimize markup effort, but it fails to record sufficient markup. Unaware of the deficiencies of their software, people say that they want more fonts, for example, and that they are not interested in descriptive markup. We need to make it clear to Microsoft, Dragonfly, and others that we need descriptive markup. I think that we want to stay away from the word "keyboarding." One of the complaints has been that people do not want to "keyboard" markup. So, if we use "keyboarding" for the act of performing scribal markup (punctuational and presentational) but not for the act of typing descriptive (and referential) markup, then we invite confusion. Clearly Robert is referring to the use of punctuation, and he must also be referring to the use of presentational markup (e.g., skipping space between paragraphs). Both of these forms of markup have deficiencies that descriptive (and referential) markup do not have. Above all, they are ambiguous; in addition, they are often much harder to parse. First, ambiguity. Periods are used to end sentences and they are used to indicate that a string of characters is an abbreviation (Mr.). Perhaps even worse, the same character is used to indicate that a word is possessive and to indicate the end of an imbedded quotation, e.g.: a) She told him, "Do not say 'dogs' house' anymore." How much time do we want to waste on developing algorithms to parse this markup? Why not use (b) instead? b) She told him, Do not say dogs' house anymore. Software can easily display (a) when it has recorded (b), but it cannot easily generate (b) when it has recorded (a). Of course it is easier for most of us to enter (a) than it is to enter (b); it is always easier to do half the job. Once we start accepting this responsibility, we will start convincing software developers to support our needs, and entering (b) will not require much more than entering (a) does now. Why would anyone want to record (b)? Well, they might want to print the text with open and close quotation marks. They might want to study all of the quotations, or all of the imbedded quotations. They might want to study the use of possessives. And so on. Similar problems occur with presentational markup. Yes, if we have a one-to-one mapping between presentational markup and text element, then presentational markup records all of the information that descriptive markup does. We don't really need tags for each line of poetry in *Paradise Lost*, for example. We need know only that each line of poetry is terminated with '\n', for example. There is no conflict with SGML here, however, since SGML supports this method of marking up texts. In fact, in such a case we don't really have presentational markup at all, we have descriptive markup; the markup serves not to enhance the presentation but to identify that a stream of text is a line of poetry. You also load things a little with the phrase "undue burden," Robert. In part, I am arguing that there is a "due burden" that scholars must accept if we are to get anywhere in this whole project of using computers to assist our scholarship directly. Part of that "due burden" is the proper encoding of texts. In addition, I think that you over emphasize the costs of entering descriptive markup. You do so partially by implicating that presentational markup is easier to select and perform. We argue in our article that presentational markup is considerably harder to select, and that there is no pretheoretical motivation for believing that either form of markup is easier to perform than the other. In addition, you seem to classify markup as presentational whenever it does not consist of tags. Under our functional definition of descriptive markup, at least, the markup that you are talking about is actually descriptive markup. In any case, the sort of markup that you are talking about is provided for under the SGML standard. I thank you for your kind words on our article. Our next article will help clarify the distinctions that we make and how we are making them. For the present, it seemed more important to make people aware of the advantages of descriptive markup. I hope that my response does not seem overly microscopic. I find again and again that conceptual confusion leads to unnecessary practical problems. In order for scholars to decide what form of markup to use, they must know clearly what the competing forms of markup are and what each form has to offer. Finally, your discussion of thesauri and tag sets is interesting. I'm not sure that I have anything to add to it. Need to think about it more. Cheers! --Jim Dr. James H. Coombs Software Engineer, Research Institute for Research in Information and Scholarship (IRIS) Brown University jazbo@brownvm.bitnet ========================================================================= Date: 8 December 1987, 10:18:46 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: archival politics (50 lines) Contributed by Lou Burnard [This message was delayed as a result of finger trouble on my part - I sent it to the wrong node - LB] " If I can't count on a text from a particular archive to meet my needs, what is my motivation for bothering with that archive; and what is the motivation for the archive's existence? I certainly would not want to see it supported by public funds." (JAZBO on Friday) This is fighting talk! The only defence I can offer is that a community gets the Archive it deserves. If you guys don't have the sense to agree on a common language, why should the humble archivist be expected to do it? On the contrary, I could argue that I had a responsibility to preserve accurately the current state of affairs as a dire warning to future generations. I expect librarians would like to insist that all publishers produced books to the same dimensions too (makes the shelving so much easier dontcha know). I expect there were even once some librarians who did so insist. But I doubt whether they won many friends. There is a self-evident crying need "to set and maintain standards". But it has to come from the community of users. Once a standard has been defined, it is possible for an archive to indicate whether or to what degree a text is conformant to it, and that is certainly something every user has a right to expect of an archive. Once a standard exists it is also reasonable to expect an archive to seek ways of converting and enhancing nonconformant texts. But I don't think a general purpose deposit archive has any right to decide what is or isn't acceptable until such standards have been defined. After all, most of the texts we have WERE useful to someone, at least once. Finally, may I with the greatest deference point out that an archive is emphatically not the same as a publisher. Publishers have to please their public or they go under. An archive is a mirror of its users. If all that its users wish to share is rubbish, reserving the best quality stuff for themselves, then the archive will be full of rubbish. It's up to you. Lou Burnard Oxford Text Archive ========================================================================= Date: 8 December 1987, 10:33:48 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Archives Contributed by dartvax!psc90!jdg@ihnp4 (Dr. Joel Goldfield) Regarding Jim Coombs questions concerning Michael Sperberg-McQueen's queries and comments, having a text archive at Oxford but not in North America as well seems adequate if a pledge is made by Oxford to supply these texts at a reasonable price (to be determined) and reasonably quickly. The only negative aspect I can think of at the moment if these conditions are met is that it would certainly be costly to download them via transatlantic (satellite) communication. I would hope that telephone/ modem linkage to receive this information would be cost-effective and that we wouldn't be limited to sending CD-ROM's or magnetic tape by mail. --Joel D. Goldfield Plymouth State College (NH, USA) ========================================================================= Date: 8 December 1987, 10:38:11 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Analysis of papyrological mss Contributed by Jack Abercrombie In the interest of improving a program for papyrologists, the Center for Computer Analysis of Texts is willing to make available to colleagues a preliminary version of a program for mss analysis. Manuscripts first must be digitized and stored in a TIFF format, a common file structure used in desktop publishing. The program allows one to enhance the digitized image on an EGA screen. If you have serious interest in assisting in the development work, we would be willing to send you the source code. You would have to have access to a digitizer. WRITE TO: JACKA @ PENNDRLS. John R. Abercrombie Assistant Dean for Computing, Director of the Center for Computer Analysis of Texts (University of Pennsylvania) ========================================================================= Date: 8 December 1987, 10:44:17 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Coombs' ``Markup: On Requirements'' Contributed by Richard Giordano I really can't see what all the fuss is about. If people are serious about creating both national and international standards for data "markups", a data archive, and such related issues, I don't see why we don't work in close collaboration with at least these three organizations: The American Library Association; the Library of Congress; and the Research Libraries Group. They have the resources, know-how, and institutional connections to develop such standards, communication formats, and the like--and they have a track record in this regard that extends back over twenty years. Someone mentioned somewhere here that ALA was "a conservative bastion". I have no idea what he means by this. Traditionally, ALA and LC have both taken the lead in the scholarly world in providing machine-readable information. The technical problems that LC has addressed have been fundamental to data processing. Rich ========================================================================= Date: 8 December 1987, 10:47:21 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Sending messages to HUMANIST, an editorial plea (30 lines) Dear Colleagues: The new arrangement, whereby I intercept all messages to HUMANIST, seems to have worked well so far, but about one thing some confusion has arisen. Messages apparently intended for distribution sometimes are sent to me directly, that is, to mccarty@utorepas.bitnet, rather than to HUMANIST, i.e., humanist@utoronto.bitnet. My life would occasionally be made simpler if you would all adopt the convention of sending messages for distribution only to humanist@utoronto, even if you want my opinion on whether or not they should be distributed. (In that case, put a note to that effect in the message; I can easily delete the note.) If you want to write to me *as editor* of HUMANIST, then please send your message to mccarty@utorepas. Finally, beware of the distinction between UTORONTO, where HUMANIST lives, and UTOREPAS, where I electronically reside. Thanks very much for making HUMANIST a lively place. Yours, W.M. _________________________________________________________________________ Dr. Willard McCarty / Centre for Computing in the Humanities University of Toronto / 14th floor, Robarts Library / 130 St. George St. Toronto, Canada M5S 1A5 / (416) 978-4238 / mccarty@utorepas.bitnet ========================================================================= Date: 8 December 1987, 12:29:31 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: A North-American text archive and service centre (80 lines) Contributed by Ian Lancashire Michael Sperberg-McQueen argues for a national North American text archive and service centre, supported by a consortium of colleges and universities. He contrasts this to a consortium of humanities computing centres, which (because it involves fewer institutions) can be perceived as serving only a small percentage of faculty and students. He also challenges someone to dispute this. I'm suspicious of any proposal to centralize computing needs about one data-processing shop. Competition is the essence of being American, isn't it? The more heads at work on a problem, the better chance of finding an answer, or ideally something completely new that we didn't expect in the first place. Most of us have just won the fight for personal computing equipment and software: resources for which we're beholding to no-one because they are in the marketplace, available for a price that's affordable even to students (or should I say even to faculty). Hasn't centralized computing lost the war in most universities? Do we want to perpetuate it on a national scale? The more people creating text archives, the better, because what we need are specialized collections from the scholarly editors who have previously worked only with paper books. Will the research projects set up to edit works by individual authors trust a central archive to do their work for them? Surely not. Look at the same argument for centralized software provision on a national scale. You can find clearinghouses of MS-DOS programs at North Carolina and at Wisconsin, and competitors emerge monthly from the woodwork. Our colleagues cannot agree to accept only one place for a software depository and distribution centre. They long ago rejected centralized software development because business proved it could produce far better work than any academic could. I'd rather buy my car from a car dealership that's in business for the money than from the government or from my engineering colleagues who occasionally build faster, more efficient cars for academic reasons. Few people in this field will argue with the idea of cooperation or consortia. The question Michael poses is, should the consortium be a collection of workers or a collection of customers? Probably a consortium of humanities computing centres and facilities would be a good beginning to to persuading our colleagues (wherever they are, inputting whoever) that a circle has more strength than a scattergram. We could at least help make the market for machine-readable texts profitable enough that companies now selling them (in the States, Electronic Text Corp. comes to mind) do well enough to subsidize (modestly, from royalties) further reliable machine-readable editing. [-30-] ========================================================================= Date: 8 December 1987, 16:52:09 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Text encoding (63 lines) Contributed by "Michael Sperberg-McQueen" Four quick observations on text encoding provoked by the recent barrage of postings: 1 Jim Coombs is right to praise the better information content of descriptive tagging, but still we should not require descriptive markup for *all* texts. Confronted with a printed book or a manuscript, there will be cases where we don't *know* whether something is a 'chapter' or a 'section'--what we know objectively might be that there is a page break followed by centered 14-point Baskerville saying XXXX, followed by 28 points of white space, followed by text. Everything more is interpretation. If we do have an interpretation, I'm in favor of encoding it in descriptive markup. But sometimes we won't and won't want to. The Carmina Burana manuscript is a classic example of this: it has been rebound and the gatherings re-arranged at least once, and different parts of the manuscript (and different hands) may well reflect multiple attempts to impose some (mutually incompatible) structure(s) on the collection. It would be sound practice to separate an editorial judgment on the intended structure(s) of the manuscript from a codicological description of the information that leads to that editorial judgment. The First Folio of Shakespeare, similarly, must be encoded with detailed typographic information if it is to be used for textual criticism, since the position of a word in the line, on the page, within the gathering, and within the volume, are all relevant to judging the authority of the word and its spelling. 2 Yes, the coming flood of machine-readable texts will overwhelm the material we now have in the machine, but still we must make our peace with (a) other (existing) markup schemes and (b) specifically presentational and procedural markup schemes. They will continue in use at least for a while and we must provide migration paths into the new scheme if we can. And markup restricted to font, etc. may be a useful first step in analysing any complex text, as dictionary work by Amsler and by Raimund Drewek at Zurich seems to show. 3 No, people should not have to have one concordance program for every encoding scheme in the world. (That is the current situation, though.) But many people do have large software systems built around specific formats. There is no need to cut them off, if we can develop one scheme capable of representing texts in those special formats without information loss. Given N different encoding schemes, such a universal scheme would reduce the translation problem from magnitude N * N to magnitude 2 * N. That, I believe, is a good reason to work for an "interchange format," and a good reason to accept in the interchange format whatever level of information is in the source. (Specific recommendations for minimal markup content shouldn't prevent this.) Eventually, we can always hope software developers will see that they might as well work directly with the interchange format rather than engaging in preliminary translation. But first we have to survive in the existing world dominated by existing schemes and non-schemes. 4 The library community is interested in machine-readable cataloguing data, and some of it also interested in collecting and cataloguing machine-readable data. But are they also interested in creating it? If so, then yes we should surely cooperate with them. But the only useful basis for any cooperation is for each group to be clear on its own point of view. And that is what all this fuss is about. Michael Sperberg-McQueen, University of Illinois at Chicago ========================================================================= Date: 8 December 1987, 16:57:06 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: more archival politics (this one shd run and run) Contributed by Lou Burnard Disagreeing with Ian is not to be undertaken lightly. Nevertheless... "I'm suspicious of any proposal to centralize computing needs about one data-processing shop." An archive is not a data processing shop. "Hasn't centralized computing lost the war in most universities? Well, actually, no, it hasn't - not at those where it's been recognised that there's room for both private and public resources anyway. Some of us didnt even know there was a war going on... "The more people creating text archives, the better" Maybe we need a definition here. The more people creating *text resources* the better, of course. But the more centres competing to archive and secure those resources? I'm not so sure! How many libraries does your university need? Ours has far too many - and when it started thinking about the problems of integrating their various catalogues, it soon became apparent that no one library could impose its will on the others. So, guess what, a consortium emerged. A centralized quasi-official embodiment of the university's collective desire to bang the librarians' heads together until they started squeaking in tune. I'm all in favour of competition and the American Way (I want to see New York again too). But an archive has responsibilities which distinguish it very sharply from data producers or consumers. Recently, an organisation called the Knowledge Warehouse came into being here in the UK. It was funded by a consortium of UK publishers as a private company and also got a grant from the British Library. The idea was to set up some sort of archival service for publishers typesetting tapes etc. The scheme looked good on paper and had a lot of money behind it. But it doesnt seem to have been successful. The consensus amongst those I've talked to was that too few publishers wanted to play ball with an organisation which they at least perceived as a competitor. The moral I draw from this is that just as with books, there is a place for bookshops and private collections and state-owned and maintained great libraries, so is there a place for electronic text corpora and private collections of texts as well as for great archives. But it's important to distinguish them, because their roles and priorities are quite different. I wouldnt put a bookseller in charge of a library - nor would I expect a librarian to make much money in publishing. Lou ========================================================================= Date: 8 December 1987, 19:01:37 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Text Encoding Contributed by "James H. Coombs" In reply to Robert Amsler's of 8 December 1987, 09:38:34 EST On closed vs. open tag sets, Robert concludes: This is how I see the problem of the selection of tags for text entities in documents. That is, if the system is completely open and `productive' there will be little commonality between author's selections--whereas if the authors are offered a wide-range of approved tags to select from, then they will manage to find tags which meet their needs. I would agree with Robert if I could bring myself to believe that we can develop a tag set that will genuinely meet our needs. The AAP tag set, for example, does not provide a "poetry quotation" tag; and we can expect scholars to realize that they can use a "list type" for poetry quotations in order to meet the immediate needs of 1) tagging an entity and 2) getting the entity formatted in a particular way. To some extent, we also have to say that this approach would meet many of the needs of descriptively marking up a text (as long as the chosen list type is used only for poetry quotations---with internal consistency). Some of the advantages of descriptive markup are lost in such an approach, however. Above all, the choice of the tag is not intuitive; both the original researcher and anyone using the text later will have to perform extra work to determine what "list type 2" is used for. I don't want to go on about this too long here, so let me just appeal to people's intuitions by saying that the tag "poetry quotation" has many advantages over the tag "list type 2". (None of these advantages are computational however; once a programmer determines that he/she should do X to "list type 2", the two tags have equal value.) If we discount the sort of advantages that I am referring to (discussed in our article---I'm not hedging), then we can solve the problem quite easily: let's just have one tag with serialization. In addition, we can close the set by specifying upper and lower bounds. So, all elements will be tagged "...", where 'n' is some integer in the range of 0 to 4096 (or (2**32)-1)?). I doubt that anyone will need to tag more than (2**32)-1 entity types in a single document. Then, people just need to provide us with appropriate documentation, e.g.: I used the following tags: E0 for paragraphs E1 for poetry quotations . . . E2347 for passages that allude to Genesis Of course, people will immediately say things like, "Let's all use for paragraphs." The many motivations that would cause such a response are the same motivations that cause us to provide

for paragraphs in the first instance. Because we cannot predict all entities that people need to mark up, we tend to throw our hands up in the air and say one of two things: 1) Let's just fake it from here on out and provide several list types. 2) We need to keep the tag set open. 3) [another approach that I am missing??] AAP has chosen approach (1) and then said go look at the standard if you really need something else. (The information for developing AAP-compliant documents with user-defined tags is not provided in the authors' documentation.) The deficiencies of approach (1) should be immediately clear to us when someone like the AAP ignores something so basic (to humanists) as poetry quotations. Moreover, if I am really analyzing a document, I will quickly run out of AAP list types. (And I don't think that I could twist things quite so far as to use a list type for my anyway.) Who is capable of providing a closed tag set that addresses these problems? Yes, the approach addresses them to some extent, but then what have we gained over ? Ok, so perhaps we remember to provide a tag for allusions. But what tags will we provide for post-structualist critics? For the next major critical theory? I agree that we should provide "a wide range of approved tags to select from," but I think it even more important to ensure that documents are marked up descriptively. (I recognize that I am close to equivocating in my use of "description." I am not fully satisfied with the functional definition that we offer in our article. Renear and I are working on this, and it gets complicated quickly. Basically, however, I want to say that is descriptive in some way that is not.) And, a posting just arrived from Michael Sperberg-McQueen, who argues that descriptive markup is not always appropriate. I suspect that Michael is saying that we sometimes need to describe the manuscript instead of the abstract text; in which case, we still want descriptive markup (i.e, we don't want Waterloo SCRIPT font instructions; we want something that says that X is/was in F font). In any case, I can hedge and conclude: insofar as a text is susceptible to description, it should be marked up descriptively and, further, that tag sets should be 1) open and 2) descriptive in this more intuitive sense of descriptive that favors over . --Jim Dr. James H. Coombs Software Engineer, Research Institute for Research in Information and Scholarship (IRIS) Brown University jazbo@brownvm.bitnet ========================================================================= Date: 8 December 1987, 19:06:03 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: North American Archive(s) Issue Contributed by amsler@flash.bellcore.com (Robert Amsler) How about a Humanities Archive Network (HumAN) I think we have an opportunity to do something considerably greater than the Oxford Archive and in fact an obligation to do this because of the state of networking available in the USA. What I'd propose is a collection of sites across the country ALL offering to host the archive or provide access to its data via their computing facilities. We should be thinking of downloading of information electronically as the PRIMARY means of distribution of archive data, with only rare recourse to writing the information out onto magnetic media as a dissemination method. The model I have in mind is based upon that used for the ARPANET's Network Information Center (NIC), which maintains a list of software and personnel at all the sites it serves. One can access this database via connecting to it from anywhere on the network, and determine where the data you want is located, and set about its retrieval by either anonymous remote login and file-transfer-protocol (FTP) downloading of the data; or finding out who to contact as the holding institution's network liaison. So... data would be distributed around the country as suited the individual institutional member's computing facilities. Some institutions might opt to have copies of everything; others to themselves store nothing, but instead to keep texts they created on equipment elsewhere. Each member institution would have a designated liaison who maintained contact with the central information resource center which itself kept a complete database of what was available where, both in terms of data and computing facilities (not unlike a list of libraries, their holdings and research facilities) and also of researchers and their interests and how to reach them electronically. This part of the Humanities Archive Network would require funding, as well as the creation of the HumAN itself--though this is becoming easier and easier as more and more research communities take to setting up their own networks. I would think the NEH ought to find such a proposal well justified in terms of the potential multiplier effect it would have upon the entire field of (computational) research in scholarship. Robert A. Amsler Bellcore Morristown, NJ 07960 ========================================================================= Date: 8 December 1987, 19:12:29 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: The Humanities Computing Yearbook (53 lines) Contributed by Willard McCarty (in this case as YEARBOOK@UTOREPAS) Dear Colleagues: As some of you will know, I am gathering information about interesting and worthy software for a new serial, the Humanities Computing Yearbook, to be published by Oxford U.P. The announcement for the Yearbook follows. Please send your recommendations to me, c/o yearbook@utorepas.bitnet. Thanks very much for your help. -------------------------------------------------------------------------- The Humanities Computing Yearbook On behalf of Oxford University Press, the publishers, the Centre for Computing in the Humanities is pleased to announce a new periodical, The Humanities Computing Yearbook. Ian Lancashire and Willard McCarty are the co-editors. An editorial board is in process of being set up. The first volume, scheduled for publication in the summer of 1988, aims to give a comprehensive guide to publications, software, and specialized hardware organized by subject or area of application. Research and instructional work in many fields will be covered: ancient and modern languages and literatures, linguistics, history, philosophy, fine art, archaeology, and areas of computational linguistics affecting text-based disciplines in the humanities. The more notable software packages will be described in some detail. We welcome your suggestions of what we should consider. We are especially interested in discovering innovative software that may not be widely known, including working prototypes of systems in development. Electronic correspondence should be sent to YEARBOOK@UTOREPAS.BITNET, conventional mail to the Editors, The Humanities Computing Yearbook, Centre for Computing in the Humanities, Univ. of Toronto, 14th floor, Robarts Library, 130 St. George Street, Toronto, Canada M5S 1A5. Our telephone number is (416) 978-4238. Please feel free to distribute this notice. Ian Lancashire Willard McCarty ========================================================================= Date: 8 December 1987, 22:00:06 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Text Encoding: salvaging texts (addendum) Contributed by "James H. Coombs" Oops. I should have added that I am not saying that people should throw away everything that does not accord with the standard. I am saying that the standard should not try to accommodate inadequate texts. I like (my interpretation of) what Lou Burnard says about them (implicitly?): they are "rubbish." Well, ok, so we may be better off recycling many of them instead of just throwing them out, but let's say that they are in the recycle bin and not that they are in the approved bin. --Jim Dr. James H. Coombs Software Engineer, Research Institute for Research in Information and Scholarship (IRIS) Brown University jazbo@brownvm.bitnet ========================================================================= Date: 9 December 1987, 09:03:00 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Text Encoding: salvaging texts Contributed by "James H. Coombs" Robert Amsler corrects my perhaps overly vigorous condemnation of texts that have been marked up procedurally. I have no intention of entering the American Heritage Dictionary from scratch or even of working with a version that has all markup stripped away. Such markup can help one considerably in the process of deriving a descriptively marked up version. (Just to clarify, I AM working with the AHD.) I don't feel the same way about *Paradise Lost*, however. Perhaps I am being overly vigorous again, but I would rather enter that relatively tiny (compared to the AHD) and simple document myself than spend the same time negotiating for a tape, getting it loaded on the mainframe, learning the markup system, writing the programs to convert it to descriptive markup, etc. So, first point, dictionaries are unusually large and complicated. Poems, even long poems, imbue one with the poetic experience even when the task is as mindless as keyboarding (but they better be good poems too!). We have a continuum, and we have all of that old philosophical stuff about points at which one would just prefer to enter and proof read than to negotiate, acquire, interpret, program, etc. Second and final point, my concern was with the value of a metalanguage. Correct me if I am wrong, but the fact that I would rather convert the AHD than enter and proof read it has nothing to do with our ability to develop a metalanguage that will generate(JHC)/parse(RA) both the procedural and the descriptive markup. Perhaps one CAN develop a context-sensitive grammar that will enable one to uniquely identify every element type in the AHD. I don't know anyone who believes that they can develop that grammar more quickly than they can perform partial conversion automatically and then finish up by hand. If it's that difficult to generate the context-sensitive grammer, won't it be much more difficult to generate a metalanguage? Now, if a single grammar will work for many dictionaries (and we actually have the need to convert many dictionaries), then it may be justified to develop the grammar. Is this what you are working on, Robert? My goal was (and remains) to discourage what seems to me to be a quixotic pursuit: the development of a metalanguage that will generate(JHC)/parse(RA) all forms of markup for all documents. The fact that one may be better off with procedural markup than without it in some/many cases does not address my claim that such a metalanguage is impossible or even my weaker claim that even if it is impossible, it's not worth the effort (again, what's the gain?). --Jim Dr. James H. Coombs Software Engineer, Research Institute for Research in Information and Scholarship (IRIS) Brown University jazbo@brownvm.bitnet ========================================================================= Date: 9 December 1987, 09:04:34 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: archives, mark-up and money (75 lines) Contributed by Phillipa Mathieson Interesting how HUMANIST discussions on standards for text encoding, making publishers aware of the need for electronic texts, assessing copyrights for such texts, establishing text archives, and programs for text searching and retrieval all seem to come together. It almost sounds as if we were all aiming for the same thing: texts on-line in machine-readable format with the software to manipulate them, available to all who want them. The main question seems to be "whose money are we going to use to achieve this?" Ian Lancashire's analogy of buying a car from commercial dealers, and the acceptance by other HUMANISTS from time to time of "intellectual copyright," and of the restrictive mechanisms needed to insure financial returns for the owners of that copyright, alarm me. I see no reason why academic grant money intended for humanist research should not be spent on laying down guidelines for the encoding of texts and the software to read them, and for distributing the results. And I agree with Lou Burnard that "the community gets the archive it deserves." If we aren't enough of a community, or interested enough in the disinterested rewards of scholarship, to share our work with others without demanding additional financial rewards (in most cases, over and above those already granted us by salaried positions in academic institutions or as staff members of publically-funded educational projects), we don't deserve either the positions or the use of on-line materials. I recently discussed with a Toronto software firm my need to use a database program (Empress32 from Rhodnius) on a second computer with slightly different architecture from the first. We had already bought a licence to use the program on one, and we wished to use the same program for the same project on another. Their attitude was that a firm which expands and buys a new machine must pay for a second licence for the new machine. I think they saw this as a kind of tax on the profits of the firm which their software was assumed to have contributed to. When I said we had no profits, the salesman kindly tried to explain to me that they had to protect their copyright in the program by charging individual licences for individual machines: "If you wrote an article and someone else used it as the basis for his own work, without acknowledgement, and made a great success of it, you'd sue the balls off him." This kind of commercial attitude has no place in humanist scholarship, and putting the development of archives and their software on a commercial basis will simply cheapen (in the sense of "lowering the quality"--certainly not in the financial sense) and restrict humanist activites. It is good to have an organized group establishing guidlelines for text mark-up and doing so in an open forum. It would be bad to have a commercially-based central text archive system which discouraged individual scholars from making available their work by maintaining an arcane set of instructions for mark-up, which only "they" really knew how to insert so that the standard software programs could use it. Michael Sperberg-McQueen's reservations about the software-development function of a central archive system are a good sign: setting up a central archive system seems to me likely to lead to the development of software for the specifications of that archive, and if you add the commercial competition angle, we'll all end up paying through the nose for the software *and* the texts, and running round nervously trying to comply with restrictive copyright requirements for texts long since free from their original publication copyright restrictions. At which point, it will again become easier to type it in yourself, and the idea of a community of scholars sharing their work will bite the dust yet again. ========================================================================= Date: 9 December 1987, 09:14:37 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: What new information in texts of the Oxford Archive? (33 lines) Contributed by Lou Burnard The Text Archive gets a fair amount of criticism for not providing more information about the texts in the catalogue ('fair' meaning both "a modest quantity" and "justifiable"). As I am now embarking on a major overhaul consequent on a local change of mainframe, I'd like to start trying a bit harder to rectify this situation. Humanists and others who have a view can help by making some suggestions about what information they think ought (minimally) to be provided in the catalogue. I should stress that I dont have the resources to do a proper cataloguing job - not yet anyway. But some things that could be added to the current shortlist are 1. more bibliographic info (e.g. date of first publication/composition, genre etc) 2. some sort of code for level/type of markup 3. some sort of code indicating completeness, accuracy, level of verification 4. (probably not in the catalogue, but generated for each text) text profile i.e. everything that a program I havent written yet can deduce automatically about the text - size in records, tags used, character usage profile etc. Comments? Preferences? Concrete suggestions? Lou ========================================================================= Date: 9 December 1987, 12:46:41 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Archives and encoding (51 lines) Contributed by Richard Giordano From what I've been reading, four issues seems to be - if there is to be an machine-readable archive, where should it be? - who will pay for it? - what constitutes a coding standard? Michael Sperberg-McQueen also includes the questions, who is going to do the conversion, as well as the coding? The library community certainly will not get involved in conversion efforts. But you can be sure that the institutional structures already exist within the library community to both establish and maintain a data archive of machine readable text. They're in the business of collecting and making available information to users, and I think the best of them do a great job at it. Anyway, you can be certain that sooner or later--and probably sooner--the American Library Association is going to take up the issue. And when it does, the first thing that will come up is the establishment of a standard interchange format--much the same way that cataloging and other data is exchanged throughout the world in a standard MARC format. As for the Libraries point of view: nothing more and nothing less than to (1) preserve information; (2) index and describe the information so that users can easily get to the source information, as well as having an idea of what the information is about; (3) making the information available to users. There might be more to it than this, but I think this pretty-much covers it. It seems so obvious to me that the institutional structure exists, as well as the expertise, to establish a national archive of machine-readable texts, as well as assistance in generating a standard communications format. Libraries can also be of use in helping to establish practices by which text itself is indexed (since the indexing and retrieval of information for untrained users is at the heart of every librarian's professional education). Libraries, however, are not in the position to convert the sources into machine-readable form. Richard Giordano ========================================================================= Date: 9 December 1987, 14:41:51 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Salvaging Texts (20 lines) Contributed by Mark Olsen Funny that James Coombs should mention *Paradise Lost* since I am currently going through the process of pulling off of a tape and formatting it for my purposes. I think that he seriously over- estimates the effort required to use existing text data and under- estimates the effort required to scan and correct even a simple text. The materials stored at Oxford, Packard and ARTFL in any condition can be corrected, coded and formatted much faster than starting from hardcopy. Mark ========================================================================= Date: 9 December 1987, 19:05:35 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: A national archive for the U.S. (97 lines) Contributed by Jack Abercrombie We have been following with much interest the discussion on establishing a national archive center similar in some respects to the Oxford Archive. Many of you are not aware that some four years ago the staff of the Center for Computer Analysis of Texts (CCAT) submitted a working proposal to the National Endowment for the Humanities advocating the establishment of a US national center for textual studies (including archive). From the comments then received as the proposal was circulated by NEH as well as other comments received at the Grenell conference (1985) where our draft proposal was discussed by fifteen representatives from national and international centers, the following conclusions seemed accurate. First, a national center at that time did not have a "snowball's prayer in hell" of coming into being given the general lack of collaboration and cooperation amongst US institutions and their faculties on this very issue. Second, regional centers, an idea originally proposed to us by the late Art Hanson from Princeton University (1982), would be a better approach in that a regional/discipline specific center can concentrate on a few tasks and do them well with limited resources. With this in mind, we established the Center for Computer Analysis of Texts (1984). CCAT has focused on three specific areas both for internal and external users: building an accessible, unified archive for biblical and ancient texts, providing scanning services to colleagues just above cost, and assisting colleagues through consultation, information dessimination and software development. We realized then that sister institutions would start similar centers that might, but probably wouldn't seriously, overlap our own goals. In the spring of 1986 at the University of Toronto, we again proposed to the representatives of existing and potential centers (in US, Canada and England) that we should share information on our archival holdings, and that we should coordinate more fully our efforts to add texts to our archives as well as in software development. Of the six centers represented at that meeting, there seemed to be general agreement that it was a good idea to try to federate our efforts to avoid duplication and to cut costs in supporting international, accessible archives. Again we proposed to seek funding to make this a reality as well as to solve some other minor problems within the proposed consortium. Our hastily prepared proposal to begin implementing these ideas was submitted to NEH and severely criticized by some reviewers. We accepted the reality here, and have proceeded to work with other equally concerned institutions to make them aware of our archival holdings and to keep them informed on the projects taking place at CCAT (e.g. CD-ROM Project). This chronicle of frustration and also hope, we think, is instructive, because it points out that the ideal (that is, a national archive or even a federated sytem of archives) may not be realistic given the number and nature of the relevant participants. The reality, regional and discipline specialized centers, continues to grow in many positive ways. Unfortunately from our perspective, we would like to see more coordination than is possible as long as we work within the blinders of discipline, university, nation, etc. At the very least, centers should be sharing, as some already do, information on their archival holdings and additions to their archives whether by acquisition or data entry. (NOTE: To obtain information on CCAT's accessible archive request information sheet from CCAT, Box 36 College Hall, Philadelphia, PA 19104.) Centers should also foster new ways for cooperation and collaborations. Towards this end, the Center for Computer Analysis of Texts in coordination with Computer Assisted Research Group (CARG) of the Society of Biblical Literature has begun an ambitious project to prepare an archival list of biblical and other material deemed relevant to CARG members. A first step will be to build an archival list along the lines of the information submitted by CCAT to the Rutgers Inventory Project. The second step will be acquiring copies of the texts not in CCAT's archives and placing that information in the same, consistent format (that is, the present format or a future format as is being discussed) of all the other material in CCAT's accessible archive. Prepared by John R. Abercrombie (Assistant Dean for Computing and Director of the Center for Computer Analysis of Texts) with cooperation from Robert Kraft (Coordinator of External Affairs CCAT and Director of CARG) ========================================================================= Date: 9 December 1987, 19:29:44 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Correction (re: Yaacov Choueka's affiliation) Contributed by "Michael Sperberg-McQueen" In the posting about the Vassar conference for planning the basic structure of the ACH/ACL/ALLC text encoding guidelines, Yaacov Choueka's affiliation was wrongly given. It should read: Institute for Information Retrieval and Computational Linguistics, and Department of Mathematics and Computer Science, Bar=Ilan University I apologize for the error. ========================================================================= Date: 9 December 1987, 19:30:58 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Warning about another Christmas virus Contributed by "Michael Sperberg-McQueen" We've already had several score, by now probably a few hundred copies of this turning up here; it may reach you next. If you are at a CMS site and receive a program called CHRISTMA EXEC, please (a) warn your postmaster and (b) discard the exec (or keep a copy for the postmaster to look at, but DO NOT RUN IT). This exec paints a Christmas tree on your screen and then sends itself to everyone named in either your NAMES or NETLOG files. The result is potentially serious stress on Bitnet and on your local spool system, and possibly a few system crashes here and there as the number of reader files soars and exceeds the maximum. The Christmas tree isn't all that pretty, and the joke is pretty mean. A word to the wise. Your postmaster will thank you. Michael Sperberg-McQueen, UIC ========================================================================= Date: 9 December 1987, 19:35:31 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: archives-coding-texts Contributed by Bob Kraft I really must finish up the task of insuring consistent ID coding for the dozens of texts on the forthcoming CCAT-PHI CD-ROM, or I would plunge in at length on the current Humanist discussions. Meanwhile, I will take a minute to UNDERSCORE the comments of Mark Olsen. It is hard for me to conceive of a situation in which it would be more efficient to rekey or to scan anew a text already extant in some electronically readable form. I also have *Paradise Lost* in the CCAT Archive, and formatted it into TLG Beta Code ID form last week, checking the results against a library edition. It probably took me about an hour, including making sure that every line began in upper case and that the "paragraph" type breaks in the poetry were indicated. This text will be on the CD-ROM and is available on IBM diskette to anyone who would like it for $25 (CCAT minimum charge) and who agrees (by signing the CCAT Users Contract) to use it non-commercially and responsibly. One person's "rubbish" is another's treasure. Some of the happiest hours of my weekend life have been spent in junkyards. Incidentally, CCAT texts come with a "convert" program to permit the user to change the file so that explicit book/line locators are inserted at the left margin of each line. This type of software development permits us to be consistent and frugal about coding the IDs without inconveniencing the user who might otherwise be mystified by the implicit nature of the ID system. To leave that task in the hands of others made no sense to us. We will handle "SGML-type" markup requests similarly, for existing textual materials. If people want concrete information about the issues raised in the current HUMANIST discussions, just ask. Few of the issues are hypothetical, at least to those of us already engaged in archiving, (re)coding, formatting, and distributing -- not to mention searching for funding and other types of support! Bob Kraft ========================================================================= Date: 9 December 1987, 21:31:31 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Rage for chaos, or, in praise of polymorphic encoding (50 lines) Contributed by Sebastian Rahtz I have just been reprieved from the gallows! I had approx 110 HUMANIST messages from the last couple of weeks in my mailbox which I hadnt really read, and I had been planning to print out the whole lot and read it at home tonight. Due to a combination of unfortunate circumstances (daily backup, me reading my mail etc), I have now lost the whole damned lot! I feel so relieved! Can I put in my trivial penyworth, tho? Lets face it, I dislike SGML so much is because its UGLY. But however important it all is, could those who care about text archives gather in a corner away from HUMANIST for a while? i was under the impression that there was a conference about it recently, so is there a need for the same people to discuss it in public..... it all reminds me of archaeology. Some years ago field archaeologists in Britain used to bicker at every opportunity about standardisation of recording methods, and all the same arguments were trotted out every time. No-one ever agreed, various people said they would set up global answers, and even now there remain a multiplicity of schemes. Why did it all fail? Because the problem was really that people didnt know why they were collecting the data in the first place.... I for one no longer believe in absolute recording; I believe that each excavation record, or each encodedtext, is a reflection of its creator, not the real world. But i apologize for dipping my toe in the text-encoding water; I vote for chaos, though, when the chips are down. Why? Because I used to be an archaeologist, and therefore I am interested in historical processes not in fossilisation. In the same way that I would have anyone who wanted walk all over Stonehenge because 20th C destruction of monuments is itself archaeology, so I wouldn't shed many tears if Lou Burnard's archive went up in flames (sorry, Lou), because the variety of texts lost is itself interesting (would we compare the loss of Lou's tapes to the destruction of Alexandria?). People who try and impose 'standards' on the world are basically misguided--variety is the spice of life. Sorry, is there some NEED to analyse all texts in the world NOW that I am not aware of? And there was I thinking scholarship was only a joke..... sebastian rahtz, computer science, southampton, UK ========================================================================= Date: 9 December 1987, 21:50:33 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: More Markup etc. (62 lines) Contributed by amsler@flash.bellcore.com (Robert Amsler) I must admit I do have many reservations about the feasibility of coming up with a universal metalanguage for all markup schemes. I think this is first an empirical problem though, not a theoretical one. We need to know what markup systems are in use and how much text is/could be available in these systems. That will determine how much effort should be made to accomodate their markup system in any future standard. The exact trade-off point between developing a parser to read text marked up in an inadequate markup language and then adding `useful' markup vs. starting over and typing the text in with the `right' markup is a hard one to specify. Dictionaries are on the `use any machine-readable copy' side by a massive amount (i.e. probably the data entry effort is ten or a hundred times the effort of the `figure out how to use what they have' effort). However there is still another issue here, the likelihood that anyone else will want to markup the text in a manner that you would find completely satisfying. There strikes me as a large range of variations in descriptive markup from noting simple text units to noting full interpretive tagging of historical and symbolic meaning `believed' to be associated with certain parts of a text. The inference I get from James Coombs side is that there is somehow an easily understood common agreement as to what should be marked in a text. I am not certain I agree with that when one leaves the domain of markup which recreates the visible form of the original document and enters the interpretive tagging area. In fact, I would define `inadequate' markup as markup from which one cannot recreate the original form of the document--regardless of whether it is descriptive or procedural markup. I've be concerned about is that one cannot tag a text with all the descriptive markup that everyone might want to be there. Could anyone imagine a historic text being published with ALL the commentary upon its meaning being interspersed in the text? We'd have to have tags with authors names on them and maybe even dates. I think perhaps what is needed is a means of integrating interpretive tags with a rather sparsely marked up version of a document. That is, having a tag set which is stored independently of the text to which it refers and which can readily be sorted into the linear sequence of the document as desired. In fact I even imagine a futuristic world in which a scholar can distribute ONLY their tag set for a well-known work, such that the recipients can study it with their copy of the original text on a variety of software and hardware systems. Some might simply elect to have the `annotated' text printed out on paper for study--others to have it loaded into a hypertext system for interactive reading on-line. Robert Amsler Bellcore Morristown, NJ ========================================================================= Date: 9 December 1987, 23:06:31 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Heisenbergian mark-up (46 lines) Contributed by Willard McCarty Here's a brief and probably one-sided observation about textual mark-up, offered by someone interested chiefly in the themes and images of literary texts rather than in their syntactic structures or physical features. Regardless of the medium, when I mark up a text for interpretation I am doing something like reading it, that is, taking it in, attaching to its words things I know, discover, or think about, and preserving all that along with the original text. I want to mark-up my own text because (a) marking-up in my sense is primarily an intellectual, not a mechanical activity, and (b) it is utterly dependent on some hypothetical construct I have or am developing. (Building this construct may owe a debt to things that can be counted, hence "objectively" tagged, but the construct cannot be verified by relating it to countable things.) At the same time I must always keep a clear distinction between the words as the author or editor has given them, and if I'm doing this electronically with proper software, I have the liberty of erasing easily the remnants of interpretation I no longer respect. Note that I am not making a distinction here between an "objective" text and "subjective" commentary; that distinction misses the point of literary criticism altogether. So, I don't want anybody's scheme for marking up (in my sense), and I don't expect my marked up text to be of interest to anybody either. Nevertheless, if I'm successful, the final result (an essay or book) will say something valuable to others. Can it be said that there are aspects of textual mark-up that do not have to take interpretation into account at all? Sebastian Rahtz has suggested that there aren't. Willard McCarty, Univ. of Toronto (mccarty@utorepas.bitnet) ========================================================================= Date: 10 December 1987, 09:15:25 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Copyright-free Texts Wanted (130 lines) Contributed by amsler@flash.bellcore.com (Robert Amsler One project I and some others at Bellcore are interested in is an effort to integrate a dictionary with citations to texts with these texts. The OED is the dictionary we have in mind, though I am also working with the Century Dictionary (not yet in machine-readable form) and other dictionaries such as the Collins English Dictionary, the Merriam-Webster Seventh Collegiate Dictionary and the Oxford Advanced Learners Dictionary. `Integrate' here means to provide access to the complete textual work from the dictionary definitions and visa versa, to provide access to the definitions from within the textual work. This is being envisioned as a form of hypertext access. The primary requirement is to obtain textual works which are cited in these dictionaries--which basically means most classical works in English. Appended to this message is a lit of the most frequently cited authors and works from the OED (compiled by actually searching the OED database thanks to Frank Tompa' help). 29140 citations - Shakespeare 1311 citations - Hamlet (1600-1) 1034 citations - Love's Labour's Lost (1594-5) 906 citations - 2 Henry IV (1590-91) 877 citations - Merchant of Venice (1596-7) 874 citations - King Lear (1605-6) 868 citations - The Tempest (1611-12) 865 citations - Romeo and Juliet (1594-5) 862 citations - 1 Henry IV (1597-8) 846 citations - Macbeth (1605-6) 841 citations - Henry V (1598-9) 834 citations - Othello (1604-5) 821 citations - Merry Wives of Windsor (1599-1600) 801 citations - Midsummer Night's Dream (1595-6) 794 citations - King John (1596-7) 779 citations - Richard III (1592-3) 778 citations - Troilus and Cressida (1601-2) 775 citations - As You Like It (1599-1600) 705 citations - Measure for Measure (1604-5) 15499 citations - Scott, Sir Walter 890 citations - The Heart of Midlothian (1817) [Novel] 880 citations - The Fair Maid of Perth (1828) [Novel] 694 citations - Guy Mannering (1815) [Novel] 644 citations - The Antiquary (1816) [Novel] 616 citations - Kenilworth (1821) [Novel] 599 citations - Lady of the Lake (1810) [Poem] 592 citations - Waverley (1814) [Novel] 543 citations - Rob Roy (1817) [Novel] 532 citations - Old Mortality (1816) [Novel] 490 citations - Marmion (1808) [Poem] 474 citations - The Monastery (1820) [Novel] 428 citations - Ivanhoe (1820) [Novel] 405 citations - Quentin Durward (1823) [Novel] 344 citations - Lord of the Isles (1815) [Novel] 328 citations - Woodstock (1826) [Novel] 11967 citations - Milton, John 4945 citations - Paradise Lost 648 citations - Samson Agonistes (1671) [Poem] 640 citations - Paradise Regained (1671) 625 citations - Comus (1634) [Poem] (A Maske presented at Ludlow Castle 1634: on Michaelmasse night etc.) 11000 citations - Chaucer 1238 citations - Troylus (Troilus ? and Criseyde) (1382?) [8200 line poem] 986 citations - (Translation of Boeth(ius)'s ``Consolation of Philosophy'') (1380?) [Prose] 877 citations - The Legend of Good Women (1382) 663 citations - Prologue (to The Legend of Good Women) 549 citations - The Knight's Tale 506 citations - The House of Fame 10759 citations - Wyclif 1166 citations - Selected Works 1072 citations - Works 713 citations - Sermons 474 citations - Genesis 420 citations - Isa 413 citations - Matt 315 citations - Ecclus 306 citations - Ps 278 citations - Luke 265 citations - Prov 9554 citations - Caxton 1282 citations - The Golden Legend (1483) 718 citations - The Foure Sonnes of Aymon (1489?) 668 citations - The boke yf (= of) Eneydos'' (1490) 639 citations - The chronicles of englond (1480) 610 citations - The historie of Jason (1477) 457 citations - Geoffroi de la Tour l'Andri (the knyght of the toure) (1483) 399 citations - The historye of reynard the foxe (1481) 399 citations - The book of fayttes of armes and of chyualrye (1483) 8745 citations - Dryden 11041 citations - 11041 citations - Cursor Mundi 5385 citations - From: MCCARTY@UTOREPAS Subject: Two confusions about text-markup standards (30 lines) From "Michael Sperberg-McQueen" Sebastian Rahtz says: People who try and impose 'standards' on the world are basically misguided--variety is the spice of life. This may be true, but I should point out that no one on HUMANIST is trying to impose standards or anything else on anyone. As if anyone could! And if anyone must rely on the differences between TLG and SGML methods for encoding chapter headings to give spice to intellectual life, humanities computing is in even deeper trouble that I thought. Willard McCarty says: So, I don't want anybody's scheme for marking up [...] and I don't expect my marked up text to be of interest to anybody either. [...] At the same time I must always keep a clear distinction between the words as the author or editor has given them, and if I'm doing this electronically with proper software, I have the liberty of erasing easily the remnants of interpretation I no longer respect. Would a conventional set of markup rules restrict one's freedom more than the conventional alphabets and syntax we already use? But the crucial point is that the "proper software" you describe cannot do its work without *some* encoding scheme. We have the choice of all of us developing software independently, so as to ensure that we use different schemes and make certain that once you have marked up your text with your software you cannot concord it with my software, and vice versa, or we can try to find a framework that allows sharing and flexibility. It is not standardization but chaos that produces rigidity and destroys freedom. -Michael Sperberg-McQueen University of Illinois at Chicago ========================================================================= Date: 10 December 1987, 12:23:15 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: "electronic paralanguage" (100 lines) From Dr Abigail Ann Young 1-416-585-4504 YOUNG at UTOREPAS This notice appeared in IRList, and I found it sufficiently interesting to pass along to HUMANIST. I apologise in advance to those who will be getting it twice! ********************************************** Date: Sun, 6-DEC-1987 12:38 EST Janet F. Asteroff Subject: Computer-mediated Communication . . . I recently completed my dissertation on paralanguage in electronic mail, the abstract of which is appended to this posting. I found, among the 16 people I studied, many forms of "extra expression" in the form of "paralanguage." Ultimately, I documented enough differences between writing on the computer and writing through other media to identify it as "electronic paralanguage" with its own formal definition. Many people believe that face-to-face communication is the richest form of communication because of the variety of signals and channels, and as well the potential for channel redundancy. I have no problem with this assumption. I do, however, take issue with comparing other forms of communication to face-to-face and then judging any other medium as "information poor." Some scholars of computer-mediated communication carry this negative frame of reference over to their own work. While the computer may not provide as many channels as face-to-face communication, and the channel itself may be somewhat more limited, there is considerable research to indicate that computer users have done some interesting things to convey their meaning and message. Since I am not a fan of clogging up boards with long messages, anyone interested in my work can contact me directly at Asteroff@cutcv1.bitnet and I will be happy to send you more material. The dissertation is also available through University Microfilms: Janet F. Asteroff, "Paralanguage in Electronic Mail: A Case Study." Teachers College, Columbia University, May, 1987. /Janet (Asteroff@cutcv1.bitnet) ABSTRACT PARALANGUAGE IN ELECTRONIC MAIL: A CASE STUDY Janet F. Asteroff This study explores the use of paralanguage in electronic mail communication. It examines the use of paralanguage according to the electronic mail and computing experience and technical expertise of 16 library science graduate students who fall into two groups by rank of experience, novice and advanced. These respondents used electronic mail in a non-elective and task-related situation to communicate with their instructor. This case study is based on a multi-level qualitative content analysis of the electronic mail exchanged between the respondents and the instructor, and the attitudes and experiences of the respondents about their use of electronic mail and computers. This research interprets the roles and functions of paralanguage in computer-mediated communication and explores the phenomenon as an indicator of certain kinds of expression. Paralanguage is a component of spoken, written, and electronic communication. It gives to what is being communicated a character over and above that which is necessary to convey meaning in the linguistic or grammatical sense. Paralanguage in electronic mail is positioned between spoken and written paralanguage in its visual and interpretive structures. Electronic paralanguage, a term developed to describe paralanguage in computer-mediated communication, is defined as: features of written language which are used outside of formal grammar and syntax, and other features related to but not part of written language, which through varieties of visual and interpretive contrast provide additional, enhanced, redundant or new meanings to the message. Electronic paralanguage is revealed to be a component of communication which in some situations showed substantial differences by the rank of the respondent, as well as differences in individual behaviors. Novice respondents used more paralanguage in more types of messages than did advanced respondents. Electronic paralanguage also provides a robust picture of the character of communication. The use of exclamation points by novice respondents in task-related messages showed that electronic paralanguage can in certain cases be a general measure of stress and experience, and as well is a precise indicator of different kinds of positive and negative psychological stress. ------------------------------ ========================================================================= Date: 10 December 1987, 16:25:16 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: More on ACH Standards for Markup (63 lines) From Nancy Ide > In response to Jim Coombs' comments and questions: > (1) My messages may have been sent out in the wrong order, but I meant to make it clear that we fully agree with Jim's assertion that we cannot know al l of ou needs yet, and for that reason the standard will be extensible--and we hope to make user-defined extensions far easier to deal with than AAP does. But as Bob Amsler pointed out, a standard which specifies so little that most researchers end up inventing their own tags anyway is not of much use. Bob also mentioned that our subcommittees are going to have to work closely together to avoid redundancy and to make note of places where alternate descriptions, related to different applications, describe what may be physically the same thing--something Jim was concerned about. I think Bob's idea of a "data dictionary" is excellent and I hope we can implement it in the course of development. (2) By "multiple parallel hierarchies" I mean something along the lines of what Jim outlined--except that his examples all use nicely nested entities. The problems arise when we have overlapping entities, for example, the dog ran where "" and "" mark the beginning and end of a noun phrase, respectively, and "" and "" mark the beginning and end of another entity--say, a thematic unit of some kind. The context-free syntax of SGML cannot handle this, and so special mechanisms are required to enable multiple tag sets in which overlapping entities may be specified. As I mentioned, these exist in SGML but are not entirely straightforward, from my understanding. (3) The ultimate goal of an attempt to develop some formal description of existing schemes is to facilitate the development of translation programs to translate old formats into the new one. I sympathize with Jim's feeling that we shouldn't spend so much time converting the past, but I also understand, after spending 48 hours with the keepers of the European and Middle-eastern archives which house millions of words of machine-readable text, that it is not possible to mount this effort without considering what to do about texts that already exist in machine-readable form. I should also point out that at the end of the two-day meeting in Poughkeepsie we had a discussion about establishing a North American archive, but by that time many particpants had left and those who remained had little energy left to address the issue vigorously. However I understood those who spoke to say that they didn't feel the need to establish such an archive, and that in any case the Oxford model (where no guarantees of quality are made) is sufficient. I personally tend to agree with Jim that an archive--North American or better yet, international--should be established in which texts are "guaranteed," and which more importantly serves as a central clearinghouse. Oxford does this as well as it can now but without considerably more funding cannot more vigorously pursue the acquisition and creation of machine-readable texts nor ensure that they are both correct and tagged in a standard form. Nancy Ide ide@vassar ========================================================================= Date: 10 December 1987, 23:11:09 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: A new HUMANIST GUIDE (268 lines) Dear Colleagues: A somewhat revised version of the guide to HUMANIST follows. As always your comments are welcome, to mccarty@utorepas.bitnet. Yours, W.M. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ A Guide to HUMANIST +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ C O N T E N T S I. Nature and Aims II. How to use HUMANIST A. Sending and receiving messages B. Conventions and Etiquette C. Distributing files D. ListServ's commands and facilities E. Suggestions and Complaints ================================================================= I. Nature and aims ================================================================= Welcome to HUMANIST, a Bitnet/NetNorth/EARN discussion group for people who support computing in the humanities. Those who teach, review software, answer questions, give advice, program, write documentation, or otherwise support research and teaching in this area are included. Although HUMANIST is intended to help these people exchange all kinds of information, it is primarily meant for discussion rather than publication or advertisement. HUMANIST is an activity of the Special Interest Group for Humanities Computing Resources, which is in turn an affiliate of both the Association for Computers and the Humanities (ACH) and the Association for Literary and Linguistic Computing (ALLC). Although participants in HUMANIST are not required to be members of either organization, membership in them is highly recommended. HUMANIST currently has more than 170 members in 10 countries around the world. In general, HUMANISTs are encouraged to ask questions and offer answers, to begin and contribute to discussions, to suggest problems for research, and so forth. One of the specific motivations for establishing HUMANIST was to allow people involved in this area to form a common idea of the nature of their work, its requirements, and its standards. Institutional recognition is not infrequently inadequate, at least partly because computing in the humanities is an emerging and highly cross-disciplinary field. Its support is significantly different from the support of other kinds of computing, with which it may be confused. It does not fit easily into the established categories of academia and is not well understood by those from whom recognition is sought. Apart from the general discussion, HUMANIST encourages the formation of a professional identity by maintaining an informal biographical directory of its members. This directory is automatically sent to new members when they join. Supplements are issued whenever warranted by the number of new entries. Members are responsible for keeping their entries updated. Those from any discipline in or related to the humanities are welcome, provided that they fit the broad guidelines described above. Please tell anyone who might be interested to send a message to me, giving his or her name, address, telephone number, and a short biographical description of what he or she does to support computing in the humanities. This description should cover academic background and research interests, both in computing and otherwise; the nature of the job this person holds; and, if relevant, its place in the university. Please direct applications for membership in HUMANIST to MCCARTY@UTOREPAS.BITNET, not to HUMANIST itself. ================================================================= II. How to Use HUMANIST ================================================================= A. Sending and receiving messages ----------------------------------------------------------------- Although HUMANIST is managed by software designed for Bitnet/NetNorth/EARN, members can be on any comparable network with access to Bitnet, for example, Janet or Arpanet. Users on these networks suffer only slight restrictions, which will be mentioned below. Submissions to HUMANIST are made by sending electronic mail as if to a person with the user-id HUMANIST and the node-name UTORONTO. All valid submissions are sent to every member, without exception. The editor of HUMANIST screens submissions only to prevent the inadvertent distribution of junk mail, which would otherwise be a serious problem for such a highly complex web of individuals using a wide variety of computing systems linked together by several different electronic networks. The editor will usually pass valid mail on to the membership within a few hours of submission. The volume of mail on HUMANIST varies with the state of the membership and the nature of the dominant topic, if any. Recent experience shows that as many as half a dozen or more messages a day may be processed. For this reason members are advised to pay regular, indeed frequent attention to their e-mail or serious overload may occur. A member planning on being away from regular contact should advise the editor and ask to be temporarily removed from active membership. The editor should be reminded when active membership is to be resumed. The editor also asks that members be careful to specify reliable addresses. In some cases the advice of local experts may help. Any member who changes his or her userid or nodename should first give ample warning to the editor and should verify the new address. If you know your system is going to be turned off or otherwise adjusted in a major way, find out when it will be out of service and inform the editor. Missed mail can be retrieved, but undelivered e-mail will litter the editor's mailbox. [Please note that in the following description, commands will be given in the form acceptable on an IBM VM/CMS system. If your system is different, you will have to make the appropriate translation.] ----------------------------------------------------------------- B. Conventions and Etiquette ----------------------------------------------------------------- Conversations or asides restricted to a few people can develop from the unrestricted discussions on HUMANIST by members communicating directly with each other. This may be a good idea for very specific replies to general queries, so that members are not burdened with messages of interest only to the person who asked the question and, perhaps, a few others. Members have, however, shown a distinct preference for unrestricted discussions on nearly every topic, so it is better to err on the side of openness. If you do send a reply to someone's question, please restate the question very briefly so that the context of your answer will be clear. [Note that the REPLY function of some electronic mailers will automatically direct a user's response to the editor, from whom all submissions technically originate, not to the original sender or to HUMANIST. Thus REPLY should be avoided in many cases.] Use your judgment about what the whole group should receive. We could easily overwhelm each other and so defeat the purpose of HUMANIST. Strong methods are available for controlling a discussion group, but the lively, free-ranging discussions made possible by judicious self-control seem preferable. Controversy itself is welcome, but what others would regard as tiresome junk- mail is not. Courtesy is still a treasured virtue. Make it an invariable practice to help the recipients of your messages scan them by including a SUBJECT line in your message. Be aware, however, that some people will read no more than the SUBJECT line, so you should take care that it is accurate and comprehensive as well as brief. If you can, note the length of your message in the subject line. The resulting line should look something like this: Subject: Textual archives and encoding (45 lines) Use your judgment about the length of your messages. If you find yourself writing an essay or have a substantial amount of information to offer, it might be better to follow one of the two methods outlined in the next section. All contributions should also specify the member's name as well as e-mail address. This is particularly important for members whose user-ids bear no relation to their names. ----------------------------------------------------------------- C. Distributing files ----------------------------------------------------------------- HUMANIST offers us an excellent means of distributing written material of many kinds, e.g., reviews of software or hardware. (Work is now underway to provide this service for reviews.) Although conventional journals remain the means of professional recognition, they are often too slow to keep up with changes in computing. With some care, HUMANIST could provide a supplementary venue of immediate benefit to our colleagues. There are two possible methods of distributing such material. More specialized reports should probably be reduced to abstracts and posted in this form to HUMANISTs at large, then sent by the originators directly to those who request them. The more generally interesting material in bulk can be sent in an ordinary message to all HUMANISTs, but this could easily overburden the network so is not generally recommended. We are currently working on a means of centralized storage for relatively large files, such that they could be fetched by HUMANISTs at will, but this means is not yet fully operational. At present the only files we are able to keep centrally are the monthly logbooks of conversations on HUMANIST. See the next section for details. ----------------------------------------------------------------- D. ListServ's Commands and Facilities ----------------------------------------------------------------- As just mentioned, ListServ maintains monthly logbooks of discussions. Thus new members have the opportunity of reading contributions made prior to joining the group. To see a list of these logbooks, send the following command: TELL LISTSERV AT UTORONTO SENDME HUMANIST FILELIST Note that in systems or networks that do not allow interactive commands to be given to a Bitnet ListServ (I will call such systems "non-interactive"), the same thing can be accomplished be sending a message to HUMANIST with the command as the first and only line, which should read as follows: GET HUMANIST FILELIST The logbooks are named HUMANIST LOGyymm, where "yy" represents the last two digits of the year and "mm" the number of the month. The log for July 1987 would, for example, be named HUMANIST LOG8707, and to get this log on a system that supports interactive commands to HUMANIST you would issue the following: TELL LISTSERV AT UTORONTO GET HUMANIST LOG8705 On a non-interactive system, you would send HUMANIST a message with the following line: GET HUMANIST LOG8705 Note that on a non-interactive system as many of these one-line commands as you wish can be put in a message to HUMANIST. ListServ accepts several other commands, for example to retrieve a list of the current members or to set various options. These are described in a document named LISTSERV MEMO. This and other documentation will normally be available to you from your nearest ListServ node and is best fetched from there, since in that way the network is least burdened. You should consult with your local experts to discover the nearest ListServ; they will also be able to help you with whatever problems in the use of ListServ you may encounter. Once you have found the nearest node XXXXXX, type the following: TELL LISTSERV AT XXXXXX INFO ? or, on a non-interactive system: INFO ? The various documents available to you will then be listed. ----------------------------------------------------------------- E. Suggestions and Complaints ----------------------------------------------------------------- Suggestions about the running of HUMANIST or its possible relation to other means of communication are very welcome. So are complaints, particularly if they are constructive. Experience has shown that an electronic discussion group can be either beneficial or burdensome to its members. Much depends on what the group as a whole wants and does not want. Please make your views known, to everyone or to me directly, as appropriate. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Willard McCarty, 8 December 1987 Editor of HUMANIST, Centre for Computing in the Humanities, University of Toronto (MCCARTY@UTOREPAS.BITNET) ========================================================================= Date: 10 December 1987, 23:16:27 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Reply to Rahtz's pro-chaos message (30 lines) From Robert Amsler Actually, `THE' standard for encoding text already exists, so I'm afraid it doesn't really matter whether we like it or not. I speak of SGML itself, ISO standard 8879, which was approved Sept., 1986. The AAP (American Association of Publishers) also has completed work on an application of the SGML standard, the so called AAP standard--which itself will soon be adopted--also, whether or not humanities computing professionals care or not. The good news for chaos fans is that so-far very few publishers have made much of an effort to convert to the AAP standard or to pledge to make their data available in that standard. Some agencies of the US govrnment are making noises about accepting electronic texts in the standard (such as NSF, NLM, etc.) and some software merchants (SoftQuad) have marketed programs which use the standard for typesetting, editing, etc. So... what remains. The AAP standard (and SGML itself) is based on the concept of `document types' having their own appropriate set of `tags'. The document types which have been created are only the most generic sort for magazine articles and books--though they contain specs for tables and math. equations. The humanities community has expresed no preferences so far, such as developing its own document types for things such as plays, poetry, etc. The stage is set for humanists to have a voice in the future of publisher's formats. ========================================================================= Date: 11 December 1987, 09:00:16 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: The flavour of HUMANIST (22 lines) [The following was sent me by Sebastian Rahtz. It is a quotation from the Archaeological Information Exchange and applies well to the kind of thing HUMANIST is, or at least what I think it is. It is quoted with gratitude but without permission. --ed.] "An archaeological information exchange network should avoid programmatic constraints, thereby maintaining the sense of immediacy, the ebb and flow of discourse and activity which represent the situational flux of daily life, while at the same time providing explicit points of reference in order to prevent total chaos." [Brian Molyneaux] ========================================================================= Date: 11 December 1987, 10:22:59 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Text Archives/Markup/TeX From dow@husc6.BITNET (Dominik Wujastyk) Just a point of information, relating to the current debate about markup systems and text archives. It was mentioned by Michael Sperberg-McQueen that the First Folio has to be encoded right down to the typographical level in order to be of maximum use. This reminded me of an on-line database of mathematical abstracts offered by the American Mathematical Society, called MathSci (if I remember correctly). The whole (large) database is encoded in TeX, and a micro implementation of TeX is sold/given to all subscribers to the system. You dial up, search the database, and download whatever you want, or can afford. Then you run your entries through TeX, which is sitting there quietly on your hard disk, and presto, you have a typeset version of your mathematical texts. You can view it on your screen using a DVI previewer, or print it out to paper on anything from a 9-pin matrix printer to a phototypesetter. The important thing in this is the different levels of encoding being represented. The TeX markup specifies the main structural elements of the datum, but the macro package that is located with the TeX implementation (AMSTeX) controls the interpretation of the tags in the database, right down to the positioning of individual characters on the output medium. Just a thought. Dominik Wujastyk dow@harvunxw.bitnet ========================================================================= Date: 12 December 1987, 10:57:48 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Author's query on scholars and telecommunications (25 lines) From Terry Erdt For a book forthcoming from Paradigm Press, entitled The Electronic Scholar's Resource Guide, I am putting together a piece on telecommunications, which will include bulletin board systems, libraries with catalogs capable of dial-up connections, Humanet on Scholarsnet, BRS and Dialog, some forums on CompuServe, Bitnet's IRList Digest, as well as, of course, Humanist. I would appreciate any suggestions for broadening the scope of coverage as well as any information about specific resources. Terrence Erdt Erdt@vuvaxcom.Bitnet Technical Review Editor Computers and the Humanities Grad. Dept. of Library Science Villanova University Villanova, PA 19085 ========================================================================= Date: 12 December 1987, 14:59:03 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: No text or archive; joy in 29 lines From Marshall Gilliland * * * * * ** ** ** ** * * * * * / \ / v \ Seasons *-- --* Happy \ o @ & / / \ / | \ Greetings *-- O --* New / * \ / * \ / % * * * \ and *-- --* Year / * * * * \ / * Saskatoon * *\ / | * | * | * | * \ / O & $ @ \ *-------------------------* | | | | | _&_ % | | Q U | / | |_____| _\@/_ |___| | # | |#SASK| | # == Marshall Gilliland _________________________________________|__#__|_____________________ ========================================================================= Date: 13 December 1987, 13:41:53 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: text encoding: a Thesaurus Linguae Graecae user's perspective. Contributed by Brad Inwood (124 lines) Text mark up and coding have turned out to be THE issues for HUMANISTs to get productively excited about. It is, after all, the most natural focus for the rather loose set of interests shared by those who think of themselves as humanities types. Some observations: The debate about what constitutes an adequate set of tags in a machine-readable text is obviously a reflex of the interests (and discipline) of the researcher. It would be astounding if we could agree about acceptable minimal markup. My own view is that text archives need only hold texts which preserve streams of words, minimal and transparent editorial markup to signal emendations, restorations, etc.; flags for verse and metre; and precise and unambiguous indications of the normal reference format (Bekker pages, columns and lines for Aristotle; line numbers for Greek tragedies; book and line for Homer; or whatever is standard). Where no standard reference style descends to the level of single lines (as within chapters of modern novels) line-level reference should be imposed by the file itself (e.g. chapter 3, line 769: 769th line of that chapter in the electronic file) rather than imported from the text one happens to be imputting from -- even if it is what the researcher regards as the best text. If archives are to create standard reference forms where none exist for printed editions (as in this case) they should do so in a manner appropriate to their medium, not the printed medium. researchers who require more markup should bloody well add it themselves and not burden archives with worries about anything more elaborate. My own work with machine readable texts is limited to various materials in the Thesaurus Linguae Graecae text base; my own notions of minimally acceptable coding and entry format stem from this experience. without pausing too much for the rationale in each case, I would extract the following lessons: 1. preservation of information about page breaks and line ends in the original text is not worth the effort. 2. it is particularly bad if one preserves that information at the cost of retaining soft hyphens in the text which are of no semantic significance but a mere product of the typesetter's art. contrary to what everyone says, it is not trivial to strip them out in a global move; more important, software must be made to do fancy tricks not to take line ends alone as separators; to make it ignore hyphens at line end is even harder. yes, I know it can be done, but why must we bother? removing hyphens from such text is the single most difficult and time-consuming job I have had, and the one with the highest risk of introducing errors into an already proof-read text. 3. the TLG preserves page layout information in a fanatical way: e.g. it will tell you that a given line is to be indented by so many tabs, but not that it is a verse quotation. translating the tabs to spaces is easy enough, but why not just have a tag to mark verse and the particular metre? 4. markup for standard reference style is there in the TLG, but inconsistently implemented from author to author. in the Platonic corpus, for example, Stephanus pages and columns are declared at the head of each dialogue and subsequently incremented by a flag; a special programme is needed to convert the 79th [x] after a declared [x21] actually means Stephanus page 100. Line numbers are usually suppressed (although they are part of the standard reference format for Plato), but occasionally lines are indicated explicitly. No guidance is given as to why this is so, when the much more important page and column references are so badly handled. 5. never let anyone tell you that a decent proofreading job can be done by someone who does not know the language well or is not reading with attention to the sense. the TLG was keyboarded, not scanned, and yet broken characters in the printed edition used have turned up as the wrong character in the final corrected files (Burnet's Oxford Text of Plato is still running on the original plates and there are a lot of broken characters which no literate reader could mistake; but no one caught the rho with the missing tail: looks just like an omicron and would even have scanned as one; the best visual confirmation based on comparison with the printed source would not reveal an error). I fear that scholars are really going to have to have one round of proofreading everything, so I am pleased that some HUMANISTs feel that keyboarding Milton can be fun. 6. standard coding delimiters are needed which can be regarded as non-separators by software (I guess that means that the opening and closing delimiters must be characters NEVER used as delimiters, parentheses or punctuation marks). Otherwise the coding used to mark a conjectural supplement will break the word when the text stream is analyzed by software. 7. if you really want the kind of information which would make an electronic text useful for serious editorial work (full apparatus, notation of font changes, change of hands, etc.) then it seems to me you need something more than an electronic text. you probably need a hyper-text system of some sort or a fantastically complex data-base-cum-text. with custom software. I start from the assumption that most users of electronic texts want a clean accurate electronic copy, well-referenced, so that they can mark it up for their own analytical purposes or search for and analyze the words in it. it is a job of an entirely different order to prepare a data base which can be used to help edit a text. the TLG's omission of textual apparatus is much lamented, and reasonably so; but in this case I think they got it right. better to get the text out and make it usable. if they had waited to settle on how to handle the apparatus in the electronic text, (a) we would still be waiting for the TLG, and (b) they would have had, in effect, to re-edit all of ancient Greek literature, not just enter and correct. the editorial talent for that job simply does not exist. the media via of just typing the apparatus which happens to be in the text you choose to keyboard or scan is a perfect example of falling between two stools: not enough to make electronic analysis possible, too much for absolute ease of use and speed of production. this is all pretty bitty, but that is how user's experience tends to come out, I guess. maybe the peon's perspective will be of some use when the theoretical issues threaten to get out of hand -- or proportion. Brad Inwood ========================================================================= Date: 14 December 1987, 23:49:56 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Gaudeamus igitur in about 200 lines Dear Colleagues: I've never before sent you a list of everyone who belongs to HUMANIST, thinking that you could get this information yourself from ListServ at any time. This time of year, however, motivates me to do so, I guess in celebration of an unusual, international (though monolingual), once noisy, sometimes argumentative, and to me always interesting fellowship. So, to all of us listed below -- some directly, some hidden in redistribution lists -- I wish a very happy Hanukkah and very merry Christmas. I think the Buddha's birthday is also celebrated about this time of year, and doubtless I have missed other holidays, for which forgive me. Yours, W.M. -------------------------------------------------------------------------- * * HUMANIST Discussion list - created 07 MAY 87 * CJOHNSON@ARIZRVAX Christopher Johnson OWEN@ARIZRVAX David Owen ATMKO@ASUACAD Mark Olsen ATPMB@ASUACAD Pier Baldini CNNMJ@BCVMS M. J. Connolly CHOUEKA@BIMACS Yaacov Choueka ALLEN@BROWNVM Allen H. Renear JAZBO@BROWNVM James H. Coombs ST401742@BROWNVM Timothy Seid HAMESSE@BUCLLN11 Jacqueline Hamesse THOMDOC@BUCLLN11 CETEDOC Belgium WORDS@BUCLLN11 Robert Hogenraad JONES@BYUADMIN Randall Jones ECHUCK@BYUHRC Chuck Bush H_JOHANSSON%USE.UIO.UNINETT@CERNVAX Stig Johansson BJORNDAS@CLARGRAD Sterling Bjorndahl YOUNGC@CLARGRAD Charles M. Young spqr@cm.soton.ac.uk Sebastian Rahtz FKOCH%OCVAXA@CMCCVB Christian Koch PRUSSELL%OCVAXA@CMCCVB Roberta Russell mffgkts@cms.umrcc.ac.uk Tony Smith nash@cogito.mit.edu David Nash epkelly@csvax1.tcd.hea.irl Elizabeth Dowse MCCARTHY@CUA William J. McCarthy JMBHC@CUNYVM Joanne M. Badagliacco RSTHC@CUNYVM Robert S. Tannenbaum TERGC@CUNYVM Terence Langendoen MTKUS@CUVMA Mark Kennedy RCLUS@CUVMA Robert C. Lehman SLZUS@CUVMA Sue Zayac cul.henry@cu20b.columbia.edu Chuck Henry cul.lowry@cu20b.columbia.edu Anita Lowry cul.woo@cu20b.columbia.edu Janice Woo humanist@edinburgh.ac.uk Humanist Group r.j.hare@edinburgh.ac.uk Roger Hare cameron@exeter.ac.uk Keith Cameron amsler@flash.bellcore.com Robert Amsler GAUTHIER@FRTOU71 Robert Gauthier DOW@HARVUNXW Dominik Wujastyk GALIARD@HGRRUG5 Harry Gaylord HUET@HUJIPRMB Emanuel Tov ST_JOSEPH@HVRFORD David Carpenter ayi017@ibm.soton.ac.uk Brendan O'Flaherty cpi047@ibm.soton.ac.uk Simon Lane fri001@ibm.soton.ac.uk Sean O'Cathasaigh ayi004@ibm.southampton.ac.uk Brian Molyneaux CONSERVA@IFIIDG Lelio Camilleri GG.BIB@ISUMVS Rosanne Potter S1.CAC@ISUMVS Carol Chapelle sano@jpl-vlsi.arpa Haj Sano nick@lccr.sfu.cdn Nick Cercone bol@liuida.uucp Birgitta Olander RPY383@MAINE Colin Martindale psc90!jdg@mnetor.uucp Joel Goldfield humanist@mts.durham.ac.uk Humanists' Group CHADANT@MUN Tony Chadwick DGRAHAM@MUN David Graham H156004@NJECNVM Kenneth Tompkins FAFEO@NOBERGEN Espen Ore FAFKH@NOBERGEN Knut Hofland collins@nss.cs.ucl.ac.uk Beryl T. Atkins g.dixon@pa.cn.umist.ac.uk Gordon Dixon KRAFT@PENNDRLN Robert Kraft JACKA@PENNDRLS Jack Abercrombie jld1@phx.cam.ac.uk John L. Dawson PKOSSUTH@POMONA Karen Kossuth sdpage@prg.oxford.ac.uk Stephen Page T3B@PSUVM Tom Benson BALESTRI@PUCC Diane P. Balestri RICH@PUCC Richard Giordano TOBYPAFF@PUCC Toby Paff d.mitchell@qmc.ac.uk David Mitchell BARNARD@QUCDN David T. Barnard LESSARDG@QUCDN Greg Lessard LOGANG@QUCDN George Logan ORVIKT@QUCDN Tone Orvik WIEBEM@QUCDN M. G. Wiebe weinshan%cps.msu.edu@relay.cs.net Donald Weinshank GILLILAND@SASK Marshall Gilliland JULIEN@SASK Jacques Julien FRIEDMAN_E@SITVXA Edward A. Friedman JHUBBARD@SMITH Jamie Hubbard ZRCC1001@SMUVM1 Robin C. Cover GX.MBB@STANFORD Malcolm Brown XB.J24@STANFORD John J. Hughes ACDRLK@SUVM Ron Kalinoski DECARTWR@SUVM Dana Cartwright bs83@sysa.salford.ac.uk Max Wood A79@TAUNIVM David Sitman lb0q@te.cc.cmu.edu Leslie Burkholder ECSGHB@TUCC George Brett DUCALL@TUCCVM Frank L. Borchardt DYBBUK@TUCCVM Jeffrey Gillette SREIMER@UALTAVM Stephen R. Reimer TBUTLER@UALTAVM Terry Butler USERDLDB@UBCMTSG Laine Ruus EGC4BFD@UCLAMVS Kelly Stack IBQ1JVR@UCLAMVS John Richardson IMD7VAW@UCLAMVS Vicky Walsh IZZY590@UCLAVM George Bing U18189@UICVM Michael Sperberg-McQueen qghu21@ujvax.ulster.ac.uk Noel Wilson BAUMGARTEN@UMBC Joseph Baumgarten J_CERNY@UNHH Jim Cerny CLAS056@UNLCDC3 John Turner FELD@UOFMCC Michael Feld CSHUNTER@UOGUELPH Stuart Hunter AMPHORAS@UTOREPAS Philippa Matheson ANDREWO@UTOREPAS Andrew Oliver ANNE@UTOREPAS Anne Lancashire BRAINERD@UTOREPAS Barron Brainerd ERSATZ@UTOREPAS Harold Chimpden Earwicker IAN@UTOREPAS Ian Lancashire INWOOD@UTOREPAS Brad Inwood MCCARTY@UTOREPAS Willard McCarty ROBERTS@UTOREPAS Robert Sinkewicz STAIRS@UTOREPAS Mike Stairs WINDER@UTOREPAS Bill Winder YOUNG@UTOREPAS Abigail Young ZACOUR@UTOREPAS Norman Zacour humanist@utorgpu.utoronto Humanist Redistribution List S_RICHMOND@UTOROISE S. Richmond BRADLEY@UTORONTO John Bradley DESOUS@UTORONTO Ronald de Sousa ESWENSON@UTORONTO Eva V. Swenson LIDIO@UTORONTO Lidio Presutti PARROTT@UTORONTO Martha Parrott PAULIE2@UTORONTO Test Account 42104_263@uwovax.uwo.cdn Glyn Holmes 42152_443@uwovax.uwo.cdn Richard Shroyer IDE@VASSAR Nancy Ide a_boddington@vax.acs.open.ac.uk Andy Boddington aeb_bevan@vax.acs.open.ac.uk Edis Bevan may@vax.leicester.ac.uk May Katzen catherine@vax.oxford.ac.uk Catherine Griffin dbpaul@vax.oxford.ac.uk Paul Salotti john@vax.oxford.ac.uk John Cooper logan@vax.oxford.ac.uk Grace Logan lou@vax.oxford.ac.uk Lou Burnard stephen@vax.oxford.ac.uk Stephen Miller susan@vax.oxford.ac.uk Susan Hockey v002@vaxa.bangor.ac.uk Thomas N. Corns udaa270@vaxa.cc.kcl.ac.uk Susan Kruse wwsrs@vaxa.stir.ac.uk Keith Whitelam ej1@vaxa.york.ac.uk Edward James gw2@vaxa.york.ac.uk Geoffrey Wall jrw2@vaxa.york.ac.uk John Wolffe chaa006@vaxb.rhbnc.ac.uk Philip Taylor srrj1@vaxb.york.ac.uk Sarah Rees Jones cstim@violet.berkeley.edu Tim Maher f.e.candlin@vme.glasgow.ac.uk Francis Candlin CHURCHDM@VUCTRVAX Dan M. Church ERDT@VUVAXCOM Terry Erdt fwtompa@watdaisy.uucp Frank Tompa DDROB@WATDCS Don D. Roberts VANEVRA@WATDCS James W. Van Evra WALTER@WATDCS Walter McCutchan drraymond@watmum.waterloo Darrell Raymond makkuni.pa@xerox.com Ranjit Makkuni xeroxhumanists~.x@xerox.com Humanists at Xerox ELI@YALEVM Doug Hawthorne YAEL@YKTVMH2 Yael Ravin DANIEL@YORKVM1 Daniel Bloom YFAN0001@YORKVM1 Gerald L. Gold YFPL0004@YORKVM1 Shu-Yan Mok YFPL0018@YORKVM1 Paul Kashiyama CS100006@YUSOL Peter Roosen-Runge GL250012@YUVENUS Jim Benson * * Total number of users subscribed to the list: 168 ========================================================================= Date: 15 December 1987, 23:40:40 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: An idea about biographies; Supplement 5 (440 lines) Dear Colleagues: At some point in the near future, if anyone would care for such a thing, I have it in mind to do a proper job on the biographies. Apart from the editing and formatting, this would involve collecting a revised biographical statement from each of you, if you'd care to supply one. These might be written or rewritten according to a suggested list of things to be mentioned -- to make them *slightly* less chaotic without taking the play out. The revised collection would be for circulation only on HUMANIST. What do you think? Please let me know if the idea strikes you as worthy of effort. What do you think should be on the list of things to be mentioned? Meanwhile, the next supplement follows. Yours, W.M. -------------------------------------------------------------------------- Autobiographies of HUMANISTs Fifth Supplement Following are 23 additional entries to the collection of autobiographical statements by members of the HUMANIST discussion group. Further additions, corrections, and updates are welcome, to mccarty@utorepas.bitnet. W.M. 16 December 1987 ========================================================================= *Atwell, Eric Steven Centre for Computer Analysis of Language and Speech, AI Division, School of Computer Studies, Leeds University, Leeds LS2 9JT; +44 532 431751 ext 6 I am in a Computer Studies School, but specialise in linguistic and literary computing, and applications in Religious Education in schools. I would particularly like to liaise with other researchers working in similar areas. ========================================================================= *Benson, Tom {akgua,allegra,ihnp4,cbosgd}!psuvax1!psuvm.bitnet!t3b (UUCP) t3b%psuvm.bitnet@wiscvm.arpa (ARPA) Department of Speech Communication, The Pennsylvania State University 227 Sparks Building, University Park, PA 16802; 814-238-5277 I am a Professor of Speech Communication at Penn State University, currently serving as editor of THE QUARTERLY JOURNAL OF SPEECH. In addition, I edit the electronic journal CRTNET (Communication Research and Theory Network). ========================================================================= *CETEDOC (CENTRE DE TRAITEMENT ELECTRONIQUE DES DOCUMENTS) CETEDOC, LLN, BELGIUM THE CETEDOC (CENTRE DE TRAITEMENT ELECTRONIQUE DES DOCUMENTS) IS AN INSTITUTION OF THE CATHOLIC UNIVERSITY OF LOUVAIN AT LOUVAIN-LA-NEUVE, BELGIUM. ITS DIRECTOR IS PROF. PAUL TOMBEUR. ========================================================================= *Chadwick, Tony Department of French & Spanish, Memorial University of Newfoundland St. John's, A1B 3X9; (709)737-8572 At the moment I have two interests in computing: one is the use of computers in composition classes for second language learners, the socond in computerized bibliographies. I have an M.A. in French from McMaster and have been teaching at Memorial University since 1967. Outside computers, my research interests lie in Twentieth Century French Literature. ========================================================================= *Coombs, James H. Institute for Research in Information and Scholarship, Brown University Box 1946, Providence, RI 02912 I have a Ph.D. in English (Wordsworth and Milton: Prophet-Poets) and an M.A. in Linguistics, both from Brown University. I have been Mellon Postdoctoral Fellow in English and am about to become Software Engineer, Research, Institute for Research in Information and Scholarship (IRIS). I have co-edited an edition of letters (A Pre-Raphaelite Friendship, UMI Research Press) and have written on allusion and implicature (Poetics, 1985; Brown Working Papers in Linguistics). Any day now, the November Communications of the ACM will appear with an article on "Markup Systems and the Future of Scholarly Text Processing," written with Allen H. Renear and Steven J. DeRose. I developed the English Disk on the Brown University mainframe, which provides various utilities for humanists, primarily for word processing and for staying sane in CMS. I wrote a Bibliography Management System for Scholars (BMSS; 1985) and then an Information Management System for Scholars (IMSS; 1986). Both are in PL/I and may best be considered "aberrant prototypes," used a little more than necessary for research but never commercialized. I am currently working on a system with similar functionality for the IBM PC. Last year, I developed a "comparative concordance" for the multiple editions of Wordsworth's Prelude. I am delayed in that by the lack of the final volume of Cornell's fine editions. A preliminary paper will appear in the working papers of Brown's Computing in the Humanities User's Group (CHUG); a full article will be submitted in January, probably to CHUM. I learned computational linguistics from Prof. Henry Kucera, Nick DeRose, and Andy Mackie. Richard Ristow taught me software engineering management or, more accurately, teaches me more every time I talk to him. I worked on the spelling corrector, tuning algorithms. I worked on the design of the grammar corrector, designed the rule structures, and developed the rules with Dr. Carol Singley. Then I started with Dr. Phil Shinn's Binary Parser and developed a language independent N-ary Parser (NAP). NAP reads phrase structure rules as well as streams of tagged words (see DeRose's article in Computational Linguistics for information on the disambiguation) and generates a parse tree, suitable for generalized pattern matching. Finally, at IRIS, I will be developing online dictionary access from our hypermedia system: Intermedia (affix stripping, unflection, definition, parsing, etc.). In addition, we are working on a unified system for accessing multiple databases, including CD-ROM as well as remote computers. ========================================================================= *Dawson, John L. University of Cambridge, Literary and Linguistic Computing Centre Sidgwick Avenue, Cambridge CB3 9DA England; (0223) 335029 I have been in charge of the Literary and Linguistic Computing Centre of Cambridge University since 1974, and now hold the post of Assistant Director of Research there. The LLCC acts as a service bureau for all types of humanities computing, including data preparation, and extends to the areas of non-scientific computing done by members of science and social science faculties. Much of our work remains in the provision of concordances to various texts in a huge range of languages, either prepared by our staff, by the user, or by some external body (e.g. TLG, Toronto Corpus of Old English, etc.) Some statistical analysis is undertaken, as required by the users. Recently, we have begun preparing master pages for publication using a LaserWriter, and several books have been printed by this means. My background is that of a mathematics graduate with a Diploma in Computer Science (both from Cambridge). I am an Honorary Member of ALLC, having been its Secretary for six years, and a member of the Association for History and Computing. My present research (though I don't have much time to do it) lies in the comparison of novels with their translations in other languages. At the moment I am working on Stendhal's "Le Rouge et le Noir" in French and English, and on Jane Austen's "Northanger Abbey" in English and French. I have contributed several papers at ALLC and ACH conferences, and published in the ALLC Journal (now Literary & Linguistic Computing) and in CHum. ========================================================================= *Giordano, Richard I am a new humanities specialist at Princeton University Computer Center (Computing and Information Technology). I come to Prinecton from Columbia University where I was a Systems Analyst in the Libraries for about six years. I am just finishing my PhD dissertation in American history at Columbia as well. ========================================================================= *Johnson, Christopher Language Research Center, Room 345 Modern Languages, University of Arizona Tucson, Az 85702; (602) 621-1615 I am currently the Director of the Lnaguage Research Center at the University of Arizona. Masters in Educational Media, Univeristy of Arizona; Ph.D. in Secondary Education (Minor in Instructional Technology), UA. I have worked in the area of computer-based instruction since 1976. I gained most of my experience on the PLATO system here at the University and as a consultant to Control Data Corp. Two years ago I moved to the Faculty of Humanities to create the Language Research Center, a support facility for our graduate students, staff, and faculty. My personnal research interests are in the area for individual learning styles, critical thinking skills, Middle level education and testing as they apply to computer-based education. The research interests of my faculty range from text analysis to word processing to research into the use of the computer as an instructional tool. ========================================================================= *Johansson, Stig Dept of English, Univ of Oslo, P.O. Box 1003, Blindern, N-0315 Oslo 3, Norway. Tel: 456932 (Oslo). Professor of English Language, Univ of Oslo. Relevant research interest: computers in English language research. Coordinating secretary of the International Computer Archive of Modern English (ICAME) and editor of the ICAME Journal. Member of the ALLC. ========================================================================= *Kalinoski, Ron Academic Computing Services, 215 Machinery Hall, Syracuse University Syracuse, New York 13244; 315/423-3998 I am Associate Director for Research Computing at Syracuse University and am interested in sponsoring a seminar series next spring focusing on computing issues in the humanities. I hope that this will lead to hiring a full-time staff person to provide user support services for humanities computing. ======================================================================== *Langendoen, D. Terence Linguistics Program, CUNY Graduate Center, 33 West 42nd Street, New York, NY 10036-8099 USA; 212-790-4574 (soon to change) I am a theoretical linguist, interested in parsing and in computational linguistics generally. I have also worked on the problem of making sophisticated text-editing tools available for the teaching of writing. I am currently Secretary-Treasurer of the Linguistic Society of America, and will continue to serve until the end of calendar year 1988. I have also agreed to serve on two working committees on the ACH/ALLC/ACL project on standards for text encoding, as a result of the conference held at Vassar in mid-November 1987. ========================================================================= *Molyneaux, Brian Department of Archaeology, University of Southampton, England. I am at present conducting postgraduate research in art and ideology and its relation to material culture. I am also a Field Associate at the Royal Ontario Museum, Department of New World Archaeology, specialising in rock art research. I obtained a BA (Hons) in English Literature, a BA (Hon) in Anthropology, and an MA in Art and Archaeology at Trent University, Peterborough, Ontario. My research interest in computing in the Humanities includes the analysis of texts and art works within the context of social relations. ========================================================================= *Olofsson, Ake I am at the Department of Psychology, University of Umea, in the north of Sweden. Part of my work at the department is helping people to learn how to use our computer (VAX and the Swedish university Decnet) and International mail (Bitnet). We are four system-managers at the department and have about 40 ordinary users, running word-processing, statistics and Mail programs. ======================================================================== *ORVIK, TONE POST OFFICE BOX 1822, KINGSTON, ON K7L 5J6; 613 - 389 - 6092 WORKING ON BIBLE RESEARCH WITH AFFILIATION TO QUEEN'S UNIVERSITY'S DEPT. OF RELIGIOUS STUDIES; CREATING CONCORDANCE OF SYMBOLOGY. HAVE WORKED AS A RESEARCHER, TEACHER, AND WRITER, IN EUROPE AND CANADA; ESPECIALLY ON VARIOUS ASPECTS OF BIBLE AND COMPARATIVE RELIGION. INTERESTED IN CONTACT WITH NETWORK USERS WITH SAME/SIMILAR INTEREST OF RESEARCH. ========================================================================= *Potter, Rosanne G. Department of English, Iowa State University, Ross Hall 203, (515) 294-2180 (Main Office); (515) 294-4617 (My office) I am a literary critic; I use the mainframe computer for the analysis of literary texts. I have also designed a major formatting bibliographic package, BIBOUT, in wide use at Iowa State University, also installed at Princeton and Harvard. I do not program, rather I work with very high level programming specialists, statisticians, and systems analysts here to design the applications that I want for my literary critical purposes. I am editing a book on Literary Computing and Literary Criticism containing essays by Richard Bailey, Don Ross, Jr., John Smith, Paul Fortier, C. Nancy Ide, Ruth Sabol, myself and others. I've been on the board of ACH, have been invited to serve on the CHum editorial board. ========================================================================= *Renear, Allen H. My original academic discipline is philosophy (logic, epistemology, history), and though I try to keep that up (and expect my Ph.D. this coming June) I've spent much of the last 7 years in academic computing, particularly humanities support. I am currently on the Computer Center staff here at Brown as a specialist in text processing, typesetting and humanities computing. I've had quite a bit of practical experience designing, managing, and consulting on large scholarly publication projects and my major research interests are similarly in the general theory of text representation and strategies for text based computing. I am a strong advocate of the importance of SGML for all computing that involves text; my views on this are presented in the Coombs, Renear, DeRose article on Markup Systems in the November 1987 *Communications of the ACM*. Other topics of interest to me are structure oriented editing, hypertext, manuscript criticism, and specialized tools for analytic philosophers. My research in philosophy is mostly in epistemic logic (similar to what AI folks call "knowledge representation"); it has some surprising connections with emerging theories of text structure. I am a contact person for Brown's very active Computing in the Humanities User's Group (CHUG). ========================================================================= *Richardson, John Associate Professor, University of California (Ls Angeles), GSLIS; (213) 825-4352 One of my interests is analytical bibliography, the desription of printed books. At present I am intrigued with the idea that we can describe various component parts of books, notably title pages, paper, and typefaces, but the major psycho-physical element, ink, is not described. Obviously this problem involves humanistic work but also a far degree of sophistication with ink technology. I would be interested in talking with or corresponding with anyone on this topic... ========================================================================= *Taylor, Philip Royal Holloway & Bedford New College; University of London; U.K; (+44) 0784 34455 Ext: 3172 Although not primarily concerned with the humanities (I am principal systems programmer at RHBNC), I am freqently involved in humanties projects, particularly in the areas of type-setting (TeX), multi-lingual text processing, and natural language analysis, among others. ========================================================================= *Whitelam, Keith W. Dept. of Religious Studies, University of Stirling, Stirling FK9 4LA Scotland; Tel. 0786 3171 ext. 2491 I have been lecturer in Religious Studies at Stirling since 1978 with prime responsibility for Hebrew Bible/Old Testament. My research interests are mainly aimed at exploring new approaches to the study of early Israelite/ Palestinian history in an interdisciplinary context, i.e. drawing upon social history, anthropology, archaeology, historical demography, etc. I have been constructing a database of Palestinian archaeological sites, using software written by the Computing Science department, in order to analyse settlement patterns, site hierarchies, demography, etc. The department of Environmental Science has recently purchased Laser Scan an offered me access to the facilities. This will enable me to display settlement patterns, sites, etc in map form for analysis and comparison. I am particularly interested in corresponding/discussing with others working on similar problems, particularly in Near Eastern archaeology. I have also been involved in exploring the possibilities of setting up campus-wide text processing laser printing facilities. It looks as though we shall be able to offer a LaTeX service in the New Year. We are also planning to offer a WYSIWYG service, such as Ventura on IBM or a combination with Macs for the production of academic papers. Again I have a particular interest in the use of foreign fonts, e.g. Hebrew, Akkadian, Ugaritic, Greek, etc. My teaching and research on the Hebrew Bible leads to a concern with developing computer-aided text analysis, although I have had little time to explore this area. We have OCP available on our mainframe VAX but my use of this has been very limited. I see this as an important area of future development in teaching and research along with Hebrew teaching. ========================================================================= *Wilson, Noel Head of Academic Services, University of Ulster, Shore Road Newtownabbey, Co. Antrim, N. Ireland BT37 0QB; (0232)365131 Ext. 2449 My post has overall responsibility for the central academic computing service, offered by the Computer Centre, to the University academic community. Within this brief, my Section is responsible for the acquisition/development and documentation of CAL and proprietary software. We currently provide a program library in support of courses and research which contains approx. 400 programs; of these approx. 80 are in-house developments, 50 proprietary systems and the remainder obtained from a variety of sources incl. program libraries (eg CONDUIT - Univ. of Iowa). We have only very recently addressed computing within the Faculty of Humanities; academic staff in the Faculty have used computers in a research capacity and are now turning towards the various u'grad. courses. Presently we hold a grant of 79,000 pounds from the United Kingdom Computer Board for Universities and Research Councils, for the development of CAL software in support of Linguistics and Lexicostatistics. Within this project we are attempting to develop courseware to support grammar teaching in French, German, Spanish and Irish (details of existing materials appropriate to u'grad. teaching would be most welcome!). We also are investigating the creation of software to support an analysis of text (comparative studies) - in this area we are looking at frequency counts assoc. with words/expressions/words within registers etc. - again help would be appreciated. I am happy to provide further details on any of the above points and wish to keep informed of useful Humanities-related CAL work elsewhere. We currently use the Acorn BBC micro. but are also moving in the direction of PC clones. ========================================================================= *Wood, Max Computing Officer, 403 Maxwell Building, The University of Salford The Crescent, Salford, G.M.C. ENGLAND; 061-736-5843 Extension 7399 We are involved in a project to introduce the use of computing in teaching here in the Business and Management Department of Salford University and I am keen to extend links to other Business schools both here in the U.K. and indeed in the U.S.A. Obviously therefore I would like to join your forum so as to possibly exchange ideas news etc. My background is essentially in computing and I mainly supervise the computing resources available to our Department, and have formulated much of the teaching systems we currently use. ========================================================================= *Wujastyk, Dominik I am a Sanskritist with some knowledge of computing. Once upon a time (1977-78) I learned Snobol4 from Susan Hockey at Oxford, where I did undergraduate and later doctoral Sanskrit. More recently, I have been using TeX on my PC AT (actually a Compaq III), and in the middle of this summer I published a book _Studies on Indian Medical History_, which was done in TeX and printed out on an HP LJ II, and sent to the publisher as camera ready. It all went very well. I have received the MS DOS Icon implementation from Griswold at Arizona, but have not spent time on it. I am trying to teach myself at the moment, just to learn enough to knock out ocassional routines to convert files from wordprocessor formats to TeX, and that sort of thing. (Probably reinventing the wheel.) At the present time I am editing a Sanskrit text on medieval alchemy, and doing all the formatting of the edition in LaTeX. Before I ever started Sanskrit, I did a degree in Physics at Imperial College in London, but that is so long ago that I don't like to think about it! ========================================================================= *Young, Charles M. Dept. of Philosophy, The Claremont Graduate School I am a member of the American Philosophical Association's committee on Computer Use in Philosophy. One of my pet projects is to find some way of making the Thesaurus Linguae Graecae database (all of classical Greek through the 7th century C.E.) more readily available to working scholars. ========================================================================= *END* ========================================================================= Date: 16 December 1987, 15:24:57 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: National text archive (45 lines) From C. Faulhaber (U.C. Berkeley, ked@coral.berkeley.edu) via Tim Maher 1) Text Archives. What is needed is some sort of alliance between the computing types and the professional librarians. It seems to me that there is a much better chance of getting a national text archive if it can be integrated into an ongoing concern. I list three candidates, in decreasing order of feasibility: a) RLG: Through their PRIMA project they are actively interested in providing access to new information resources. b) The organization at the U. of Michigan which already maintains data bases for use in the social sciences. c) OCLC: They have relatively less experience than RLG in providing services for research institutions but are aggressively expanding their range. 2) Citation dictionaries: John Nitti (Medieval Spanish Seminary, 1120 Van Hise Hall, U. of Wisconsin, Madison 53720) has been working on just such a dictionary (Dictionary of the Old Spanish Language) since ca. 1970, although the original plan was to draw the citations from texts transcribed specifically for that purpose and publish in standard format on OED lines. With optical disk technology, the possibility now exists to combine DOSL and athe texts themselves. In fact, we are contemplating the possibility of combining these 2 elements with my own Bibliography of Old Spanish Texts serving as a data base front end in order to search through texts on the basis of, e.g., date, author, subject. Prof. Charles Faulhaber Dept. of Spanish and Portuguese Univ. of California, Berkeley. ked@coral.berkeley.edu ========================================================================= Date: 17 December 1987, 15:53:31 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Info (30 lines) From Mark Olsen A student here is doing a project on the discourse of John Woolman and is looking for computer readable versions of texts by other 18th century American Quakers for comparisons. I would appreciate any info concerning the availability of these texts before scanning them in. A second, stranger request has come through. I have a faculty member who is studying a 19th century manuscript. Parts of it were crossed out and she is wondering if there is the possibility of using computer enhancement of the images to improve readability. She has tried blowing-up the images, but has not gotten much. Any ideas? I must that I know nothing about image processing except what I read about concerning the space shots. Maybe I should try JPL (snicker). Thanks in advance, Mark Olsen I don't know how many lines of text this has, but it doesn't conform to any known mark-up standard. ========================================================================= Date: 17 December 1987, 19:51:24 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Christmas gift for HUMANISTs (50 lines) From Sebastian Rahtz The following Christmas gift for HUMANISTs is prompted by a description Lou Burnard sent me of the Vassar 'text encoding standards' meeting, and by the subsequent HUMANIST discussion (no I dont have permission to 'publish' this) Incidentally, a recent contribution to HUMANIST implied that text-encoding standards were a central issue to all HUMANISTs. May I stand up for the archaeologists, musicians, art-historians, linguists and philosophers amongst us to say that there is more to humanities computing than text! equality for all. Sebastian Rahtz (spqr@uk.ac.soton.cm) A cold coming we had of it, just the worst time of the year for a journey, and such a long journey: the ways deep and the weather sharp, a hard time we had of it. at the end we preferred to travel all night, sleeping in snatches, with the voices singing in our ears, saying that this was all folly. but there was no information, and so we continued and arrived at evening, not a moment too soon finding the place; it was (you may say) satisfactory. all this was a long time ago, I remember, and I would do it again, but set down this set down this: were we led all that way for birth or death? there was a birth, certainly, we had evidence and no doubt. I had seen birth and death, but had thought they were different; this birth was hard and bitter agony for us, like death, our death. we returned to our places, these kingdoms, but no longer at ease here, in the old dispensation, with an alien people clutching their gods. I should be glad of another death. ========================================================================= Date: 21 December 1987, 15:01:37 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Test message This is a test. Please neither do nor conclude anything because of its appearance. ========================================================================= Date: 21 December 1987, 19:29:37 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Whereabouts of R.G. Ragsdale From Christian Koch On December 4 an announcement was sent out over HUMANIST regarding a proposed course to be offered in connection with the European Conference on Computers in Education to be held next summer in Lausanne, Switzerland. The proposal was by R.G. Ragsdale and the course in question was International Educational Computing: An Interaction of Values and Technology. Anyone interested in further information was to contact R.G. Ragsdale. Unfortunately there was no address given, neither e-mail nor regular mail. I am wondering if anyone knows the whereabouts of R.G. Ragsdale. Am also wondering if anyone knows a contact person for the European Conference on Computers in Education. Thanks! Christian Koch Oberlin College Bitnet: fkoch%ocvaxa@cmccvb ========================================================================= Date: 22 December 1987, 10:50:41 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: ICEBOL (106 lines) From David Sitman ICEBOL3 April 21-22, 1988 Dakota State College Madison, SD 57042 ICEBOL3, the International Conference on Symbolic and Logical Computing, is designed for teachers, scholars, and programmers who want to meet to exchange ideas about non-numeric computing. In addition to a focus on SNOBOL, SPITBOL, and Icon, ICEBOL3 will feature introductory and technical presentations on other dangerously powerful computer languages such as Prolog and LISP, as well as on applications of BASIC, Pascal, and FORTRAN for processing strings of characters. Topics of discussion will include artificial intelligence, expert systems, desk-top publishing, and a wide range of analyses of texts in English and other natural languages. Parallel tracks of concurrent sessions are planned: some for experienced computer users and others for interested novices. Both mainframe and microcomputer applications will be discussed. ICEBOL's coffee breaks, social hours, lunches, and banquet will provide a series of opportunities for participants to meet and informally exchange information. Sessions will be scheduled for "birds of a feather" to discuss common interests (for example, BASIC users group, implementations of SNOBOL, computer generated poetry). Call For Papers Abstracts (minimum of 250 words) or full texts of papers to be read at ICEBOL3 are invited on any application of non-numeric programming. Planned sessions include the following: artificial intelligence expert systems natural language processing analysis of literary texts (including bibliography, concordance, and index preparation) linguistic and lexical analysis (including parsing and machine translation) preparation of text for electronic publishing computer assisted instruction grammar and style checkers music analysis. Papers must be in English and should not exceed twenty minutes reading time. Abstracts and papers should be received by January 15, 1988. Notification of acceptance will follow promptly. Papers will be published in ICEBOL3 Proceedings. Presentations at previous ICEBOL conferences were made by Susan Hockey (Oxford), Ralph Griswold (Arizona), James Gimpel (Lehigh), Mark Emmer (Catspaw, Inc.), Robert Dewar (New York University), and many others. Copies of ICEBOL 86 Proceedings are available. ICEBOL3 is sponsored by The Division of Liberal Arts and The Business and Education Institute of DAKOTA STATE COLLEGE Madison, South Dakota For Further Information All correspondence including abstracts and papers as well as requests for registration materials should be sent to: Eric Johnson ICEBOL Director 114 Beadle Hall Dakota State College Madison, SD 57042 U.S.A. (605) 256-5270 Inquiries, abstracts, and correspondence may also be sent via electronic mail to: ERIC @ SDNET (BITNET) ------- End of Forwarded Message ========================================================================= Date: 23 December 1987, 21:46:36 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: The reason for a silent HUMANIST from 15 to 21 December Dear Colleagues: You may have assumed that the silence of HUMANIST from about 15 to 21 December was due to a mass unplugging of terminals and departing for seasonal festivals, but this is not entirely so. A new version of ListServ (the software that runs HUMANIST), just installed about then, ran amok, wrote about 1,000,000 lines in the system log here, and so provoked our postmaster into disconnecting it -- and with it HUMANIST. The messages that seemed to be sent out during that period went into limbo, where apparently they still sit. These may suddenly appear in your readers, perhaps even in duplicate or triplicate, or they may not show up at all. Against the latter possibility, I am sending you copies of the limbo'd messages in two batches, with my best wishes for your good health and prosperity in the new year. Yours, W.M. _________________________________________________________________________ Dr. Willard McCarty / Centre for Computing in the Humanities University of Toronto / 14th floor, Robarts Library / 130 St. George St. Toronto, Canada M5S 1A5 / (416) 978-4238 / mccarty@utorepas.bitnet ========================================================================= Date: 23 December 1987, 22:01:38 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Batch 1 of limbo'd messages (448 lines) ========================================================================= Date: 15 December 1987, 23:40:40 EST From: MCCARTY at UTOREPAS To: HUMANIST at UTORONTO Subject: An idea about biographies; Supplement 5 (440 lines) Dear Colleagues: At some point in the near future, if anyone would care for such a thing, I have it in mind to do a proper job on the biographies. Apart from the editing and formatting, this would involve collecting a revised biographical statement from each of you, if you'd care to supply one. These might be written or rewritten according to a suggested list of things to be mentioned -- to make them *slightly* less chaotic without taking the play out. The revised collection would be for circulation only on HUMANIST. What do you think? Please let me know if the idea strikes you as worthy of effort. What do you think should be on the list of things to be mentioned? Meanwhile, the next supplement follows. Yours, W.M. -------------------------------------------------------------------------- Autobiographies of HUMANISTs Fifth Supplement Following are 23 additional entries to the collection of autobiographical statements by members of the HUMANIST discussion group. Further additions, corrections, and updates are welcome, to mccarty@utorepas.bitnet. W.M. 16 December 1987 ========================================================================= *Atwell, Eric Steven Centre for Computer Analysis of Language and Speech, AI Division, School of Computer Studies, Leeds University, Leeds LS2 9JT; +44 532 431751 ext 6 I am in a Computer Studies School, but specialise in linguistic and literary computing, and applications in Religious Education in schools. I would particularly like to liaise with other researchers working in similar areas. ========================================================================= *Benson, Tom {akgua,allegra,ihnp4,cbosgd}!psuvax1!psuvm.bitnet!t3b (UUCP) t3b%psuvm.bitnet@wiscvm.arpa (ARPA) Department of Speech Communication, The Pennsylvania State University 227 Sparks Building, University Park, PA 16802; 814-238-5277 I am a Professor of Speech Communication at Penn State University, currently serving as editor of THE QUARTERLY JOURNAL OF SPEECH. In addition, I edit the electronic journal CRTNET (Communication Research and Theory Network). ========================================================================= *CETEDOC (CENTRE DE TRAITEMENT ELECTRONIQUE DES DOCUMENTS) CETEDOC, LLN, BELGIUM THE CETEDOC (CENTRE DE TRAITEMENT ELECTRONIQUE DES DOCUMENTS) IS AN INSTITUTION OF THE CATHOLIC UNIVERSITY OF LOUVAIN AT LOUVAIN-LA-NEUVE, BELGIUM. ITS DIRECTOR IS PROF. PAUL TOMBEUR. ========================================================================= *Chadwick, Tony Department of French & Spanish, Memorial University of Newfoundland St. John's, A1B 3X9; (709)737-8572 At the moment I have two interests in computing: one is the use of computers in composition classes for second language learners, the socond in computerized bibliographies. I have an M.A. in French from McMaster and have been teaching at Memorial University since 1967. Outside computers, my research interests lie in Twentieth Century French Literature. ========================================================================= *Coombs, James H. Institute for Research in Information and Scholarship, Brown University Box 1946, Providence, RI 02912 I have a Ph.D. in English (Wordsworth and Milton: Prophet-Poets) and an M.A. in Linguistics, both from Brown University. I have been Mellon Postdoctoral Fellow in English and am about to become Software Engineer, Research, Institute for Research in Information and Scholarship (IRIS). I have co-edited an edition of letters (A Pre-Raphaelite Friendship, UMI Research Press) and have written on allusion and implicature (Poetics, 1985; Brown Working Papers in Linguistics). Any day now, the November Communications of the ACM will appear with an article on "Markup Systems and the Future of Scholarly Text Processing," written with Allen H. Renear and Steven J. DeRose. I developed the English Disk on the Brown University mainframe, which provides various utilities for humanists, primarily for word processing and for staying sane in CMS. I wrote a Bibliography Management System for Scholars (BMSS; 1985) and then an Information Management System for Scholars (IMSS; 1986). Both are in PL/I and may best be considered "aberrant prototypes," used a little more than necessary for research but never commercialized. I am currently working on a system with similar functionality for the IBM PC. Last year, I developed a "comparative concordance" for the multiple editions of Wordsworth's Prelude. I am delayed in that by the lack of the final volume of Cornell's fine editions. A preliminary paper will appear in the working papers of Brown's Computing in the Humanities User's Group (CHUG); a full article will be submitted in January, probably to CHUM. I learned computational linguistics from Prof. Henry Kucera, Nick DeRose, and Andy Mackie. Richard Ristow taught me software engineering management or, more accurately, teaches me more every time I talk to him. I worked on the spelling corrector, tuning algorithms. I worked on the design of the grammar corrector, designed the rule structures, and developed the rules with Dr. Carol Singley. Then I started with Dr. Phil Shinn's Binary Parser and developed a language independent N-ary Parser (NAP). NAP reads phrase structure rules as well as streams of tagged words (see DeRose's article in Computational Linguistics for information on the disambiguation) and generates a parse tree, suitable for generalized pattern matching. Finally, at IRIS, I will be developing online dictionary access from our hypermedia system: Intermedia (affix stripping, unflection, definition, parsing, etc.). In addition, we are working on a unified system for accessing multiple databases, including CD-ROM as well as remote computers. ========================================================================= *Dawson, John L. University of Cambridge, Literary and Linguistic Computing Centre Sidgwick Avenue, Cambridge CB3 9DA England; (0223) 335029 I have been in charge of the Literary and Linguistic Computing Centre of Cambridge University since 1974, and now hold the post of Assistant Director of Research there. The LLCC acts as a service bureau for all types of humanities computing, including data preparation, and extends to the areas of non-scientific computing done by members of science and social science faculties. Much of our work remains in the provision of concordances to various texts in a huge range of languages, either prepared by our staff, by the user, or by some external body (e.g. TLG, Toronto Corpus of Old English, etc.) Some statistical analysis is undertaken, as required by the users. Recently, we have begun preparing master pages for publication using a LaserWriter, and several books have been printed by this means. My background is that of a mathematics graduate with a Diploma in Computer Science (both from Cambridge). I am an Honorary Member of ALLC, having been its Secretary for six years, and a member of the Association for History and Computing. My present research (though I don't have much time to do it) lies in the comparison of novels with their translations in other languages. At the moment I am working on Stendhal's "Le Rouge et le Noir" in French and English, and on Jane Austen's "Northanger Abbey" in English and French. I have contributed several papers at ALLC and ACH conferences, and published in the ALLC Journal (now Literary & Linguistic Computing) and in CHum. ========================================================================= *Giordano, Richard I am a new humanities specialist at Princeton University Computer Center (Computing and Information Technology). I come to Prinecton from Columbia University where I was a Systems Analyst in the Libraries for about six years. I am just finishing my PhD dissertation in American history at Columbia as well. ========================================================================= *Johnson, Christopher Language Research Center, Room 345 Modern Languages, University of Arizona Tucson, Az 85702; (602) 621-1615 I am currently the Director of the Lnaguage Research Center at the University of Arizona. Masters in Educational Media, Univeristy of Arizona; Ph.D. in Secondary Education (Minor in Instructional Technology), UA. I have worked in the area of computer-based instruction since 1976. I gained most of my experience on the PLATO system here at the University and as a consultant to Control Data Corp. Two years ago I moved to the Faculty of Humanities to create the Language Research Center, a support facility for our graduate students, staff, and faculty. My personnal research interests are in the area for individual learning styles, critical thinking skills, Middle level education and testing as they apply to computer-based education. The research interests of my faculty range from text analysis to word processing to research into the use of the computer as an instructional tool. ========================================================================= *Johansson, Stig Dept of English, Univ of Oslo, P.O. Box 1003, Blindern, N-0315 Oslo 3, Norway. Tel: 456932 (Oslo). Professor of English Language, Univ of Oslo. Relevant research interest: computers in English language research. Coordinating secretary of the International Computer Archive of Modern English (ICAME) and editor of the ICAME Journal. Member of the ALLC. ========================================================================= *Kalinoski, Ron Academic Computing Services, 215 Machinery Hall, Syracuse University Syracuse, New York 13244; 315/423-3998 I am Associate Director for Research Computing at Syracuse University and am interested in sponsoring a seminar series next spring focusing on computing issues in the humanities. I hope that this will lead to hiring a full-time staff person to provide user support services for humanities computing. ======================================================================== *Langendoen, D. Terence Linguistics Program, CUNY Graduate Center, 33 West 42nd Street, New York, NY 10036-8099 USA; 212-790-4574 (soon to change) I am a theoretical linguist, interested in parsing and in computational linguistics generally. I have also worked on the problem of making sophisticated text-editing tools available for the teaching of writing. I am currently Secretary-Treasurer of the Linguistic Society of America, and will continue to serve until the end of calendar year 1988. I have also agreed to serve on two working committees on the ACH/ALLC/ACL project on standards for text encoding, as a result of the conference held at Vassar in mid-November 1987. ========================================================================= *Molyneaux, Brian Department of Archaeology, University of Southampton, England. I am at present conducting postgraduate research in art and ideology and its relation to material culture. I am also a Field Associate at the Royal Ontario Museum, Department of New World Archaeology, specialising in rock art research. I obtained a BA (Hons) in English Literature, a BA (Hon) in Anthropology, and an MA in Art and Archaeology at Trent University, Peterborough, Ontario. My research interest in computing in the Humanities includes the analysis of texts and art works within the context of social relations. ========================================================================= *Olofsson, Ake I am at the Department of Psychology, University of Umea, in the north of Sweden. Part of my work at the department is helping people to learn how to use our computer (VAX and the Swedish university Decnet) and International mail (Bitnet). We are four system-managers at the department and have about 40 ordinary users, running word-processing, statistics and Mail programs. ======================================================================== *ORVIK, TONE POST OFFICE BOX 1822, KINGSTON, ON K7L 5J6; 613 - 389 - 6092 WORKING ON BIBLE RESEARCH WITH AFFILIATION TO QUEEN'S UNIVERSITY'S DEPT. OF RELIGIOUS STUDIES; CREATING CONCORDANCE OF SYMBOLOGY. HAVE WORKED AS A RESEARCHER, TEACHER, AND WRITER, IN EUROPE AND CANADA; ESPECIALLY ON VARIOUS ASPECTS OF BIBLE AND COMPARATIVE RELIGION. INTERESTED IN CONTACT WITH NETWORK USERS WITH SAME/SIMILAR INTEREST OF RESEARCH. ========================================================================= *Potter, Rosanne G. Department of English, Iowa State University, Ross Hall 203, (515) 294-2180 (Main Office); (515) 294-4617 (My office) I am a literary critic; I use the mainframe computer for the analysis of literary texts. I have also designed a major formatting bibliographic package, BIBOUT, in wide use at Iowa State University, also installed at Princeton and Harvard. I do not program, rather I work with very high level programming specialists, statisticians, and systems analysts here to design the applications that I want for my literary critical purposes. I am editing a book on Literary Computing and Literary Criticism containing essays by Richard Bailey, Don Ross, Jr., John Smith, Paul Fortier, C. Nancy Ide, Ruth Sabol, myself and others. I've been on the board of ACH, have been invited to serve on the CHum editorial board. ========================================================================= *Renear, Allen H. My original academic discipline is philosophy (logic, epistemology, history), and though I try to keep that up (and expect my Ph.D. this coming June) I've spent much of the last 7 years in academic computing, particularly humanities support. I am currently on the Computer Center staff here at Brown as a specialist in text processing, typesetting and humanities computing. I've had quite a bit of practical experience designing, managing, and consulting on large scholarly publication projects and my major research interests are similarly in the general theory of text representation and strategies for text based computing. I am a strong advocate of the importance of SGML for all computing that involves text; my views on this are presented in the Coombs, Renear, DeRose article on Markup Systems in the November 1987 *Communications of the ACM*. Other topics of interest to me are structure oriented editing, hypertext, manuscript criticism, and specialized tools for analytic philosophers. My research in philosophy is mostly in epistemic logic (similar to what AI folks call "knowledge representation"); it has some surprising connections with emerging theories of text structure. I am a contact person for Brown's very active Computing in the Humanities User's Group (CHUG). ========================================================================= *Richardson, John Associate Professor, University of California (Ls Angeles), GSLIS; (213) 825-4352 One of my interests is analytical bibliography, the desription of printed books. At present I am intrigued with the idea that we can describe various component parts of books, notably title pages, paper, and typefaces, but the major psycho-physical element, ink, is not described. Obviously this problem involves humanistic work but also a far degree of sophistication with ink technology. I would be interested in talking with or corresponding with anyone on this topic... ========================================================================= *Taylor, Philip Royal Holloway & Bedford New College; University of London; U.K; (+44) 0784 34455 Ext: 3172 Although not primarily concerned with the humanities (I am principal systems programmer at RHBNC), I am freqently involved in humanties projects, particularly in the areas of type-setting (TeX), multi-lingual text processing, and natural language analysis, among others. ========================================================================= *Whitelam, Keith W. Dept. of Religious Studies, University of Stirling, Stirling FK9 4LA Scotland; Tel. 0786 3171 ext. 2491 I have been lecturer in Religious Studies at Stirling since 1978 with prime responsibility for Hebrew Bible/Old Testament. My research interests are mainly aimed at exploring new approaches to the study of early Israelite/ Palestinian history in an interdisciplinary context, i.e. drawing upon social history, anthropology, archaeology, historical demography, etc. I have been constructing a database of Palestinian archaeological sites, using software written by the Computing Science department, in order to analyse settlement patterns, site hierarchies, demography, etc. The department of Environmental Science has recently purchased Laser Scan an offered me access to the facilities. This will enable me to display settlement patterns, sites, etc in map form for analysis and comparison. I am particularly interested in corresponding/discussing with others working on similar problems, particularly in Near Eastern archaeology. I have also been involved in exploring the possibilities of setting up campus-wide text processing laser printing facilities. It looks as though we shall be able to offer a LaTeX service in the New Year. We are also planning to offer a WYSIWYG service, such as Ventura on IBM or a combination with Macs for the production of academic papers. Again I have a particular interest in the use of foreign fonts, e.g. Hebrew, Akkadian, Ugaritic, Greek, etc. My teaching and research on the Hebrew Bible leads to a concern with developing computer-aided text analysis, although I have had little time to explore this area. We have OCP available on our mainframe VAX but my use of this has been very limited. I see this as an important area of future development in teaching and research along with Hebrew teaching. ========================================================================= *Wilson, Noel Head of Academic Services, University of Ulster, Shore Road Newtownabbey, Co. Antrim, N. Ireland BT37 0QB; (0232)365131 Ext. 2449 My post has overall responsibility for the central academic computing service, offered by the Computer Centre, to the University academic community. Within this brief, my Section is responsible for the acquisition/development and documentation of CAL and proprietary software. We currently provide a program library in support of courses and research which contains approx. 400 programs; of these approx. 80 are in-house developments, 50 proprietary systems and the remainder obtained from a variety of sources incl. program libraries (eg CONDUIT - Univ. of Iowa). We have only very recently addressed computing within the Faculty of Humanities; academic staff in the Faculty have used computers in a research capacity and are now turning towards the various u'grad. courses. Presently we hold a grant of 79,000 pounds from the United Kingdom Computer Board for Universities and Research Councils, for the development of CAL software in support of Linguistics and Lexicostatistics. Within this project we are attempting to develop courseware to support grammar teaching in French, German, Spanish and Irish (details of existing materials appropriate to u'grad. teaching would be most welcome!). We also are investigating the creation of software to support an analysis of text (comparative studies) - in this area we are looking at frequency counts assoc. with words/expressions/words within registers etc. - again help would be appreciated. I am happy to provide further details on any of the above points and wish to keep informed of useful Humanities-related CAL work elsewhere. We currently use the Acorn BBC micro. but are also moving in the direction of PC clones. ========================================================================= *Wood, Max Computing Officer, 403 Maxwell Building, The University of Salford The Crescent, Salford, G.M.C. ENGLAND; 061-736-5843 Extension 7399 We are involved in a project to introduce the use of computing in teaching here in the Business and Management Department of Salford University and I am keen to extend links to other Business schools both here in the U.K. and indeed in the U.S.A. Obviously therefore I would like to join your forum so as to possibly exchange ideas news etc. My background is essentially in computing and I mainly supervise the computing resources available to our Department, and have formulated much of the teaching systems we currently use. ========================================================================= *Wujastyk, Dominik I am a Sanskritist with some knowledge of computing. Once upon a time (1977-78) I learned Snobol4 from Susan Hockey at Oxford, where I did undergraduate and later doctoral Sanskrit. More recently, I have been using TeX on my PC AT (actually a Compaq III), and in the middle of this summer I published a book _Studies on Indian Medical History_, which was done in TeX and printed out on an HP LJ II, and sent to the publisher as camera ready. It all went very well. I have received the MS DOS Icon implementation from Griswold at Arizona, but have not spent time on it. I am trying to teach myself at the moment, just to learn enough to knock out ocassional routines to convert files from wordprocessor formats to TeX, and that sort of thing. (Probably reinventing the wheel.) At the present time I am editing a Sanskrit text on medieval alchemy, and doing all the formatting of the edition in LaTeX. Before I ever started Sanskrit, I did a degree in Physics at Imperial College in London, but that is so long ago that I don't like to think about it! ========================================================================= *Young, Charles M. Dept. of Philosophy, The Claremont Graduate School I am a member of the American Philosophical Association's committee on Computer Use in Philosophy. One of my pet projects is to find some way of making the Thesaurus Linguae Graecae database (all of classical Greek through the 7th century C.E.) more readily available to working scholars. ========================================================================= *END* ========================================================================= ========================================================================= Date: 23 December 1987, 22:02:58 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Batch 2 of limbo'd messages (142 lines) ========================================================================= Date: 16 December 1987, 15:24:57 EST From: MCCARTY at UTOREPAS To: HUMANIST at UTORONTO Subject: National text archive (45 lines) From C. Faulhaber (U.C. Berkeley, ked@coral.berkeley.edu) via Tim Maher 1) Text Archives. What is needed is some sort of alliance between the computing types and the professional librarians. It seems to me that there is a much better chance of getting a national text archive if it can be integrated into an ongoing concern. I list three candidates, in decreasing order of feasibility: a) RLG: Through their PRIMA project they are actively interested in providing access to new information resources. b) The organization at the U. of Michigan which already maintains data bases for use in the social sciences. c) OCLC: They have relatively less experience than RLG in providing services for research institutions but are aggressively expanding their range. 2) Citation dictionaries: John Nitti (Medieval Spanish Seminary, 1120 Van Hise Hall, U. of Wisconsin, Madison 53720) has been working on just such a dictionary (Dictionary of the Old Spanish Language) since ca. 1970, although the original plan was to draw the citations from texts transcribed specifically for that purpose and publish in standard format on OED lines. With optical disk technology, the possibility now exists to combine DOSL and athe texts themselves. In fact, we are contemplating the possibility of combining these 2 elements with my own Bibliography of Old Spanish Texts serving as a data base front end in order to search through texts on the basis of, e.g., date, author, subject. Prof. Charles Faulhaber Dept. of Spanish and Portuguese Univ. of California, Berkeley. ked@coral.berkeley.edu ========================================================================= Date: 17 December 1987, 15:53:31 EST From: MCCARTY at UTOREPAS To: HUMANIST at UTORONTO Subject: Info (30 lines) From Mark Olsen A student here is doing a project on the discourse of John Woolman and is looking for computer readable versions of texts by other 18th century American Quakers for comparisons. I would appreciate any info concerning the availability of these texts before scanning them in. A second, stranger request has come through. I have a faculty member who is studying a 19th century manuscript. Parts of it were crossed out and she is wondering if there is the possibility of using computer enhancement of the images to improve readability. She has tried blowing-up the images, but has not gotten much. Any ideas? I must that I know nothing about image processing except what I read about concerning the space shots. Maybe I should try JPL (snicker). Thanks in advance, Mark Olsen I don't know how many lines of text this has, but it doesn't conform to any known mark-up standard. ========================================================================= Date: 17 December 1987, 19:51:24 EST From: MCCARTY at UTOREPAS To: HUMANIST at UTORONTO Subject: Christmas gift for HUMANISTs (50 lines) From Sebastian Rahtz The following Christmas gift for HUMANISTs is prompted by a description Lou Burnard sent me of the Vassar 'text encoding standards' meeting, and by the subsequent HUMANIST discussion (no I dont have permission to 'publish' this) Incidentally, a recent contribution to HUMANIST implied that text-encoding standards were a central issue to all HUMANISTs. May I stand up for the archaeologists, musicians, art-historians, linguists and philosophers amongst us to say that there is more to humanities computing than text! equality for all. Sebastian Rahtz (spqr@uk.ac.soton.cm) A cold coming we had of it, just the worst time of the year for a journey, and such a long journey: the ways deep and the weather sharp, a hard time we had of it. at the end we preferred to travel all night, sleeping in snatches, with the voices singing in our ears, saying that this was all folly. but there was no information, and so we continued and arrived at evening, not a moment too soon finding the place; it was (you may say) satisfactory. all this was a long time ago, I remember, and I would do it again, but set down this set down this: were we led all that way for birth or death? there was a birth, certainly, we had evidence and no doubt. I had seen birth and death, but had thought they were different; this birth was hard and bitter agony for us, like death, our death. we returned to our places, these kingdoms, but no longer at ease here, in the old dispensation, with an alien people clutching their gods. I should be glad of another death. ========================================================================= Date: 18 December 1987, 14:08:49 EST From: MCCARTY at UTOREPAS To: HUMANIST at UTORONTO Subject: Offline 16 (20 lines) From Bob Kraft My bimonthly OFFLINE column for Religious Studies News has just been sent off to the printer for the January or February issue of RSNews. It consists of a report on the computer aspects of the recent annual meetings of the Society of Biblical Literature, American Academy of Religion, and American Schools for Oriental Research, held jointly in Boston on 5-8 December 1987. If any HUMANISTS would like a pre-publication electronic copy of OFFLINE 16, I am willing to send it upon request. Happy Holidays! Bob Kraft ========================================================================= ========================================================================= Date: 23 December 1987, 22:31:33 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Hypermedia bibliography Anyone wishing a copy of a recent bibliography of items on hypermedia, compiled at IRIS (Brown Univ.), should sent a note to me requesting it. The bibliography, which recently appeared on IRLIST, comes in three parts, each approximately 500 lines long. W.M. ========================================================================= Date: 29 December 1987, 13:53:52 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: Library of Congress: markup and MRTs? From James H. Coombs In a note posted on 8 Dec 1987: Richard Giordano states, Traditionally, ALA [American Library Association] and LC [Library of Congress] have both taken the lead in the scholarly world in providing machine-readable information. The technical problems that LC has addressed have been fundamental to data processing. Could you provide more information, e.g., citations of articles? I know that LC is considering SGML, but they seem to be much more of a follower than a leader in this effort at least. I also believe that the LC is more interested in microfilm than in electronic media for the preservation of materials printed on paper that is not acid free. I was somewhat distressed when I first read this (wish I knew where I read it too), but apparently microfilm lasts longer than computer tape and requires less maintenance. (Still might be the wrong decision.) So, I've missed out on what the LC is doing for Machine Readable Texts [MRTs] and the like. Any information appreciated. Thanks. --Jim P.S. Well, the same for ALA and RLG [Research Libraries Group]. What are they doing? Dr. James H. Coombs Software Engineer, Research Institute for Research in Information and Scholarship (IRIS) Brown University jazbo@brownvm.bitnet ========================================================================= Date: 29 December 1987, 13:56:42 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: music-encoding standards? (50 lines) From James H. Coombs I'm glad to see Humanist up again! In a posting of 17 December, Sebastian Rahtz says: Incidentally, a recent contribution to HUMANIST implied that text-encoding standards were a central issue to all HUMANISTs. May I stand up for the archaeologists, musicians, art-historians, linguists and philosophers amongst us to say that there is more to humanities computing than text! equality for all. Just so, Sebastian! ANSI X3V1.8M/87-17---Journal of Technical Developments discusses the application of SGML to music (Work Group, Music Processing Standards). According to an article in TAG (The SGML Newsletter), the goal is to describe music not only for documentation and hard copy preparation but also to be included in technical documentation and played in a real time rendition simultaneously while viewing a particular part of the document. Dr. Goldfarb referred to the inclusion of music in a technical document, and therefore to the concept of time, as "technical documentation in four dimensions." (vol. 1, no. 3, page 10) --Jim Dr. James H. Coombs Software Engineer, Research Institute for Research in Information and Scholarship (IRIS) Brown University jazbo@brownvm.bitnet ========================================================================= Date: 30 December 1987, 00:46:40 EST Reply-To: MCCARTY@UTOREPAS Sender: HUMANIST Discussion From: MCCARTY@UTOREPAS Subject: The "interesting problem" of 30 November From Prof. Yaacov Choueka I just saw your email msg about the problem of identifying the language of given titles. A few months ago, when on a sabbatical at Bellcore, I was engaged in a small research project with David Copp about finding "minimal sets of words which would cover every "innocent" line (=80 char.) of English", i.e. s.t. every line of standard English (not specially constructed to give a counter-example...) would contain at least one word from the list. We had some quite interesting developments, and we experimented with several lists of about 60-100 words, and tens of thousands of lines from the New York Times, finding that a list of < 100 words can give a criterion for English which would be reliable with a very high percentage. We were looking for some applications, and yours is an excellent one! The research has not yet been written or presented at a conference (time, time!), but either I or David can give you more details if you are interested. In fact if you already have hundreds or thousands of titles for which you already know the language, it might be fun to run our programs on this set. Best regards. (David Copp can be reached at copp@bellcore.flash) Yaacov Choueka.