12.0371 new on WWW: text-analysis

Humanist Discussion Group (humanist@kcl.ac.uk)
Tue, 26 Jan 1999 22:32:06 +0000 (BST)

Humanist Discussion Group, Vol. 12, No. 371.
Centre for Computing in the Humanities, King's College London
<http://www.princeton.edu/~mccarty/humanist/>
<http://www.kcl.ac.uk/humanities/cch/humanist/>

[1] From: John Dawson <jld1@cam.ac.uk> (71)
Subject: Elta Working Paper JLD-1

[2] From: Willard McCarty <willard.mccarty@kcl.ac.uk> (12)
Subject: method in text-analysis

--[1]------------------------------------------------------------------
Date: Tue, 26 Jan 1999 22:26:37 +0000
From: John Dawson <jld1@cam.ac.uk>
Subject: Elta Working Paper JLD-1

ELTA Software Initiative

[The Elta Software Initiative is a collaborative effort to encourage and
support the development of software tools for the analysis, retrieval
and manipulation of electronic texts. Our focus (at least initially) is
on tools to support the needs of the humanities computing community, but
we hope our results are useful for anyone interested in computer
processing of texts marked up with SGML and XML.
See web sites http://www.kcl.ac.uk/humanities/cch/elta (Europe)
and http://www.cse.fau.edu/~tom/elta (USA)]
------------------------------------------------------------------------------
Working Paper JLD-1: Introduction
------------------------------------------------------------------------------
This paper is also available at
http://www.cus.cam.ac.uk/~jld1/elta/jld-1.html

As there is no defined structure for discussing the requirements for new
text-handling software packages, I thought I would begin by working
through the OCP (Oxford Concordance Program) commands. Many of those
commands are, of course, specific to producing and printing (on a
line-printer) various types and layouts of concordance and index, but
they serve to bring up important points. In a way, this is like
returning to first principles: certainly if the OCP project were
starting now, we would do things differently, but in that case the
important question is "Why?"

I was consulted right from the beginning of the original OCP project in
1977; much of my advice was heeded, some was not. Several aspects of OCP
which make it difficult (in some cases impossible) to use for particular
purposes are clearly the result of time and programming language
constraints (OCP was originally coded, for very good reasons, in
Fortran), rather than of conscious decisions. These I shall address in
detail here.

For those unfamiliar with the general structure of OCP commands, an
outline follows, with a note of the Working Paper in which I shall
discuss each section.

Paper JLD-2
Input
Comments
References
Select
Text

Paper JLD-3
Words
Alphabet
Compress
Diacritics
Ignore
Padding
Punctuation

Paper JLD-4
Action
Contexts sorted by
Do
Headwords
Keep frequency
Keys sorted by
Include
Maximum context
Pick
Prefixes
References
Sample
Suffixes

Paper JLD-5
Format
Context
Headwords
Layout
Print
References
Titles
-------------------------------------
Last modified 26 January 1999 and posted to Humanist and Elta
by John Dawson, University of Cambridge JLD1@cam.ac.uk

--[2]------------------------------------------------------------------
Date: Tue, 26 Jan 1999 22:29:24 +0000
From: Willard McCarty <willard.mccarty@kcl.ac.uk>
Subject: method in text-analysis

I have prepared a brief guide to methodology in elementary text-analysis for
1st-year undergraduates here at King's College London. It is currently in
draft form, at <http://ilex.cc.kcl.ac.uk/year1/method.html>. Comments would
be appreciated.

Yours,
WM

----------
Dr. Willard McCarty
Senior Lecturer, Centre for Computing in the Humanities
King's College London / Strand / London WC2R 2LS
+44 (0)171 873 2784 voice; 873 5081 fax
http://www.kcl.ac.uk/humanities/cch/wlm/
maui gratia

-------------------------------------------------------------------------
Humanist Discussion Group
Information at <http://www.kcl.ac.uk/humanities/cch/humanist/>
<http://www.princeton.edu/~mccarty/humanist/>
=========================================================================