4.0137 Concording and Text Analysis in Unix (2/59)

Elaine Brennan & Allen Renear (EDITORS@BROWNVM.BITNET)
Tue, 29 May 90 18:10:09 EDT

Humanist Discussion Group, Vol. 4, No. 0137. Tuesday, 29 May 1990.


(1) Date: Fri, 25 May 90 22:34:52 CDT (31 lines)
From: janus@ux.acs.umn.edu
Subject: UNIX concording programs

(2) Date: Sun, 27 May 90 01:33:37 -0700 (28 lines)
From: edwards@cogsci.berkeley.edu (Jane Edwards)
Subject: Re: 4.0132 Notes and Queries

(1) --------------------------------------------------------------------
Date: Fri, 25 May 90 22:34:52 CDT
From: janus@ux.acs.umn.edu
Subject: UNIX concording programs

In answer to Knut Hofland's request for cheap concording programs on
UNIX, I recommend HUM-- A CONCORDANCE AND TEXT ANALYSIS PROGRAM. It is
available from ftp host uunet.uu.net where you will find three files in
the directory comp.sources.unix/volume10/hum .

This package is a set of C source code programs that will, among other
things, produce kwic, kwal (for poetry), cross- references, frequency
counts of words and characters, produce word length histograms. It also
finds sentences with specified patterns (a grep that gives
humanistically relevant results, not just single lines of texts). There
are a number of other programs that come in the HUM package. The three
compressed files that you can download also contain manual pages for the
programs.

The beauty (or drawback -- depending on your viewpoint) is that these
programs fit into the general UNIX scheme: each program is a module that
you manipulate the way you want. A finished printed concordance will
have to be massaged with a number of sorts, and a formatting program.

HUM was written by William Tuthill at U of California Berkeley, and the
version I have used is 3.7, from the early 80's. It will be reviewed in
CHum in the future.

--Louis Janus
U of Minnesota
Scandinavian Dept

(2) --------------------------------------------------------------35----
Date: Sun, 27 May 90 01:33:37 -0700
From: edwards@cogsci.berkeley.edu (Jane Edwards)
Subject: Re: 4.0132 Notes and Queries (4/43)

Regarding UNIX concordance programs, we use one which was written by
William Tuthill (now at SUN microsystems in Mountain View), called
the HUM package (for computer analysis of texts in the Humanities).

It includes the following programs:

1. freq - generates frequency distribution and total count of types and
tokens, with options which include mapping upper to lower case,
redefinition of punctuation set, listing in order of alphabetical
or numerical halves of the frequency - type pairs.
2. kwal - key word in line concordance
3. kwic - key word in context concordance
4. wheel - rolls through the text a word cluster at a time
5. wdlen - counts the length of the longest line in the text
6. sfind - retrieves a record surrounding a pattern (word or tag)
with options for specifying the record either by explicit delimiters
or (by default) up to a standard end-of-sentence punctuation mark
(period, question, exclamation mark).

His email address: tut@sun.com

-Jane Edwards