4.1288 TACT -- Text Analysis (2/194)

Elaine Brennan & Allen Renear (EDITORS@BROWNVM.BITNET)
Sat, 27 Apr 91 13:49:45 EDT

Humanist Discussion Group, Vol. 4, No. 1288. Saturday, 27 Apr 1991.

(1) Date: Fri, 26 Apr 1991 10:26:18 -0400 (154 lines)
From: mccarty@epas.utoronto.ca (Willard McCarty)
Subject: What is TACT?

(2) Date: Fri, 26 Apr 1991 10:27:27 -0400 (40 lines)
From: mccarty@epas.utoronto.ca (Willard McCarty)
Subject: TACT-L

(1) --------------------------------------------------------------------
Date: Fri, 26 Apr 1991 10:26:18 -0400
From: mccarty@epas.utoronto.ca (Willard McCarty)
Subject: What is TACT?
an MS-DOS shareware program
for interactive textual analysis
(ver. 1.2, released June 1990)

TACT is an interactive full-text retrieval system for MS-DOS with
a number of analytical tools. Like others of its kind, TACT
retrieves segments of text according to specified word forms. In
addition, it can find words or character-strings that match
criteria the user specifies. TACT generates simple graphs to show
the distribution of forms throughout an entire text, or within
various structural divisions determined by the user. TACT also
allows retrieval by metatextual `categories'.

Use of TACT begins with a stable ASCII text. Although this text
need not be marked-up for TACT, in most cases intelligent markup
is crucial to effective analysis.

For markup the researcher uses a wordprocessor to insert simple
codes according to the properties that he or she wishes to query.
In a play, for example, acts, scenes, and speeches are obvious
things to mark; in a novel, chapters; in a narrative poem, books
and stanzas; in a lexicon, subdivisions of the entry; and so
forth. The researcher may also, however, want to mark specific
entities, such as proper names (of people and places), names of
plants and animals, or episodes. In addition, through markup a
number of hypothetical structures can be simultaneously indicated
alongside those denoted by the author or editor, e.g. an
alternative division of a poem into thematic units.

Once the text is marked up, a TACT program known as MAKBAS
converts it into a database for efficient retrieval. MAKBAS
allows the user to define the collation sequence of the alphabet,
special characters, and the characteristics of the tags used for markup.

Working with the database, TACT can present a complete list of
words from which a subset for retrieval may be selected, one word
at a time. Through what is called `regular expression'
capability, the user may also specify a selection rule according
to a pattern of characters, including "wildcards" (for example,
all words beginning with the letter "a" and ending with "ed" or
"ing"). Rules may also contain operators to indicate juxtaposed
words; specific words within a user-definable span, or all words
within such a span of an expression; or all words resembling the
chosen expression to varying degrees. Such rules may be kept in
one or more ASCII files external to the program, from which
specific rules may be selected; thus, for example, the user can
construct a lexicon of words and expressions.

Once a set of words has been selected by whatever means, it can
be saved within TACT as a "category". Categories can in turn be
combined to form other categories. Thus, for example, all words
and expressions the user regards as indicating "love" can be
saved as the category LOVE, and in addition be combined with a
similar category MAD to produce the category MADLOVE. Category
names can be included within rules as easily as words, so that,
for example, a user could ask to see all passages in which
LOVE-words occur within 2 lines of MAD-words. To take a slightly
different example, a user could ask to be shown all paragraphs in
which the category LOVE and the word "death" occur.

In the creation of a category from a rule, the user can examine all
selected locations in the text and choose which to include or exclude.
This ability to choose by context is often essential. The word "heat",
for examle, might be part of what is meant by "love" in some contexts but
not in others, or only the noun might be relevant, not the verb.

Various displays are available. Text can be displayed as KWIC
(keyword-in-context) segments; as simple distribution graphs,
showing how the occurrence of a set of locations is distributed
through the text, or among various structural divisions; or as an
index showing only a list of locations where the event occurred,
with a 1 line context. A new display, added to version 1.2, can
show all collocates to the selected positions in the text -- with
collocates ordered by the Z-score.

Displays in TACT are linked so that, for example, the user can go directly
from a position in a distribution graph to the text it represents.

TACT is multilingual. In order to display foreign languages, it supports
the extended ASCII character set of the IBM PC, and with tools which
extend the character set displays, its capabilities can be extended to
many other languages, such as Greek and Old English. (Hebrew, Arabic, and
languages such as Chinese are beyond its present design, however.) It
supports multilingual analysis as well by allowing for proper
alphabetization, convenient keyboard entry, and printing on devices that
requirespecial "escape codes" to produce non-ASCII characters -- even if
these sequences are different from those that would be used to
enter the character from the keyboard, or display it on screen.

In addition to MAKBAS and TACT, the TACT system includes a program to
construct databases from very large texts (MERGEBAS) and another to
search a database and find all phrases that occur more than a specified
number of times (COLLGEN).

Developed by: John Bradley and Lidio Presutti
University of Toronto Computing Services (UTCS),
Room 201, 4 Bancroft Avenue,
Toronto, Ontario, M5S 1A1
Canada; fax: (416) 978-7159;

John Bradley
voice: (416) 978-3995; e-mail: bradley@vm.utcs.utoronto.ca

Lidio Presutti
voice: (416) 978-5130; e-mail: lidio@vm.utcs.utoronto.ca

Distributed by:
TACT Distribution
Centre for Computing in the Humanities,
Robarts Library, Room 14297A,
University of Toronto,
Toronto, Ontario
Canada, M5S 1A5

The developers recognize the generous support of the Centre for
Computing in the Humanities, University of Toronto, and IBM
Canada through its former partnership with the university. The
developers are also indebted to John B. Smith's ARRAS program, by
which TACT has in part been inspired.

Hardware: requires standard MS-DOS platform with 640K RAM; fixed
disk; DOS 2.1 or above.

Cost: The CCH charges a distribution fee of $30 CDN for a copy of
the program and a printed, bound copy of the documentation. (GST
should be added for Canadian sales.) TACT is shareware. You are
welcome to distribute copies of TACT, subject to its license,
which basically permits distribution as long as long as it is not
distributed for profit.

Documentation: online help and a preprinted tutorial; Support:
the developers are glad to answer questions about the usage or design
of TACT and to receive suggestions for its improvement. Queries can be
sent directly to the developers or to TACT-L@vm.utcs.utoronto.ca, the
discussion group for users of TACT.

25 April 1991

(2) --------------------------------------------------------------53----
Date: Fri, 26 Apr 1991 10:27:27 -0400
From: mccarty@epas.utoronto.ca (Willard McCarty)
Subject: TACT-L
Announcing TACT-L for users of TACT

The Centre for Computing in the Humanities, University of
Toronto, is pleased to announce the creation of TACT-L, an
electronic discussion group for users of the interactive text-
retrieval program TACT. (A brief description of TACT is being
published elsewhere on Humanist for those unfamiliar with the
software.) The purpose of TACT-L is to allow easier
communication amongst the users and with the development team at
Toronto, and to allow for convenient distribution of common materials.

Technically speaking, TACT-L is (like Humanist or NotaBene) a
ListServ list; its address is TACT-L@vm.utcs.utoronto.ca. It has
initially been set up to run in an `unmoderated' mode. This
means that any mail sent to TACT-L will be automatically
circulated to everyone on the list. Since it is exclusively for
users of the program, we anticipate that the list will circulate
only mail pertaining to the software and its applications.

Like other ListServ lists TACT-L has a file-server where information can
be stored. From time to time members of the development team will post
items on the server, but they also invite submissions, such as sample
rule-files, marked-up texts, and the like.

If you are interested in joining, please reply to the
undersigned. (Note that my address has changed.) Your comments
about the program and about your applications of it will be welcome.

Willard McCarty mccarty@epas.utoronto.ca