3.37 concording, cont. (109)

Tue, 16 May 89 21:04:22 EDT

Humanist Discussion Group, Vol. 3, No. 37. Tuesday, 16 May 1989.

(1) Date: Monday, 15 May 1989 1821-EST (18 lines)
Subject: Concording...

(2) Date: Tue, 16 May 89 16:19:30 EDT (10 lines)
From: db <BOYARIN@TAUNIVM.bitnet>
Subject: Re: 3.29 concording, cont. (29)

(3) Date: Tue, 16 May 89 12:50 EDT (56 lines)
From: J. K. McDonald <MCDOJK@QUCDN>
Subject: Your HUMANIST item 3.29 concording

(1) --------------------------------------------------------------------
Date: Monday, 15 May 1989 1821-EST
Subject: Concording...

--Just to add a thought to W. McCarty's addition to J. Abercrombie's
message on new forms of the concording process in a computing

McCarty points to the need for a morphological component in the
concording function, but notes, "even then, such a tool is exceedingly
clumsy when you want to find ideas, themes and structures, but have only
got or can specify words." This made me think of how nice it would be
to have a process for the parsing of the semantic morphology of texts
(fuzzy memories of Hjelmslev's Glossematics)--or (if we're now making
wish-lists), a process for registering non-linear signifying aspects of
text-material, a la Saussure's anagrammes.

T. Harpold
(2) --------------------------------------------------------------18----
Date: Tue, 16 May 89 16:19:30 EDT
From: db <BOYARIN@TAUNIVM.bitnet>
Subject: Re: 3.29 concording, cont. (29)

I think that the model of the text retrieval system at Bar-Ilan is
very relevant. It has both the kind of morphological analysis that
you speak of and Boolean searches plus within searches etc. The
result is a key-word in context search. Everything else sounds like
hypertext to me, i.e. one person creates the links and then others can
follow them. Boyarin.
(3) --------------------------------------------------------------60----
Date: Tue, 16 May 89 12:50 EDT
From: J. K. McDonald <MCDOJK@QUCDN>
Subject: Your HUMANIST item 3.29 concording

Since 1980 at Queen's University we have been developing our VINCI, a
CALL system for Italian which gets HAL to generate randomly certain
syntactic structures, pre- and post-edited, but quick, flexible, etc.
The machine finds appropriate morphological forms from the lemma
provided (dictionary form) and avoids semantic bloopers because of
the various metonymic fields we have built into the datafiles. Most
recently I have sketched out a system of 'trailing filters' to capture
about 70 semantic relationships (including your antonyms, hyperonyms,
diminutives, register variants, archaisms, etc.) and gave a paper on
the scheme last month to the American Association for Italian Studies
older man; literary scholarship has been expected of me, not this kind
of thing, whatever it is. So I have retired three years early, to get
on with it (Queen's has good pensions). I deplore the litnik-langnik
dichotomy (at Berkeley we had philology--Yakov Malkiel--along with our
Romance literatures for the Ph.D.). We are spreading VINCI from the
old mainframe Italian (with APL) to PC (with C) and my young colleague
Greg Lessard of Queen's French will be giving a paper on our system in
French at your DYNAMIC TEXT. (I'm working on Spanish and Italian.)

I have always been confounded by talk of cleverly marking IN ADVANCE
the lemmata of certain themes in concordance systems: how does one ever
know in advance what the material will encourage one to winkle out?
Surely we want a system that can take an OCR text and check everything
in it for everything, as often as we ask it to do it. (E.g., the use
in English of the conditional tense instead of the past subjunctive in
contrary-to-fact conditions might emerge during our readings as a
significant stylistic feature of a given author's 'rifacimento' of an
earlier work.) Is it because English is so poor in morphological
tags that we haven't taken the effort to get off our lemmata? Is it
because we respect absolute logic that we overlook language-specific
semantic relationships? Isn't the author's style an idiolect of a
superior sort that we want to capture?

If the VINCI code (or any other CALL system) can randomly fetch out
"Il ragazzo ti ha dato il biglietto?/(Si', me lo ha dato.)" or any
extension of plausible subjects, verbs, direct object, etc., and check
students' answers one-on-one, why cannot the code be made to recognize
'ragazzaccio', 'glieli', 'daranno', 'le mele' for what they are and
report back what it sees in an OCR text? Am I asking the computer
to do something it cannot do, or something the computational linguists
have not heretofore been asked to make it do, and therefore say it
cannot do? (My perennial suspicion is that people like to be asked to
do what they know they can do, and tend to go glassy-eyed or even
hostile when asked to do something for which they do not see the
purpose.) Can you cast around the much larger Toronto pond and find
out whether I am asking the impossible? Can VINCI not be of service
in the field of stylistic analysis?

Jim McDonald MCDOJK@QUCDN.BITNET (613) 372-2071
"An Darach" RR1 Hartington, Ontario K0H 1W0