5.0084 PC-KIMMO (1/208)

Elaine Brennan & Allen Renear (EDITORS@BROWNVM.BITNET)
Tue, 21 May 91 15:32:15 EDT

Humanist Discussion Group, Vol. 5, No. 0084. Tuesday, 21 May 1991.


Date: Mon, 20 May 91 8:28:56 CDT (204 lines)
From: txsil!evan@txsil@utafll.uta.edu (Evan Antworth)
Subject: PC-KIMMO News


PC-KIMMO News

May 20, 1991

This announcement describes recent developments related to PC-KIMMO (an
implementation for personal computers of Kimmo Koskenniemi's two-level model
of word production and recognition).

(1) PC-KIMMO version 1.0.5 update

(2) KGEN - a rule compiler (table generator) for PC-KIMMO

(3) KTEXT - a text-processing application using the PC-KIMMO parser

(4) recent articles related to PC-KIMMO

The software described below is made freely available to the academic
community for non-commercial use and redistribution. We invite your feedback
on these programs. Please note that the software is packaged in compressed
archives: Zip files for MS-DOS and Stuffit files for Macintosh. In addition,
if you obtain the files by e-mail, they will arrive in encoded form:
uu-encoding for MS-DOS and Binhex format for Macintosh. Utility programs for
handling archives and encoded files are available from computer bulletin
boards or from your university computing center. (Hint for MS-DOS users: when
you unzip a file, use the -d option to preserve the subdirectories.) Finally,
it is possible that the files may not yet be available in some of the places
listed below. Just wait a few days and try again.


(1) PC-KIMMO 1.0.5 update

PC-KIMMO version 1.0.5 has been available since the end of February. It fixes
a problem with loading very large lexicons (more than 100 sublexicons). Thanks
to Elizabeth Hinkelman and her colleagues for finding this bug. This version
also fixes a couple things that caused crashes on the Macintosh. There are no
functional changes in version 1.0.5. If you want to upgrade to version 1.0.5,
you can obtain it as follows:

1. Obtain it via anonymous FTP from the following sources. (I am advised
that it is best to use the symbolic names rather than the numeric addresses.
Also, the directory structure is subject to change.)

MS-DOS version:
msdos.archive.umich.edu [141.211.165.34]
msdos/linguistics/pckim105.zip

Macintosh version:
mac.archive.umich.edu [141.211.165.34]
mac/etc/linguistics/pckim105.sit

2. Request it from us via e-mail. Be *sure* to specify which version you want
(DOS, Mac, UNIX).

3. Send a diskette and a self-addressed, stamped diskette mailer to the
address below. Be *sure* to specify which version you want (DOS, Mac, UNIX)
and the disk format.


(2) KGEN

KGEN, a rule compiler for PC-KIMMO, is now available for beta testing. KGEN
was written by Nathan Miles of Ohio State University. All rights and
responsibilities pertaining to the program presently belong to Nathan Miles
(not to the Summer Institute of Linguistics). He can be reached by e-mail at
miles@cis.ohio-state.edu. Nathan has done a great job at developing this
program and he deserves our thanks.

KGEN takes a two-level rule like this:

y:i => @:C___+:0

and translates it into a finite state table like this:

@ y + @
C i 0 @
1: 2 0 1 1
2: 2 3 2 1
3. 0 0 1 0

KGEN accepts as input a file of two-level rules and produces as output a file
of state tables that is identical in format to PC-KIMMO's rules file. Anything
that KGEN does not correctly handle can be easily fixed by hand in its output
file. Everyone who uses PC-KIMMO (or who doesn't use it because they don't
want to write tables by hand) is welcome to try out KGEN. But what we really
need are some beta testers who can compare KGEN's output to tables they have
written by hand. Let us know if you are willing to beta test KGEN for us.

Presently KGEN runs only under MS-DOS and UNIX, but we hope to get it compiled
for the Macintosh soon (any Think C experts out there?). You can obtain KGEN
as follows.

1. The MS-DOS version of KGEN is available via anonymous FTP from SIMTEL20:

wsmr-simtel20.army.mil [192.88.110.20]
pd1:<msdos.linguistics>kgen02.zip

SIMTEL20 can also be accessed using LISTSERV commands from BITNET via
LISTSERV@NDSUVM1, LISTSERV@RPIECS and in Europe from EARN TRICKLE servers
(for example, FRMOP11 in France). You can also obtain files from SIMTEL20 by
e-mail. Send this line as the only message to listserv@vm1.nodak.edu (1 = one)
(this may not work outside the U.S.):

/PDGET MAIL PD1:<MSDOS.LINGUISTICS>KGEN02.ZIP UUENCODE

The MS-DOS version of KGEN is also available by anonymous FTP from:

msdos.archive.umich.edu [141.211.165.34] (symbolic name recommended)
msdos/linguistics/kgen02.zip

2. The UNIX version (consisting of the source files which you must compile
on your own machine) is available by anonymous FTP from the machine TUT:

cis.ohio-state.edu [128.146.8.60]
pub/kgen/kgen03.tar.Z

3. Request KGEN from us via e-mail. Be *sure* to specify which version you
want (DOS, UNIX).

4. If all else fails, send a diskette and a self-addressed, stamped diskette
mailer to the address below. Be *sure* to specify which version you want (DOS,
UNIX) and the disk format.


(3) KTEXT

KTEXT is a new text-processing application that uses the PC-KIMMO parser. It
accepts as input a text in orthographic form, tokenizes it into words, strips
off and saves punctuation, capitalization, white space, and formatting codes,
parses each word, and outputs the result to a quasi-database file with a
record for each word. Its output data structures are suitable for further
processing by other programs, such as a text interlinearizer, a syntactic
parser, or a machine translation system.

KTEXT is a beta test release that is distributed and supported by the Summer
Institute of Linguistics. It is available for MS-DOS, Macintosh, and UNIX. You
can obtain it as follows.

1. The MS-DOS version of KTEXT is available from SIMTEL20 as (see above on
how to access SIMTEL20 by FTP or e-mail):

pd1:<msdos.linguistics>ktext093.zip

It is also available via anonymous FTP from:

msdos.archive.umich.edu [141.211.165.34] (symbolic name recommended)
msdos/linguistics/kgen02.zip

2. The Macintosh version of KTEXT is available via anonymous FTP from:

mac.archive.umich.edu [141.211.165.34] (symbolic name recommended)
mac/etc/linguistics/ktext094.sit

It is also available via anonymous FTP from:

sumex-aim.stanford.edu [36.44.0.6]
/info-mac/app/ktext094.hqx

You can also obtain files from SUMEX-AIM by e-mail. Send this line as the only
message to listserv@ricevm1.rice.edu (1 = one) (this may not work outside the
U.S.):

$MACARCH GET /info-mac/app/ktext094.hqx

3. Request KTEXT from us via e-mail. Be *sure* to specify which version you
want (DOS, UNIX).

4. If all else fails, send a diskette and a self-addressed, stamped diskette
mailer to the address below. Be *sure* to specify which version you want (DOS,
UNIX) and the disk format.

5. To obtain the UNIX sources, please contact us at the address below.


(4) Recent articles related to PC-KIMMO:

Antworth, Evan L. 1991. Introduction to two-level phonology. Notes on
Linguistics, 53:4P18. Dallas, TX: Summer Institute of Linguistics.

Antworth, Evan L. 1991. Glossing text with the PC-KIMMO morphological parser.
(Manuscript submitted for publication)

Simons, Gary F. 1991. A two-level processor for morphological analysis. Notes
on Linguistics, 53:19P27. Dallas, TX: Summer Institute of Linguistics.

Vanni, Michelle. 1990. Abstract of "PC-KIMMO: a two-level processor for
morphological analysis." Georgetown Journal of Languages & Linguistics
1.4:498-500.


Special requests for any of the software or articles described above and/or
requests for more information should be sent to:

Evan Antworth
Academic Computing Department
Summer Institute of Linguistics
7500 W. Camp Wisdom Road
Dallas, TX 75236
U.S.A.

Internet: evan@txsil.sil.org <-------- new address as of May 1991
UUCP: ...!uunet!convex!txsil!evan
phone: 214/709-2418
fax: 214/709-3387