11.0062 European language resources

Humanist Discussion Group (humanist@kcl.ac.uk)
Fri, 23 May 1997 10:59:20 +0100 (BST)

Humanist Discussion Group, Vol. 11, No. 62.
Centre for Computing in the Humanities, King's College London

Date: Thu, 22 May 1997 17:28:36 -0400 (EDT)
From: Khalid Choukri <elra@calvanet.calvacom.fr>
Subject: ELRA New Language Resources

[ We apologise for the duplicate posting of this announcement ]



The new release of ELRA catalogue (vol2N1) has grown up and currently
consists of:

1) Spoken resources: 37 databases in several languages (recordings from
microphone, telephone, continuous speech, isolated words, phonetic
dictionaries, etc.).

2) Written resources:
* 14 monolingual and multilingual corpora
* 28 monolingual lexica
* Around 60 multilingual lexica
* A linguistic software platform and grammars development platform

3) Terminological resources: over 360 databases with a wide range of domains
and several languages (Catalan, Danish, English, French, German, Italian,
Latin, Polish, Portuguese, Spanish, Turkish).

Since our last news on this electronic list, new resources have been
negotiated by ELRA and are now available. These are:


ELRA-S0035 Phonolex (BAS/DFKI):

PHONOLEX consists of a simple list of word forms (666,237 inflected words)
with a set of features e.g. orthography (German 'Umlauts' in LaTeX format,
capital nouns, old German spelling rules), linguistic information (nouns,
verbs, etc.), pronunciation and a list of empirical pronunciations.

Language: German
Format: ASCII
Mark-up: extended SAM-PA (PhonDat-Verbmobil)


ELRA-S0036 Speri-Data AG Basic dictionaries (colloquial=

These dictionaries contain a daily-life vocabulary. They include phonetic
transcriptions with related phoneme lists. The following languages are

Language Entries
Danish 8,000
Dutch 12,000
English (UK) 8,000
Finnish 10,000
French 19,000
German 13,000
Italian 23,000
Norwegian 8,000
Portuguese 9,000
Spanish 13,000
Swedish 10,000


ELRA-S0037 Speri-Data AG Technical dictionaries:

All dictionaries contain phonetic transcriptions, with related phoneme
lists. The following dictionaries are available (the label basic dictionary
refers to the above ELRA-S0036):

Domain Entries
Banking French 10,200
Banking German 10,200
Banking Italian 10,200
Banking Spanish 10,200
Radiology German 42,000 (including basic dictionary)
Radiology English 16,000
Medical German 130,000 (including basic dictionary)
Jurisprudence German 31,000
Jurisprudence German 55,000 (including basic dictionary)
Insurance German & English 37,000

A peculiarity of medical dictionaries in German speaking countries has to be
taken into consideration: doctors in Germany, Austria and Switzerland may
not use the original technical terms in Latin but the Latin word in a
spelled manner or a German technical term (see examples below). Medical
dictionaries therefore have to contain three different terms.

Technical term Technical term Technical term
in Latin in German spelling in German

Appendicitis Appendizitis Blinddarmentz=FCndung
Eccema Eczema Ekzem
Diarrhoe Diarrh=F6 or Diarrh=F6e Durchfall, Durchfluss
Carbunculus Karbunkel Geschw=FCr


ELRA-S0038 Siemens VoiceMail (American English)

VoiceMail consists of 17,5 hours of read acoustic speech divided into 9,5
hours of transliterated speech and 8 hours of non-transliterated speech
recorded over the digital telephone network (ISDN) with 921 speakers
originated from the USA. It contains orthographic transliteration for about
25,000 utterrances (of 34,912 utterances in total).

Language: American English
Standard in use: headerless, one separate transliteration file comprising
all utterances of all speakers
Sampling rate: 8 kHz
Speakers: 377 males and 544 females
Size: 17,5 hours
Medium: 2 CD-ROM


ELRA-L0021 Dictionary of French verbs - CORA:

This dictionary contains 25,610 verbs with usage domains, level of language
(familiar, popular, literary, Quebec and Swiss terms, etc.), conjugation,
auxiliary, verbal adjectives in -able, -ant or -, encoded syntactical
constructions (subject, direct & indirect object, adverb), sample phrases,
synonyms, operators enabling semantic-syntactic classification, encoding of
derived forms in -age, -ment, -tion, -oir, -ure, deverbal nouns, base words
from which verbs can be derived, a scale of usage ranging from 1 to 6, like
those used by commercial dictionaries (basic vocabulary, extended,
specialised, etc.).
Codes enable automatic production of conjugation forms, derived nouns and
adjectives and, if necessary, the production of potential forms.


ELRA-L0022 Dictionary of words - CORA:

This dictionary is composed of 126,844 words, with usage domains,
grammatical category, gender, number, uncountable, collective, adjectival,
nominal, verbal, adverbial derived forms according to the type of words.


ELRA-L0023 Dictionary of affixes - CORA:

4,286 suffixes and prefixes, plus information on their verbal, nominal or
adjectival bases or on the verbal basis of greco-latin items. This
dictionary does not include the suffixes contained in the dictionary of
French verbs (ELRA-L0021) and words (ELRA-L0022) such as -age, -ment, -if,=


ELRA-L0024 Dictionary of verb phrases - CORA:

Dictionary of 3,480 entries based on the model of the dictionary of French
verbs (ELRA-L0021).


ELRA-L0025 Dictionary of invariable forms and phrases - CORA:

Dictionary of 4,783 entries based on the model of the dictionary of words


ELRA-L0026 Dictionary of exclamatory stereotyped phrases - CORA:

Dictionary of 1,901 entries based on the model of the dictionary of
invariable forms and phrases (ELRA-L0025).


ELRA-L0027 Dictionary of French local authorities - CORA:

38,965 entries in lower cases with accents, controlled on the guide
Michelin, without localities; A link can be made to the dictionary of words
(ELRA-L0022) which contains inhabitants' names and their correspondence with
town names.


ELRA-L0028 Dictionary of noun phrases and plural-only words -=

2,138 compound names and 1,397 entries of plural-only words.

For further information, please contact :

87, Avenue d'Italie
FR-75013 PARIS
Tel : +33 01 45 86 53 00
Fax : +33 01 45 86 44 88
E-mail : info-elra@calva.net
WWW: http://www.icp.grenet.fr/ELRA/home.html

Tel. +33 1 45 86 53 00
Fax. +33 1 45 86 44 88
87, Avenue D'ITALIE, 75013 PARIS
Email: elra@calvanet.calvacom.fr
Web: http://www.icp.grenet.fr/ELRA/home.html