18.137 review on phonostatistics and language typology

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty_at_kcl.ac.uk>
Date: Mon, 16 Aug 2004 07:54:40 +0100

               Humanist Discussion Group, Vol. 18, No. 137.
       Centre for Computing in the Humanities, King's College London
                     Submit to: humanist_at_princeton.edu

         Date: Mon, 16 Aug 2004 07:42:33 +0100
         From: "Yuri Tambovtsev" <yutamb_at_mail.cis.ru>
         Subject: a book on phonostatistics and language typology

Dear HumanistList colleagues, may I ask you to be so kind as to send this
review either in your university colleagues, library or your friends who
might be interested in this book? Or some journal for publication if you
know a suitable journal? Looking forward to hearing from you soon to
<mailto:yutamb_at_hotmail.com>yutamb_at_hotmail.com Remain yours most hopefully
Yuri Tambovtsev, Novosibirsk, Russia

["Typology of functioning of phonemes in a sound chain of Indo-
European, Palaeo-Asiatic, Ural-Altaic and other world languages:
compactness of subgroups, groups, families and other language taxons" -
Novosibirsk: Sibirskij Nezavisimyj Institut, 2003 - 143 pages.]
   [Novosibirsk, 630123, Ul. Severnaya 23/1. Sibirskij Nezavissimyj

Reviewed by Senior Teacher of Novosibirsk School #
Ludmila Alekseevna SHIPULINA

     The book under review is the addition to Tambovtsev's theories, methods
and data published earlier (Tambovtsev. 1994-a; 1994-b; 2001-a; 2001-b;
2001-c). I think that linguistics needs new data to support or to reject the
classical theories. More often than not, linguists argue about this or that
linguistic theory (e.g. Uralic or Altaic language unities) without any new
data at hand. This new book by Yuri Tambovtsev provides such new data.
Speaking about applications of statistical methods in linguistics, one must
agree with Chris Butler that very often only statistical techniques are
relevant for some linguistic research because it is difficult otherwise to
understand the language phenomenon. It is especially important in any
type of linguistic study involving differences in people's linguistic
behaviour or in the patterns of language itself (Wray et al., 1998: 255).
Tambovtsev adds much data on phonological statistics of world languages.
He is one of the very few linguists who applied phonology to stylistics and
typology (Teshitelova, 1992: 157 - 181). In this book, as in the previous
books, Yuri Tambovtsev considers the typology of regulation and chaos of
distribution of consonant phonemes in a sound chain of world languages.
In fact, Tambovtsev concentrates on variability in sound chains of world
languages. Actually, he adds much to the essential parts of his theories and
methods in the analysed monograph under review, especially on the
phonostatistical universals of Finno-Ugric, Turkic, Indo-European ans
other world languages. The author examines the homogeneity of texts in
various languages from the point of view of the occurrence of phonemic
groups in their sound speech chains with the help of phonological
statistics. Tambovtsev also investigates the rules of a sound chain division,
as well as frequency of occurrence of certain phonemic groups of
consonants in the phonetic systems of various world languages. Many new
languages are investigated by his method, in comparison to his previous
books (Tambovtsev, 1994-a; 1994-b; 2001-a; 2001-b; 2001-c).
     In fact, Yuri Tambovtsev has computed phonostatistical data on the
occurrence of labial, front (i.e. forelingual), palatal (mediolingual), back
(velar, pharengeal and glottal), sonorant, occlusive, fricative
and voiced consonants in speech in a great number of languages. It
comprises 8 phonological features. The articulation system of these
languages is also discussed in brief. There is as well a short review of
ethnic history (ethnogenesis) of the nations speaking these languages. The
author thinks it of great importance to analyse these language contacts
during the history of their ethnic development.
   As far I can judge, Tambovtsev's first article in the field of phonological
statistics was published in 1976. So, he has been working on the problems
mentioned above for a long time, i.e. for some 30 years. Unfortunately, I
cannot mention all Tambovtsev's publications since he is the author of 8
monographs and about 250 articles on language typology, phonostatistics
and phonetics. His study involves the sound pictures of 156 world
languages. In the book under review, Tambovtsev's conclusions are based
on the data of the occurrence of the frequency of phonemes in the
languages of the following families and groups:
1. Indo - European language family (the language groups: Indo - Aryan (8
languages), Iranian (4 languages) , Celtic (1 language), Italic (1 language),
Romanic (5 languages) , Germanic (7 languages) , Baltic (2 languages) ,
Slavonic (8 languages) , genetically isolated Indo-European languages (5
languages) , artificial languages(1).
2. Ural-Altaic language community which include the Uralic and Altaic
language communities:
A. Uralic language community, Finno-Ugric language family, Ugric
subgroup of Finno-Ugric language family (5 languages), Permic
subgroup of Finno-Ugric language family (2 languages) , Volgaic
subgroup of Finno-Ugric language family (5 languages) , Balto - Finnic
subgroup of Finno-Ugric language family (9 languages) , Samoyedic
language family (3 languages).
B. Altaic language community, Turkic language family (22 languages) ,
Mongol language family ( 3 languages).
3. Tungus - Manchurian language family (6 languages),
4. Yenisseyic language family (1 language).
5. Caucasian language family (2 languages).
6. Palaeo - Asiatic language family (8 languages).
7. Sino - Tibetan language family (2 languages).
8. Afro - Asiatic language family (3 languages).
9. Bantu language family (2).
10. Austro -Asiatic language family (2).
11. Austronesian language family (5 languages).
12. Australian language family (6 languages).
13. The language community of American Indians (20 languages).
     As a linguist I often feel I must use statistical methods in my studies of
the English, German and other languages. However, it is hard for a linguist
to understand how to use them correctly, but at the same time in the easiest
simple way. The author of the book teaches us how to do it. He does it on
the example of the following methods of statistical calculation: standard
quadratic deviation, variation coefficient, level of significance, confidence
interval, T-criterion of Student, criterion of Kolmogorov-Smirnov, Chi-
square criterion, and Euclidean distance. He also shows how to measure
the statistical reliability of the linguistic results. Very often a
linguist, who
is a layman in linguistic statistics, may draw wrong linguistical results
because his results are not statistically reliable.
The book by Yuri Tambovtsev focuses not only on the mathematical
statistical methods, which have been employed by him in his linguistic
research, but also discusses the important problems of classification of
world languages. The author touches the topics of reliability of
mathematical statistical methods in linguistics. The target of his research is
to compare various languages within a single family as well as languages
belonging to different families and groups. For this sake, Tambovtsev has
generated mean values of frequency rates of various phonemes and
phonemic groups in speech. In fact, these mean values provide reliable
correlation between different languages. There are several mathematical
methods allowing estimations of variation of major statistical values.
Tambovtsev aims to estimate regularities in usage of particular phonemes
or phonemic groups in particular languages. He has chosen several
methods of variability estimation and described techniques of their
application to phonetic studies.
In this respect, the issues of a size of a sample are important. In fact, the
greater the sample, the more reliable results. One of the most important
problems is the problem of the size of the portions (units) into which the
text is divided. The portion should not be too small or too big. Tambovtsev
correctly takes the generally accepted sample portion in phonological
research, which is 1000 phonemes. Tambovtsev separates all his texts of
the languages under discussion into units comprising 1000 phonemes. In
statistics, the most reliable results are obtained on large samples. Thus,
Tambovtsev argues that the minimum necessary sample should include not
less than 30 thousand phonemes.
The author has applied the method of evaluation of the mean quadratic
deviation in his research among other methods estimating statistical
variations. The mean quadratic deviation index is used in generating other
evaluating indices. Quadratic deviation indices generated for two different
texts can be compared if the sample sizes of basic texts are equal. Standard
deviation data cannot be compared if the samples of texts are not equal. In
cases, when the sample sizes are different, other mathematical functions
should be used. Tambovtsev correctly chooses the estimation of the
confidence interval, "chi-square" criterion, coefficient of variance, etc.
In my opinion, it is important to provide the reader with the exact
examples of how to calculate the mean quadratic deviation or standard
deviation because a layman in phonostatistics, as myself, may do it in the
wrong way. Yuri Tambovtsev provides us with the data on the occurrence
of the labial consonants in the Old English texts: "Boewulf, Ohthere's and
Wulfstan's Story, the Description of Britain, Julius Caesar", etc. He
compares the use of labials in Old English to the analogical use in modern
Variation coefficient represents another important tool in comparative
linguistic research. It helps to compare incommensurable values. As it was
stated above, the mean quadratic deviation characterises the degree of
deviation of the frequency rate of a particular phoneme from the mean
value. However, the mean quadratic deviation values do not take into
account the fact that the number of labial phonemes is greater that that of
the mid-lingual (palatal) phonemes. Consequently, the absolute mean
index of labial sounds is considerably greater than that of the palatal ones.
On the other hand, front-lingual phonemes are usually more frequent than
labial. This heterogeneity of features asks for additional methods of
comparison, i. e. the variation index called the "coefficient of variance".
     Unlike the mean quadratic deviation, the coefficient of variation allows
correlation of frequency rates of those phonemes and phonemic groups,
which have produced different mean values. It is possible to make the
measure of variability comparable using the coefficient of variation. It can
be used in linguistics in the way it is recommended by Fred Fallik and
Bruce Brown for behavioural sciences (Fallik et al., 1983: 111 - 112). The
coefficient of variation is used as an indicator of variation/stability of
particular linguistic elements in a sample. The minimum necessary size of
such samples should be not less than 30 units. The larger is the value of
variation coefficient, the higher is the variability of a particular
pholological feature (phonemic frequency in this case).
     Another important statistical notion is the significance level. In his
research Yuri Tambovtsev has chosen the significance level value of 0.05,
or 5%. To my mind, Tambovtsev chose it correctly since such a level of
significance is usually used by the majority of researchers in linguistics
and phonology. This sort of significance level (i.e. 5%) tells us that we
have 95% confidence in our linguistic research. This significance level. I
believe, is important in any linguistic research, but especially important for
correlations carried out on small samples, i.e. in the samples less than 30
thousand phonemes.
     Confidence interval evaluation is closely related to other statistical
procedures like estimations of the minimum necessary sample at the fixed
significance level. Tambovtsev proposes to fix it always at 5%, for a
layman in statistics not to break his brain over the other possible levels.
Actually, it is so specific mathematical, that a linguist should not try to
understand its mathematical foundation. I'm sure, if a linguist learns how
operate with all necessary statistical criteria correctly, then using only one
level of significance (e.g. 5%) is quite all right. The higher level of
significance usually requires larger samples, and thus, much more labour,
than necessary.
   In certain cases, I guess, one is advised to use the values of the
interval. The confidence interval evaluation is more reliable for
phonological research since it provides us with a greater precision. The
general rule is the narrower the confidence interval, the higher is the
homogeneity of a parameter under discussion, i.e. a frequency parameter
of a particular phonemic class or phoneme in speech. Usually, a text
allows us to obtain narrower confidence intervals than the collection of
phrases and words.
     In his book, the author correctly provides a correlation between these
three important parameters: sample size and the confidence interval at the
fixed significance value. Available data have shown that the greater the
sample size, the lower is the confidence interval at the fixed significance
level in all languages of the world, irrespective of their genetic affiliation
or grammatical type.
Tambovtsev has also paid attention to reliability of statistical results
obtained in the course of his phonological research. He has received
indices representing statistical error resulting from the fact that each
sample represents only some portion of the general language aggregate.
Such indices are called representation errors. The value of the
representation error depends mostly on the sample size and on variation
rate of a particular parameter. It is noteworthy that texts in different
languages produce similar representation error, which does not depend on
their morphological structures. This fact suggests a certain universal in
consonant phonemic groups functioning in genetically different languages.
However, I think, that Tambovtsev has applied the strictest way of
estimating the representation error. On the one hand it is bad, since it
requires larger samples for a fixed error (e.g. the error of 5% or less), but,
on the other hand, it means that one can be surer of his linguistic result.
Yuri Tambovtsev rightly mentions that many linguists who use statistics
do not know that the T-test or "Student's" criterion was proposed by
William Gosset, and not by some scholar called Student. "Student" was the
name that William Gosset assumed as a pseudo-name. The Student's
criterion is employed in cases when it is necessary to compare two mean
values found for two different texts. The reliability of difference between
two mean values depends on variability of involved parameters and on the
sizes of the sample, for which these variables have been generated. The
"student's" criterion can be applied for variables subordinating to normal
dispersion. Within a sample of not less than 30 units, dispersion is
considered normal. In the course of research, the "student's" criterion has
been calculated for two samples of equal size of 31 thousand phonemes.
On the one hand, a scientific text was compared with fiction, and on the
other hand, two scientific texts were compared. The value the former is
nearly four times greater than the latter. It convinces us that the
criterion can be applied for the stylistic analysis of texts all right.
The statistical criterion, called Kolmogorov-Smirnov test, provides
researchers with mathematical method of analysis, which does not depend
on the restrictions applied to statistical analyses. It concerns the following
1) Statistical analyses are carried out with independent accidental
2) Aggregates of accidental variables should demonstrate close mean
and dispersion values;
3) Aggregates should subordinate to the law of normal
The Kolmogorov-Smirnov criterion belongs to the so-called "robust" non-
parameter methods, which are not sensitive of deviations from the standard
conditions. Low values of the Kolmogorov-Smirnov (K-S) criterion mean
that the fluctuation of the analysed linguistic parameters is minor, that is
not linguistically significant. Tambovtsev argues that the low value of K-S
criterion in his research supports his hypothesis on a normal dispersion of
the established eight groups of consonants within the speech sound chains.
Representation of any language with the help of eight groups of
consonants has served as a basis for his phono-statistical research.
Tambovtsev has also employed the "chi-square" criterion in his
investigations. With the aid of this criterion, he estimates differences
between the empirical and expected values. If the difference is
insignificant, it can be a result of accidental deviation. Otherwise, it
reflects significant differences between factitious (empirical) and expected
(theoretical) values of frequencies of phonemic group occurrences in
speech. L. Bolshev and N. Smirnov (Bolshev et al., 1983: 166 - 171) have
generated the list of maximum frequency values reflecting insignificant
fluctuations of variables through the "chi-square" technique, which
Tambovtsev provides on page 33. It is quite handy because usually
linguists do not have books on statistics at hand. Christopher Butler
recommends the chi-square test to measure the independence and
association of linguistic units in various sorts of linguistic material
1985: 118 - 126). Tambovtsev shows how to use it on the material of the
occurrence of labial consonants in British and American prose (Agatha
Christie, John Braine, W. S. Maugham, Jack London, F. Scott Fitzgerald,
Ernest Hemingway, etc.). The chi-square values show that labials are
distributed rather homogeniously. Tambovtsev draws the attention of the
reader to calculate the degrees of freedom correctly (p.30). He also
compares how similar is the distribution of labials, front, palatal, and velar
consonants in Kalmyk (a Mongolian language) and Japanese (a genetically
isolated language). It is not by this statistical criterion (p.31).
However, the
same criterion shows close similarity between the distribution of the 5
consonantal groups in Turkish and Uzbek (p.32). The T coefficient is less
than 1 in 5 parameters, i.e. front, palatal, velar, sonorant and occlusive.
Tambovtsev explains T coefficient as the ratio of the obtained values of
chi-square and the theoretical values which can be found in the chi-square
tables. It T coefficient is less than 1, the statistical results are
similar p.31 -
33). It also shows great similarity between some other Turkic, Finno-
Ugric, Samoyedic, Tungus-Manchurian, Slavonic, Germanic, Iranian and
other Indo-European languages inside their taxons.
Chapter 2 is dedicated to the issues of genetic and typological
classifications of languages of the world. The author does not go into
details and debates concerning inclusion of certain languages into
particular genetic groups and families, or identification of a particular
language as a separate language or a dialect. The major aim of the author
is to provide a technique, which would allow linguists to check the
rightfulness of inclusion of a particular language into a certain language
group or a family. Before analysing the compactness of subgroups, groups,
families and other language taxons, Tambovtsev warns the reader that the
problem of the division of world languages into families has not been
completely solved. For instance, it is quite necessary to discuss the
problem if Turkic languages constitute a family themselves or a branch in
some other family, called Altaic family. Actually, Turkic languages are
considered to form a family by some linguists (e.g. Baskakov, 1966 and
other Russian linguists). However, some other linguists, especially those in
the West, consider Turkic languages to be a group within the Altaic family
spoken in Asia Minor, Middle Asia and southern Asia (Crystal, 1992: 397;
Katzner, 1986:3). The other two branches of Altaic family are Tungus-
Manchurian and Mongolian. To my mind, it is more logical to consider
Turkic languages a family, rather than a subgroup within Altaic family.
Altaic languages should be called a super family, Sprachbund, language
community or unity, since the true genetic relationship of Turkic, Tungus-
Manchurian and Mongolian languages have not been proved. If one goes
along this line, then all languages on the Earth may be called one family
with lots of groups and branches. On the other hand, it is not productive to
form separate language family consisting of one language. For instance, in
1960s Ket was considered an isolated language of Paleo-Asiatic family
(Krejnovich, 1968: 453). However, now it is considered to form the so-
called Yeniseyan family, though consisting of only one language with its
dialects and subdialects. Summing up the modern point of view, David
Crystal remarks that Yeniseyan is a family of languages generally placed
within the Paleosiberian grouping, now represented by only one language -
Ket, or Yenisey-Ostyak (Crystal, 1992: 424). I don't think it is wise to
multiply language families like that. Other linguists (e.g. Ago Kunnap,
Angela Marcantonio, etc.) question the very existence of the Uralic
language family (Marcantonio, 2002).
Among other language families, Tambovtsev describes the Finno-Ugric
family. He argues, that this language family includes two major groups:
Baltic-Finnic and Ugric groups.
The author considers the theories of those linguists who identify the
following four groups in the Finno-Ugric family:
1) The Baltic-Finnic group including Estonian, Finnish, Karelian,
Vepsian, Izhorian, Vodian, Livonian, and Saami possessing some specific
2) The Volga group including Erzia-Mordovian, Moksha-Mordovian,
Mountain Mari, and Lawn or Meadow East Mari;
3) The Permic group comprising Udmurdian, Komi-Zyrian, and Komi-
4) The Ugric group comprising Hungarian, Manty, and Khansi.
Together with the Samoyedic language family comprising the Nenets,
Selkup, Nganasan, and Enets languages.
The Finno-Ugric and Samoyedic are said to form the Uralic language unit.
Tambovtsev argues that until present, no fore-language of this unit has
been established. The languages of the Uralic unit do not form a compact
unity from the point of view of dispersal and frequency of phonemic
groups. With the aid of the coefficients that have been received by
Tambovtsev in his studies, the author has shown that the consonant indices
and the compactness (dispersion) coefficients suggest a more compact
unity for Samoyedic languages family (the meanV=18.29%; T=0.16),
rather than for the Finno-Ugric (the mean V=24.14%; T=0.47). The Uralic
language unity has a greater dispersion (the mean V=28.31%; T=0.57).
This fact has been interpreted as a support of the idea that languages of the
Samoedic and Finno-Ugric family are more closely related to one another
within the family, than between the families. Thus, the idea of the Uralic
taxon as a language family should be either rejected or considered with
caution (p.125).
The Turkic language group includes Azeri, Baraba-Tatar, Bashkir,
Gagauz, Karaim, Dolgan, Kazakh, Kamasin, Karakalpak, Karachai-
Balkarian, Kyrgyz, Crimea-Tatar, Kumyk, Nogai, Tatar, Tofalar, Tuvin,
Turkish, Turkmenian, Uzbek, Shor, and Yakut. The author argues that a
Turkic fore-language can be regarded as a real basic language for all the
Turkic languages. He points out that the Turkic fore-language (Ursprache)
demonstrates closer relations to any of the present Turkic languages, than
these languages may have between one another now. However, he did not
include the Ancient Turkic into his studies because of the uncertainty in
the pronunciation.
The Mongolian language family includes only three languages: Buriat,
Kalmyk, and Mongolian. It is the minimum possible group for statistical
The Tungus-Manchurian language group includes 10 languages:
Manchurian, Nanai, Negidal, Oroch, Orok, Solon, Udege, Ulchi, Evenk
(Tungus), and Even.
Inclusion of the Turkic, Mongolian and Tungus-Manchurian language
family into one language unity represents the debatable topic in linguistics
to day.
The Indo-European language family seems to be the most thoroughly
investigated. Major linguistic methods of investigations and comparative
linguistic analysis were elaborated during the long history of studies of
European languages. However, currently the major question concerning
the existence of a single Indo-European fore-language has not been
It is noteworthy, that many linguistic debates have been often carried out
in terms of "similarity" and "linguistic distance". Yet, the terms themselves
have not been clearly defined yet.
Tambovtsev thinks that at the present state of understanding, modern
languages represent either products of divergence or the reverse process,
i.e. convergence. In historical perspective, both processes produced their
impacts on development of languages. Tambovtsev agrees with those
researchers who think that origin of all Indo-European languages from a
single fore-language is fiction, while their co-existence and convergence in
their development resulting in appearance of certain common features is a
scientific fact. The noted uniformity of the Indo-European languages can
be explained as a secondary, later phenomenon, and differentiating
features represent the original and early characteristics of each language of
this family.
However, no classifications other than the genealogic one have been
elaborated, Tambovtsev accepts the following classification of the Indo-
European family: the Indian, the Iranian, the Baltic, the Slavonic
(including Eastern, Western, and Southern Slavonic sub-groups),
Germanic, Romanic, and Celtic language groups.
Following Illich-Svitych, Tambovtsev believes that the Nostratic language
unity can serve as a good model for linguistic investigations of various
sorts, but he does not think these languages should be considered a
language unity; moreover, this rather arbitrary construct is not recognised
by all the linguists. The Nostratic language unity includes the following
language families: Indo-European, Finno-Ugrian, Samoyedic, Turkic,
Mongolian, Tungus-Manchurian, Cartvelian, and Semito-Hamitian.
Tambovtsev proposes a concept of compactness for linguistic studies. He
defines compactness as more or less closely related languages within
language sub-groups, groups, families, etc. In other words, he attempts to
measure the distance between languages within analysed taxons or
clusters. The distances are measured on the basis of frequency rates of
particular linguistic (phonological) characteristics.
The author uses the concepts of image recognition and regards language
families as a unit with more of less compact structure. In the branch of
applied mathematics called pattern recognition different images of various
sorts are recognised. One can consider language to be a sort of such image.
Therefore, one can use the methods of pattern recognition to develop
various types of classifications based on exact values of some coefficients
(Zagorujko, 1999: 195 - 201). The generated index of compactness can be
regarded as an indicator of an opposing process of diffusion. Values of
frequency rate of particular parameter should not considerably deviate
from the mean value established for a given language family or group. If
the values of deviation are considerably greater than the established mean
value, the given language does not belong to the language family under
discussion. If majority of languages produce these deviation indices higher
than the mean value, we should state that the languages under study do not
form a language group but rather a set of separate languages.
Tambovtsev has forwarded his hypothesis that typological similarity of
languages can be tested by statistical methods resulting in generation a set
of indices described above. The hypothesis holds that when a language is
included into a particular language group, the generated indices of this new
formation will show either a
higher or lower compactness. Closely related language would increase the
compactness indices and vice versa.
The author illustrates this presupposition by a series of examples. Thus, he
analyses frequency rates of labial consonants in the Turkic languages
compared to Mongolian. The frequency of labial consonants in Mongolian
is 7.52%. In the Turkic languages the relevant figures vary from 5.98% to
12.80%. The total fluctuation index is 6.28, the difference between the
neighboring languages is 0.49. The Altai language has produced the lowest
index of labial consonant frequency, while the Karakalpakian has shown
the highest index. The Turkic languages can be classified in the following
way by the labial consonant frequency indices: Karakalpakian - 12.80%;
Turkish - 10.41%; Uigur - 9.83%; Azerbajanian - 9.66%; Uzbekian -
9.42%; Kumandinian - 9.22%; Baraba-Tatarian - 9.04%; Turkmenian -
8.50%; Kirgizian - 8.43%; Kazakn-Tatarian - 8.03%; Kazakhian - 7.99%;
Khakassian - 7.82%; Yakutian - 6.10%, and Altaian - 5.98%. The place of
the Mongolian language (7.52%) is between Khakassian and Yakutian
suggesting the distribution of labial consonants is more similar in these
three languages compared to other languages of the Turkic group.
The Mongolian group has produced the following indices: Mongolian
(7.52%), Buriatian (7.67%), and Kalmykian (6.65%). This distribution
indices fall within the same range as above - from 5.98% to 12.80%, while
the total fluctuation and the difference between the neighboring languages
are lower (1.02 and 0.34 respectively).
The Uralian language unity yields the labial frequency indices in the range
of 7.71% - 13.72%, the difference between the neighboring languages is
0.30. Indices of language group compounding Mongolian and Tungus-
Manchu languages are from 7.52% to 12.46%, with the mean difference
between the neighboring values of 0.70. Consequently, we may infer on
considerable differences in the sound chains of the Mongolian and the
Tungus-Manchurian languages.
On the contrary, introduction of the Mansi language belonging to the
Finno-Ugrian language family, on which language Turkic and Mongolian
languages did not produced considerable influence, into the Turkic
languages increases the diffusion index of this group. Consequently, the
Mansi language, unlike Mongolian, does not belong to the Turkic language
Analysis of frequency rates of the front (i.e. forelingual) consonants may
serve as another example of compactness of Turkic and Mongolian
languages. Front-lingual consonants represent the most frequent sounds in
the Turkic languages as well as in many other languages of the world. The
range of frequency of front-lingual sounds in the Turkic languages varies
from 32.35% to 40.24%. The overall fluctuation index is 7.89, the
difference between the neighboring languages (the mean difference) is
0.564. In Mongolian, the range of frequency of front-lingual sounds is
36.57%of the total number of sounds. The mean difference for a
compound group of Turkic languages and Mongolian becomes lower
(0.526). The relevant figures found for the UraliĀ languages are: frequency
range 24.79% - 36.78%; the fluctuation index is 11.99; the mean
difference is 0.6. Apparently, the Turkic language group is more compact
than the Uralic.
The Mongolian and Tungus-Manchu language families have yielded
similar indices in the range of 17.31% to 36.57%; the fluctuation index is
19.26; the mean difference is 2.75.The Paleo-Asian group of languages
represent still less compact group, their frequency rates varying from
20.02% to 36,74%; the fluctuation index is 16.64; the mean difference is
The author provides frequency indices on many languages and language
groups. In order to show the general tendency in the distribution of speech
sounds he proposes to use the general coefficients of variation resulting
from adding generated indices on each group of phonemes. He also uses
the T coefficient, which is generated on the basis of "chi-square" index, as
a reference index. The resulting general coefficients of variation (V) allow
him to form the following sequence. The Ugric language group
demonstrates the highest diffusion (V = 221.27%, T = 3,77). The Baltic-
Finnish languages yield V = 185.90%, T=2,79). The group of Volga
languages is the most compact group with V =143, 19, T=1.02).
Another interesting method of comparative analysis implies introduction
of isolates Asian languages into various language families in order to
establish possible relationships. Thus, introduction of the Ket language
into the Finnish-Ugric family (V = 193.13%, T = 3.77) results in the
higher diffusion (V =198.04, T = 3.94). The same procedure with
Yukaghir yields V = 199.17%; with Korean V is 199.24%, T = 3.88; with
Japanese V is 200.51%, T = 3.91; Nivkhi yields V = 206.48%. On the
contrary, Chinese has shown closer similarity with the Finno-Ugric
languages: V = 190.01%, T = 3.65.
As a result of his investigations, Tambovtsev has come to the following
1) Front (forelingual) and occlusive consonants are most evenly
distributed within language families.
2) Voiced consonants represent the most variable feature; some
languages have no category called "voiced" consonants.
3) The Mongolian language family is the most compact by the total
sum of the values of the coefficient of variation based on seven major
groups of phonemes (without voiced consonants) and the coefficient T.
The consequence with respect to total sum of the coefficient of variation
has been established as follows: the Mongolic, the Samoyedic, the Turkic,
the Tungus-Manchurian, and Finno-Ugric language families. The Paleo-
Asiatic language family has yielded the highest diffusion (i.e. the lowest
compactness) indices and consequently can be regarded not as a language
family but as a loose language unity or community.
4) The general tendency has been shown that in general a language
sub-group is more compact that a group, and a group is more compact that
a language family. The least compact, that is the most loose, is the
language super-unity comprising all the languages of the world.
5) A collection of two language groups or two families into one
unit results in a higher diffusion characteristics than the original taxons.
   All I can say is that the book by Yuri Tambovtsev is a solid and profound
investigation in the comparative analysis of the languages of the world.
The author provides many tables with indices and coefficients generated
through various techniques for a great number of languages. Analysis of
these data provides linguists with a method of linguistic investigations on
the basis of numerical procedures. The book contains a large list of
references. It is recommended to those students, who are interested in
phonology, linguistical statistics and typology of world languages. I guess
that at the moment, many linguists are dealing with minor linguistic
problems in one language. Linguistics lacks such books, which deal with
the modern classification of world languages. Tambovtsev's book may
give the new material for such language classifications.
     Being a linguist by education, I naturally was scared to discuss
methods without the consultation of the specialists in mathematical
statistics. I must thank for consultations and generous advice Prof. Dr.
Arkadiy Shemiakin, Prof. Dr. Vadim Efimov, Prof. Dr. Leonid Frumin and
Prof. Dr. Valeriy Yudin.

Bolshev et al., 1983 - Bolshev, Login Nikolaevich and Nikolai Vasilyevich
Smirnov. Tables of Mathemetical Statistics. - Moskva: Nauka, 1983. - 416
pages. (in Russian).
Butler, 1985 - Butler, Christopher. Statistics in Linguistics. - Oxford:
Basil Blackwell, 1985. - 214 pages.
Fallik et al., 1983 - Fallik, Fred and Bruce Brown. Statistics for Behavioral
Sciences. - Homewood, Illinois: The Dorsey Press, 1983. - 538 pages.
Marcantonio, 2002 - Marcantonio, Angela. The Uralic Language Fimily:
Myths and Statistics. - Oxford: Blackwell Publishers, 2002. - 335 pages.
Tambovtsev, 1994 -a - Tambovtsev, Yuri. Dinamika funktsionirovanija
fonem v zvukovyh tsepochkah jazykov razlichnogo stroja. [Dynamics of
functioning of phonemes in the languages of different structure]. -
Novosibirsk: Novosibirsk University Press, 1994-a. - 133 pages.
Tambovtsev, 1994-b - Tambovtsev, Yuri. Tipologija uporjadochennosti
zvukovyh tsepej v jazyke. [Typology of Oderliness of Sound Chains in
Language]. - Novosibirsk: Novosibirsk University Press, 1994-b. - 199
Tambovtsev, 2001-a - Tambovtsev, Yuri. Kompendium osnovnyh
statisticheskih harakteristik funktsionirovanija soglasnyh fonem v
zvukovoj tsepochke anglijskogo, nemetskogo, frantsuzkogo i drugih
indoevropejskih jazykov. [A compendium of the major statistical
characteristics within the paradigm of consonant phonemes functioning in
the sound chains of the English, German, French, and other Indo-European
languages.] - Novosibirsk: Novosibirsk Classical Institute, Novosibirsk,
2001. - 129 pages.
Tambovtsev, 2001-c - Tambovtsev, Yuri. Nekotorye teoreticheskie
polozhenia tipologii uporiadochennosti fonem v zvukovoi tzepochke
yazyka i kompendium statisticheskikh kharakteristik osnovnykh grupp
soglasnykh fonem. [Theoretical concepts of typology of the order of
phonemes in language sound chains and a compendium of statistical
characteristics of the main groups of consonant phonemes]. -
Novosibirsk: Novosibirsk Classical Institute, 2001. - 130 pages.
Tambovtsev, 2003 - Lingvisticheskaja taksonomija: kompaktnost'
jazykovyh podgrupp, grupp i semej. [Linguistical taxonomy: coppactness
of language subgruops, groups and families]. - In: Baltistika, Volume 37, #
1, (Vilnius), 2003, p. 131 - 161.
Teshitelova, 1992 - Teshitelova, Marie. Quantitative Linguistics. -
Amsterdam/Philadelphia: John Benjamins publishing company, 1992. -
253 pages.
Wray et al., 1998 - Wray, Alison; Trott, Kate and Aileen Bloomer with
Shirley Reay and Chris Butler. Projects in Linguistics: A Practical Guide
to Researching Language. - London and New York: Arnold, 1998. - 303
Zagorujko, 1991 - Zagorujko, Nikolaj Grigorjevich. Applied Methods of
Data and Knowledge Analysis [in Russian]. - Novosibirsk: Institute of
Mathematics of the Siberian Branch of the Russian Academy, 1999. - 268
Reviewed by Ludmila Alekseevna Shipulina
Received on Mon Aug 16 2004 - 03:59:18 EDT

This archive was generated by hypermail 2.2.0 : Mon Aug 16 2004 - 03:59:19 EDT