From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty_at_kcl.ac.uk>

Date: Fri, 1 Jun 2007 06:58:39 +0100

Date: Fri, 1 Jun 2007 06:58:39 +0100

Humanist Discussion Group, Vol. 21, No. 66.

Centre for Computing in the Humanities, King's College London

www.kcl.ac.uk/schools/humanities/cch/research/publications/humanist.html

www.princeton.edu/humanist/

Submit to: humanist_at_princeton.edu

[1] From: Carl Vogel <vogel_at_cs.tcd.ie> (48)

Subject: Re: 21.065 how to use Chi-square correctly

[2] From: Michael Hart <hart_at_pglaf.org> (4)

Subject: Re: 21.065 how to use Chi-square correctly

[3] From: Ryan Deschamps <Ryan.Deschamps_at_Dal.Ca> (36)

Subject: Re: 21.065 how to use Chi-square correctly

[4] From: "Juliana Tambovtseva" <jultamb_at_yandex.ru> (8)

Subject: Classification Accuracy Within Author Discrimination

--[1]------------------------------------------------------------------

Date: Fri, 01 Jun 2007 06:53:31 +0100

From: Carl Vogel <vogel_at_cs.tcd.ie>

Subject: Re: 21.065 how to use Chi-square correctly

* >
*

* > At 09.35 30/05/2007, Norman Gray (Humanist Discussion Group) wrote:
*

* > >I don't believe anything has changed in the way that chi-squared is
*

* > >defined or used. Nor has much changed (unfortunately) in the way and
*

* > >extent that it is abused. [...]
*

* >
*

* > Am I the only one interested to know more
*

* > (possibly with examples) about how the chi-square test is abused?
*

* > If you Norman could take us by hand and write
*

* > down a small description about how the test is
*

* > used correctly and how to avoid the abuse...
*

* > With thanks in advance
*

* > maurizio
*

* >
*

* > Maurizio Lana - ricercatore
*

* > Dipartimento di Studi Umanistici - Universita del Piemonte Orientale a
*

* > Vercelli
*

* > via Manzoni 8, I-13100 Vercelli
*

* > +39 347 7370925
*

Howdy,

Of course, without knowing the full details of the article, one can

only speculate on what the reviewer had in mind. The digested version

of what was done in the paper in question was checking the relative

distribution of one linguistic feature between two sources. It is

possible that the reviewer was objecting to the articulation of the

null hypothesis (that any differences in the distributions are random)

or the alternative hypothesis (that the two texts could not have been

drawn from the same population). One or other of those might have

been given a different interpretation. Further, the fact that the

null hypotheses was rejected might have been used to argue support

for a hypothesis far more general than the test was actually focused

upon.

There's a very interesting article about the use of Chi-square testing

in natural language research by Adam Kilgarriff:

@Article{Kilgarriff05,

author = {Kilgarriff, Adam},

title = {Language is never, ever, ever random},

journal = {Corpus Linguistics and Linguistic Theory},

year = {2005},

OPTkey = {},

volume = {1-2},

OPTnumber = {},

pages = {263-275},

OPTmonth = {},

OPTnote = {},

OPTannote = {}

}

All the best,

Carl

--[2]------------------------------------------------------------------

Date: Fri, 01 Jun 2007 06:54:06 +0100

From: Michael Hart <hart_at_pglaf.org>

Subject: Re: 21.065 how to use Chi-square correctly

I am interested as well.

Michael S. Hart

Founder of Project Gutenberg

Who Minored In Statistics

Some 33 Years Ago. . . .

--[3]------------------------------------------------------------------

Date: Fri, 01 Jun 2007 06:54:53 +0100

From: Ryan Deschamps <Ryan.Deschamps_at_Dal.Ca>

Subject: Re: 21.065 how to use Chi-square correctly

I can think of only three cases where Chi-Squared could be said to be abused:

1. Using Chi-squared in cases where it isn't appropriate (non-exclusive

categories, too few observations, time-influenced data etc.)

2. Using Chi-squared after failing to follow appropriate observational &

descriptive analysis.

3. Using Chi-squared when a parametric test (t-test, ANOVA) would do just as

well.

The first two just involve proper scientific procedure and analysis, and is not

unique to Chi-squared. Of course, just popping data into SPSS and running a

test is not a good way to go about any statistical analysis.

The last one is bit of statistical snobbery, but it is valid enough

to warrant a

mention. If the parametric test can be run, you should run it instead of the

Chi-squared (despite any preference for Greek letters). (see this tutorial

for an explanation:

http://www.georgetown.edu/faculty/ballc/webtools/web_chi_tut.html ).

The whole thing about the "new" way to run a Chi-squared confounds me, although

I'd need a context for the quote. Mathematical distributions are not very

useful if they change over time. If a more proficient or flexible

distribution is discovered, I am sure they'd give it a new name to make note of

that fact. Frankly, I can only see this happening for specific kinds of

Chi-tests, and the Chi test would still be relevant because it is so easy to

use.

Sometimes economists will give the use a series of procedures a name

and call it

a "test" -- for instance, the Granger test uses Least-squares regression and

ANOVA on time-series data -- but now I'm really grasping.

Sometimes another test is required to make sure the data is appropriate for a

particular statistical test. But Chi-squared is so basic that I cannot really

think of case where that is necessary. (Still grasping).

Hope this is helpful.

For transparency sake, I am a diletant on statistical matters (ie. geeky enough

to write about it on a wiki). Judging on my not-so-many years of statistical

memory, I encourage further comments to straighten anything I've said out.

Ryan. . .

Ryan Deschamps

MLIS/MPA Expected 2005

--[4]------------------------------------------------------------------

Date: Fri, 01 Jun 2007 06:55:30 +0100

From: "Juliana Tambovtseva" <jultamb_at_yandex.ru>

Subject: Classification Accuracy Within Author Discrimination

Dear Humanist colleagues, what are your impression on the

article published in: Literary and Linguistic Computing for June 2007;

Vol. 22, No. 2

Employing Thematic Variables for Enhancing Classification Accuracy

Within Author Discrimination Experiments by George Tambouratzis and

Marina Vassiliou, pages 207-224. The idea to compile a simple and

understandable handbook on how to use Chi-square is appreciated.

Especially with lots of exact examples. Looking forward to hearing

from you to <mailto:yutamb_at_mail.ru>yutamb_at_mail.ru

Received on Fri Jun 01 2007 - 02:06:42 EDT

*
This archive was generated by hypermail 2.2.0
: Fri Jun 01 2007 - 02:06:43 EDT
*