Home About Subscribe Search Member Area

Humanist Discussion Group


< Back to Volume 32

Humanist Archives: Jan. 1, 2019, 10:28 a.m. Humanist 32.296 - toward a theory of the corpus

                  Humanist Discussion Group, Vol. 32, No. 296.
            Department of Digital Humanities, King's College London
                   Hosted by King's Digital Lab
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org




        Date: 2018-12-31 21:00:21+00:00
        From: Bill Benzon 
        Subject: Toward a Theory of the Corpus 

Getting this out was like pulling my own teeth, but I'm glad it's out, even
if I'm going to have to rework the whole thing. Comments appreciated.

Happy New Year to all,

Bill B

* * * * * *

Toward a Theory of the Corpus

Abstract: Recent corpus techniques ask literary analysts to bracket the
interpretation of meaning so that we may trace the motions of mind. These
techniques allow us to think of the mind as being, in some aspect, a high-
dimensional space of verbal meanings. Texts then become paths through such a
space. The overarching argument is that by thinking of texts as just ordered
collections of physical symbols that are meaningless in themselves we can
examine those collections in ways that allow us to recover the motions of mind
as it constructs meanings for itself. When we examine a corpus over historical
time we can see the evolution of mind. The corpus thus becomes an arena in which
we investigate the movements of mind at various scales.

Contents

Meaning, text, and mind: Notes toward a theory of the corpus 2
PART 1: MAPPING A NEW ONTOLOGY OF THE TEXT 4
1. Can you learn anything worthwhile about a text if you treat it, not as a
TEXT, but as a string of marks on pages? 6
2. Computational linguistics & NLP: What's in a corpus? -- MT vs. topic
analysis 13
3. Why computational critics need to know about constitutive computational
semantics 18
PART 2: VIRTUAL READING: PATHS THROUGH THE MIND AND THE MIND OVER HISTORICAL
TIME 21
4. Augustine's Path, A note on virtual reading 23
5. Mapping the pathways of the mind 29
6. Inferring the direction of the historical process underlying a corpus 39

Meaning, text, and mind: Notes toward a theory of the corpus

The set of observations I've collected in this working paper has two sources;
it was spun out to scratch two conceptual itches. One is my long-standing
interest in literary form. The other is the opposition or tension between
meaning and, well, computation that has been dogging computational criticism
for, I don't know, a decade. Even computational critics who otherwise refuse
to take that opposition as a criticism nonetheless tend to treat their
mathematical models as scaffolding to support, or as gadgets for detecting, what
really interests them. In the end I spent more time scratching that second itch
than the first.

Computational critics have an opportunity to map the human mind that is
qualitatively different from what interpretive critics accomplish by uncovering
meanings 'hidden' in literary texts. But to avail themselves of thisâ\x{80}\x{93}
opportunity computational critics must understand the broad disciplinary
framework in which 'meaning' is opposed to 'distant reading'. It is not
simply that these are two different phenomena, or that 'distant reading' is
not intended to replace or supplant the explication of 'meaning', but that
yoking them together in that opposition makes no more sense than opposing
'salt' to 'NaCl'.

The first three sections -- Part 1: Mapping a new ontology of the text -- deal
with that kind conceptual difference. The last three sections -- Part 2:
Virtual reading: Paths through the mind and the mind over historical time --
are about those new conceptual possibilities. I've provided some introductory
material for both parts that is intended to help stitch these various arguments
together. The overarching argument is that by thinking of texts as just ordered
collections of physical symbols that are meaningless in themselves we can
examine those collections in ways that allow us to recover the motions of mind
as it constructs meanings for itself. We bracket the interpretation of meaning
so that we may trace the motions of mind.

* * * * *

Part 1: Mapping a new ontology of the text -- My overall objective here is to
outline a way of thinking about language and texts that is centered on form and
mechanism (linguistics) rather than meaning (literary criticism).

1. Can you learn anything worthwhile about a text if you treat it, not as a
TEXT, but as a string of marks on pages? -- Conventional literary criticism
talks a lot about the text, but has no coherent conception of it. That is
because it is focused on meaning and meaning doesn't exist in the marks on
pages, the physical text. Corpus techniques, topic modeling for example, have
nothing but those marks and yet manage to reconstitute something that looks like
meaning (but really isn't, not quite). How is that possible? Moreover, by
focusing on certain kinds of patterns in those marks, we can uncover formal
structure in texts, structure that is otherwise invisible to conventional
criticism, which also talks a lot about form without offering a coherent account
of it.

2. Computational linguistics & NLP: What's in a corpus? -- MT vs. topic
analysis -- Corpora play very different roles in topic modeling and in machine
translation. In topic modeling a corpus is the object of investigation while in
machine translation a corpus is used to build a tool which then, in turn, does
the translation. In MT the corpus allows us to create that Martin Kay calls an
'ignorance model'. We would really like to be able to us a robust account of
natural language semantics in MT; alas, we don't have such a model
(ignorance), so we use corpus techniques to construct a very crude approximation
of semantics.

3. Why computational critics need to know about constitutive computational
semantics -- Simple, you need to know the lay of the land. That can be
expressed in four contrasts: 1) close reading vs. distant reading, 2) meaning
vs. semantics, 3) statistical semantics vs. computational semantics, and 4)
corpus as tool vs. corpus as object. More often than not, corpus as tool is a
substitute for constitutive computational semantics.

Part 2: Virtual reading: Paths through the mind and the mind over historical
time -- Assuming that we can think of the mind as, in some aspect, a high-
dimensional network of verbal meanings, we can use statistical techniques to
reveal the paths different texts trace through the mind and, beyond that, follow
the mind as it evolves over historical time.

4. Augustine's Path, A note on virtual reading -- If we think of the mind as
a high-dimensional space that can be approximated by statistical techniques,
including those in analyzing texts, then we can see Andrew Piper's statistical
analysis of conversion texts, chiefly Augustine's Confessions, as an analysis
of mental structure. The statistical structure uncovered in the location of the
13 books of the Confessions can thus be reinterpreted as a pathway in the mind,
of Augustine, but also of his readers. What are these different mental regions
that are traversed in just this way?

5. Mapping the pathways of the mind -- Michael Gavin uses vector semantics to
examine a passage from Paradise Lost. After arguing that a word-space model is,
after all, a model of the mind, I suggest that vector semantics could be used to
map paths through the mind. I illustrate this conjecture by drawing a path for
the Milton passage by picking words that had been brought to my attention by
Gavin's analysis. There's no reason why such a path couldn't be traced
computationally.

6. Inferring the direction of the historical process underlying a corpus --
Mathew Jockers' final study in Macroanalysis (2013) attempted to investigate
influence in a corpus of 3300 19th century novels. I argue that what he in fact
discovered is that the socio-cultural process that created those novels is
inherently directional. Without intending to do so, Jockers had in effect
operationalized the 19th century idealist notion of Spirit and provided a way of
thinking about 'an autonomous aesthetic realm' (in a phrase from Edward
Said).


https://www.academia.edu/38066424/Toward_a_Theory_of_the_Corpus

Bill Benzon
bbenzon@mindspring.com

917-717-9841

http://new-savanna.blogspot.com/ 
http://www.facebook.com/bill.benzon 
http://www.flickr.com/photos/stc4blues/

https://independent.academia.edu/BillBenzon

http://www.bergenarches.com/#image1 


_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php


Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)

This site is maintained under a service level agreement by King's Digital Lab.