6.0425 More on "Humanities Computing: Merely a Hobby?" (1/482)

Sun, 3 Jan 1993 18:55:56 EST

Humanist Discussion Group, Vol. 6, No. 0425. Sunday, 3 Jan 1993.

Date: 23 Dec 1992 15:47:06 +0000 (GMT)
From: Arjan Loeffen C&L/RUU <Arjan.Loeffen@let.ruu.nl>
Subject: Re: Humanities computing: merely a hobby?

Long note on the discussion: "Humanities computing: merely a hobby?"

I received a collection of positions in the general discussion headed
"Humanities computing: merely a hobby?". I have put down my
interpretation of the discussion in the form of statements (see section
1, below).

To compare these statements to a more formal investigation on the
dilemma(s) described, I would like to report on a rather 'quick'
evaluation I did about a year ago on the use of electronic texts in
humanities research in the Netherlands (see section 2, below). It might
interest the reader, as I feel these findings may put some local
problems in a different light.

Finally, I will try to specify what I think the profession of a
'computer humanist' should be (section 3, below).

I will indicate the dilemma's as noted in section 1 in the following
sections 2 and 3 by placing them between brackets.

Section 1: The HUMANIST Discussion

I see the discussion (which started to my knowledge by a letter of
Stephen Clausing, but has been going on for some time already) as
circulating around the following statements, not in chronological order:


1- There Are No Jobs In Humanities Computing

Stephen Clausing states: "if humanities computing is to be more than a
gentleman's sport, somebody has got to start creating jobs for this


2- We Are Not The Right Person At The Right

Donald Spaeth states: "Although there *are* occasionally posts for
humanities computing folk, they tend to be in computer centres rather
than subject specialisms."

3- We Can't Find The Right Person For The Job

Eric Johnson states: "When we need to hire faculty, it is *very*
difficult to find people with a sound subject background and anything
more than superficial knowledge of computing."


4- We Don't Show What We're Worth

Mark Olsson states: "there has been a consistent failure among the
practitioners of humanities computing to rock the boat; to produce
results of sufficient interest, rigor and appeal to attract a following
among scholars who *do not* make extensive use of computers."

5- We Are Hybrids

Donald Spaeth states: "Appointment committees wishing to introduce
computing methods into their departmental teaching cannot expect to do
it with topical specialists who have done a bit of word-processing and
databases / statistics / concordancing in their PhD thesis. They need
specialists in humanities / literary / historical computing."


6- We Don't Have The Tools

Herbert Stahlke states: "The problem is that much humanities research
requires the study of records that have not been digitized or the use of
natural language, the analysis of which is also beyond the reach of
current applications."; Oliver Berghof states: "who among HUMANISTs
could name programs which have allowed them to improve their research ?
More specifically, who has been using programs which do not collect
data, but help to analyze them?"

7- We Don't Get Through To Our Collegues

Herbert Stahlke states: "A much more serious problem, however, is the
lack of recognition given to colleagues who make a serious effort to use
computing to improve their teaching and research."; Mark Olsen states:
"scholars in our home disciplines (literature, history, etc.) seem to be
able to safely ignore the considerable literature generated by humanities
computing research over the years. [...] When we publish results that
our non-computer using colleagues read, all the rest will follow."

8- It's Not Local To Computing; It's A General Problem

John Lavagnino states: "it's part of a general decline in the value
placed on traditional scholarship"


9- Humanities Computing Is Scientifically Irrelevant

Willard McCarty states: "[Our scientific results should not be]
definitive proof that Shakespeare did or did not write Shakespeare's
works, nor in literary studies anything remotely like what is applauded
in the more properly quantitative fields Olsen listed. What, then? Can
we say, here, now? Can we articulate the intellectual nature of
humanities computing or point to studies that do?"

10- Humanities Computing Does Not Generate Relevant Results

Mark Olsen states: "A recent paper in L&LC noted that "the numbers have
been crunched for about twenty years now" but it remains "difficult to
see the point of the exercise." So, how patient should we be? Another
twenty years before laying down the cards to see what we're holding?"


11- We Don't Have A Theory

Mark Olsen states: "Indeed, it is my firm belief that the technology
allows us to rethink the notion of "textuality" and the relationship of
text to context (discursive, social, and political). And provide solid,
verifiable results based on new theoretical models, allowing us to test
and (hopefully) improve critical theory. Humanities computing should be
in the lead of rethinking textuality precisely because the technology
allows us to treat text as a radically different object of research."

12- We Have Lost Our Dream And Have Become

Paul Falzer states: "The scientist qua scientist, like the
businessperson and the bureaucrat, have abandoned the dream [of personal
computing, a dream of individuality, independence, and flexibility] and
rushed headlong into a brave new world of interconnected MIPS and bits.
I think that the humanist can do better."

(So, who dares to put on a happy face?)

Section 2: The Report On Text Directed Computer
Based Research In The Netherlands.

The report (interviews held in dec 1990 - may 1991, report published
januari 1992 -- in Dutch) is based on talks with 24 scolars that more or
less represent humanities computing in the Netherlands. The report ends
with two chapters that have been written on the basis of between-
the-line remarks during the interviews.

The central statement is that text directed computer research does not
really come out as a serious field of scientific research. These are
the findings, of course according to my own interpretation.

- We Don't Cooperate (4, 5, 7, 10, 11)

Cooperation is essential to the emancipation of text directed
computerresearch. Such cooperation should exist between universities
and companies, as well as between universities themselves.

- Universities And Companies Don't Cooperate

"What we have developed so far *cannot* be interesting to companies"
(Hans Voorbij). We do not get a feeling for one anothers wishes and
possibilities. We do not adhere to the format (in general) companies
wish our findings and products to be exchanged. The immediate effect of
this dilemma is, that universities and companies do things double. An
example is the PENELOPE parser, an in-house syntactical parser, part of
a text critiquing system alias style checker, developed by IBM. This
also -- maybe even more painfully-- applies to universities themselves.
As an example, three Dutch universities have actually developed three
syntactical parsers at the same time.

- Not Clearly Related Disciplines Don't Cooperate

We do not connect with faculties that do have electronic texts or will
benefit from them. Examples are the faculty of law, medicine, social
sciences, and with libraries.

Note: In a report written in 1989 by the Dutch Bureau for Libraries and
Information Services (NBBI) it is stated, that one of the principal
problems in information services is to *retrieve* information, rather
than to process it. The committee defines some focus points in solving
the problems libraries have to get the information in store to the
people. Remarkably, they do not conclude that attention should be drawn
to the retrieval engine itself, nor to adaptions to structured or even
superficially enriched documents. All conclusions focus on external
aspects such as book-exchange between libraries, long distance
connections, in- line reservations and applications, bibliographies on
CD-ROM, Expansion of controlled vocabularies, and document delivery. I
have not found a word about querying mechanisms, document type support,
autoatic abstracting, and other fields of improvement that may call for
humanities expertise.

Even the knowledge exchange between computer sciences and humanities is
limited. It really is an exception if such a contact exists. In
Nijmegen, a special curriculum has been defined integrating both
faculties (sciences), mainly (but not only) supporting computer supported
linguistic research. This, however, took a *very long time* as people
from both sciences had to get used to eachother --imagine people of
computer science and literature getting together.

- Clearly Related Disciplines Don't Cooperate

Even within obviously related disciplines contacts are sparse.
Linguists do not realize that, nor how, their material may be
interesting to text research; the same holds for results of text
research, to be incorporated in linguistic research. It is however
clear that such mutual interests exists. E.g., synchronic linguistic
theories based on old manuscripts will benefit from critical studies of
such sources. In Amsterdam, such linguistic variability research is
conducted under the supervision of one person who actually knows
something of both fields.

Part of general problem of not cooperating also lies in too little
documentation on software products developed, no attention for software
portability / compatibility, and of course not financing time spent in
getting to know current research projects (which really takes some time).

- We Do Not Educate Adequately

We hardly integrate education in text directed computerresearch. We may
distinguish two types of researcher: the humanities researcher getting
aquainted to the computer, and the computer researcher planted in a
humanities environment. Both are forced to go into the host discipline
more than superficially (see also section 3, below). Because this
usually is a very ad hoc thing, time and money is waisted -- every
single person does not only have to walk the same strech again, but will
probably fall into the same pitfalls. And, which is more, they never
get on the same (interlectual) level as that of the host (the guest you
never get used to -- an intruder). "You might easily oversee some
essential things. It will always be somewhat amateuristic. What we
need is people who have had an integrated education, and therefore are
no amateurs." (Jan Aarts).

Happily, some educational projects have emerged recently: the Nijmegen
project mentioned, a humanities computing specialization in Utrecht,
Groningen and Amsterdam. The specific field of text & computer has not
emerged yet. For instance, document information systems are only the
topic of a single course in Tilburg, Delft and Utrecht, while to my firm
belief modelling and formalizing the proces of retrieval of information
from textual sources should pre- eminently be located in humanities

And even then: does the *software* available support scientific
education well enough? What exists above, may come down. But if
there's not much intelligent software to be found in research, how much
will be available in education? And if the software is fixed on
specific 'jobs' (as it generally is), what kind of creativity may be
expected from the student? (Don't tell me OCP does a fine job. Let's be
honest, isn't OCP really a relic, a reminder of Days Gone By? -- A.L.)

Note: as will be the case in other countries, university policies may
inhibit even such a thing as exchanging students to get a feel for some
other type of research, e.g. to let a student do some courses (do a
'specialization') on computer use in his/her own field of study. Just
to mention the fact that faculties and sections are financed on the
basis of the number of students that subscribe to the individual
courses. (A.L.)

- There's no tradition (5, 7, 11, 12)

A second factor is the lack of a computer tradition in several
disciplines. Conventional methods sustain: they work, why change?

- The Computer Is A New Thing

In the HUMANIST discussion, Paul R. Falzer writes:

I am reminded of a colleague, a senior faculty member who had adamantly
refused even to work at an electric typewriter prior to the arrival of a
dusty old 8088 a few weeks ago. He asked me what he could do with it
and as I began telling him a few stories his eyes got a big as saucers.
I brought him a simple shareware text editor, installed it, and gave the
.exe file the name of his youngest daughter. To everybody's amazement,
he's using the thing and has gotten excited about learning how to use it
better. The last I heard, he was looking into the price of scanners.

This nicely describes what in practice does hardly ever occurs. People
still have something like computer fear: it's a polluting or even
threatening object, to be banned from the field. Even if people get
over this idea, they usually use the computer as a typewriter (as the
story goes). "It's hard for people to realize you can some something
systematical with the thing" (Andrea de Leeuw van Weenen). Large parts
of (large) financial resources for the computerization of the humanities
in the Netherlands (in 1989) have been spent on PC's, cables and the
sockets they rest on -- to be used for ever for Word-Perfect. Hardly
something to get the show on the road. Or to get emancipated. For
instance, the money was *not* used to educate people, or even to
instruct them.

- MRT Is A New Thing

Even if the computer is recognized as a relevant tool, new problems
arise. People tend to start building corpora (note that the research
aimed at describing electronic text research in particular) without much
knowledge on text enrichment strategies, OCR, markup and the like.
Moreover, people start doing everything by hand. Even if people accept
the time consuming proces of manual input, other people do not think
this is of any scientific relevance. Not every scolar is handed the time
to build machine readable sources.

Another thing is, that it is assumed that the use of conventional
systems (DBMS, statistical packages, word processors) may give more
instantaneous results. This does not hold for far reaching, well
modelled text research on MRT corpora. Partly because the tools are not
available, or are not powerful enough. Many people start working with
several packages at the same time, one for this job, another for that,
etcetera. People use the *wrong* packages. And if the computer beeps,
they leave it as a friend who wasn't a real friend anyway.

Third, people tend to first start building MRTs and *after* that to
formulate questions. Often they have to re-evaluate the whole text:
remove CRLFs, inserts tags, replace hyphens by codes, etc. Again, no
experience means twice the time.

- Paying For It All Is A New Thing

The lack of 'tradition' or insufficient knowledge on computing in
general puts a brake on requests for investments by the authorities in
respect of hardware, software, and persons. This especially holds for
modern language research. Humanities computing funds are really put
through to linguistic research, as that community is more extrovert in
its computational needs. Ben Salemans (Nijmegen) and Eep Talstra
(Amsterdam) point out that it is very hard to get philological research
financed by the main scientific financer in the Netherlands, NWO (Dutch
Scientific Research, a governmental institute). This is partly due to
the lack of experience with such projects. NWO has even until recently
not been willing to finance software development at all. It will be
clear, that this is not a stimulating situation.

Minimal funds for humanities computer has some immediate effects:

- Small research environment (people, time).
- No time to get oriented (and, therefore: doublures).
- The tendency to specialize, because there's
not time to be creative (and possibly fail),
to elaborate on a specific thing some more
(which could result in new types of
research), etc.
- Resarch projects tend to be 'bare', i.e. no
time for documentation, dissimination of
software and the like.
- Single sided and conventional hard- and
software purchase (the secretary knows WP,
so WP it is), and no people to go along with
it. Printers tend to sit there in the back
of a deserted room, dusty, rusty -- when the
ribbon has worn out.
- No modernization of existing environment
(waiting two years to get a mail

That was about it.

In the same report, I have pointed out some general objectives (fully
under my own responsability), I'd like to copy here:

- Get Together

It seems clear that cooperation on several levels is neccessary for
sensible text directed research. This may avoid doublures, early
failures, use of too expensive or ineffective methods, etc. Such
cooperation should be broad, not between collegues in the first grade
only. See above.

- Get Wise

Students have to get aquainted with computers and software at an early
stage, preferrably regarding their use within their own field of study.
For those interested, a specialized curriculum should be offered that
aims at the convergance of different fields of knowledge. For instance,
Language and Computing, Text and Computing, Art History and Computing,
Law and Computing, whatever. For text itself, this curriculum should
give attention to computational, linguistic, textual and informational
problems, models and implementations. As stated above, such a
curriculum should include available general scientific tools (exactly
this has brought me to my current research on a textbase management
system for humanities computing -- the lack of a general working
environment inhibits the most important strategies mentioned, both for
scolars and students).

- Get Realistic

Two points on getting realistic about computing humanities, that should
be heard not only by collegues, but also by planners, foundations, and
institutions that ultimately have the money.

- People Need Time

To accept something like a start-up time for text entry, in which
special attention is drawn to text encoding, software tools, and
automation in general, will in the long run not only earn back the time
investment, but also improve the texts and related results. This
start-up time also includes shopping time: what's out there, can I use
it, can I get it.

- Software Development Can Be Scientific

The development of software and corpora is a valid scientific aim, has
every kind of scientific ring to it, and should be seen as scientific
work. One should not see it as a by- product of a more conventional
scientific quest.

Section 3: What Humanities Computing Should Be

After this very long resumee, I will try to point out what in my current
opinion should be the goal of humanities computing. I believe that --to
go back to the initial question: "How do I get a job"-- this may clear
out some misunderstandings, and define the profession. Because, to get
a job, you need a *profession*.

I could put my opinion in one sentence: Make Clear Where You Stand. I
am convinced that if we stick to a clear field of expertise people will
know when to look for us, and when not; what to expect and what not.
And --probably most important-- people will know what their own role is
in cooperating with us.

We are (roughly) inbetween two sciences: humanities and computer
science. (5, 7, 9, 10, 11)

- We are not -professionally- humanists. We don't have the time (and
maybe not even the inclination) to get into serious research: we know
what goes in and what comes out. We even know the paths to get there.
But we do not know the ins and outs, the individual decisions involved.
We do not have the creativity to select specific paths when several
present themselves. We are not able to conduct humantities research on
the level expected by its conventional community.

- We are not computer scientists. We know how to use computers, we are
able to describe, test, evaluate and use software and hardware that is
relevant to our field. We may even be able to program the computer to
create some nice software. But we are not into building applications
as-is. We do not have the time to do so. We have not been trained to
do so. We are not into processors, memory management, mathematical
proofs, communication protocols, whatever defines the actual technical
stuff used to solve non-technical problems or fulfill non-technical

It is our profession to link the two: technical products and
non-technical models, use, research, people. To specify what
considerations apply for implementations that are to be used by specific
users, here: humanists. To show the impact of technical development on
traditional and non- traditional research, such as resp. textual
criticism and document retrieval; the way relational systems may be used
in historical research; the way corpora should be encoded, and how such
enrichment should be queried to fulfill humanist needs; the way musical
patterns may be represented by encoding schemes. Things like that.

We know something of both areas. That's, to my opinion, not only our
strength, but should also be our research focus: building the bridge.
Get the two together. There are enough traditional scientists who are
willing to cooperate. It's the fear that the line between expertises
will be blurred --that automation will take over, that traditional
research will be overridden by bits and bytes-- that keeps these
scientists from our rooms, our publications, our software. We should
make clear that computers add something to the field, and not replace or
destroy. It's our publications that do not describe the problems and
possible solutions such that they may be read, understand, and
incorporated by humanities researchers. They do not eliminate the
misunderstandings. They rather endorse them.

Personally, I have chosen to focus on describing a text directed
research environment. I will describe as exactly as possible what a
scientist who professionally uses textual sources in humanities reseach
(with a focus on literature) will expect from a computer program. What
his/her sources are. What kind of tools he/she uses. How these tools
may be emulated by software modules. What the relation my be between
these modules. What kind of abstract system lies behind all these tools
and sources. In fact, I focus on the model behind textual analysis. To
this end, I must have knowledge both on humanist research where textual
sources are involved, as well as models defined by information /
computer science. In this particular case, I will try to link the object
oriented model to every day needs in humanities research.

Thanks for reading this (long) statement. Hope it helps in some way.

Arjan Loeffen

3rd year PhD Student
Historical / cultural information science
University of Utrecht, The Netherlands