21.125 ideal readers for a database

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty_at_kcl.ac.uk>
Date: Sun, 24 Jun 2007 10:32:55 +0100

               Humanist Discussion Group, Vol. 21, No. 125.
       Centre for Computing in the Humanities, King's College London
                     Submit to: humanist_at_princeton.edu

   [1] From: Desmond Schmidt <schmidt_at_itee.uq.edu.au> (94)
         Subject: Re: 21.110 ideal readers for a database?

   [2] From: "John G. Keating" <john.keating_at_nuim.ie> (120)
         Subject: Re: 21.120 ideal readers for a database

   [3] From: Willard McCarty <willard.mccarty_at_kcl.ac.uk> (119)
         Subject: Re: 21.120 ideal readers for a database

         Date: Sun, 24 Jun 2007 10:07:40 +0100
         From: Desmond Schmidt <schmidt_at_itee.uq.edu.au>
         Subject: Re: 21.110 ideal readers for a database?

Hi Neven,

The questions of who are the users and what do they want are the first
ones that the human-computer interaction designers say we must ask, so
I think these are the right questions. However, we should also realise
that the "ideal user" changes with time, as do the things that they
think they want. Often, it is only when one has built an interface that
one can be in a position to criticise it. I think that the online
corpora you are referring to, in spite of their many useful features,
are probably too inflexible in their design. It would be nice if you
could easily do the things you suggest - perhaps you are thinking of
interfaces on other websites not related to Greek and Latin corpora
that already do this. But there is one good reason why it is not
already so for us too, and that is money. How do you find the funding
to do these kinds of things? The humanities has a fraction of the
financial support of other disciplines, so things change only slowly. I
think the only solution to this can be co-operation in the design of
software for humanities projects. There is enough money if we can all
agree on how to spend it. The free software paradigm might emend this
situation, though it hasn't yet.

However, you make another valid point, when you complain that it is
difficult "to consult different readings". When we directly transfer
the printed structures of "edition" and "apparatus criticus" into the
digital medium we are like the early printers who made books in the
image of manuscripts. It took them 100 years to realise their mistake.
We still think of the text as one version, whereas in fact it is many.
This fact has to be built into the archive itself and made the
fundamental structure of the text. Otherwise we will forever be
scratching around trying to compare a text here in one format with a
text there in another, or building closed systems that do what we want
but don't interact with others. Markup can't record variation very well
but markup is what we are seemingly stuck with, and that is why I think
it is difficult to build the flexibility you crave.

On 20/06/2007, at 3:47 PM, Humanist Discussion Group (by way of Willard
McCarty <willard.mccarty_at_kcl.ac.uk>) wrote:

> Humanist Discussion Group, Vol. 21, No. 110.
> Centre for Computing in the Humanities, King's College London
>www.kcl.ac.uk/schools/humanities/cch/research/publications/ humanist.html
> www.princeton.edu/humanist/
> Submit to: humanist_at_princeton.edu
> Date: Wed, 20 Jun 2007 06:43:20 +0100
> From: Neven Jovanovic <neven.jovanovic_at_ffzg.hr>
> >
>Dear all,
>after some experience with databases --- corpora, to be more precise
>of ancient Greek and Latin (and Neolatin) texts, a question occurs.
>kind of an "ideal reader" (or "ideal user") did the designers of those
>databases / corpora have in mind? For whom are those databases
>What is very easy to do with them --- and what is quite uncomfortably
>difficult to achieve?
>Greek and Latin corpora that I know make it very easy to find
>of words, and (therefore) verbal similarities between texts. On the
>hand, it is difficult to create one's own subcorpora there; it is not
>quite simple to search writers just from one period (i. e.
>synchronically), or from just one genre. It requires, also, special
>to follow ideas, not words. It is difficult to annotate a text, to
>interesting places (you have to go outside the database for this). It
>practically impossible to add other texts to databases. It is
>to consult different readings (apparatus criticus) of a text.
>There is another kind of corpus --- the Perseus --- which makes it very
>easy to study the text as a student, providing access to the
>translations, lexica (kind of "school" corpus). But this corpus is
>difficult to search (or research) as a corpus, as a collection of
>and genres and periods are accessible with difficulty here as well.
>What are your experiences with databases / corpora in your fields of
>expertise? For whom do these databases / corpora seem to be written /
>designed? What do they enable you to do --- and what do they, excuse
>pun, disable?
>Neven Jovanovic
>Zagreb, Croatia
Desmond Schmidt
School of Information Technology and Electrical Engineering,
University of Queensland
Brisbane, Australia

         Date: Sun, 24 Jun 2007 10:09:10 +0100
         From: "John G. Keating" <john.keating_at_nuim.ie>
         Subject: Re: 21.120 ideal readers for a database

Dear Neven,

It has been a busy week and I never managed to get to send the second
half of my reply to your previous message; I was going to say
something about databases.

Traditionally, the user interaction part of (database) software was
was the only route to accessing the data. This was typically stored
in some proprietary organisational structure and encoding mechanism,
presumably to protect the intellectual property of the developers,
and to minimise copying and distribution. Nowadays, we want all of
the data to be accessible via the Internet, so client-sever models
abound. The problem with the latter approach is that the interaction
mechanisms provided by the client software are are not as extensive
as those provided by custom- developed user interfaces -- we end up
with a "same old, same old" approach to layout, interaction, query
models, presentation, etc. The HTTP protocol and web browsers, which
have become the new operating system, are responsible I believe, for
fundamentally limiting the way people think about accessing,
visualising and utilising data extracted from databases. Another
problem is protection of intellectual property, i.e. the encoding
used by the development team is usually "hidden" from the client
software (and the user). You referred to this in your recent post
(the writings of Patrick Rourke).

Usually, I find that there is no grand plan to data architecture; of
course, there need not be! If data are available in one format then
it is usually straightforward to write transformation programs to get
them into another if one has access. This (access issue) is terribly
important, because certain data manipulation and querying mechanisms
are sometimes only achievable if data have a particular structure. To
answer research questions, via a series of database queries may rely
on first transforming the data representation in some way.
Mathematicians do this all the time; for example, it may not be
possible to analytically integrate a mathematical function in
real-space, but if it is represented in complex space the integration
becomes doable (analytically). Computer graphics programmers
represent 3D points in a 4D world to help with transformations. I
always encourage my humanities colleagues to think of data
organisation as being flexible; it can and should change as often as
the researcher requires.

Doing something like this requires collaboration and agreement
between research groups -- which is probably harder than developing
new software! A co-ordinated approach to data sharing, and
data-encoding standards sharing will go a long way towards developing
database systems that aid researchers do more, and do it better.

I really like your analogies with literature -- thank you; I'll share
this with my students -- my personal analogy is formal dancing where
are two key participants, the researcher and software engineer. Each
work together to produce something transient, yet beautiful.
Sometimes one partner takes the lead, or has focus, and then the
other leads. The success and beauty of the dance depends on
co-operation, fluidly of movement, and competencies of the partners.
Finding the right partners is crucial, I expect, for dancing and
software development projects.

This can work, however; let me give you an example (as you asked what
other people are doing). As part of the Irish in Europe project
(http://www.irishineurope.ie/) partners are working towards building
a virtual research platform for humanities researchers. I attended
several colloquia where researchers presented data on their
specialised research, however, I was concerned that no one partner
could compare or merge his or her (prosopographical) data with
another data set in any formal statistical, numerical, or
organisational manner. This seemed the most natural thing for me do
do (I trained as a Physicist) so I asked my History colleagues to
work on an XML schema so that a publicly accessible database could be
built. It has taken years, but we have a schema that most people are
happy with; if we need to change it then we can do that with ease.
Researchers are now offering substantial data sets for inclusion in
the project and we have obtained funding to build an online virtual
collaborative-working environment for contributors, researchers,
students and the general public.

For the moment all of the conversion to XML from other formats will
be performed by myself and my students. Once we have finished the
initial phase of XML encoding we will be in a position to make global
data architectural changes with ease. Having all data in a similar
format, and having an openly- accessible schema, means that other
developers can build software tools knowing that if the tools work
with one data set, they will work for all data sets. We also believe
that a concentrated investment in developing open data architecture
standards now, is the best way to enable future generations of
researchers and developers do more, and do it better.

We (historians and software engineers) have also been playing with
the idea of incorporating "what can we do" features into the schema;
in addition to the "what is it like" features. One of my MSc students
built a nice piece of software which examined the XML encoded data,
and informed the researchers what type of statistical queries would
be possible using this field.

My personal goal (as a software developer working with humanities
researchers) is to develop databases containing lots of
metainformation that will allow computer programs to write analysis-
based computer programs based on researchers' requirements. My ideal
future is where humanities researchers will be presented with a data
set, and accompanying programs that help specify what is required of
the data (queries, visualisation, analyses). These inputs are then
used by the programs to write and develop other computer programs
that are tailored to the researchers' requirements. If the
requirements change, then new programs can be re-developed. I believe
that many computer programs (especially the front-ends to database
systems) are also just data, and can be specified and produced
automatically. I believe this work is necessary because industrial
influences will most certainly ensure that the software we use
"today" won't work "tomorrow". So we need to concentrate some effort
on formal specification of what we expect of data and how we work
with data, in a general sense, not just with front-end software
interfaces to databases. Them, when the software changes, we can
still "do" what we want to "do" regardless of software change. There
is a lot of work to be done in automated construction of ontologies
in the field of software engineering, however!

So, in response to your question on what is *not* being done -- we
are not preparing for the future; software change and obsolescence
will be detrimental to our progress. Deal with this issue now;
challenge the computer scientists; and work towards developing open
data architecture standards.

Best wishes, John.

Dr. John G. Keating
Associate Director
An Foras Feasa: The Institute for Research in Irish Historical and
Cultural Traditions
National University of Ireland, Maynooth
Maynooth, Co. Kildare, IRELAND

Email: john.keating_at_nuim.ie
Tel: +353 1 708 3854
FAX: +353 1 708 3848

         Date: Sun, 24 Jun 2007 10:26:59 +0100
         From: Willard McCarty <willard.mccarty_at_kcl.ac.uk>
         Subject: Re: 21.120 ideal readers for a database

In Humanist 21.120 Neven develops an implication in John Keating's response,

>that designing a database interface is more alike to staging a play,
>or making a movie, or
>performing an opera --- than to writing a novel.

>...if I want to produce a database of Latin texts, I will go and hire
>somebody to program an interface according to my needs (as John so clearly
>described). Here --- as with a play, a movie, an opera --- the result of
>performing together may be unexpected; the motives and interpretations may
>clash, or go their own separate ways.

Here Neven puts into a nutshell essential dynamics of a collaborative
research project. Part of that which is unexpected, as my colleague
Harold Short says, is the product of a bilateral curiosity. Under the
best of circumstances, exercising this curiosity changes the
understandings with which each side entered into the collaboration.
If for a moment we leave aside the implications of there being a
"prime mover", someone clearly in charge, we get to Neven's question:

>Will the author --- the client --- first sit down and
>meditate on what does he want the database to be used for? Or will he
>simply go to the programmers and say, "well, here are all those classical
>Latin texts, I want to make them searchable" (remember Willard's fable)?

Experience suggests that the author/client is most likely to arrive
with his or her research question in mind but little else, or perhaps
with just a notion that somehow computing may provide some help for a
left-over bit of research that would be nice to tidy away, if
possible. (I suppose that if one did a proper study of the topic, one
would discover a whole range of initial states, from the utterly
vague to the tightly focused, but that the majority of author/clients
would be clear about their research, vague about what computing might
do.) Given curiosity sufficiently robust, what one would hope for is
that a problem emerges in the interaction between the two sides that
provokes new or better research questions *on both sides*.

>Here we come to what Patrick Rourke has written about the TLG. Some
>things were not imaginable at the time the Thesaurus linguae Graecae was
>first designed and produced; immense practical difficulties (coding the
>alphabet, getting texts into the computer) had to be solved --- and were
>solved; questions of copyright were at the horizon (who owns the rights to
>apparatus criticus). It looks like people behind the TLG asked
>themselves, "What is the most time- and energy-consuming task in classical
>philology that the computers can be used for?" And the reply was,

In the Bad Old Days when emergent humanities computing practitioners
were stuck at help-desks, we were told that the question to ask was,
"What do you want to do?" The idea was that the researcher, on the
other side of an institutionally unbridgeable abyss, would then state
a problem for which the technical practitioner would provide a solution
by writing suitable code or, somewhat later, by pulling the right
application off the shelf. Surely this actually happened a fair bit,
but in those Bad Old Days I certainly witnessed the interchange that
we now know to be an achievable -- and frequently achieved -- goal.
So I may begin with the question to Neven, what do you want to be
able to do? But we all know it has to go further than that, much further.

>But what can we do now, what is to be done today? At least two roads seem
>possible. We can look at the discipline --- in my case, the Classics ---
>and ask, what do we do today? And how can we enable people to do more of
>it, and better? Or, we can look at the corpus, and ask: what is *not*
>being done in the discipline today? How can we enable people to do

Indeed -- more of the same faster and better, or something new? How
does one arrive at that "something new"?

>This is why I am interested in what *other* disciplines are doing with
>their corpora and databases.

I hope numerous responses to this interest are made here. Indeed,
facilitating such cross-disciplinary talk is one of the major roles
of humanities computing across the digital humanities and one if its
best rationales. But meanwhile, perhaps as much or more productive,
is for people like me to ask people like Neven, where is your
research running into trouble? Where are the frustrations? What seems
wrong or frustratingly limiting with the software you can get your
hands on? What do the anomalies tell you? The result I expect from
such an interchange is that we are much better able to adumbrate a
data-model or kind of procedure that no existing software or design
for software currently satisfies. Perhaps also -- this is my
particular interest in such interchanges -- we come to a better
understanding of the kind of language and the styles of reasoning we
need to be able to ask the questions we need to ask. Mostly, I think,
we're like Papageno with his padlocked lips, able only to make some
noises -- in tune, if we try very hard.

As we edge away from modelling something we know intellectually how
to model, and administratively how to justify, toward the unknown,
play enters the picture, and moving out into the unknown
conjecturally, with no justification, becomes increasingly important.
If Neven took up some gizmo X, something important might happen. How
do we reduce the cost of that (primarily the time required to do it),
so that he can afford to play? If he were to notice some curious
analogy between what's happening in a text and, say, evolutionary
biology, how can we make it an easy matter to try that analogy out?
How can we maximize the speculative power and flexibility of computing?

Finally to Neven's authorial "prime mover". What about the
consequences of being in the role of the one whose job is to wait to
be moved, and then, when the push comes, having no choice as to
direction? This is even more basic a problem than who gets credit for
publications from a collaborative project.

Consider the builder whom you hire, say to extend your kitchen; let
this person be an imaginative and highly skilled person. In early
days, he or she will be learning constantly, whatever the assignment.
But what are the satisfactions once this builder has learned what is
to be learned from more or less random jobs? What happens to such a
person when he or she gets particularly interested in kitchens, say?
And then what happens when he or she has exhausted the imaginative
possibilities of kitchens?

Consider the cabinetmaker, who learns through apprenticeship, then on
completion of a masterwork, becomes a master of the trade. Great
satisfactions there, I would suppose, no matter what sort of cabinet
is called for.

Consider the architect, who starts as a dogsbody in someone's firm,
then with considerable luck and talent progresses to his or her own
firm. Few will make the grade to the kind of architect who takes on
only the jobs that suit his or her interests. The majority, I
suppose, will spend their careers drawing the designs of others.

Enough, I think, to ask the further question: what kind of social
role are we talking about here? What role will best ensure that when
Neven asks his questions about research in classics there's someone
with the right qualities of mind to engage in a worthy contest of
imaginations with him?


Dr Willard McCarty | Reader in Humanities Computing | Centre for
Computing in the Humanities | King's College London |
http://staff.cch.kcl.ac.uk/~wmccarty/. Et sic in infinitum (Fludd
1617, p. 26).
Received on Sun Jun 24 2007 - 05:45:36 EDT

This archive was generated by hypermail 2.2.0 : Sun Jun 24 2007 - 05:45:37 EDT