19.464 relational database and TEI

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty_at_kcl.ac.uk>
Date: Thu, 1 Dec 2005 07:18:46 +0000

               Humanist Discussion Group, Vol. 19, No. 464.
       Centre for Computing in the Humanities, King's College London
                     Submit to: humanist_at_princeton.edu

   [1] From: "Da Rold, Dr. O." <odr1_at_leicester.ac.uk> (59)
         Subject: RE: 19.458 relational database and TEI

   [2] From: Wendell Piez <wapiez_at_mulberrytech.com> (99)
         Subject: Re: 19.458 relational database and TEI

         Date: Thu, 01 Dec 2005 07:04:33 +0000
         From: "Da Rold, Dr. O." <odr1_at_leicester.ac.uk>
         Subject: RE: 19.458 relational database and TEI

Dear All,

I have been reading your replies with much interest and even more
trepidation. Although I've been working with computing and humanities
for some years now, I still feel very much like a novice, thus many
thanks for all the stimulating views.

I should have perhaps given some background to the project. We are
funded by the AHRC for five years. Since the outset of the project we
decided that we should follow the keywords: longevity,
accessibility and flexibility, which we are eager to implement at all
stages, including the computing side of the data output.

We are determined to publish on our website all our work in progress as
early as June 2006. We do not want to withhold the data until the end of
the project in 2010. These factors generated important considerations
once we had to decide how and what we wanted to do. All AHRC
funded projects have a technical appendix which gives an indication of
the technical methodology for the electronic output. Our technical
appendix initially proposed the use of TEI and XML for encoding and then

James has summarised the situation exactly:

'I think Orietta's original question stems from the project's dependence
upon the local IT support who understands RDBMS/SQL-based solutions and
has no interest in XML, much less querying it. '

Our project does not have an allocated technical support developer. I am
the research fellow on the project, i.e. I
carry out data analysis, data management, data input, admin. etc. etc. I
am certainly not worried about experimenting with technologies, but I am
concerned that computing experimentation could put the output of our
project in jeopardy. Therefore, in the meantime we need to rely on the
help that our computer center is generously giving us.

Moreover, the aim of the project is to catalogue manuscripts. Our
research questions are demanding and ask for detailed descriptions, but
at the same time I am perfectly aware of the standard set first by
MASTER and then by the P5 TEI guidelines, which cannot be ignored.

Ideally, we'd like the best from both systems: flexibility of querying
to be delivered on the web and standards. (See the paper by Bradley,
the reference for which was posted on the list last week. Thanks Neven,
for attracting my attention to this.)

The data at the end of the project, in five years, will have to be
deposited with the AHDS and they seem to be quite flexible, but I can
understand that from their point of view storing data in XML is the best
option and some of the opinions expressed here give good reasons for

So for the time being, I have started to develop tables and queries in
Access. I followed the principles initially expressed by R.P. Bourret,
'XML and Databases', which Patrick had also clearly elucidated in his

'make sure that whoever designs the database is knowledgeable of TEI
bibliography standards, as the database structure will need to be
compatible with the TEI markup element model'.

We now have 40 tables in our database, some with more than 50 fields in
them (description of single letters for each scribal hand). This should
allow us to get started on the collection of the data. In the meantime I
hope I'll be able to get more training in Xquery and Xpath etc., so that
in five years time we may be able to have a different, if not a more
sophisticated catalogue.

Thank you all for the very interesting insights.


         Date: Thu, 01 Dec 2005 07:05:24 +0000
         From: Wendell Piez <wapiez_at_mulberrytech.com>
         Subject: Re: 19.458 relational database and TEI

Willard, and HUMANISTs,

At 02:09 AM 11/30/2005, Mark Olsen wrote:
>It could be that David's assertion that Slashdot readers are notoriously
>anti-XML -- and even anti-humanities computing -- is correct and that they
>should be disregarded as such. I have, however, been seeing alot of
>heated discussion about XML and database theory. Have a peek at Fabian
>Pascal's voluminous rants as one example:
> http://www.dbdebunk.com/index.html
>I suspect that there is some substance to the complaints on both theoretical
>and practical grounds.

Yes, there are voluminous rants to be found, and many of them are
worth the provocation.

There's another perspective on this, however. Coming back from XML
2005, the big annual conference for the XML industry, I note that one
of the strongest overarching themes of the event was coming from the
big vendors -- IBM, Oracle, Microsoft. All of them are keen on
bridging the gap between traditional databases and XML processing.
(Which they clearly see as a huge new thing, as the effects of the
first two or three waves of XML adoption are felt within their
developer communities. But that's no longer news.) While their
visions are disparate in detail (well of course each has a somewhat
different model as to how it is to be done ... a model appropriate to
the platform-specific toolkit each is trying to peddle), they sing in
unison about one thing: the differences between XML and RDBMS are
being reconciled as new modeling, mapping and design methodologies
come into play, "best of breed" technologies emerge and success
stories are written and told. XML vs RDBMS is less and less an
either/or proposition.

Nor are the big developers pushing this out of utopian
high-mindedness. They're doing it because they don't want to wake up
in three years and find someone else eating their lunch, whether
another BigCo, or some upstart they haven't heard of yet. There's a
*large* market in the kind of "solution" we are talking about: a
database of MS descriptions could easily be a database of market
research reports, or drug trials with medical histories, or
environmental impact statements, which are now collated by hand, at
great expense to taxpayers, in proprietary word processing formats.

Naturally it'll take some time before the second wave gets to share
in these advances. You may not yet be able to query your RDBMS in
XQuery, or store your XML in it "natively" and get RDBMS-style
locking, validation, performance, just yet, without springing for the
pricey version of the software, or without pushing out ahead of where
the average IT department wants to go, or without running the risk of
vendor lock-in. Though "XML export" may now make even that a thing of
the past.... :->

But it's happening, so a heated discussion about XML and database
theory (to which Mark refers) is happening with it. It's heated for
the same reason such discussions usually are: there are lots of blind
men crowding up to the elephant, and even more issuing opinions based
on the distant smell of peanuts. Given all this, I'd be extremely
wary of any of their prognostications. Performance doesn't compare?
Wait six months. Functionalities missing? Wait six months. Yet some
of the pundits are undoubtedly right, and some things may *never* be
straightforward, elegant or easy. It's just impossible for the lay
listener to know which of the many pundits is the correct one. Does
it matter whether your mixed content is parsed at runtime, or
compiled into a mini-DOM when the data is loaded? You be the judge!
All of us have opinions: that's what makes us pundits.

As to what Joris says about XSLT:
>* A W3C recommendation by itself is not proof for the technical
>soundness of a proposed solution. Rather it states that it's one of
>a number of possible solutions to a particular problem, being the
>preferred choice of a certain group of people. That doesn't imply
>that the proposed solution is a good solution, a solid one or even a
>nice one. XSLT 1.0 is a W3C recommendation. However, any engineer
>will tell you that XSLT mixes the characteristics of a templating
>language with those of a procedural language. This has caused XSLT
>to be a limping hybrid that's messy in nature and induces messy code
>(which is hard to sustain). Unfortunately the XSLT 2.0 candidate
>makes things worse. So yes, a candidate recommendation will
>propagate certain solutions, but that might not necessarily be a good thing.

Reading this, you'd almost think XSLT, a "limping hybrid", doesn't
work. Yet there's another way to look at the exact same situation.
"Any engineer" will also tell you that a Cuisinart food processor
mixes the characteristics of a knife and a bowl, and is poorly
applied to the proper job of either. Does that make it useless, "not
necessarily a good thing"? Show me the computer language (maybe the
natural language too) whose only instances are graceful, elegant and
clean, and I'll show you a language that never got out of the lab.
(Like the garlic-chopper I saw not long ago which made perfectly
sized and shaped cubes of garlic, each one exactly 1mm across. It was
on the "must go" discount shelf, but even $1.00 was too much to pay.)
XSLT can be graceful, elegant and clean too when used appropriately.
If it's used inappropriately, or by coders who don't know how to
write it well, whose fault is that? If somebody can actually make a
living fixing other people's bad XSLT code, to make it more
maintainable or more easily extended to the next set of problems,
that's an indication that XSLT works, not that it doesn't.


Wendell Piez mailto:wapiez_at_mulberrytech.com
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
    Mulberry Technologies: A Consultancy Specializing in SGML and XML
Received on Thu Dec 01 2005 - 02:34:01 EST

This archive was generated by hypermail 2.2.0 : Thu Dec 01 2005 - 02:34:10 EST