19.436 relational database and TEI

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty_at_kcl.ac.uk>
Date: Thu, 24 Nov 2005 07:34:19 +0000

               Humanist Discussion Group, Vol. 19, No. 436.
       Centre for Computing in the Humanities, King's College London
                   www.kcl.ac.uk/humanities/cch/humanist/
                        www.princeton.edu/humanist/
                     Submit to: humanist_at_princeton.edu

         Date: Thu, 24 Nov 2005 07:18:08 +0000
         From: "Patrick T. Rourke" <ptrourke_at_methymna.com>
         Subject: Relational database and TEI?

It's hard to be sure what your requirements are without knowing for
sure how ambitious your goals are. I haven't read your website very
carefully, but scanning the informational pages I wasn't certain
whether you wanted to catalogue bibliographical data about the MSS or
actually wanted to store complete texts.

Access allows export to delimited text, XML, Excel, Word, and some
Paradox and dBase formats. The main reason to use Access is if you
only have a few users accessing the data at one time and are dealing
with relatively small datasets (in the thousands and tens of
thousands of records, less than 255 fields per record, and relatively
small fields - i.e., I would not put the Pearl ms in a single field
in Access), if you are planning on writing your application in
ASP.NET, and if you want something that can be easily migrated to
Microsoft's SQL Server software.

I haven't really worked with Filemaker much, but I imagine that for
scalability it is similar to Access.

That said, you don't want to think of a database in the same way that
you think of e.g. a word processing application. In selecting a
database package, any format other than text, XML, and SQL is
redundant. You really do not want to be exporting data directly from
a database as PDF; rather, you want to pull the data from the
database via SQL and format it programmatically.

Also, if you expect that more than 50 scholars will be using your
database daily, most likely you will not want web pages to be driven
by either Access or Filemaker. Your computing center is right to
suggest an SQL server (whether Oracle, MS SQL Server, PostgreSQL,
MySQL, or another full RDBMS).

It sounds as though your computing center is suggesting that you use
Access so that they do not have to write a customized data entry
system for you to directly populate the SQL database, and so that you
will not have write access to their SQL server, but rather want you
to use Access as a data entry user interface and plan on importing
the Access database to the SQL server themselves. The combination of
Visual Studio with ASP.NET, Windows Server 2003, and Microsoft SQL
Server is very popular with commercial programmers (mainly because
they can develop quickly with the system; fortunately, if used
conscientiously, the ASP pages will be readable with any browser on
any platform), and it sounds to me as though this is what your
academic computing center wants to use. That said, do not let them
simply push the Microsoft paradigm without asking them to consider
alternatives - e.g., a PHP-Apache-MySQL solution - and explain why
they are not preferable, but accept "we have no experience with that
platform" as a reasonable (if not solely sufficient) argument against
alternatives.

If you are only collecting bibliographical data, you should simply
store that as text information in the database, and format it as TEI
on export - make sure that whoever designs the database is
knowledgeable of TEI bibliography standards, as the database
structure will need to be compatible with the TEI markup element model.

If you are going to be storing actual document text and presentation
information (including alternate readings, marginalia, etc.), rather
than merely standard bibiliographical data, you do want to use XML in
the actual text fields. Make sure that your computing center
programmers are aware that you need to be able to input TEI document
fragments, in UTF-8, and how long each document fragment will be (the
whole question of how to partition a document across a database table
is very tough), and how many there will be. They may not have
experience dealing with large, non-hierarchical, richly structured
documents (most programmers rarely deal with individual text
fragments longer than a blog posting), so it might help to give them
a copy of a good textual critical edition with a comprehensive
apparatus criticus and commentary, to give them some idea of the
scale of the undertaking and the scale of the texts involved. Also
show them an example of richly TEI encoded text.

If you are also planning on storing images of individual MS leafs,
you really have no choice: you will have to use a more robust RDBMS
than Access (or, I suspect, Filemaker), even for data entry.

Patrick Rourke

On Nov 23, 2005, at 1:50 AM, Humanist Discussion Group (by way of
Willard McCarty <willard.mccarty_at_kcl.ac.uk>) wrote:

>We are almost 8 months into our project (link below) and after
>numerous
>discussions with colleagues, trials and advice from our computing
>centre
>here in Leicester, we are going to catalogue our manuscripts
>creating a
>data-capture database. Our computer centre suggested managing our data
>from Microsoft Access on my desk machine, but then publish it for Web
>access on the university SQL server. I am some how hesitant to use
>Access as it does not seem to have any flexibility in exporting the
>data, a part from SQL and XML. FileMaker Pro 8 seems to have more
>flexibility at this hand exporting in PDF, SQL, XML and excel.
>Does anybody in humanist similar experience with either/ or both
>software?
Received on Thu Nov 24 2005 - 02:48:01 EST

This archive was generated by hypermail 2.2.0 : Thu Nov 24 2005 - 02:48:02 EST