18.474 indexing local machines

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty_at_kcl.ac.uk>
Date: Sat, 8 Jan 2005 10:25:52 +0000

               Humanist Discussion Group, Vol. 18, No. 474.
       Centre for Computing in the Humanities, King's College London
                   www.kcl.ac.uk/humanities/cch/humanist/
                        www.princeton.edu/humanist/
                     Submit to: humanist_at_princeton.edu

   [1] From: Jan Christoph Meister <jan-c-meister_at_uni- (45)
                 hamburg.de>
         Subject: Re: 18.463 indexing local machines

   [2] From: Amanda French <Amanda_French_at_ncsu.edu> (65)
         Subject: Re: indexing local machines

--[1]------------------------------------------------------------------
         Date: Sat, 08 Jan 2005 10:16:29 +0000
         From: Jan Christoph Meister <jan-c-meister_at_uni-hamburg.de>
         Subject: Re: 18.463 indexing local machines

07:51
08.01.2005

A most interesting topic indeed! However, let me pose a trivial
question: how good is an automatic indexing tool when it comes to
searching for NEW information, or for information that might be
available (local or distributed), but classified, contextualized and,
most importantly, VERBALIZED in an unanticipated manner?

It seems that clever algorithms plus enough brute force computing
power have made obsolete the deeply nested systematic index structures
of yonder. Having a rough idea of what your looking for is good
enough to make you choose the Google over the Yahoo-directory route.
However, the snag is that you can only find what you are able to
represent in a string that matches or approximates something captured in
the index
data base. That's exactly why libraries have systematic catalogues: we
can do deductive searches, starting with some generic top-level
concept and then drill down to find the new and unexpected (rather
than simply re-find stuff we knew we had somewhere, but just couldn't
trace). To put it in a philosophical nutshell: if we decide to go the
unstructured route and subscribe to what I'd like to call the 'Google
paradigm' of knowledge representation then aren't we locking ourselves
into the static configuration of knowledge as we have it here and now,
expressed in the strings indexed by the machine? In fact, is an
unsystematic index 're-presenting' knowledge in the first place? Is it
not just simply defining its relational coordinates in the database,
without any semantic surplus value generated that would allow us to
retool and reconfigure our knowledge elements?

Caveat: I haven't tried out any of the tools mentioned, so I might be
completey off the track here ...

Chris

*******************************

Jan Christoph Meister
Forschergruppe Narratologie
Universität Hamburg

Skype: jcmeister
Mail: jan-c-meister_at_uni-hamburg.de
Office: +49 - 40 - 42838 4994
Cell: +49 - 0172 40 865 41

My site: www.jcmeister.de

ACP: Computer Philology Working Group at Hamburg University
            www.c-phil.uni-hamburg.de

NarrNet: the Information hub for Narratologists
            www.narratology.net

SGA: Story Generator Algorithms
            www.rrz.uni-hamburg.de/story-generators

--[2]------------------------------------------------------------------
         Date: Sat, 08 Jan 2005 10:17:39 +0000
         From: Amanda French <Amanda_French_at_ncsu.edu>
         Subject: Re: indexing local machines

I'm interested in the discussion about indexing local machines. On the one
hand, it's a rather mundane issue that has to do with maximizing one's own
quotidian productivity. On the other hand, it speaks to the larger
long-term issue of record preservation and access. Will the scholar writing
a biography of Willard McCarty and/or a history of humanities computing a
hundred years from now be able to find and use all the important secondary
material Willard has produced that exists only digitally?

Traditional archivists, as we know, have evolved a system for dealing with
the fact that everyone organizes their records differently. That system
could be summed up as follows: Leave the records as they are as much as
possible. Describe exhaustively what's there and how it's organized. The
chief problem with this archival system--which produces amazingly
impressive results--is that that exhaustive description is, well,
exhausting. There's simply not enough time and labor available to do it
properly, and as a result there are scads of unprocessed and therefore
unusable material in archives ("hidden collections" is the euphemism).

But there are two reasons why archivists have chosen to do it that way
rather than by disassembling everyone's records and forcing them into a
single rational scheme, which would almost certainly be quicker. The first
reason is that it would be difficult for everyone to agree on which single
scheme to use. Records could be organized by medium, genre, topic or
what-have-you, and there are just too many instances of items that don't
fit comfortably anywhere. Willard's "bit bin" approach solves that; if
everything is simply in a big barrel titled "Willard McCarty's Stuff,"
that's simple and comprehensive, and tools that do the equivalent of
keyword searching will be sufficient to aid a future researcher. (Although
as things stand we still have to differentiate between media. Mostly you
can't keyword search the _content_ of an image, audio, or video file; you
can only search its filename or metadata. At least as far as I know. Though
see http://speechbot.research.compaq.com/ for a description of a
speech-recognition search engine in development at Hewlett-Packard.)

The second reason archivists choose to maintain someone's organizational
system, however, is that the way people choose to organize their records is
itself a carrier of meaning--a record in itself. For myself I find it hard
to imagine _not_ filing emails and documents on my local machine in as
rational a system as I can manage to devise, because the very act of
developing (and re-developing and re-developing) that system helps me
understand what it is that I'm doing. Speaking from the mundane personal
level, I'll reveal that I try to make sure that my browser bookmarks, my
documents, and e-mail directories all have the same file structure; if I'm
teaching a class called "Humanities Computing 101," for instance, I'd have
a folder with that title in my bookmarks, e-mail, hard drive document
folder. I do think that this aids me in my work, because there comes a
point when that class is over and all those folders need to be retired into
my archives. Without a structured filing system (not an overly byzantine
one), I think I would feel hard-pressed to discern the order in my very
life and work! The issues are the same with the Web; you can find things
quickly but you're aware that there's an enormous underlying chaos behind
it that is sometimes exhilarating and sometimes threatening.

So to sum up, I'm interested to learn about this x1 program, and I might
well experiment with some unstructured structures on my hard drive. But I'd
also be interested to learn what basic structures you've retained,
Willard--surely there are some?

Amanda French
(former lurker)

--
Amanda L. French, Ph.D.
CLIR Post-Doctoral Fellow in Scholarly Information Resources
Digital Library Initiatives
NCSU Libraries, Box 7111
Raleigh, NC 27695-7111
Tel: 919-513-0211
Fax: 919-515-3031
Mob: 720-530-7515
http://www4.ncsu.edu/~alfrench
Received on Sat Jan 08 2005 - 05:33:51 EST

This archive was generated by hypermail 2.2.0 : Sat Jan 08 2005 - 05:33:56 EST