7.0615 Neural Networds and Info Retrieval (1/371)

Elaine Brennan (EDITORS@BROWNVM.BITNET)
Tue, 12 Apr 1994 23:59:15 EDT

Humanist Discussion Group, Vol. 7, No. 0615. Tuesday, 12 Apr 1994.

Date: 12 Apr 1994 09:47:09 -0400 (EDT)
From: ide@cs.vassar.edu (Nancy M. Ide)
Subject: Workshops on Neural Networks and Information Retrieval in Amsterdam


Date: 03 Apr 94 15:21:09 EDT
From: Johannes C. Scholtes <100322.250@CompuServe.COM>
Subject: Workshops on Neural Networks and Information Retrieval in Amsterdam

Preliminary Program

Neural Networks and Information Retrieval in a Libraries Context

Amsterdam , The Netherlands
Friday June 24, 1994 and Friday September 16, 1994


M.S.C. Information Retrieval Technologies BV, based in Amsterdam, the
Netherlands, is currently undertaking a study on Neural Networks and
Information Retrieval in a Libraries Context, in collaboration with the
Department of Computational Linguistics of the University of Amsterdam and
the Department of Information Technology and Information Science at
Amsterdam Polytechnic. This study is funded by the European Commission as a
complementary measure under the Libraries Programme

In this study the general application of artificial neural net (ANN)
technology to information retrieval (IR) problems is investigated in a
libraries context. Typical applications of this technology are advanced
interface design, current awareness, SDI, fuzzy search and concept
formation.

In order to discuss and disseminate the results obtained through this
study, two one-day workshops will be organized by M.S.C. Information
Retrieval Technologies BV, the first one after compilation of the State of
the Art Report and the second one after completion of the prototyping and
experimentation phase.

During both workshops, there will be much room for discussions on how to
commercialise such applications of ANN in a libraries context.

Both workshops are open to participants from other organizations,
commercial and academic, that are interested in various applications of
ANNs in existing libraries systems.

For who:

Interesting for all:

- Computer Companies
- Information Management and Supply Companies
- Government Agencies
- Libraries
- Universities and Polytechniques

That are Interested in:

- Neural Networks
- Information Retrieval
- Libraries Sciences
- Natural Language Processing
- Advanced Computer Science
- Data compression

For applications such as:

- Current Awareness
- Selective Dissemination of Information (SDI)
- Information Filtering
- Automatic Contents Based Information Distribution
- Categorization
- Advanced Interface Design
- Fuzzy Retrieval (Information recognized by Optical Character
Recognition
and Speech Recognition).
- Retrieval Generalization
- Thesaurus Generation
- Information Compression
- Juke box staging



General Information

Costs per participant for both days:

Commercial companies Dfl. 950,-
Universities and non-profit institutions (*) Dfl.
500,-
Students (*) Dfl. 150,-

(*) Letter of university or non-profit institution must be shown at
registration

These costs include:
Workshop Proceedings
State of the Art report on Neural Networks
in Information Retrieval as composed by
MSC
Achievements report on Neural Networks in
Information Retrieval as composed by MSC
Ongoing coffee & tea
Lunch
Diner
Future mailings on progress
Limited availability of travel grants for

students (please apply)
All other expenses such as traveling,
hotels, short stays, etc.
are
not included in the fee.

Payment

The following payment methods are accepted:

1. Credit Cards
2. Prepayment by bank
3. Personal cheques

More information:

M.S.C. Information Retrieval Technologies BV
Dr Johannes C. Scholtes
Dufaystraat 1
1075 GR AMSTERDAM
the Netherlands

Telephone: +31 20 679 4273
Fax: +31 20 6710 793
Internet: 100322.250@compuserve.com or
scholtes@msc.mhs.compuserve.com
Compuserve: MHS: SCHOLTES@MSC or
100322,250

Background & Introduction

Recent research of artificial neural networks (ANN) in the field of
pattern recognition and pattern classification applications has provided
successful alternatives of traditional techniques. Products applied for
optical character recognition (OCR), speech recognition, hand-written
character recognition and prediction of non-linear time series are good
examples of commercialization of these ANN techniques.

So far, the European Commission has funded more than 40 projects of
different sizes under the ESPRIT and other programmes which involve
research on or the application of ANN technology.

The task of Information Retrieval (IR), that is the matching of a large
number of documents against a query, can also been seen as a pattern
recognition or pattern classification task. Therefore, there have been
several approaches to the application of ANN in IR in order to increase the
quality of the retrieval process.

Despite the theoretical and practical evidence that ANN are good tools for
pattern recognition tasks, it is still an open question whether they are
appropriate tools within the specific domain of Bibliographic Information
Retrieval. Apart from some minor studies it seems no real attempt has been
made up until now to integrate an ANN as a main component of a
bibliographical information retrieval system or an on-line library
catalogue (OPAC). It is therefore not clear whether and how ANN techniques
can be combined with more "classical" methods, for instance rule-based or
statistical approaches. By the same token it is not clear either to what
extent existing OPACs could benefit from ANN technology.

Objectives

The objectives of this study are:
to ascertain the State-of-the-Art of the application of Artificial Neural
Net (ANN) technology to Information Retrieval (IR), with particular
emphasis on bibliographic information in a libraries context;
to assess the (potential) quality of ANN-based approaches to IR in this
particular domain of interest, in comparison with traditional practices.
Here "quality must be understood in terms of both (measurable) efficiency
and practical benefits;
to stimulate interest in the practical application of ANN technology to
bibliographic information retrieval in a libraries context.

Information Retrieval

It can be stated that Information Retrieval (IR) is the ultimate
combination between Natural Language Processing (NLP) and Artificial
Intelligence (AI). On the one hand there is an enormous amount of NLP data
that needs to be processed and understood to return the proper information
to the user. On the other hand, one needs to understand what the user
intends with his or her query given the context of the other queries and
some kind of user model.

Most of these systems still use techniques that were developed over thirty
years ago and that implement nothing more than a global surface analysis of
the textual (layout) properties. No deep structure whatsoever is
incorporated in the decision to whether or not retrieve a text.

There is one large dilemma in IR research. The data collections are so
incredibly large, that any method other than a global surface analysis
would fail. However, such a global analysis could never implement a
contextually sensitive method to restrict the number of possible candidates
returned by the retrieval system.

Information retrieval can also be a very frustrating area of research.
Whenever one invents a new model, it is difficult to show that it works
better (qualitatively and quantitatively) than any previous model. The
addition of new dependencies often results in much too slow a system.
Systems such as Salton's SMART exist for over 30 years without having any
serious competition.

The field of information retrieval would be greatly indebted to a method
that could incorporate more context without slowing down. Since computers
are only capable of processing numbers within reasonable time limits, such
a method should be based on vectors of numbers rather than on symbol
manipulations. This is exactly where the challenge lies: on the one hand
keep up the speed, and on the other incorporate more context.

Artificial Neural Networks

The connectionist approach offers a massively parallel, highly distributed
and highly interconnected solution for the integration of various kinds of
knowledge, with preservation of generality. It might be that connectionism
or neural networks (despite all currently unsolved questions concerning
learning, stability, recursion, firing rules, network architecture, etc.),
will contribute to the research in natural-language processing and
information retrieval.

Distributed data representation may solve many of the unsolved problems in
IR by introducing a powerful and efficient knowledge integration and
generalization tool. However, distributed data representation and
self-organization trigger new problems that should be solved in an elegant
manner.

Current Problems in Information Retrieval

The main objectives of current IR research can be characterised as the
search for systems that exhibit adaptive behaviour, interactive behaviour
and transparency. More specifically, these models should implement
properties for:
Understanding incomplete queries or making incomplete matches,
Understanding vague user intentions,
Ability to generalise over queries as well as over query results,
Adapting to the needs of an evolving user (model),
Allowing dynamic relevance feed-back,
Aid for the user to browse intelligently through the data, and
Addition of (language) context sensitivity.

Different Approaches in Information Retrieval and Neural NetworksTwo main
directions of neural network related research information retrieval can be
observed.

First, there are relatively static databases that are investigated with a
dynamic query (free text search, also known as document retrieval systems).


Next, there are the more dynamic databases that need to be filtered with
respect to a relatively static query (the filtering problem also known as
current awareness systems and Selective Dissemination of Information, SDI).
In the first case the data can be preprocessed due to their static
character. In the second case, the amounts of data are so large that there
is no time whatsoever for a preprocessing phase. A direct context-sensitive
hit-and-go must be made.

Early neural models adapt well to the paradigms currently used in
information retrieval. Index terms can be replaced by processing units,
hyperlinks by connections between units, and network training resembles the
index normalisation process. However, these models do not adapt well to the
general notion of neural networks.

In addition, it is difficult to imagine what to teach a neural information
retrieval system if it is used as a supervised training algorithm. The
address space will almost always be too limited due to the large amounts of
data to be processed. A combination of structured (query, retrieved
document numbers) pairs does not seem plausible either, considering the
restricted amount of memory of (current) neural network technology.
Nevertheless, most of the neural IR models found in literature are based on
these principles.

Also problematic are the so-called clustering networks. Due to the large
amounts of data in free text databases, clustering is very expensive and is
therefore considered irrelevant in changing information retrieval
environments.

More interesting are the unsupervised, associative memory type of models,
that can be used to implement a specific pattern matching task. This type
of neural networks can be particularly useful in a filtering application.
Here, the memory demands of the neural network only need to fulfil the
query (or interest) size, and not the size of the entire data base. It is
in this area where neural networks are expected to be most useful and
relevant for information retrieval.

Especially topics such as fuzzy retrieval, current awareness, SDI, concept
formation and advanced interface design are in the scope of the project.
However, input from the workshops is very important for the final
determination of the direction of the research.

Program

Day 1: June 24, 1994

9.15-9.30 Welcome and Introduction
Dr Ir Johannes C. Scholtes, President of
MSC Information Retrieval Technologies
B.V.

9.30-11.00 Tutorial Neural Networks (Back Propagation
Kohonen Feature Maps) Dr
Ir Johan Henseler,
Forensic Laboratories, Head of
Section Computer Criminality

11.00-11.15 Break

11.15-12.30 Information Retrieval Application in Libraries
Dr E. Sieverts, Professor at Amsterdam
Polytechnique. Library Program

12.30-13.30 Lunch

13.30-15.00 Presentation Findings & State of the Art
Report

15.00-15.15 Break

15.15-16.00 Directions for (Commercial) Applications
Dr ir Johannes C. Scholtes

16.00-17.00 Panel Discussion

17.00-18.00 Reception

19.00-... Diner and evening program


Day 2: September 16, 1994


9.15-9.30 Welcome and Introduction
Dr Ir Johannes C. Scholtes. President of

MSC Information Retrieval Technologies B.V.

9.30-11.00 Achievements
Dr Ir Johannes C. Scholtes. President of

MSC Information Retrieval Technologies

B.V. & Dr E. Sieverts. Professor at Amsterdam

Polytechnique Library Program

11.00 - 12.30 Hands on demonstrations

12.30-13.30 Lunch

13.30-15.00 Problem Issues by Dr E. Sieverts.
Professor at Amsterdam
Polytechnique. Library
Program

15.00-15.15 Break

15.15-16.00 Commercial Implications by Dr Ir Johannes C. Scholtes.

President of MSC Information
Retrieval Technologies B.V.

16.00-17.00 Panel Discussion

17.00-18.00 Reception

19.00-... Diner and evening program


During the day, demo's of the prototypes will be available to the
participants of the workshop. Each demo will be guided by a specialist who
demonstrates the software