16.397 Workshop on Shallow Processing of Large Corpora

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty@kcl.ac.uk)
Date: Fri Jan 03 2003 - 03:53:23 EST

Next message: Humanist Discussion Group (by way of Willard McCarty

               Humanist Discussion Group, Vol. 16, No. 397.
       Centre for Computing in the Humanities, King's College London
                   www.kcl.ac.uk/humanities/cch/humanist/
                     Submit to: humanist@princeton.edu

         Date: Fri, 03 Jan 2003 08:47:18 +0000
         From: "Kiril Simov" <kivs@bultreebank.org>
         Subject: Shallow Processing of Large Corpora - Second Call for Papers

Second Call for Papers

        Workshop on Shallow Processing of Large Corpora
           http://www.bultreebank.org/SProLaC.html
                        (SProLaC 2003)
                   CORPUS LINGUISTICS 2003
          Lancaster University (UK), 27 March, 2003

The workshop will take place on the 27th of March 2003 at the
CORPUS LINGUISTICS 2003 Conference at Lancaster University (UK).
http://www.comp.lancs.ac.uk/ucrel/cl2003/

Workshop motivation and aims:

Corpora have developed with respect to two main directions:

- large corpora of size min. 100 mln. tokens, and

- small corpora of size up to 1 mln. tokens.

The data in the former is only morpho-syntactically annotated
and the data in the latter is assigned more detailed syntactic
(and) semantic information. Needless to say, both types of
language corpora are valuable. However, a question arises,
whether it is possible to build a really large corpus, which is
fully processed linguistically. Since it is a hard task and
concerns metadata problems (theories, availability of appropriate
tools etc), we put the stress on shallow parsing of unrestricted
data. In our view, the creation of such a resource, using
automation, is a task of great importance. It would serve as a
template for linguistic research, consistency checking and
validation, large-scale applications in Information Retrieval
and Information Extraction, testing of machine learning
algorithms and many others. This task is related to other
subtasks, such as: an adequate combination of diverse
shallow processing techniques in a sound and robust
processor, and smoothing shallow parsing approaches
for stages of deeper linguistic analyses.

The workshop aims at being a forum for researchers to
present their work in the area of Computational Corpus
Linguistics and Language Engineering and to discuss
the problems in design, management, linguistic interpretation
and exploration of unrestricted data from both perspectives.

We envisage a one-day workshop and 10-12 presentations.

Topics of interest:

      - design principles for shallow-parsed large corpora;
      - text segmentation and preprocessing;
      - definition of the connection between the levels
        of processing;
      - chunk and partial parsing of large amounts of texts;
      - machine learning methods with large coverage;
      - software systems for management and accessibility
        to shallow-parsed large corpora;
      - applications of shallow-parsed large corpora

There will be a general discussion at the end of the workshop.

Important dates:

Deadline for workshop abstract submission: 10th January 2003
Notification of acceptance: 3rd February 2003
Final version of paper for workshop proceedings: 3rd March 2003

Submissions:

Papers should describe existing research connected to
the topics of the workshop. The presentation at the
workshop will be 25 minutes long (20 minutes for
presentation and 5 minutes for questions and discussion).
Each submission should show: title; author(s); affiliation(s);
and contact author's e-mail address, postal address,
telephone and fax numbers. Abstracts (maximum 500 words,
plain-text format) should be sent to:

Kiril Simov
Email: kivs@bultreebank.org

The final version of the accepted papers should follow
the format for the main conference and should be no more
than 10 pages long. Instructions for formatting can be
found on the main conference page.

There will be a proceedings of the workshop.

Registration:

The registration will be managed by the local organisers
of the main conference.

Programme committee:

Michael Barlow, USA
Tomaz Erjavec, Slovenia
Silvia Hansen, Germany
Atanas Kiryakov, Bulgaria
Sandra Kuebler, Germany
Ghassan Mourad, France
Joakim Nivre, Sweden
Kemal Oflazer, Turkey
Karel Oliva, Austria
Petya Osenova, Bulgaria (co-chair)
Vladimir Petkevic, Czech Republic
Adam Przepi'orkowski, Poland
Geoffrey Sampson, UK
Kiril Simov, Bulgaria (co-chair)
Milena Slavcheva, Bulgaria
Marko Tadic, Croatia
Dan Tufis, Romania
Tylman Ule, Germany
Tamas Varadi, Hungary
Nikolaj Vazov, Bulgaria
Andreas Wagner, Germany

Organizing committee:

Kiril Simov
BulTreeBank Project
Linguistic Modelling Laboratory, CLPP,
Bulgarian Academy of Sciences
Acad. G.Bonchev St. 25A
1113 Sofia, Bulgaria
Tel: (+359 2) 979 2825
Fax: (+359 2) 70 72 73
Email: kivs@bultreebank.org
http://www.BulTreeBank.org

Petya Osenova
BulTreeBank Project
Linguistic Modelling Laboratory, CLPP,
Bulgarian Academy of Sciences
Acad. G.Bonchev St. 25A
1113 Sofia, Bulgaria
Tel: (+359 2) 979 2825
Fax: (+359 2) 70 72 73
Email: petya@bultreebank.org
http://www.BulTreeBank.org

This archive was generated by hypermail 2b30 : Fri Jan 03 2003 - 03:56:40 EST