16.397 Workshop on Shallow Processing of Large Corpora

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty@kcl.ac.uk)
Date: Fri Jan 03 2003 - 03:53:23 EST

  • Next message: Humanist Discussion Group (by way of Willard McCarty

                   Humanist Discussion Group, Vol. 16, No. 397.
           Centre for Computing in the Humanities, King's College London
                         Submit to: humanist@princeton.edu

             Date: Fri, 03 Jan 2003 08:47:18 +0000
             From: "Kiril Simov" <kivs@bultreebank.org>
             Subject: Shallow Processing of Large Corpora - Second Call for Papers

                              Second Call for Papers

            Workshop on Shallow Processing of Large Corpora
                            (SProLaC 2003)
                       CORPUS LINGUISTICS 2003
              Lancaster University (UK), 27 March, 2003

    The workshop will take place on the 27th of March 2003 at the
    CORPUS LINGUISTICS 2003 Conference at Lancaster University (UK).

    Workshop motivation and aims:

    Corpora have developed with respect to two main directions:

          - large corpora of size min. 100 mln. tokens, and

          - small corpora of size up to 1 mln. tokens.

    The data in the former is only morpho-syntactically annotated
    and the data in the latter is assigned more detailed syntactic
    (and) semantic information. Needless to say, both types of
    language corpora are valuable. However, a question arises,
    whether it is possible to build a really large corpus, which is
    fully processed linguistically. Since it is a hard task and
    concerns metadata problems (theories, availability of appropriate
    tools etc), we put the stress on shallow parsing of unrestricted
    data. In our view, the creation of such a resource, using
    automation, is a task of great importance. It would serve as a
    template for linguistic research, consistency checking and
    validation, large-scale applications in Information Retrieval
    and Information Extraction, testing of machine learning
    algorithms and many others. This task is related to other
    subtasks, such as: an adequate combination of diverse
    shallow processing techniques in a sound and robust
    processor, and smoothing shallow parsing approaches
    for stages of deeper linguistic analyses.

    The workshop aims at being a forum for researchers to
    present their work in the area of Computational Corpus
    Linguistics and Language Engineering and to discuss
    the problems in design, management, linguistic interpretation
    and exploration of unrestricted data from both perspectives.

    We envisage a one-day workshop and 10-12 presentations.

    Topics of interest:

          - design principles for shallow-parsed large corpora;
          - text segmentation and preprocessing;
          - definition of the connection between the levels
            of processing;
          - chunk and partial parsing of large amounts of texts;
          - machine learning methods with large coverage;
          - software systems for management and accessibility
            to shallow-parsed large corpora;
          - applications of shallow-parsed large corpora

    There will be a general discussion at the end of the workshop.

    Important dates:

    Deadline for workshop abstract submission: 10th January 2003
    Notification of acceptance: 3rd February 2003
    Final version of paper for workshop proceedings: 3rd March 2003


    Papers should describe existing research connected to
    the topics of the workshop. The presentation at the
    workshop will be 25 minutes long (20 minutes for
    presentation and 5 minutes for questions and discussion).
    Each submission should show: title; author(s); affiliation(s);
    and contact author's e-mail address, postal address,
    telephone and fax numbers. Abstracts (maximum 500 words,
    plain-text format) should be sent to:

    Kiril Simov
    Email: kivs@bultreebank.org

    The final version of the accepted papers should follow
    the format for the main conference and should be no more
    than 10 pages long. Instructions for formatting can be
    found on the main conference page.

    There will be a proceedings of the workshop.


    The registration will be managed by the local organisers
    of the main conference.

    Programme committee:

    Michael Barlow, USA
    Tomaz Erjavec, Slovenia
    Silvia Hansen, Germany
    Atanas Kiryakov, Bulgaria
    Sandra Kuebler, Germany
    Ghassan Mourad, France
    Joakim Nivre, Sweden
    Kemal Oflazer, Turkey
    Karel Oliva, Austria
    Petya Osenova, Bulgaria (co-chair)
    Vladimir Petkevic, Czech Republic
    Adam Przepi'orkowski, Poland
    Geoffrey Sampson, UK
    Kiril Simov, Bulgaria (co-chair)
    Milena Slavcheva, Bulgaria
    Marko Tadic, Croatia
    Dan Tufis, Romania
    Tylman Ule, Germany
    Tamas Varadi, Hungary
    Nikolaj Vazov, Bulgaria
    Andreas Wagner, Germany

    Organizing committee:

    Kiril Simov
    BulTreeBank Project
    Linguistic Modelling Laboratory, CLPP,
    Bulgarian Academy of Sciences
    Acad. G.Bonchev St. 25A
    1113 Sofia, Bulgaria
    Tel: (+359 2) 979 2825
    Fax: (+359 2) 70 72 73
    Email: kivs@bultreebank.org

    Petya Osenova
    BulTreeBank Project
    Linguistic Modelling Laboratory, CLPP,
    Bulgarian Academy of Sciences
    Acad. G.Bonchev St. 25A
    1113 Sofia, Bulgaria
    Tel: (+359 2) 979 2825
    Fax: (+359 2) 70 72 73
    Email: petya@bultreebank.org

    This archive was generated by hypermail 2b30 : Fri Jan 03 2003 - 03:56:40 EST