13.0523 Large Corpora and Annotation Standards

From: Humanist Discussion Group (willard@lists.village.virginia.edu)
Date: Sat Apr 01 2000 - 20:06:01 CUT

  • Next message: Humanist Discussion Group: "13.0521 lecture: Lanham at NYU"

                  Humanist Discussion Group, Vol. 13, No. 523.
          Centre for Computing in the Humanities, King's College London

            Date: Thu, 30 Mar 2000 17:27:47 -0500
            From: "Nancy M. Ide" <ide@cs.vassar.edu>
            Subject: Large Corpora and Annotation Standards

                        Large Corpora and Annotation Standards


                       Held in conjunction with ANLP/NAACL'00
                                Seattle, Washington
                                 4 May 2000 1-6pm
               This meeting is intended to bring together researchers and
               developers from a variety of domains in text, speech,
               video, etc., to look broadly at the technical issues that
               bear on the development of software systems and standards
               for the annotation and exploitation of linguistic
               resources. The goal is to lay the groundwork for the
               definition of a data and system architecture to support
               corpus annotation and exploitation that can be widely
               adopted within the community.

               Among the issues to be addressed are:

                   - layered data architectures
                   - system architectures for distributed databases
                   - support for plurality of annotation schemes
                   - impact and use of XML/XSL
                   - support for multimedia, including speech and video
                   - tools for creation, annotation, query and access of
                   - mechanisms for linkage of annotation and primary
                   - applicability of semi-structured data models, search
                       and query systems, etc.
                   - evaluation/validation of systems and annotations

               The motivation for this meeting is the American National
               Corpus (ANC) effort, which should begin corpus creation
               within the year. We anticipate that the ANC will provide a
               significant resource for natural language processing, and
               we therefore seek to identify state-of-the-art methods for
               its creation, annotation, and exploitation. Also, as a
               national and freely available resource, the data and system
               architecture of the ANC is likely to become a de facto
               standard. We therefore hope to draw together leading
               researchers and developers to establish a basis for the
               design of a system to support the creation and use of the

                                          Provisional Program

                      Overview of the American National Corpus Effort
                         Nancy Ide and Catherine Macleod

                      Searching Linguistically Annotated Corpora
                         Chris Brew

                      Considerations for Large Corpus Annotation:
                      Intercoder Reliability
                         Rebecca Bruce and Janyce Wiebe

                      The XML Framework and Its Implications for Large
                      Corpus Access
                         Nancy Ide

                      The ATLAS System
                         John Henderson

                      Annotation Standards and Their Impact on Large
                      Corpus Development
                         Nicoletta Calzolari

                      A Framework for Multi-level Linguistic Annotation
                         Patrice Lopez and Laurent Romary

                      Discussion : Requirements for the ANC

               A related workshop will be held at the LREC conference on
               May 29-30, 2000. Se http://www.cs.vassar.edu/~ide/anc/lrec.html.


               Nancy Ide
               Professor and Chair
               Department of Computer Science
               Vassar College
               Poughkeepsie, NY 12604-0520 USA
               Tel: +1 914 437-5988 Fax: +1 914 437-7498

                           Humanist Discussion Group
           Information at <http://www.kcl.ac.uk/humanities/cch/humanist/>

    This archive was generated by hypermail 2b29 : Sat Apr 01 2000 - 20:12:45 CUT