14.0699 Metadata Engine project

From: by way of Willard McCarty (willard@lists.village.Virginia.EDU)
Date: Wed Feb 28 2001 - 05:13:57 EST

  • Next message: by way of Willard McCarty: "14.0700 automata"

                   Humanist Discussion Group, Vol. 14, No. 699.
           Centre for Computing in the Humanities, King's College London

             Date: Wed, 28 Feb 2001 10:09:48 +0000
             From: NINCH-ANNOUNCE <david@ninch.org>
             Subject: European Metadata Engine Project

    News on Networking Cultural Heritage Resources
    from across the Community
    February 27, 2001

                            The Metadata Engine Project (METAe)

                               METAe Newsletter Now Available

    A promising European project, The METADATA ENGINE, is described below,
    along with the first issue of the project's newsletter. Essentially, the
    project is working on developing software that will automatically generate
    metadata during the digitization of printed material and hopefully making
    "large scale digitisation of printed material, such as books and journals,
    more reliable in terms of digital preservation, more cost-effective in
    terms of automation, and more user-oriented in terms of future applications."

    David Green

    >Date: Mon, 26 Feb 2001 10:06:02 +0000
    >From: Simon Tanner <S.G.Tanner@HERTS.AC.UK>
    *** Apologies for cross-postings ***

    The Metadata Engine Project (METAe) - Newsletter now available.

    The first issue of the METAe Newsletter is now available from:

    (for an introduction to METAe see the base of this email)

    In this first issue we introduce our project and tell you some information
    about progress to date. Our next issue due out in April 2001 will have even
    more detail and information. The METAe homepage has further information and
    of course the METAe team welcome contact at any time:
    <http://meta-e.uibk.ac.at/>http://meta-e.uibk.ac.at/. The METAe Project is
    funded under the European Union IST Programme.

    In this issue, Gnter Mhlberger, from the Project Co-ordination team at
    University of Innsbruck explains the genesis of the idea that led to the
    Metadata Engine Project. Also, the influence of the METAe project is
    already being felt on the international scene and Alexander Eggar explains
    why METAe have been invited to attend the next MOA2 DTD meeting in New York.

    We also introduce the 14 partners that make up the Metadata Engine project.
    In future issues two partners per issue will showcase their expertise and
    involvement in METAe. This will give a good opportunity to find out more
    about the backgrounds to our various partners.

    We will endeavour to keep you up to date with the METAe project progress
    and to give details of forthcoming events that METAe organises or will be
    presenting information at. The newsletter may also include reports on
    meetings attended by METAe partners - as this issue does, with an article
    by Gerd Prasthofer on the SCHEMAS-workshop held in Bonn during November 2000.

    We hope you will find this newsletter useful and informative. Any feedback
    can be directed to Simon Tanner, Editor of the METAe Newsletter at

    Best regards,
             Simon Tanner
             Senior Digitisation Consultant (HEDS)
             Higher Education Digitisation Service
             Web: <http://heds.herts.ac.uk>http://heds.herts.ac.uk

    Some further information about METAe:

    "Metadata" are playing a significant role in "digital preservation":
    Firstly, they are, in conjunction with emerging standards (such as XML,
    EAD, Dublin Core or RDF ), among the most promising ways to keep digital
    material "alive" over the years and decades. Secondly, metadata are needed
    for all kinds of resource discovery, i. e. using and accessing digital
    collections in a user-friendly way. The METADATA ENGINE project picks up
    these considerations and will develop software modules in order to automate
    metadata capturing by introducing layout and document analysis as a key
    technology for digitisation software. METAe will enhance dramatically the
    quality of creating and maintaining digital collections of printed material
    such as books and journals.

    The METAe project will address the need for an automated generation of
    metadata during the conversion of printed documents and thus be able to
    make large scale digitisation of printed material, such as books and
    journals, more reliable in terms of digital preservation, more
    cost-effective in terms of automation, and more user-oriented in terms of
    future applications.
    In order to achieve these aims the METADATA ENGINE project will
    (1) introduce layout and document analysis to be employed as a key
    technology in future digitisation software,
    (2) develop capturing and conversion tools for the automated recording and
    generation of administrative and descriptive metadata,
    (3) develop an omnifont OCR-engine specialising in processing old European
    typefaces of the 19th century,
    (4) strictly obey emerging standards in the fields of digital preservation
    and resource description, such as XML, EAD, TEI, or ISO 12083,
    (5) develop a XML search engine capable for retrieving the tagged full text
    and the images.

    Description of work
    The METAe project will develop a software package which extensively
    automates and improves the generation of metadata by applying new
    technologies for character, layout and document recognition, and converts
    the captured information into XML documents. These XML files will serve as
    a basis for a variety of applications, such as new XML search engines,
    navigation tools, electronic books, audio books, or the automated
    production of HTML, XHTML, PDF or PS files.
    The METAe package consists of (1) an input module for scanning printed
    material and importing existing bibliographic metadata, (2) an omnifont
    character recognition module (OCR-engine) specialising in typefaces of the
    19th century, (3) a document analysis module capable of classifying pages
    according to their physical and logical structure (items such as title
    pages, table of contents pages, etc., will be recognised automatically),
    (4) a page layout analysis module capable of analysing and segmenting page
    elements such as page numbers, headings, captions, footnotes, pictures,
    highlighted phrases, or graphical separators, (5) a knowledge base
    providing a controlled vocabulary and rules for the recognition process
    (the table of contents is, in most cases, called "contents"), (6) a
    conversion module assembling an XML document containing all recognised
    metadata, and (7) an export module for the XML enriched document and the
    scanned image.
    The XML documents will be generated according to emerging standards for
    digital preservation and the electronic interchange of information such as
    RDF, DC, EAD, TEI, or ISO 12083.
    In order to introduce a wide public to the new features of accessing and
    browsing images and XML-marked full texts, a METAe search engine and web
    application will be developed as well.
    Simon Tanner
    Senior Digitisation Consultant (HEDS)
    Higher Education Digitisation Service
    University of Hertfordshire
    Phone: +44 (0) 1707 286078
    Fax: +44 (0) 1707 286079
    Web: <http://heds.herts.ac.uk>http://heds.herts.ac.uk
    METAe Project: <http://meta-e.uibk.ac.at/>http://meta-e.uibk.ac.at/

    Sun Microsystems, Inc. has published the second edition of its
    popular "Digital Library Toolkit", a valuable resource for anyone
    planning a digital collection. To download a free copy, go to:


    NINCH-Announce is an announcement listserv, produced by the National
    Initiative for a Networked Cultural Heritage (NINCH). The subjects of
    announcements are not the projects of NINCH, unless otherwise noted;
    neither does NINCH necessarily endorse the subjects of announcements. We
    attempt to credit all re-distributed news and announcements and appreciate
    reciprocal credit.

    For questions, comments or requests to un-subscribe, contact the editor:
    See and search back issues of NINCH-ANNOUNCE at


    This archive was generated by hypermail 2b30 : Wed Feb 28 2001 - 05:20:25 EST