wg1:wg1
Table of Contents
Working Group 1: Corpus Annotation
- Leader: Carlos Ramisch (France)
- Vice-leader: Kaja Dobrovolc (Slovenia)
Workplan
Annotated corpora constitute the Action's major operational tools for NLP-applied universality. Therefore, WG1 will be dedicated to the following activities:
- Studies and community discussions in language typology and language universals at the level of morphology, syntax and semantics, with special attention paid to idiosyncrasy at all these levels;
- Unification and enhancement of cross-lingual annotation guidelines for morpho-syntax and MWEs:
- defining the division of labour between morpho-syntactic and semantic annotation,
- addressing hard or weakly covered syntactic phenomena (syntactically irregular structures, relative clauses, coordination, pronoun inclusivity, etc.),
- covering new MWE categories (nominal, adjectival and functional MWEs),
- paving the way for unified annotation guidelines for idiosyncratic constructions;
- Coordinate the development and maintenance of centralized software for universality-based corpus construction:
- online spaces for community discussion and editing annotation guidelines,
- tools for automatic pre-annotation, annotation transfer and manual annotation of corpora,
- tools for corpus merging, validation, curation, statistics, conversion and release. The software development itself will be funded at national levels;
- Defining file formats for corpora annotated according to the unified guidelines;
- Construction of annotated corpora:
- adapting the existing corpora to the enhanced guidelines,
- creating new annotated corpora following the enhanced guidelines.
Documents
- WG1 Meeting 1 brainstorming topics and slides - 16-17 March 2023, Paris-Saclay University, France (co-located with UniDive 1st general meeting,
Translations of this page:
- en
wg1/wg1.txt · Last modified: 2023/03/22 11:59 by agata.savary