Working Group 1: Corpus Annotation


Annotated corpora constitute the Action's major operational tools for NLP-applied universality. Therefore, WG1 will be dedicated to the following activities:

  1. Studies and community discussions in language typology and language universals at the level of morphology, syntax and semantics, with special attention paid to idiosyncrasy at all these levels;
  2. Unification and enhancement of cross-lingual annotation guidelines for morpho-syntax and MWEs:
    • defining the division of labour between morpho-syntactic and semantic annotation,
    • addressing hard or weakly covered syntactic phenomena (syntactically irregular structures, relative clauses, coordination, pronoun inclusivity, etc.),
    • covering new MWE categories (nominal, adjectival and functional MWEs),
    • paving the way for unified annotation guidelines for idiosyncratic constructions;
  3. Coordinate the development and maintenance of centralized software for universality-based corpus construction:
    • online spaces for community discussion and editing annotation guidelines,
    • tools for automatic pre-annotation, annotation transfer and manual annotation of corpora,
    • tools for corpus merging, validation, curation, statistics, conversion and release. The software development itself will be funded at national levels;
  4. Defining file formats for corpora annotated according to the unified guidelines;
  5. Construction of annotated corpora:
    • adapting the existing corpora to the enhanced guidelines,
    • creating new annotated corpora following the enhanced guidelines.


