====== PARSEME/UniDive annotation campaign on multiword expressions ======

  * **Event title**: PARSEME/UniDive annotation campaign (UniDive WG1 task 1.2)
  * **Dates**: September 2023 -- September 2025
  * **Co-leaders**: 
    * [[https://people.auth.gr/pgiouli/?lang=en|Voula Giouli]], Aristotle University of Thessaloniki, Greece
    * [[https://www.ilsp.gr/en/members/markantonatou-stella-2/|Stella Markantonatou]], Language and Speech Processing/ATHENA RC, Athens, Greece
    * Takuya Nakamura, Université Paris-Saclay, France
    * [[https://pageperso.lis-lab.fr/carlos.ramisch/|Carlos Ramisch]], Aix-Marseille Université, France
    * [[https://perso.lisn.upsaclay.fr/savary/|Agata Savary]], Université Paris-Saclay, France
    * [[https://www2.lingfil.uu.se/cl/sara/|Sara Stymne]], Uppsala University, Sweden

|[[https://www.cost.eu/|{{ :cost_logo_rgb_lowresolution-cropped.jpg?100 |}}]]|{{ :en-funded_by_the_eu-pos.png?200 |}}|[[other-events:logo-tessaloniki.png|{{:wg1:wg1:task1.2:logo-tessaloniki.png?100|}}]]|[[https://www.athenarc.gr/|{{:wg1:wg1:task1.2:logo-athena.png?100|}}]]|[[https://www.universite-paris-saclay.fr/|{{:other-events:logo-univ-saclay.png?100|}}]]|[[https://www.univ-amu.fr/|{{:other-events:logo-amu.png?100|}}]]|[[https://www.uu.se/|{{:wg1:wg1:task1.2:logo-uppsala.png?40|}}]]|

The [[https://unidive.lisn.upsaclay.fr/|UniDive]] COST action (task 1.2) and the [[https://gitlab.com/parseme/corpora/-/wikis|PARSEME]] community are carrying on a **multilingual corpus annotation campaign** dedicated to multiword expressions (MWEs). 

Three past PARSEME annotation campaigns were dedicated exclusively to __verbal__ MWEs (VMWEs) and resulted in 4 editions of the [[https://gitlab.com/parseme/corpora/-/wikis/home|PARSEME corpus]], which jointly covers **26 languages**.  Three [[https://gitlab.com/parseme/corpora/-/wikis/home#shared-tasks|PARSEME shared tasks]] on automatic identification of VMWEs have been organized on the basis of this corpus and set the state of the art in the task.

The current annotation campaign covers MWEs of **all syntactic types** (including nominal, adjectival, adverbial and functional MWEs). It follows the spirit of **universality**. Namely, the [[https://parsemefr.lis-lab.fr/parseme-st-guidelines/2.0|annotation guidelines]] are unified across all participating languages, whenever possible, still leaving room for truly language-specific phenomena. This approach is expected to promote meaningful cross-language comparisons. The resulting corpus will be used in a [[other-events:parseme-st|PARSEME/UniDive shared task]] on identifying and understanding MWEs, submitted as a proposal for [[https://semeval.github.io/SemEval2026/cft|SemEval 2026]]. 

===== Teams =====
Each language should be annotated by a team on **native annotators** (except when this is not possible, e.g. in the case of extinct languages like Ancient Greek or Egyptian). A language team should consist of **at least 2 annotators** (including the Language Leader), for the sake of inter-annotator agreement estimation. It is possible to start annotating alone and recruit more annotators at a later stage (May 2025 at latest). See the [[https://gitlab.com/parseme/corpora/-/wikis/home#language-teams|language teams]] from past and present annotation campaigns.

Each language team should have at least one **Language Leader**. See the [[wg1:wg1:task1.2:call-for-language-leaders|call for Language Leaders]].

 
===== Annotation work =====
For the [[https://gitlab.com/parseme/corpora/-/wikis/home#languages|languages already present]] in the PARSEME corpus, the agenda is to:
  * Re-annotate |the existing corpus with MWEs other than verbal. Annotating only part of the existing corpus is an option. In this case we recommend a **minimum of 2000 annotated MWEs** (so that each selected text is exhaustively annotated for all syntactic types of MWEs). A lower number of annotations can do but the system results are expected not to be representative. 
  * Add some **new texts** annotated from scratch (to counterbalance language model contamination from previously published data) 
For [[https://gitlab.com/parseme/corpora/-/wikis/home#upcoming-languages|new languages]], corpora will be annotated for all syntactic types at once.

Conversions from other MWE annotation schemes are fine, if curated so as to fit the PARSEME guidelines.

===== Timeline =====
  * <del>**[task leaders: 14 February]** [[wg1:wg1:task1.2:call-for-language-leaders|Call for Language Leaders]]</del>
  * <del>**[language leaders: 27 February]** Expression of interest from Language Leaders</del>
  * <del>**[task leaders: late-February]** Creating FLAT accounts</del>
  * **[language leaders: mid-March]** 
    * Reading the [[https://parsemefr.lis-lab.fr/parseme-st-guidelines/2.0/|annotation guidelines 2.0]]
    * Reading the [[https://gitlab.com/parseme/corpora/-/wikis/PARSEME-Language-Leader-Guide|Language Leader's guide]]
    * Filling in MWE examples in the guidelines
    * Recruiting annotators
    * Selecting corpora
  * <del>**[all: 28 March]** Pilot annotation, submitting [[https://gitlab.com/parseme/sharedtask-guidelines/-/issues|issues]]</del>
  * <del>**[shared task leaders: 31 March]** SEMEVAL 2016 shared task proposal</del>
  * <del>**[SEMEVAL: 19 May]** Notification from SemEval about the selected shared tasks => rejected</del>
  * **[language teams: April-1 September]** Annotation for subtask 1 (PARSEME corpus)
    * Annotating the PARSEME corpus with all syntactic types of MWEs
    * Double-annotating a sample for inter-annotator agreement estimation
    * Consistency checks
  * **[task leaders: 15 September]** Preparing the data for subtask 2
  * **[language teams: 1 October]** MWE paraphrasing for subtask 2
  * **[task leaders: 30 October]** Consolidating and splitting the corpora for both subtasks
  * **[task leaders: autumn]** Shared task proposal

===== Documents and tools =====
  * PARSEME/UniDive annotation campaign [[https://docs.google.com/document/d/1u_ycAUIB8Fw7kYI3M_Xkj5_ZWftlXcIpSa42pHGaBdc/edit?usp=sharing| master document]]
  * [[https://gitlab.com/parseme/corpora/-/wikis/|PARSEME corpus wiki]]
  * Annotation guidelines
    * [[https://parsemefr.lis-lab.fr/parseme-st-guidelines/2.0/|PARSEME annotation guidelines 2.0]] 
    * [[https://docs.google.com/document/d/1meuelqTYyTeIEW3ezqNydEZTYhXhh_8jKcv9r93y1mU/edit?usp=sharing|what’s new in version 2.0]]
    * [[https://gitlab.com/parseme/sharedtask-guidelines/-/issues|Gitlab issues]] from the guidelines
  * [[https://gitlab.com/parseme/corpora/-/wikis/PARSEME-Language-Leader-Guide|Language Leader's guide]] 
  * [[https://flat.lisn.upsaclay.fr|FLAT]] annotation platform and [[https://docs.google.com/document/d/1gAQ1yC0xR-nkJVbVNMgtN6gCRrizjfoqH-z6SQ_pDSk/edit?usp=sharing|FLAT User's Guide]]
  * Minutes from [[https://docs.google.com/document/d/1jvOGO2Q_pJpm1rB0B6sAprKzEh2n95Jc-VTktaAW_j8/edit?usp=sharing|task 1.2 co-leaders’ meetings]]
  * Minutes from [[https://docs.google.com/document/d/1r-OcsGUOMFZFewTigj9arGhDQ4ePzbnfYYSOhFxMQ_A/edit?usp=sharing|Language Leaders' meetings]]

===== Language Leaders' meetings =====
Language Leaders meet weekly online during the annotation campaign. The timeline is the following:
  * Tuesday 8 April 6 p.m. CEST
  * Friday 18 April 9 a.m. CEST
  * Friday 2 May 9 a.m. CEST
  * Tuesday 6 May 6 p.m. CEST
  * Friday 16 May 9 a.m. CEST
  * Friday 30 May 9 a.m. CEST
  * Tuesday 3 June 6 p.m. CEST
  * Friday 13 June 9 a.m. CEST
  * Friday 27 June <del>6</del> p.m. CEST
  * <del>Tuesday 1 July 6 p.m. CEST</del>
  * Friday 25 July 9 a.m. CEST

We are using the recurrent [[https://cnrs.zoom.us/j/92794488497?pwd=Smtmdm4rTCs1S3hFdjZsUk1rZlU1dz09
|zoom link]].