User Tools

Site Tools


wg1:wg1:task1.2:call-for-language-leaders

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
wg1:wg1:task1.2:call-for-language-leaders [2025/02/14 18:29] agata.savarywg1:wg1:task1.2:call-for-language-leaders [2025/02/18 08:04] (current) agata.savary
Line 1: Line 1:
 ====== Call for expressions of interest in PARSEME/UniDive annotation campaign on multiword expressions ====== ====== Call for expressions of interest in PARSEME/UniDive annotation campaign on multiword expressions ======
  
-The [[https://unidive.lisn.upsaclay.fr/|UniDive]] COST action (task 1.2) and the [[https://gitlab.com/parseme/corpora/-/wikis|PARSEME]] community are happy to announce the upcoming **multilingual corpus annotation campaign** dedicated to multiword expressions (MWEs). We call for expression of interest from current or future Language Leaders, who wish to propose a language team. If you are interested, please, fill in the [[https://docs.google.com/forms/d/1gWrTQGHotfy_42lKx_5TkAsriUxMhNZXroFeTdJnVY4/preview|EoI form]], best before **27 February 2025**.+The [[https://unidive.lisn.upsaclay.fr/|UniDive]] COST action (task 1.2) and the [[https://gitlab.com/parseme/corpora/-/wikis|PARSEME]] community are happy to announce the upcoming **multilingual corpus annotation campaign** dedicated to multiword expressions (MWEs). We call for expression of interest from current or future Language Leaders, who wish to propose a language team. If you are interested, please, fill in the [[https://docs.google.com/forms/d/e/1FAIpQLSejtAdVITdasl8LLfydI8erLB6qO3k9XAmaR6bDZJlw5SPecQ/formResponse|EoI form]], best before **27 February 2025**.
  
 Three past PARSEME annotation campaigns were dedicated exclusively to __verbal__ MWEs (VMWEs) and resulted in 4 editions of the [[https://gitlab.com/parseme/corpora/-/wikis/home|PARSEME corpus]], which jointly covers **26 languages**.  Three [[https://gitlab.com/parseme/corpora/-/wikis/home#shared-tasks|PARSEME shared tasks]] on automatic identification of VMWEs have been organized on the basis of this corpus and set the state of the art in the task. Three past PARSEME annotation campaigns were dedicated exclusively to __verbal__ MWEs (VMWEs) and resulted in 4 editions of the [[https://gitlab.com/parseme/corpora/-/wikis/home|PARSEME corpus]], which jointly covers **26 languages**.  Three [[https://gitlab.com/parseme/corpora/-/wikis/home#shared-tasks|PARSEME shared tasks]] on automatic identification of VMWEs have been organized on the basis of this corpus and set the state of the art in the task.
  
-The current annotation campaign will cover MWEs of **all syntactic types**. It follows the spirit of **universality**. Namely, the [[https://parsemefr.lis-lab.fr/parseme-st-guidelines/2.0|annotation guidelines]] are unified across all participating languages, whenever possible, still leaving room for truly language-specific phenomena. This approach is expected to promote meaningful cross-language comparisons. The resulting corpus will be used in a **PARSEME/UniDive shared task** on identifying and understanding MWEs, to be proposed for [[https://semeval.github.io/SemEval2026/cft|SemEval 2026]]. +The current annotation campaign will cover MWEs of **all syntactic types** (including nominal, adjectival, adverbial and functional MWEs). It follows the spirit of **universality**. Namely, the [[https://parsemefr.lis-lab.fr/parseme-st-guidelines/2.0|annotation guidelines]] are unified across all participating languages, whenever possible, still leaving room for truly language-specific phenomena. This approach is expected to promote meaningful cross-language comparisons. The resulting corpus will be used in a **PARSEME/UniDive shared task** on identifying and understanding MWEs, to be proposed for [[https://semeval.github.io/SemEval2026/cft|SemEval 2026]]. 
  
 For the languages already present in the PARSEME corpus, the agenda is to: For the languages already present in the PARSEME corpus, the agenda is to:
   * Re-annotate the existing corpus with MWEs other than verbal. Annotating only part of the existing corpus is an option. In this case we recommend a minimum of 3500 annotated MWEs (so that each selected text is exhaustively annotated for all syntactic types of MWEs). A lower number of annotations can do but the system results are expected not to be representative.    * Re-annotate the existing corpus with MWEs other than verbal. Annotating only part of the existing corpus is an option. In this case we recommend a minimum of 3500 annotated MWEs (so that each selected text is exhaustively annotated for all syntactic types of MWEs). A lower number of annotations can do but the system results are expected not to be representative. 
-  * Add some new texts annotated from scratch (to counterbalance language model contamination from previously published data) +  * Add some new texts annotated from scratch (to counterbalance language model contamination from previously published data)  
-For new languages, corpora will be annotated for all syntactic types at once. +For new languages, corpora will be annotated for all syntactic types at once
 +Conversions from other MWE annotation schemes are fine, if curated so as to fit the PARSEME guidelines.
  
 A language team should consist of **at least 2 annotators** (including the Language Leader), for the sake of inter-annotator agreement estimation. It is possible to start annotating alone and recruit more annotators at a later stage (May at latest). A language team should consist of **at least 2 annotators** (including the Language Leader), for the sake of inter-annotator agreement estimation. It is possible to start annotating alone and recruit more annotators at a later stage (May at latest).
  
-Centralized [[https://gitlab.com/parseme/corpora/-/wikis/PARSEME-Language-Leader-Guidedocumentation and tools]] (including the online FLAT annotation platform) are available.+Centralized [[https://gitlab.com/parseme/corpora/-/wikis/PARSEME-Language-Leader-Guide|documentation and tools]] (including the online FLAT annotation platform) are available.
  
 We propose the following timeline:  We propose the following timeline: 
   * **[language leaders: 27 February]** Expression of interest from Language Leaders   * **[language leaders: 27 February]** Expression of interest from Language Leaders
-  * [task leaders: late-February] Creating FLAT accounts +  * **[task leaders: late-February]** Creating FLAT accounts 
-  * [language leaders: mid-March] Reading guidelines, reading the Language Leader guide, filling in MWE examples, recruiting annotators, selecting corpora +  * **[language leaders: mid-March]** Reading guidelines, reading the Language Leader guide, filling in MWE examples, recruiting annotators, selecting corpora 
-  * [all: March] Pilot annotation +  * **[all: March]** Pilot annotation 
-  * [shared task leaders: 31 March] SEMEVAL shared task proposal +  * **[shared task leaders: 31 March]** SEMEVAL shared task proposal 
-  * [language teams: April-May] Annotation (including a double-annotated sample for inter-annotator agreement estimation) +  * **[language teams: April-May]** Annotation (including a double-annotated sample for inter-annotator agreement estimation) 
-  * [SEMEVAL: 19 May] Notification about the selected shared task +  * **[SEMEVAL: 19 May]** Notification about the selected shared task 
-  * [language leaders: June] Consistency checks and inter-annotator agreement estimation +  * **[language leaders: June]** Consistency checks and inter-annotator agreement estimation 
-  * [shared task leaders: 15 July] Sample data ready +  * **[shared task leaders: 15 July]** Sample data ready 
-  * [task leaders: July-August] Consolidating and splitting the corpora +  * **[task leaders: July-August]** Consolidating and splitting the corpora 
-  * [WG3 shared task leaders: 1 September] Training data for SEMEVAL+  * **[WG3 shared task leaders: 1 September]** Training data for SEMEVAL
 More details about the role of the Language Leader can be found in the PARSEME [[https://gitlab.com/parseme/corpora/-/wikis/PARSEME-Language-Leader-Guide|Language Leader guide]]. More details about the role of the Language Leader can be found in the PARSEME [[https://gitlab.com/parseme/corpora/-/wikis/PARSEME-Language-Leader-Guide|Language Leader guide]].
  
 Feel free to contact us for any questions you might have. Feel free to contact us for any questions you might have.
  
- +UniDive task 1.2 co-leadersVoula Giouli, Stella Markantonatou, Carlos Ramisch, Agata Savary, Sara Stymne
-UniDive task 1.2 co-leaders<br/> +
-Voula Giouli, Stella Markantonatou, Carlos Ramisch, Agata Savary, Sara Stymne+
  
wg1/wg1/task1.2/call-for-language-leaders.1739554186.txt.gz · Last modified: by agata.savary