User Tools

Site Tools


wg1:wg1:task1.2:call-for-language-leaders

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
wg1:wg1:task1.2:call-for-language-leaders [2025/02/14 18:22] – created agata.savarywg1:wg1:task1.2:call-for-language-leaders [2025/02/18 08:04] (current) agata.savary
Line 1: Line 1:
-===== Call for expressions of interest in PARSEME/UniDive annotation campaign on multiword expressions =====+====== Call for expressions of interest in PARSEME/UniDive annotation campaign on multiword expressions =====
 + 
 +The [[https://unidive.lisn.upsaclay.fr/|UniDive]] COST action (task 1.2) and the [[https://gitlab.com/parseme/corpora/-/wikis|PARSEME]] community are happy to announce the upcoming **multilingual corpus annotation campaign** dedicated to multiword expressions (MWEs). We call for expression of interest from current or future Language Leaders, who wish to propose a language team. If you are interested, please, fill in the [[https://docs.google.com/forms/d/e/1FAIpQLSejtAdVITdasl8LLfydI8erLB6qO3k9XAmaR6bDZJlw5SPecQ/formResponse|EoI form]], best before **27 February 2025**. 
 + 
 +Three past PARSEME annotation campaigns were dedicated exclusively to __verbal__ MWEs (VMWEs) and resulted in 4 editions of the [[https://gitlab.com/parseme/corpora/-/wikis/home|PARSEME corpus]], which jointly covers **26 languages**.  Three [[https://gitlab.com/parseme/corpora/-/wikis/home#shared-tasks|PARSEME shared tasks]] on automatic identification of VMWEs have been organized on the basis of this corpus and set the state of the art in the task. 
 + 
 +The current annotation campaign will cover MWEs of **all syntactic types** (including nominal, adjectival, adverbial and functional MWEs). It follows the spirit of **universality**. Namely, the [[https://parsemefr.lis-lab.fr/parseme-st-guidelines/2.0|annotation guidelines]] are unified across all participating languages, whenever possible, still leaving room for truly language-specific phenomena. This approach is expected to promote meaningful cross-language comparisons. The resulting corpus will be used in a **PARSEME/UniDive shared task** on identifying and understanding MWEs, to be proposed for [[https://semeval.github.io/SemEval2026/cft|SemEval 2026]].  
 + 
 +For the languages already present in the PARSEME corpus, the agenda is to: 
 +  * Re-annotate the existing corpus with MWEs other than verbal. Annotating only part of the existing corpus is an option. In this case we recommend a minimum of 3500 annotated MWEs (so that each selected text is exhaustively annotated for all syntactic types of MWEs). A lower number of annotations can do but the system results are expected not to be representative.  
 +  * Add some new texts annotated from scratch (to counterbalance language model contamination from previously published data)  
 +For new languages, corpora will be annotated for all syntactic types at once. 
 +Conversions from other MWE annotation schemes are fine, if curated so as to fit the PARSEME guidelines. 
 + 
 +A language team should consist of **at least 2 annotators** (including the Language Leader), for the sake of inter-annotator agreement estimation. It is possible to start annotating alone and recruit more annotators at a later stage (May at latest). 
 + 
 +Centralized [[https://gitlab.com/parseme/corpora/-/wikis/PARSEME-Language-Leader-Guide|documentation and tools]] (including the online FLAT annotation platform) are available. 
 + 
 +We propose the following timeline:  
 +  * **[language leaders: 27 February]** Expression of interest from Language Leaders 
 +  * **[task leaders: late-February]** Creating FLAT accounts 
 +  * **[language leaders: mid-March]** Reading guidelines, reading the Language Leader guide, filling in MWE examples, recruiting annotators, selecting corpora 
 +  * **[all: March]** Pilot annotation 
 +  * **[shared task leaders: 31 March]** SEMEVAL shared task proposal 
 +  * **[language teams: April-May]** Annotation (including a double-annotated sample for inter-annotator agreement estimation) 
 +  * **[SEMEVAL: 19 May]** Notification about the selected shared task 
 +  * **[language leaders: June]** Consistency checks and inter-annotator agreement estimation 
 +  * **[shared task leaders: 15 July]** Sample data ready 
 +  * **[task leaders: July-August]** Consolidating and splitting the corpora 
 +  * **[WG3 shared task leaders: 1 September]** Training data for SEMEVAL 
 +More details about the role of the Language Leader can be found in the PARSEME [[https://gitlab.com/parseme/corpora/-/wikis/PARSEME-Language-Leader-Guide|Language Leader guide]]. 
 + 
 +Feel free to contact us for any questions you might have. 
 + 
 +UniDive task 1.2 co-leaders: Voula Giouli, Stella Markantonatou, Carlos Ramisch, Agata Savary, Sara Stymne 
wg1/wg1/task1.2/call-for-language-leaders.1739553723.txt.gz · Last modified: by agata.savary