====== Minutes from the eleventh WG 13 meeting (online 2025-03-12 12:00 CET) ====== Agenda Subtask reports: * Task 3.2: Shared task on morphosyntactic parsing (Omer Goldman, Leonie Weissweiler, Reut Tsarfaty) [[https://groups.google.com/g/msp-sharedtask-2025-participants|Google group]] [[https://github.com/UniDive-MSP/MSP-shared-task|Training data and future evaluation code]] [[https://unidive.lisn.upsaclay.fr/doku.php?id=other-events:msp|UniDive webpage]] * Task 3.4: Evaluation campaign PARSEME 2.0 (Manon Scholivet, Agata Savary) * Taks 3.5: Evaluation campaign AdMIRe (Thomas Pickard, Aline Villavicencio) General discussion Next meeting: May 6, 12.00 CEST (on line) ====== List of Participants ====== * Gülşen Eryiğit (chair) * Joakim Nivre (co-chair) * Roberto Antonio Díaz Hernández * Ali Basirat * Csilla Horváth * Manon Scholivet * Rob van der Goot * Agata Savary * Ranka Stanković * Thomas Pickard * Aline Villavicencio * Tanja Samardzic * Dawit J * Alina Wróblewska * Luka Terčon * Olha Kanishcheva * Dan Zeman * Takuya Nakamura * Federica Gamba * Carlos Ramisch * Flavio Massimiliano Cecchini * Gosse Bouma * Rusudan Makhachashvili * Voula Giouli * Ebru Çavuşoğlu * Omer Goldman * Reut Tsarfaty * Chaya Liebeskind * Faruk Mardan * Adriana Pagano * Ilan Kernerman * Kutay acar * Ludmila Malahov * Teresa Lynn * Lucía Amorós-Poveda ====== PARSEME shared task (Manon, Agata) ====== * subtask 1 (PARSEME 2.0) * quite established framework * novelty: non-verbal MWEs, diversity measures * subtask 2 (MWE generation) * given a context with eliminated MWEs, restore this MWE * Problems: how to evaluate the system * [ALINE] Consider taking into account the level of difficulty of the items? For example, some items will be more ambiguous and more difficult to determine * [JOAKIM] It is unclear which capacity of models we test * [TOM] Very difficult to evaluate, even manually. * subtask 3 (MWE comprehension/disambiguation) * Given a sentence and a span of a potential idiomatic expressions, classify it as idiomatic, literal or coincidental * [GULSEN] There are some datasets for this task. Maybe the 3rd category complicates the things. * [JOAKIM] * [TOM] The same as SemEval 2022 (EN, PT, Galician). There are artefact issues (the models don’t really pay attention to the context). * subtask 4 (paraphrasing) * Given a sentence, rephrase it so that there are no MWEs * [AGATA] The input should be raw text, without a span. Objective: simplification of a text. * [JOAKIM] The most natural tasks among (2, 3 and 4). Close to what people do with LLMs. * Can we avoid doing manual evaluation? (LLM as judge) * [TOM] His favorite * [ALINE] They work with questionnaires for humans for this problem. There is a synonym dataset. Another task: collect sentences with synonyms of MWEs. * [ALINE] Sometimes the simplest way to express a meaning is with a MWE. * Questions: * Which subtasks to choose? * How to evaluate them? ====== AdMIRe extension ====== * Tom’s [[https://docs.google.com/presentation/d/1PLeZfHiZeU7NY8BS6AmnEunnsPsk_MOwucOSzYusBD8/edit?usp=sharing|slides]] * [[https://semeval2025-task1.github.io/|Task website]] * [[https://docs.google.com/document/d/1Suor8arKN5Npg9I4LEqpCma6p_k9vo3ZilioPltXtdA/edit?tab=t.0#heading=h.109xvas7yti|Data curation guidelines & notes]]