User Tools

Site Tools


wg3:wg3_meeting_2025-03-12_edit

Minutes from the eleventh WG 13 meeting (online 2025-03-12 12:00 CET)

Agenda

Subtask reports:

  • Task 3.2: Shared task on morphosyntactic parsing

(Omer Goldman, Leonie Weissweiler, Reut Tsarfaty) Google group Training data and future evaluation code UniDive webpage

  • Task 3.4: Evaluation campaign PARSEME 2.0

(Manon Scholivet, Agata Savary)

  • Taks 3.5: Evaluation campaign AdMIRe

(Thomas Pickard, Aline Villavicencio) General discussion

Next meeting: May 6, 12.00 CEST (on line)

List of Participants

  • Gülşen Eryiğit (chair)
  • Joakim Nivre (co-chair)
  • Roberto Antonio Díaz Hernández
  • Ali Basirat
  • Csilla Horváth
  • Manon Scholivet
  • Rob van der Goot
  • Agata Savary
  • Ranka Stanković
  • Thomas Pickard
  • Aline Villavicencio
  • Tanja Samardzic
  • Dawit J
  • Alina Wróblewska
  • Luka Terčon
  • Olha Kanishcheva
  • Dan Zeman
  • Takuya Nakamura
  • Federica Gamba
  • Carlos Ramisch
  • Flavio Massimiliano Cecchini
  • Gosse Bouma
  • Rusudan Makhachashvili
  • Voula Giouli
  • Ebru Çavuşoğlu
  • Omer Goldman
  • Reut Tsarfaty
  • Chaya Liebeskind
  • Faruk Mardan
  • Adriana Pagano
  • Ilan Kernerman
  • Kutay acar
  • Ludmila Malahov
  • Teresa Lynn
  • Lucía Amorós-Poveda

PARSEME shared task (Manon, Agata)

  • subtask 1 (PARSEME 2.0)
  • quite established framework
  • novelty: non-verbal MWEs, diversity measures
  • subtask 2 (MWE generation)
  • given a context with eliminated MWEs, restore this MWE
  • Problems: how to evaluate the system
  • [ALINE] Consider taking into account the level of difficulty of the items? For example, some items will be more ambiguous and more difficult to determine
  • [JOAKIM] It is unclear which capacity of models we test
  • [TOM] Very difficult to evaluate, even manually.
  • subtask 3 (MWE comprehension/disambiguation)
  • Given a sentence and a span of a potential idiomatic expressions, classify it as idiomatic, literal or coincidental
  • [GULSEN] There are some datasets for this task. Maybe the 3rd category complicates the things.
  • [JOAKIM]
  • [TOM] The same as SemEval 2022 (EN, PT, Galician). There are artefact issues (the models don’t really pay attention to the context).
  • subtask 4 (paraphrasing)
  • Given a sentence, rephrase it so that there are no MWEs
  • [AGATA] The input should be raw text, without a span. Objective: simplification of a text.
  • [JOAKIM] The most natural tasks among (2, 3 and 4). Close to what people do with LLMs.
  • Can we avoid doing manual evaluation? (LLM as judge)
  • [TOM] His favorite
  • [ALINE] They work with questionnaires for humans for this problem. There is a synonym dataset. Another task: collect sentences with synonyms of MWEs.
  • [ALINE] Sometimes the simplest way to express a meaning is with a MWE.
  • Questions:
  • Which subtasks to choose?
  • How to evaluate them?

AdMIRe extension

Translations of this page:
  • en
wg3/wg3_meeting_2025-03-12_edit.txt · Last modified: by gulsen.eryigit