wg3:wg3_meeting_2023-03-17
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
wg3:wg3_meeting_2023-03-17 [2023/03/21 16:24] – created gulsen.eryigit | wg3:wg3_meeting_2023-03-17 [2023/09/20 10:28] (current) – [WG3 1st Meeting Minutes -- 2023-03-17] joakim.nivre | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ==== COST Action CA21167: " | + | ===== WG3 1st Meeting |
- | WG3 Meeting 2023-03-17 | + | |
- | + | ||
+ | ==== Session 1 ==== | ||
+ | 10.45–11.00 Introduction to WG3 ({{ : | ||
+ | |||
+ | 11.00–11.30 Brainstorming on ideas and expectations | ||
+ | |||
+ | == Discussion questions: == | ||
+ | |||
+ | * What is most important for you in multilingual and cross-lingual NLP? | ||
+ | * What activities do you think we should prioritize? | ||
+ | * How can we work together to make progress towards our goals? | ||
+ | |||
+ | == Points raised: == | ||
+ | |||
+ | * Large language models are most important | ||
+ | * Articulating linguistic theories underlying tools | ||
+ | * Defining idiosyncrasy and diversity | ||
+ | * The user perspective is important | ||
+ | * Supporting low-resource languages through cross-lingual technology | ||
+ | * Supporting low-resource languages through annotation tools | ||
+ | * Supporting low-resource languages through data collection | ||
+ | * Supporting low-resource languages with semantics | ||
+ | * Tools for all languages – start with morphology | ||
+ | * Low-resource language is not a homogeneous concept | ||
+ | * Building resources for specific languages (Serbian) | ||
+ | * Linking corpus resources between languages | ||
+ | * Standardized tools applicable to different languages | ||
+ | * Evaluation of tools – coordinate with other WGs | ||
+ | * Tracking evaluation status for different types of tools | ||
+ | * Improved benchmarking and experimental design | ||
+ | * Organize shared tasks | ||
+ | |||
+ | | ||
+ | 11.30–12.00 Initial discussion on documentation of tools | ||
+ | |||
+ | |||
+ | == Discussion questions: == | ||
+ | |||
+ | * Which types of tools do we want to include? | ||
+ | * Where do we want to keep the documentation? | ||
+ | * How do we create this documentation/ | ||
+ | |||
+ | == Points raised: == | ||
+ | |||
+ | * A huge multidimensional matrix | ||
+ | * A shared repository | ||
+ | * Tools shared between typologically similar languages | ||
+ | * Consider end users | ||
+ | * Too many languages have nothing – document what is missing rather than what exists | ||
+ | * Connect to CLARIN | ||
+ | * Flagship project on MWE | ||
+ | * Include all tools or be selective? | ||
+ | * What about commercial tools? | ||
+ | * What about tools without documentation? | ||
+ | |||
+ | == WG tasks emerging from the discussion: == | ||
+ | |||
+ | * Define multidimensional taxonomy of tools for documentation | ||
+ | * Define infrastructure and procedure for creating documentation | ||
+ | |||
+ | |||
+ | ==== Session 2 ==== | ||
+ | |||
+ | 13.30–13.35 Recap of Session 1 (for new participants) | ||
+ | |||
+ | 13.35–14.20 Initial discussion on evaluation campaigns | ||
+ | |||
+ | == Background on goals and previous shared tasks == | ||
+ | ({{ : | ||
+ | |||
+ | == Brainstorming – define a novel shared task/ | ||
+ | |||
+ | * How is the task defined? | ||
+ | * What are the evaluation metrics? | ||
+ | * What kind of data is needed? | ||
+ | * Which languages should be included? | ||
+ | |||
+ | == Ideas: == | ||
+ | |||
+ | * Task = provide resources for shared tasks (eval metrics, test sets) | ||
+ | * Instead of a shared task, build a dynamic leaderboard for LMs | ||
+ | * Compare “traditional methods” to LMs on UD and MWE data | ||
+ | * UD parsing with only surprise test languages, minimize training data | ||
+ | * NLP tasks on top of UD data using linguistically defined embeddings | ||
+ | * Distinguish similar languages or dialects (for example, using MWEs) | ||
+ | * Objective: make every language appear at the center of the world | ||
+ | * Collect idiom data using LLMs, evaluate on gold data | ||
+ | |||
+ | |||
+ | 14.20–14.30 Next steps | ||
+ | |||
+ | * Next WG3 meeting in Istanbul, September 8, 2023 | ||
+ | * We will focus on documentation of tools | ||
+ | * Two tasks in preparation for the meeting: | ||
+ | - A taxonomy of multi- and cross-lingual language technology | ||
+ | - An infrastructure for multi- and cross-lingual language technology | ||
+ | Volunteers for these tasks are encouraged to contact WG leaders by email | ||
+ | |||
+ | |||
+ | 14.30–14.45 Presentation of the European Language Equality project ({{ : | ||
wg3/wg3_meeting_2023-03-17.txt · Last modified: 2023/09/20 10:28 by joakim.nivre