User Tools

Site Tools


wg3:wg3_meeting_2023-03-17

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revisionBoth sides next revision
wg3:wg3_meeting_2023-03-17 [2023/03/21 16:24] – created gulsen.eryigitwg3:wg3_meeting_2023-03-17 [2023/03/21 16:54] gulsen.eryigit
Line 1: Line 1:
-==== COST Action CA21167: "Universality, diversity and idiosyncrasy in language technology" +====WG3 1st Meeting Minutes - 2023-03-17 =====
-WG3 Meeting 2023-03-17 Minutes ==== +
- +
  
  
 +==== Session 1 ====
 +
 +10.45–11.00 Introduction to WG3 (slides)
 +
 +11.00–11.30 Brainstorming on ideas and expectations
 +
 +== Discussion questions: ==
 +
 +  * What is most important for you in multilingual and cross-lingual NLP?
 +  * What activities do you think we should prioritize?
 +  * How can we work together to make progress towards our goals?
 +
 +== Points raised: ==
 +
 +  * Large language models are most important
 +  * Articulating linguistic theories underlying tools
 +  * Defining idiosyncrasy and diversity
 +  * The user perspective is important
 +  * Supporting low-resource languages through cross-lingual technology
 +  * Supporting low-resource languages through annotation tools
 +  * Supporting low-resource languages through data collection
 +  * Supporting low-resource languages with semantics
 +  * Tools for all languages – start with morphology
 +  * Low-resource language is not a homogeneous concept
 +  * Building resources for specific languages (Serbian)
 +  * Linking corpus resources between languages
 +  * Standardized tools applicable to different languages
 +  * Evaluation of tools – coordinate with other WGs
 +  * Tracking evaluation status for different types of tools
 +  * Improved benchmarking and experimental design
 +  * Organize shared tasks
 +
 +  
 +11.30–12.00 Initial discussion on documentation of tools
 +
 +
 +== Discussion questions: ==
 +
 +  * Which types of tools do we want to include?
 +  * Where do we want to keep the documentation?
 +  * How do we create this documentation/inventory?
 +
 +== Points raised: ==
 +
 +  * A huge multidimensional matrix
 +  * A shared repository
 +  * Tools shared between typologically similar languages
 +  * Consider end users
 +  * Too many languages have nothing – document what is missing rather than what exists
 +  * Connect to CLARIN
 +  * Flagship project on MWE 
 +  * Include all tools or be selective? 
 +  * What about commercial tools? 
 +  * What about tools without documentation?
 +
 +== WG tasks emerging from the discussion: ==
 +
 +  * Define multidimensional taxonomy of tools for documentation
 +  * Define infrastructure and procedure for creating documentation 
 +
 +
 +==== Session 2 ====
 +
 +13.30–13.35 Recap of Session 1 (for new participants)
 +
 +13.35–14.20 Initial discussion on evaluation campaigns 
 +
 +== Background on goals and previous shared tasks (slides) ==
 +
 +== Brainstorming – define a novel shared task/evaluation campaign: ==
 +
 +  * How is the task defined?
 +  * What are the evaluation metrics?
 +  * What kind of data is needed?
 +  * Which languages should be included?
 +
 +== Ideas: ==
 +
 +  * Task = provide resources for shared tasks (eval metrics, test sets)
 +  * Instead of a shared task, build a dynamic leaderboard for LMs
 +  * Compare “traditional methods” to LMs on UD and MWE data
 +  * UD parsing with only surprise test languages, minimize training data
 +  * NLP tasks on top of UD data using linguistically defined embeddings
 +  * Distinguish similar languages or dialects (for example, using MWEs)
 +  * Objective: make every language appear at the center of the world
 +  * Collect idiom data using LLMs, evaluate on gold data
 +
 +
 +14.20–14.30 Next steps
 +
 + Next WG3 meeting in Istanbul, September 8, 2023
 + We will focus on documentation of tools
 + Two tasks in preparation for the meeting:
 +A taxonomy of multi- and cross-lingual language technology
 +An infrastructure for multi- and cross-lingual language technology
 + Volunteers for these tasks are encouraged to contact WG leaders by email
 +
 +
 +14.30–14.45 Presentation of the European Language Equality project (slides)
  
  
  
wg3/wg3_meeting_2023-03-17.txt · Last modified: 2023/09/20 10:28 by joakim.nivre