User Tools

Site Tools


wg3:wg3_meeting_2023-03-17

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
wg3:wg3_meeting_2023-03-17 [2023/03/21 16:43] gulsen.eryigitwg3:wg3_meeting_2023-03-17 [2023/09/20 10:28] (current) – [WG3 1st Meeting Minutes -- 2023-03-17] joakim.nivre
Line 1: Line 1:
-===== WG3 1st Meeting Minutes 2023-03-17 =====+===== WG3 1st Meeting Minutes – 2023-03-17 =====
  
  
-=== Session 1 ===+==== Session 1 ====
  
-  * 10.45–11.00 Introduction to WG3 (slides)+10.45–11.00 Introduction to WG3 ({{ :wg3:WG3_1stMeeting_Slides.pdf |slides}})
  
-  * 11.00–11.30 Brainstorming on ideas and expectations+11.00–11.30 Brainstorming on ideas and expectations 
 + 
 +== Discussion questions: == 
 + 
 +  * What is most important for you in multilingual and cross-lingual NLP? 
 +  * What activities do you think we should prioritize? 
 +  * How can we work together to make progress towards our goals? 
 + 
 +== Points raised: == 
 + 
 +  * Large language models are most important 
 +  * Articulating linguistic theories underlying tools 
 +  * Defining idiosyncrasy and diversity 
 +  * The user perspective is important 
 +  * Supporting low-resource languages through cross-lingual technology 
 +  * Supporting low-resource languages through annotation tools 
 +  * Supporting low-resource languages through data collection 
 +  * Supporting low-resource languages with semantics 
 +  * Tools for all languages – start with morphology 
 +  * Low-resource language is not a homogeneous concept 
 +  * Building resources for specific languages (Serbian) 
 +  * Linking corpus resources between languages 
 +  * Standardized tools applicable to different languages 
 +  * Evaluation of tools – coordinate with other WGs 
 +  * Tracking evaluation status for different types of tools 
 +  * Improved benchmarking and experimental design 
 +  * Organize shared tasks
  
      
 +11.30–12.00 Initial discussion on documentation of tools
 +
 +
 +== Discussion questions: ==
 +
 +  * Which types of tools do we want to include?
 +  * Where do we want to keep the documentation?
 +  * How do we create this documentation/inventory?
 +
 +== Points raised: ==
 +
 +  * A huge multidimensional matrix
 +  * A shared repository
 +  * Tools shared between typologically similar languages
 +  * Consider end users
 +  * Too many languages have nothing – document what is missing rather than what exists
 +  * Connect to CLARIN
 +  * Flagship project on MWE 
 +  * Include all tools or be selective? 
 +  * What about commercial tools? 
 +  * What about tools without documentation?
 +
 +== WG tasks emerging from the discussion: ==
 +
 +  * Define multidimensional taxonomy of tools for documentation
 +  * Define infrastructure and procedure for creating documentation 
 +
 +
 +==== Session 2 ====
 +
 +13.30–13.35 Recap of Session 1 (for new participants)
 +
 +13.35–14.20 Initial discussion on evaluation campaigns 
 +
 +== Background on goals and previous shared tasks == 
 +({{ :wg3:WG3_1stMeeting_Slides.pdf |slides}})
 +
 +== Brainstorming – define a novel shared task/evaluation campaign: ==
 +
 +  * How is the task defined?
 +  * What are the evaluation metrics?
 +  * What kind of data is needed?
 +  * Which languages should be included?
 +
 +== Ideas: ==
 +
 +  * Task = provide resources for shared tasks (eval metrics, test sets)
 +  * Instead of a shared task, build a dynamic leaderboard for LMs
 +  * Compare “traditional methods” to LMs on UD and MWE data
 +  * UD parsing with only surprise test languages, minimize training data
 +  * NLP tasks on top of UD data using linguistically defined embeddings
 +  * Distinguish similar languages or dialects (for example, using MWEs)
 +  * Objective: make every language appear at the center of the world
 +  * Collect idiom data using LLMs, evaluate on gold data
 +
  
 +14.20–14.30 Next steps
  
 +  * Next WG3 meeting in Istanbul, September 8, 2023
 +  * We will focus on documentation of tools
 +  * Two tasks in preparation for the meeting:
 +  - A taxonomy of multi- and cross-lingual language technology
 +  - An infrastructure for multi- and cross-lingual language technology
 +Volunteers for these tasks are encouraged to contact WG leaders by email
  
  
 +14.30–14.45 Presentation of the European Language Equality project ({{ :wg3:WG3_1stMeeting_Slides_ELE.pdf |slides}})
  
  
wg3/wg3_meeting_2023-03-17.1679413433.txt.gz · Last modified: 2023/03/21 16:43 by gulsen.eryigit