===== WG3 1st Meeting Minutes – 2023-03-17 ===== ==== Session 1 ==== 10.45–11.00 Introduction to WG3 ({{ :wg3:WG3_1stMeeting_Slides.pdf |slides}}) 11.00–11.30 Brainstorming on ideas and expectations == Discussion questions: == * What is most important for you in multilingual and cross-lingual NLP? * What activities do you think we should prioritize? * How can we work together to make progress towards our goals? == Points raised: == * Large language models are most important * Articulating linguistic theories underlying tools * Defining idiosyncrasy and diversity * The user perspective is important * Supporting low-resource languages through cross-lingual technology * Supporting low-resource languages through annotation tools * Supporting low-resource languages through data collection * Supporting low-resource languages with semantics * Tools for all languages – start with morphology * Low-resource language is not a homogeneous concept * Building resources for specific languages (Serbian) * Linking corpus resources between languages * Standardized tools applicable to different languages * Evaluation of tools – coordinate with other WGs * Tracking evaluation status for different types of tools * Improved benchmarking and experimental design * Organize shared tasks 11.30–12.00 Initial discussion on documentation of tools == Discussion questions: == * Which types of tools do we want to include? * Where do we want to keep the documentation? * How do we create this documentation/inventory? == Points raised: == * A huge multidimensional matrix * A shared repository * Tools shared between typologically similar languages * Consider end users * Too many languages have nothing – document what is missing rather than what exists * Connect to CLARIN * Flagship project on MWE * Include all tools or be selective? * What about commercial tools? * What about tools without documentation? == WG tasks emerging from the discussion: == * Define multidimensional taxonomy of tools for documentation * Define infrastructure and procedure for creating documentation ==== Session 2 ==== 13.30–13.35 Recap of Session 1 (for new participants) 13.35–14.20 Initial discussion on evaluation campaigns == Background on goals and previous shared tasks == ({{ :wg3:WG3_1stMeeting_Slides.pdf |slides}}) == Brainstorming – define a novel shared task/evaluation campaign: == * How is the task defined? * What are the evaluation metrics? * What kind of data is needed? * Which languages should be included? == Ideas: == * Task = provide resources for shared tasks (eval metrics, test sets) * Instead of a shared task, build a dynamic leaderboard for LMs * Compare “traditional methods” to LMs on UD and MWE data * UD parsing with only surprise test languages, minimize training data * NLP tasks on top of UD data using linguistically defined embeddings * Distinguish similar languages or dialects (for example, using MWEs) * Objective: make every language appear at the center of the world * Collect idiom data using LLMs, evaluate on gold data 14.20–14.30 Next steps * Next WG3 meeting in Istanbul, September 8, 2023 * We will focus on documentation of tools * Two tasks in preparation for the meeting: - A taxonomy of multi- and cross-lingual language technology - An infrastructure for multi- and cross-lingual language technology Volunteers for these tasks are encouraged to contact WG leaders by email 14.30–14.45 Presentation of the European Language Equality project ({{ :wg3:WG3_1stMeeting_Slides_ELE.pdf |slides}})