User Tools

Site Tools


wg1:wg1

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
wg1:wg1 [2024/04/10 14:25] – [WG1 Tasks] bruno.guillaumewg1:wg1 [2024/06/11 14:20] – [Upcoming meetings] added Task 1.3 meeting atul.kumar.ojha
Line 30: Line 30:
  
 ==== Upcoming meetings ==== ==== Upcoming meetings ====
 +  * WG Task 1.3 Meeting (online) - 26 June 2024, 11:00 **CEST**
   * WG Meeting 8 (online) - 11 April 2024, 09:00 **CEST**   * WG Meeting 8 (online) - 11 April 2024, 09:00 **CEST**
   * WG Meeting 9 (online) - 11 June 2024, 13:30 **CEST**   * WG Meeting 9 (online) - 11 June 2024, 13:30 **CEST**
Line 47: Line 48:
 ==== WG1 Tasks ==== ==== WG1 Tasks ====
   * **Task 1.1: Linguistic typology and multilingual corpus annotation**   * **Task 1.1: Linguistic typology and multilingual corpus annotation**
 +    * __Leaders / Contacts__: André Coneglian, A. Seza Doğröuz
 +    * __Objectives__: Discuss how Linguistic Typology, Corpus Annotation and Universal Dependencies can be brought into mutual relevance. More specifically, discuss how Linguistic Typology and Corpus Annotation can be explored to provide insights into the development of treebanks for new languages not in the UD database.
 +    * __Work plan__:
 +       - Determine ways in which linguistic typology can help in the trade-off between universality and language specific phenomena in corpus annotation (Systematic overview of problematic (or difficult) phenomena for annotation (e.g., noun incorporation, and others) in order to evaluate what solutions what been proposed and whether or not unification is possible).
 +       - Take into account less-resourced languages in corpus annotation so as to create new annotated corpora
 +       - More broadly, assess how annotated treebanks (particularly UD treebanks) can figure in typological research
 +    * __How can I contribute:__ <TBA>
 +    * __Documents / Links__:
       * [[https://docs.google.com/document/d/1fNRToU-LR7MQAQl3CzkDHqxxPXDBy5sWZAByxiXVyQg/edit?usp=sharing|Minutes]] from the task meetings       * [[https://docs.google.com/document/d/1fNRToU-LR7MQAQl3CzkDHqxxPXDBy5sWZAByxiXVyQg/edit?usp=sharing|Minutes]] from the task meetings
       * [[https://docs.google.com/document/d/1QbO0bTfWXSIIuD5M-W_nmy-62m6XGHkVjvV7ta2aHag/edit|Agenda]] and [[https://docs.google.com/presentation/d/1ygvOkl3MymPtEB-Wt6OA66Di5pvZBaAPrWZHnAel1T8/edit#slide=id.g2b7b733a998_0_6|report]] from the Naples 2024 meeting       * [[https://docs.google.com/document/d/1QbO0bTfWXSIIuD5M-W_nmy-62m6XGHkVjvV7ta2aHag/edit|Agenda]] and [[https://docs.google.com/presentation/d/1ygvOkl3MymPtEB-Wt6OA66Di5pvZBaAPrWZHnAel1T8/edit#slide=id.g2b7b733a998_0_6|report]] from the Naples 2024 meeting
Line 60: Line 69:
      
   * **Task 1.3: Extensions and updates to morphosyntactic annotation guidelines**   * **Task 1.3: Extensions and updates to morphosyntactic annotation guidelines**
 +    * __Leaders / Contacts:__ Atul Kr. Ojha, Daniel Zeman
 +    * __Objectives:__ The general objective of the task is to improve the linguistic part of annotation activities, focusing in particular on the Universal Dependencies treebanks (because MWE annotation guidelines are the focus of Task 1.2). In order to not duplicate efforts, most of the work is done directly in the UD infrastructure.
 +      * Subtask **A:** Issues in the [[https://github.com/UniversalDependencies/docs/issues|UD Issue Tracker]]. Select issues of interest, discuss them from the perspective of the languages we know (either during a meeting or directly in the issue tracker, including non-members of UniDive), propose a solution. If UniDivers have issues that are not yet present in the issue tracker, create a new issue there.
 +      * Subtask **B:** Construction-oriented guidelines. The UD website is relatively good as a reference manual, with separate pages for individual part-of-speech tags, morphological features and relations in individual languages. It is not so good in providing the big picture with a wholistic solution for individual constructions and strategies, although there is a growing number of documentation pages that attempt to close this gap. Since 2018, there is also an [[https://universaldependencies.org/workgroups/newdoc/index.html|incubator for construction-oriented documentation]]. We can help this documentation grow, again with a typologically varied perspective, given the joint language expertise present in UniDive.
 +    * __How can I contribute:__
 +      * Join the ongoing discussions on GitHub (UD issue tracker, see the link above).
 +      * If you can write part of the construction-oriented documentation, get in touch with the task leaders.
 +    * __Documents / Links:__
 +      * [[https://docs.google.com/spreadsheets/d/1ANohImm94mhug_Sf0n7StIKQVipW2_63n93Vhl-dP4w/edit#gid=2095968102|Expression of interest]]
       * [[https://docs.google.com/document/d/1Z6MkRiOWWud5Yj5DIY2KH-pEV5VCZwWhqs4IVuMqovc/edit?usp=sharing|Minutes]] from the task meetings       * [[https://docs.google.com/document/d/1Z6MkRiOWWud5Yj5DIY2KH-pEV5VCZwWhqs4IVuMqovc/edit?usp=sharing|Minutes]] from the task meetings
       * [[https://docs.google.com/document/d/1V2844LA8VU76T6vojQ4LEVxYgB_sI4AZ1_QF1WVQIkE/edit#heading=h.jepvhma8ziah|Agenda]] and [[https://docs.google.com/presentation/d/1ygvOkl3MymPtEB-Wt6OA66Di5pvZBaAPrWZHnAel1T8/edit#slide=id.g2b7b733a998_0_19|report]] from the Naples 2024 meeting       * [[https://docs.google.com/document/d/1V2844LA8VU76T6vojQ4LEVxYgB_sI4AZ1_QF1WVQIkE/edit#heading=h.jepvhma8ziah|Agenda]] and [[https://docs.google.com/presentation/d/1ygvOkl3MymPtEB-Wt6OA66Di5pvZBaAPrWZHnAel1T8/edit#slide=id.g2b7b733a998_0_19|report]] from the Naples 2024 meeting
  
   * **Task 1.4: Sharing tools, formats, and infrastructure**   * **Task 1.4: Sharing tools, formats, and infrastructure**
-    * __Leaders / Contacts__: Frantisek Forgac, Bruno Guillaume+    * __Leaders / Contacts__: František Forgáč, Bruno Guillaume
     * __Objectives__: The general objective of the task is to improve the technical part of annotation activities, focusing on tools, file formats and storage infrastructures. We are currently focusing on two more spectific objectives:     * __Objectives__: The general objective of the task is to improve the technical part of annotation activities, focusing on tools, file formats and storage infrastructures. We are currently focusing on two more spectific objectives:
        * Subtask **A**: Provide an overview of existing software and/or tools that support manual linguistic annotation        * Subtask **A**: Provide an overview of existing software and/or tools that support manual linguistic annotation
        * Subtask **B**: Evaluate the pros and cons of tabular formats (such as CoNNL-U) currently used in the UD and Parseme projects        * Subtask **B**: Evaluate the pros and cons of tabular formats (such as CoNNL-U) currently used in the UD and Parseme projects
     * __Workplan__:      * __Workplan__: 
-       * Subtask **A**: The specific objective is to create a comparison table of available manual annotation tools, with a focus on UD and Parseme interests (i.e. morpho-syntactic and multiword expression annotations)The next steps are: +       * Subtask **A**: The specific objective is to create a comparison table of available manual annotation tools morpho-syntactic and multiword expression annotations. A survey will be propose in the upcoming weeks, to collect feedback adn to produce the final version of the table. 
-          * Consolidate the set of features to be used in the comparison (the rows of the tables) +       * Subtask **B**: Conduct a detailed analysis of the advantages and disadvantages of the tabular annotation formats, specifically CoNLL-U, as utilized in the Universal Dependencies (UD) and PARSEME projects. A first draft of an evolution of the formats currently used will be proposed for dicussions and for testing.
-          * Create a survey to collect information about each annotation tool +
-          * Analyse the results of the survey and produce the final version of the table. +
-       * Subtask **B**: Conduct a detailed analysis of the advantages and disadvantages of the tabular annotation formats, specifically CoNLL-U, as utilized in the Universal Dependencies (UD) and PARSEME projects. The next steps are: +
-          * Develop a Schema/Definition for Structured Data Format: Consider framing this as part of a shared task in the future +
-          * Refine Data Encoding Standards: Currently, UD prescribes both WHAT to encode (the content) and HOW to encode it (the format). Ideally, these aspects should be decoupled: +
-             * The format should dictate HOW to encode data, providing the structural means. +
-             * Guidelines like UD or others should specify WHAT can be encoded, focusing on content restrictions. This separation would enhance the format's flexibility and adaptability to new types of annotations, while the guidelines ensure relevance of data. +
-          * Generate Initial Working Examples +
-             * Convert existing datasets to test the new format. +
-             * Evaluate and compare these results with those of CoNLL-U and possibly enhanced formats such as CoNLL-U Plus.+
     * __How can I contribute?__      * __How can I contribute?__ 
       * Join to the ongoing discussions on GitHub (links above)       * Join to the ongoing discussions on GitHub (links above)
Line 89: Line 97:
       * GitHub discussions about [[https://github.com/UniDive/WG1/discussions/1|the comparison table]] and about [[https://github.com/UniDive/WG1/discussions/2|file formats]]       * GitHub discussions about [[https://github.com/UniDive/WG1/discussions/1|the comparison table]] and about [[https://github.com/UniDive/WG1/discussions/2|file formats]]
       * Document used in the Task 1.4 session at the WG1 meeting in Naples (February 2024): [[https://docs.google.com/presentation/d/1mCdRAEb7KDgvJEd_QXwzgJHv2Jc3KGOnInFEERQmSUc/edit#slide=id.g2b694e49d96_0_0|Slides]] and [[https://docs.google.com/document/d/1H0-C2bqSD5EzoISxUYnfE-5ZLMhuE8XOa-FrfZANDfk/edit#heading=h.pmv33xdtvdy1|Agenda]]       * Document used in the Task 1.4 session at the WG1 meeting in Naples (February 2024): [[https://docs.google.com/presentation/d/1mCdRAEb7KDgvJEd_QXwzgJHv2Jc3KGOnInFEERQmSUc/edit#slide=id.g2b694e49d96_0_0|Slides]] and [[https://docs.google.com/document/d/1H0-C2bqSD5EzoISxUYnfE-5ZLMhuE8XOa-FrfZANDfk/edit#heading=h.pmv33xdtvdy1|Agenda]]
 +
 +  * **Task 1.5: Annotation of Spoken data**
 +    * __Leaders / Contacts__: Kaja Dobrovoljc, Sylvain Kahane
 +    * __Objectives__: TBA
 ==== Training ==== ==== Training ====
   * [[https://unidive.lisn.upsaclay.fr/doku.php?id=other-events:webinar-1#outcomes|UniDive webinar]] for newcomers to Universal Dependencies, PARSEME and/or Grew-match   * [[https://unidive.lisn.upsaclay.fr/doku.php?id=other-events:webinar-1#outcomes|UniDive webinar]] for newcomers to Universal Dependencies, PARSEME and/or Grew-match
wg1/wg1.txt · Last modified: 2024/06/11 15:38 by dan.zeman