User Tools

Site Tools


wg1:wg1

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
wg1:wg1 [2024/04/08 15:05] – [Upcoming meetings] bruno.guillaumewg1:wg1 [2024/04/10 14:25] (current) – [WG1 Tasks] bruno.guillaume
Line 55: Line 55:
     * __Workplan__: After performing pilot annotation based on the [[https://docs.google.com/document/d/1bvjSwHpj8I2zJXmftCpx19u3BNWdKtdeg21f4YVHhWw/edit#heading=h.gt53hu7d9q5p|draft guidelines for nominal MWEs]] during WG1 Day in Naples, we are currently working on transforming the annotator feedback into Gitlab issue discussions. We plan to have a consolidated version of the guidelines for all MWEs (verbal, nominal, modifier, functional) ready by end of 2024, so as to conduct a large-scale annotation campaign in early 2025. The results will be used for organizing a shared task on automatic MWE identification.     * __Workplan__: After performing pilot annotation based on the [[https://docs.google.com/document/d/1bvjSwHpj8I2zJXmftCpx19u3BNWdKtdeg21f4YVHhWw/edit#heading=h.gt53hu7d9q5p|draft guidelines for nominal MWEs]] during WG1 Day in Naples, we are currently working on transforming the annotator feedback into Gitlab issue discussions. We plan to have a consolidated version of the guidelines for all MWEs (verbal, nominal, modifier, functional) ready by end of 2024, so as to conduct a large-scale annotation campaign in early 2025. The results will be used for organizing a shared task on automatic MWE identification.
     * __How can I contribute:__ <TBA>     * __How can I contribute:__ <TBA>
-    * __Documents__+    * __Documents / Links__
         * [[https://docs.google.com/document/d/1jvOGO2Q_pJpm1rB0B6sAprKzEh2n95Jc-VTktaAW_j8/edit?usp=sharing|Minutes]] from the Task 1.2 meetings         * [[https://docs.google.com/document/d/1jvOGO2Q_pJpm1rB0B6sAprKzEh2n95Jc-VTktaAW_j8/edit?usp=sharing|Minutes]] from the Task 1.2 meetings
         * White paper proposition of the [[https://nejlt.ep.liu.se/article/view/4453|roadmap for UD/PARSEME unification]]         * White paper proposition of the [[https://nejlt.ep.liu.se/article/view/4453|roadmap for UD/PARSEME unification]]
Line 64: Line 64:
  
   * **Task 1.4: Sharing tools, formats, and infrastructure**   * **Task 1.4: Sharing tools, formats, and infrastructure**
-      * [[https://docs.google.com/document/d/1H0-C2bqSD5EzoISxUYnfE-5ZLMhuE8XOa-FrfZANDfk/edit#heading=h.pmv33xdtvdy1|Agenda]] and [[https://docs.google.com/presentation/d/1ygvOkl3MymPtEB-Wt6OA66Di5pvZBaAPrWZHnAel1T8/edit#slide=id.g2b7c5723f86_3_102|report]] from the Naples 2024 meeting +    * __Leaders / Contacts__: Frantisek Forgac, Bruno Guillaume 
 +    * __Objectives__: The general objective of the task is to improve the technical part of annotation activities, focusing on tools, file formats and storage infrastructures. We are currently focusing on two more spectific objectives: 
 +       * Subtask **A**: Provide an overview of existing software and/or tools that support manual linguistic annotation 
 +       * Subtask **B**: Evaluate the pros and cons of tabular formats (such as CoNNL-U) currently used in the UD and Parseme projects 
 +    * __Workplan__:  
 +       * Subtask **A**: The specific objective is to create a comparison table of available manual annotation tools, with a focus on UD and Parseme interests (i.e. morpho-syntactic and multiword expression annotations). The next steps are: 
 +          * Consolidate the set of features to be used in the comparison (the rows of the tables) 
 +          * Create a survey to collect information about each annotation tool 
 +          * Analyse the results of the survey and produce the final version of the table. 
 +       * Subtask **B**: Conduct a detailed analysis of the advantages and disadvantages of the tabular annotation formats, specifically CoNLL-U, as utilized in the Universal Dependencies (UD) and PARSEME projects. The next steps are: 
 +          * Develop a Schema/Definition for Structured Data Format: Consider framing this as part of a shared task in the future 
 +          * Refine Data Encoding Standards: Currently, UD prescribes both WHAT to encode (the content) and HOW to encode it (the format). Ideally, these aspects should be decoupled: 
 +             * The format should dictate HOW to encode data, providing the structural means. 
 +             * Guidelines like UD or others should specify WHAT can be encoded, focusing on content restrictions. This separation would enhance the format's flexibility and adaptability to new types of annotations, while the guidelines ensure relevance of data. 
 +          * Generate Initial Working Examples 
 +             * Convert existing datasets to test the new format. 
 +             * Evaluate and compare these results with those of CoNLL-U and possibly enhanced formats such as CoNLL-U Plus. 
 +    * __How can I contribute?__  
 +      * Join to the ongoing discussions on GitHub (links above) 
 +      * Stay tuned for the call to complete the survey 
 +      * Join the task co-leaders team 
 +    * __Documents__ 
 +      * [[https://docs.google.com/spreadsheets/d/1FZo6sSdIkxXCm9p9FcV8PzVKCeXot6apnnA-zXYhRwk/edit#gid=0|Comparison table]] (WIP) 
 +      * GitHub discussions about [[https://github.com/UniDive/WG1/discussions/1|the comparison table]] and about [[https://github.com/UniDive/WG1/discussions/2|file formats]] 
 +      * Document used in the Task 1.4 session at the WG1 meeting in Naples (February 2024): [[https://docs.google.com/presentation/d/1mCdRAEb7KDgvJEd_QXwzgJHv2Jc3KGOnInFEERQmSUc/edit#slide=id.g2b694e49d96_0_0|Slides]] and [[https://docs.google.com/document/d/1H0-C2bqSD5EzoISxUYnfE-5ZLMhuE8XOa-FrfZANDfk/edit#heading=h.pmv33xdtvdy1|Agenda]]
 ==== Training ==== ==== Training ====
   * [[https://unidive.lisn.upsaclay.fr/doku.php?id=other-events:webinar-1#outcomes|UniDive webinar]] for newcomers to Universal Dependencies, PARSEME and/or Grew-match   * [[https://unidive.lisn.upsaclay.fr/doku.php?id=other-events:webinar-1#outcomes|UniDive webinar]] for newcomers to Universal Dependencies, PARSEME and/or Grew-match
wg1/wg1.txt · Last modified: 2024/04/10 14:25 by bruno.guillaume