wg1:wg1
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
wg1:wg1 [2024/04/10 14:25] – [WG1 Tasks] bruno.guillaume | wg1:wg1 [2024/06/11 14:20] – [Upcoming meetings] added Task 1.3 meeting atul.kumar.ojha | ||
---|---|---|---|
Line 30: | Line 30: | ||
==== Upcoming meetings ==== | ==== Upcoming meetings ==== | ||
+ | * WG Task 1.3 Meeting (online) - 26 June 2024, 11:00 **CEST** | ||
* WG Meeting 8 (online) - 11 April 2024, 09:00 **CEST** | * WG Meeting 8 (online) - 11 April 2024, 09:00 **CEST** | ||
* WG Meeting 9 (online) - 11 June 2024, 13:30 **CEST** | * WG Meeting 9 (online) - 11 June 2024, 13:30 **CEST** | ||
Line 47: | Line 48: | ||
==== WG1 Tasks ==== | ==== WG1 Tasks ==== | ||
* **Task 1.1: Linguistic typology and multilingual corpus annotation** | * **Task 1.1: Linguistic typology and multilingual corpus annotation** | ||
+ | * __Leaders / Contacts__: André Coneglian, A. Seza Doğröuz | ||
+ | * __Objectives__: | ||
+ | * __Work plan__: | ||
+ | - Determine ways in which linguistic typology can help in the trade-off between universality and language specific phenomena in corpus annotation (Systematic overview of problematic (or difficult) phenomena for annotation (e.g., noun incorporation, | ||
+ | - Take into account less-resourced languages in corpus annotation so as to create new annotated corpora | ||
+ | - More broadly, assess how annotated treebanks (particularly UD treebanks) can figure in typological research | ||
+ | * __How can I contribute: | ||
+ | * __Documents / Links__: | ||
* [[https:// | * [[https:// | ||
* [[https:// | * [[https:// | ||
Line 60: | Line 69: | ||
| | ||
* **Task 1.3: Extensions and updates to morphosyntactic annotation guidelines** | * **Task 1.3: Extensions and updates to morphosyntactic annotation guidelines** | ||
+ | * __Leaders / Contacts:__ Atul Kr. Ojha, Daniel Zeman | ||
+ | * __Objectives: | ||
+ | * Subtask **A:** Issues in the [[https:// | ||
+ | * Subtask **B:** Construction-oriented guidelines. The UD website is relatively good as a reference manual, with separate pages for individual part-of-speech tags, morphological features and relations in individual languages. It is not so good in providing the big picture with a wholistic solution for individual constructions and strategies, although there is a growing number of documentation pages that attempt to close this gap. Since 2018, there is also an [[https:// | ||
+ | * __How can I contribute: | ||
+ | * Join the ongoing discussions on GitHub (UD issue tracker, see the link above). | ||
+ | * If you can write part of the construction-oriented documentation, | ||
+ | * __Documents / Links:__ | ||
+ | * [[https:// | ||
* [[https:// | * [[https:// | ||
* [[https:// | * [[https:// | ||
* **Task 1.4: Sharing tools, formats, and infrastructure** | * **Task 1.4: Sharing tools, formats, and infrastructure** | ||
- | * __Leaders / Contacts__: | + | * __Leaders / Contacts__: |
* __Objectives__: | * __Objectives__: | ||
* Subtask **A**: Provide an overview of existing software and/or tools that support manual linguistic annotation | * Subtask **A**: Provide an overview of existing software and/or tools that support manual linguistic annotation | ||
* Subtask **B**: Evaluate the pros and cons of tabular formats (such as CoNNL-U) currently used in the UD and Parseme projects | * Subtask **B**: Evaluate the pros and cons of tabular formats (such as CoNNL-U) currently used in the UD and Parseme projects | ||
* __Workplan__: | * __Workplan__: | ||
- | * Subtask **A**: The specific objective is to create a comparison table of available manual annotation tools, with a focus on UD and Parseme interests (i.e. morpho-syntactic and multiword expression annotations). The next steps are: | + | * Subtask **A**: The specific objective is to create a comparison table of available manual annotation tools morpho-syntactic and multiword expression annotations. |
- | * Consolidate the set of features to be used in the comparison (the rows of the tables) | + | * Subtask **B**: Conduct a detailed analysis of the advantages and disadvantages of the tabular annotation formats, specifically CoNLL-U, as utilized in the Universal Dependencies (UD) and PARSEME projects. |
- | * Create a survey | + | |
- | * Analyse the results of the survey and produce the final version of the table. | + | |
- | * Subtask **B**: Conduct a detailed analysis of the advantages and disadvantages of the tabular annotation formats, specifically CoNLL-U, as utilized in the Universal Dependencies (UD) and PARSEME projects. | + | |
- | * Develop a Schema/ | + | |
- | * Refine Data Encoding Standards: Currently, UD prescribes both WHAT to encode (the content) and HOW to encode it (the format). Ideally, these aspects should | + | |
- | * The format should dictate HOW to encode data, providing the structural means. | + | |
- | * Guidelines like UD or others should specify WHAT can be encoded, focusing on content restrictions. This separation would enhance the format' | + | |
- | * Generate Initial Working Examples | + | |
- | * Convert existing datasets to test the new format. | + | |
- | * Evaluate and compare these results with those of CoNLL-U and possibly enhanced formats such as CoNLL-U Plus. | + | |
* __How can I contribute? | * __How can I contribute? | ||
* Join to the ongoing discussions on GitHub (links above) | * Join to the ongoing discussions on GitHub (links above) | ||
Line 89: | Line 97: | ||
* GitHub discussions about [[https:// | * GitHub discussions about [[https:// | ||
* Document used in the Task 1.4 session at the WG1 meeting in Naples (February 2024): [[https:// | * Document used in the Task 1.4 session at the WG1 meeting in Naples (February 2024): [[https:// | ||
+ | |||
+ | * **Task 1.5: Annotation of Spoken data** | ||
+ | * __Leaders / Contacts__: Kaja Dobrovoljc, Sylvain Kahane | ||
+ | * __Objectives__: | ||
==== Training ==== | ==== Training ==== | ||
* [[https:// | * [[https:// |
wg1/wg1.txt · Last modified: 2024/06/11 15:38 by dan.zeman