wg1:wg1
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
wg1:wg1 [2024/04/08 15:05] – [Upcoming meetings] bruno.guillaume | wg1:wg1 [2024/05/07 10:31] – [WG1 Tasks] bruno.guillaume | ||
---|---|---|---|
Line 47: | Line 47: | ||
==== WG1 Tasks ==== | ==== WG1 Tasks ==== | ||
* **Task 1.1: Linguistic typology and multilingual corpus annotation** | * **Task 1.1: Linguistic typology and multilingual corpus annotation** | ||
+ | * __Leaders / Contacts__: André Coneglian, A. Seza Doğröuz | ||
+ | * __Objectives__: | ||
+ | * __Work plan__: | ||
+ | - Determine ways in which linguistic typology can help in the trade-off between universality and language specific phenomena in corpus annotation (Systematic overview of problematic (or difficult) phenomena for annotation (e.g., noun incorporation, | ||
+ | - Take into account less-resourced languages in corpus annotation so as to create new annotated corpora | ||
+ | - More broadly, assess how annotated treebanks (particularly UD treebanks) can figure in typological research | ||
+ | * __How can I contribute: | ||
+ | * __Documents / Links__: | ||
* [[https:// | * [[https:// | ||
* [[https:// | * [[https:// | ||
Line 55: | Line 63: | ||
* __Workplan__: | * __Workplan__: | ||
* __How can I contribute: | * __How can I contribute: | ||
- | * __Documents__ | + | * __Documents / Links__ |
* [[https:// | * [[https:// | ||
* White paper proposition of the [[https:// | * White paper proposition of the [[https:// | ||
Line 64: | Line 72: | ||
* **Task 1.4: Sharing tools, formats, and infrastructure** | * **Task 1.4: Sharing tools, formats, and infrastructure** | ||
- | * [[https:// | + | * __Leaders / Contacts__: Frantisek Forgac, Bruno Guillaume |
+ | * __Objectives__: | ||
+ | | ||
+ | * Subtask **B**: Evaluate the pros and cons of tabular formats (such as CoNNL-U) currently used in the UD and Parseme projects | ||
+ | * __Workplan__: | ||
+ | * Subtask **A**: The specific objective is to create a comparison table of available manual annotation tools, with a focus on UD and Parseme interests (i.e. morpho-syntactic and multiword expression annotations). The next steps are: | ||
+ | * Consolidate the set of features to be used in the comparison (the rows of the tables) | ||
+ | * Create a survey to collect information about each annotation tool | ||
+ | * Analyse the results of the survey and produce the final version of the table. | ||
+ | * Subtask **B**: Conduct a detailed analysis of the advantages and disadvantages of the tabular annotation formats, specifically CoNLL-U, as utilized in the Universal Dependencies (UD) and PARSEME projects. The next steps are: | ||
+ | * Develop a Schema/ | ||
+ | * Refine Data Encoding Standards: Currently, UD prescribes both WHAT to encode (the content) and HOW to encode it (the format). Ideally, these aspects should be decoupled: | ||
+ | * The format should dictate HOW to encode data, providing the structural means. | ||
+ | * Guidelines like UD or others should specify WHAT can be encoded, focusing on content restrictions. This separation would enhance the format' | ||
+ | * Generate Initial Working Examples | ||
+ | * Convert existing datasets to test the new format. | ||
+ | * Evaluate and compare these results with those of CoNLL-U and possibly enhanced formats such as CoNLL-U Plus. | ||
+ | * __How can I contribute? | ||
+ | * Join to the ongoing discussions on GitHub (links above) | ||
+ | * Stay tuned for the call to complete the survey | ||
+ | * Join the task co-leaders team | ||
+ | * __Documents__ | ||
+ | | ||
+ | * GitHub discussions about [[https:// | ||
+ | * Document used in the Task 1.4 session at the WG1 meeting in Naples (February 2024): | ||
==== Training ==== | ==== Training ==== | ||
* [[https:// | * [[https:// |
wg1/wg1.txt · Last modified: 2024/05/07 10:41 by bruno.guillaume