wg1:wg1
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | |||
wg1:wg1 [2024/04/08 15:05] – [Upcoming meetings] bruno.guillaume | wg1:wg1 [2024/04/10 14:25] (current) – [WG1 Tasks] bruno.guillaume | ||
---|---|---|---|
Line 55: | Line 55: | ||
* __Workplan__: | * __Workplan__: | ||
* __How can I contribute: | * __How can I contribute: | ||
- | * __Documents__ | + | * __Documents / Links__ |
* [[https:// | * [[https:// | ||
* White paper proposition of the [[https:// | * White paper proposition of the [[https:// | ||
Line 64: | Line 64: | ||
* **Task 1.4: Sharing tools, formats, and infrastructure** | * **Task 1.4: Sharing tools, formats, and infrastructure** | ||
- | * [[https:// | + | * __Leaders / Contacts__: Frantisek Forgac, Bruno Guillaume |
+ | * __Objectives__: | ||
+ | | ||
+ | * Subtask **B**: Evaluate the pros and cons of tabular formats (such as CoNNL-U) currently used in the UD and Parseme projects | ||
+ | * __Workplan__: | ||
+ | * Subtask **A**: The specific objective is to create a comparison table of available manual annotation tools, with a focus on UD and Parseme interests (i.e. morpho-syntactic and multiword expression annotations). The next steps are: | ||
+ | * Consolidate the set of features to be used in the comparison (the rows of the tables) | ||
+ | * Create a survey to collect information about each annotation tool | ||
+ | * Analyse the results of the survey and produce the final version of the table. | ||
+ | * Subtask **B**: Conduct a detailed analysis of the advantages and disadvantages of the tabular annotation formats, specifically CoNLL-U, as utilized in the Universal Dependencies (UD) and PARSEME projects. The next steps are: | ||
+ | * Develop a Schema/ | ||
+ | * Refine Data Encoding Standards: Currently, UD prescribes both WHAT to encode (the content) and HOW to encode it (the format). Ideally, these aspects should be decoupled: | ||
+ | * The format should dictate HOW to encode data, providing the structural means. | ||
+ | * Guidelines like UD or others should specify WHAT can be encoded, focusing on content restrictions. This separation would enhance the format' | ||
+ | * Generate Initial Working Examples | ||
+ | * Convert existing datasets to test the new format. | ||
+ | * Evaluate and compare these results with those of CoNLL-U and possibly enhanced formats such as CoNLL-U Plus. | ||
+ | * __How can I contribute? | ||
+ | * Join to the ongoing discussions on GitHub (links above) | ||
+ | * Stay tuned for the call to complete the survey | ||
+ | * Join the task co-leaders team | ||
+ | * __Documents__ | ||
+ | | ||
+ | * GitHub discussions about [[https:// | ||
+ | * Document used in the Task 1.4 session at the WG1 meeting in Naples (February 2024): | ||
==== Training ==== | ==== Training ==== | ||
* [[https:// | * [[https:// |
wg1/wg1.txt · Last modified: 2024/04/10 14:25 by bruno.guillaume