meetings:other-events:1st_unidive_training_school:courses
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
meetings:other-events:1st_unidive_training_school:courses [2024/05/30 15:22] – [Corpus annotation infrastructure] agata.savary | meetings:other-events:1st_unidive_training_school:courses [2024/06/22 10:12] (current) – agata.savary | ||
---|---|---|---|
Line 26: | Line 26: | ||
* **Pre-requisites**: | * **Pre-requisites**: | ||
* being concerned by syntactic annotation | * being concerned by syntactic annotation | ||
- | * ideally, having some data you want to annotate | + | * preparing |
* **Preparatory work** (offered in a parallel course by Bruno Guillaume): | * **Preparatory work** (offered in a parallel course by Bruno Guillaume): | ||
Line 32: | Line 32: | ||
* comparing UD and SUD annotation | * comparing UD and SUD annotation | ||
- | * **Further | + | * **Recommended |
* Lucien Tesnière (2015), [[https:// | * Lucien Tesnière (2015), [[https:// | ||
* Igor Mel’cuk (1988), Dependency syntax: theory and practice. SUNY press. | * Igor Mel’cuk (1988), Dependency syntax: theory and practice. SUNY press. | ||
Line 65: | Line 65: | ||
* **Pre-requisites** | * **Pre-requisites** | ||
* theoretical linguistics knowledge (parts of speech, inflection, syntactic structures) | * theoretical linguistics knowledge (parts of speech, inflection, syntactic structures) | ||
- | * Timothy Baldwin and Su Nam Kim. 2010. Multiword expressions. In Nitin Indurkhya and Fred J. Damerau, editors, Handbook of Natural Language Processing, 2 edition, pages 267–292. CRC Press, Taylor and Francis Group, Boca Raton, FL, USA. | + | * prepare a parallel corpus or a monolingual one - see the [[https:// |
- | * **Preparatory work**: To be done by the trainees before the training school: | + | * **Recommended readings** |
- | * prepare a parallel corpus or a monolingual one; it would preferably contain a new language, a new dialect, or a new genre; by “new” we mean “not already covered in the PARSEME 1.3 corpus”. | + | * Timothy Baldwin and Su Nam Kim. 2010. Multiword expressions. In Nitin Indurkhya and Fred J. Damerau, editors, Handbook of Natural Language Processing, 2 edition, pages 267–292. CRC Press, Taylor and Francis Group, Boca Raton, FL, USA. |
=====Corpus annotation infrastructure===== | =====Corpus annotation infrastructure===== | ||
Line 84: | Line 84: | ||
* **Contents** | * **Contents** | ||
- | | ||
* Session 1 (by Daniel Zeman & Agata Savary & Bruno Guillaume): **Git infrastructure** | * Session 1 (by Daniel Zeman & Agata Savary & Bruno Guillaume): **Git infrastructure** | ||
* Git for beginners | * Git for beginners | ||
Line 90: | Line 89: | ||
* PARSEME Gitlab repositories | * PARSEME Gitlab repositories | ||
* Github synchronisation in Grew" | * Github synchronisation in Grew" | ||
- | | ||
* Session 2 (by Bruno Guillaume jointly with Sylvain' | * Session 2 (by Bruno Guillaume jointly with Sylvain' | ||
* Corpus queries with Grew-Match | * Corpus queries with Grew-Match | ||
* UD vs. SUD | * UD vs. SUD | ||
* Corpus annotation with Arborator Grew | * Corpus annotation with Arborator Grew | ||
- | |||
* Session 3 (by Daniel Zeman & Agata Savary): **Corpus format validation** | * Session 3 (by Daniel Zeman & Agata Savary): **Corpus format validation** | ||
* File formats (CoNLL-U, CUPT) | * File formats (CoNLL-U, CUPT) | ||
Line 101: | Line 98: | ||
* PARSEME validator | * PARSEME validator | ||
* UD/PARSEME consistency | * UD/PARSEME consistency | ||
- | |||
* Session 4 (by Bruno Guillaume): **Advanced treebank querying and annotation** | * Session 4 (by Bruno Guillaume): **Advanced treebank querying and annotation** | ||
* querrying PARSEME data | * querrying PARSEME data | ||
* corpus pre-annotation | * corpus pre-annotation | ||
- | | ||
* Session 5 (by Daniel Zeman & Bruno Guillaume): **Corpus quality** | * Session 5 (by Daniel Zeman & Bruno Guillaume): **Corpus quality** | ||
* error mining and correcting with Grew-match | * error mining and correcting with Grew-match | ||
* fixing errors in text editors | * fixing errors in text editors | ||
- | |||
* Session 6 (by Daniel Zeman & Agata Savary): **Documentation and discussion on Git** | * Session 6 (by Daniel Zeman & Agata Savary): **Documentation and discussion on Git** | ||
* Documenting a corpus in README | * Documenting a corpus in README | ||
* UD Github issues | * UD Github issues | ||
* PARSEME Gitlab issues | * PARSEME Gitlab issues | ||
+ | |||
+ | * **Recommended readings** | ||
+ | * [[https:// | ||
+ | * Grew-match [[https:// | ||
+ | * [[https:// | ||
+ | * PARSEME corpus [[https:// | ||
+ | |||
meetings/other-events/1st_unidive_training_school/courses.1717075325.txt.gz · Last modified: by agata.savary