meetings:other-events:1st_unidive_training_school:courses
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
meetings:other-events:1st_unidive_training_school:courses [2024/04/17 14:19] – agata.savary | meetings:other-events:1st_unidive_training_school:courses [2024/05/30 15:22] (current) – [Corpus annotation infrastructure] agata.savary | ||
---|---|---|---|
Line 18: | Line 18: | ||
* **Exercises**: | * **Exercises**: | ||
- | * understanding the SUD (and UD) annotation scheme by exploring some treebanks with Grew-match (SUD_English, | + | * understanding the [[https:// |
* example of a SUD annotation from scratch based on data from the participants which are glossed and translated in English | * example of a SUD annotation from scratch based on data from the participants which are glossed and translated in English | ||
* creation of a project on ArboratorGrew | * creation of a project on ArboratorGrew | ||
Line 28: | Line 28: | ||
* ideally, having some data you want to annotate (please take contact before the summer school for the preparation of the data) | * ideally, having some data you want to annotate (please take contact before the summer school for the preparation of the data) | ||
- | * **Preparatory work**: | + | * **Preparatory work** |
- | * looking at treebanks on Grew-Match | + | * looking at treebanks on Grew-Match |
- | * comparing UD and SUD annotation | + | * comparing UD and SUD annotation |
- | * reading Gerdes et al. 2018 | + | |
- | * reading a book or a tutorial on dependency syntax: Mel’cuk 1988, Tesnière 2015, Osborne 2019, Kahane 2013 | + | |
- | =====Annotation multiword expressions for newcomers===== | + | * **Further readings**: |
+ | * Lucien Tesnière (2015), [[https:// | ||
+ | * Igor Mel’cuk (1988), Dependency syntax: theory and practice. SUNY press. | ||
+ | * Timothy Osborne (2019), A Dependency Grammar of English. Benjamins. | ||
+ | * Sylvain Kahane, 2003, [[https:// | ||
+ | * De Marneffe, M. C., Manning, C. D., Nivre, J., & Zeman, D. (2021). [[https:// | ||
+ | * Gerdes K., Guillaume B., Kahane S., Perrier G. (2018) [[https:// | ||
+ | * Gerdes K., Guillaume B., Kahane S, Perrier G. (2021) [[https:// | ||
+ | |||
+ | =====Annotation | ||
* **Trainers** | * **Trainers** | ||
* [[https:// | * [[https:// | ||
- | * [[https:// | + | * [[https:// |
* **Objectives**: | * **Objectives**: | ||
Line 62: | Line 69: | ||
* **Preparatory work**: To be done by the trainees before the training school: | * **Preparatory work**: To be done by the trainees before the training school: | ||
* prepare a parallel corpus or a monolingual one; it would preferably contain a new language, a new dialect, or a new genre; by “new” we mean “not already covered in the PARSEME 1.3 corpus”. | * prepare a parallel corpus or a monolingual one; it would preferably contain a new language, a new dialect, or a new genre; by “new” we mean “not already covered in the PARSEME 1.3 corpus”. | ||
+ | |||
+ | =====Corpus annotation infrastructure===== | ||
+ | |||
+ | * **Trainers** | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | |||
+ | * **Objectives**: | ||
+ | * Understand and efficiently use the technical infrastructure supporting UD and PARSEME corpus annotation and query | ||
+ | |||
+ | * **Form of instruction** | ||
+ | * mostly practical exercises in corpus querying and processing | ||
+ | |||
+ | * **Contents** | ||
+ | * Session 1 (by Daniel Zeman & Agata Savary & Bruno Guillaume): **Git infrastructure** | ||
+ | * Git for beginners | ||
+ | * UD GitHub repositories | ||
+ | * PARSEME Gitlab repositories | ||
+ | * Github synchronisation in Grew" | ||
+ | * Session 2 (by Bruno Guillaume jointly with Sylvain' | ||
+ | * Corpus queries with Grew-Match | ||
+ | * UD vs. SUD | ||
+ | * Corpus annotation with Arborator Grew | ||
+ | * Session 3 (by Daniel Zeman & Agata Savary): **Corpus format validation** | ||
+ | * File formats (CoNLL-U, CUPT) | ||
+ | * CoNLL-U validator | ||
+ | * PARSEME validator | ||
+ | * UD/PARSEME consistency | ||
+ | * Session 4 (by Bruno Guillaume): **Advanced treebank querying and annotation** | ||
+ | * querrying PARSEME data | ||
+ | * corpus pre-annotation | ||
+ | * Session 5 (by Daniel Zeman & Bruno Guillaume): **Corpus quality** | ||
+ | * error mining and correcting with Grew-match | ||
+ | * fixing errors in text editors | ||
+ | * Session 6 (by Daniel Zeman & Agata Savary): **Documentation and discussion on Git** | ||
+ | * Documenting a corpus in README | ||
+ | * UD Github issues | ||
+ | * PARSEME Gitlab issues | ||
+ |
meetings/other-events/1st_unidive_training_school/courses.1713356373.txt.gz · Last modified: 2024/04/17 14:19 by agata.savary