meetings:other-events:1st_unidive_training_school:courses
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
meetings:other-events:1st_unidive_training_school:courses [2024/04/29 12:13] – agata.savary | meetings:other-events:1st_unidive_training_school:courses [2024/05/30 15:22] (current) – [Corpus annotation infrastructure] agata.savary | ||
---|---|---|---|
Line 18: | Line 18: | ||
* **Exercises**: | * **Exercises**: | ||
- | * understanding the SUD (and UD) annotation scheme by exploring some treebanks with Grew-match (SUD_English, | + | * understanding the [[https:// |
* example of a SUD annotation from scratch based on data from the participants which are glossed and translated in English | * example of a SUD annotation from scratch based on data from the participants which are glossed and translated in English | ||
* creation of a project on ArboratorGrew | * creation of a project on ArboratorGrew | ||
Line 28: | Line 28: | ||
* ideally, having some data you want to annotate (please take contact before the summer school for the preparation of the data) | * ideally, having some data you want to annotate (please take contact before the summer school for the preparation of the data) | ||
- | * **Preparatory work**: | + | * **Preparatory work** |
- | * looking at treebanks on Grew-Match | + | * looking at treebanks on Grew-Match |
- | * comparing UD and SUD annotation | + | * comparing UD and SUD annotation |
- | * reading Gerdes et al. 2018 | + | |
- | * reading a book or a tutorial on dependency syntax: | + | * **Further readings**: |
+ | * Lucien Tesnière (2015), [[https:// | ||
+ | * Igor Mel’cuk | ||
+ | * Timothy | ||
+ | * Sylvain | ||
+ | * De Marneffe, M. C., Manning, C. D., Nivre, J., & Zeman, D. (2021). [[https:// | ||
+ | * Gerdes K., Guillaume B., Kahane S., Perrier G. (2018) [[https:// | ||
+ | * Gerdes K., Guillaume B., Kahane S, Perrier G. (2021) [[https:// | ||
=====Annotation of multiword expressions for newcomers===== | =====Annotation of multiword expressions for newcomers===== | ||
Line 76: | Line 83: | ||
* mostly practical exercises in corpus querying and processing | * mostly practical exercises in corpus querying and processing | ||
- | * **Contents | + | * **Contents** |
- | * Session 1 (by Bruno Guillaume), joined with Sylvain' | + | * Session 1 (by Daniel Zeman & Agata Savary & Bruno Guillaume): **Git infrastructure** |
- | * Storage formats of data: ConNLL-U, CUPT | + | * Git for beginners |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | * Advanced usages of Grew-match | + | |
- | * On PARSEME data | + | |
- | * Usage of clustering / tables for corpus maintenance, | + | |
- | * Advanced usage of ArboratorGrew | + | |
- | * usage of rewriting rules for corpus pre-annotation / maintenance | + | |
- | * usage of Parser for pre-annotation | + | |
- | * usage of Github synchronisation | + | |
- | * Session 4 (by Agata Savary) | + | |
- | * Git for beginners: | + | |
- | * a repository, a clone, a commit | + | |
- | * Git operations: clone, pull, add, commit, push | + | |
- | * branches | + | |
- | *Gitlab vs. Github | + | |
- | * PARSEME Git infrastructure | + | |
- | * PARSEME project on Git and its repositories | + | |
- | * Managing language repositories | + | |
- | * PARSEME utilities | + | |
- | * PARSEME/UD consistency | + | |
- | * Sessions 5-6 (by Daniel Zeman) | + | |
* UD GitHub repositories | * UD GitHub repositories | ||
- | | + | |
- | * How to upload: Use git diff before committing | + | * Github synchronisation in Grew" |
- | * TortoiseGit | + | * Session 2 (by Bruno Guillaume jointly with Sylvain' |
- | * Prescribed structure of the dev branch | + | * Corpus queries with Grew-Match |
- | * Do not pull history from the master branch | + | * UD vs. SUD |
- | * The docs repository, language-specific documentation | + | * Corpus annotation with Arborator Grew |
- | * Working with personal UD repositories | + | * Session 3 (by Daniel Zeman & Agata Savary): **Corpus format validation** |
- | * Validator | + | * File formats (CoNLL-U, CUPT) |
- | * On-line report after uploading data | + | * CoNLL-U validator |
- | * How to run locally | + | * PARSEME validator |
- | | + | * UD/PARSEME consistency |
- | * Demonstrate some common errors, validation levels | + | * Session 4 (by Bruno Guillaume): **Advanced treebank querying |
- | * How to register language-specific features, relation subtypes, auxiliaries | + | * querrying PARSEME data |
- | * How to fix documentation errors | + | * corpus pre-annotation |
- | * Fixing the errors | + | * Session 5 (by Daniel Zeman & Bruno Guillaume): **Corpus quality** |
- | * Annotation tool (cf. Grew) | + | * error mining and correcting with Grew-match |
- | * Text editor | + | * fixing errors in text editors |
- | * Udapi | + | * Session 6 (by Daniel Zeman & Agata Savary): **Documentation and discussion on Git** |
- | * UD Github issues: asking for linguistic help in docs, reporting bugs in treebank-specific repos | + | * Documenting a corpus in README |
- | * Referring to particular commits, files and lines in the repo. | + | * UD Github issues |
+ | * PARSEME Gitlab issues | ||
meetings/other-events/1st_unidive_training_school/courses.1714385617.txt.gz · Last modified: 2024/04/29 12:13 by agata.savary