User Tools

Site Tools


meetings:other-events:1st_unidive_training_school:courses

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
meetings:other-events:1st_unidive_training_school:courses [2024/04/17 14:19] agata.savarymeetings:other-events:1st_unidive_training_school:courses [2024/05/30 15:22] (current) – [Corpus annotation infrastructure] agata.savary
Line 18: Line 18:
  
   * **Exercises**:    * **Exercises**: 
-    * understanding the SUD (and UD) annotation scheme by exploring some treebanks with Grew-match (SUD_English, converted from UD; SUD_Naija, a native SUD treebank of a pidgincreole of English; mSUD_Beja, a native morph-based SUD treebank glossed in English) (joint session with Bruno Guillaume?)+    * understanding the [[https://surfacesyntacticud.github.io/|SUD]] (and [[https://universaldependencies.org/guidelines.html|UD]]) annotation scheme by exploring some treebanks with [[https://match.grew.fr/|Grew-match]] (SUD_English, converted from UD; [[https://universal.grew.fr/?corpus=SUD_Naija-NSC@2.13|SUD_Naija]], a native SUD treebank of a pidgincreole of English; [[https://universal.grew.fr/?corpus=mSUD_Beja-NSC@2.13|mSUD_Beja]], a native morph-based SUD treebank glossed in English) (joint session with Bruno Guillaume?)
     * example of a SUD annotation from scratch based on data from the participants which are glossed and translated in English     * example of a SUD annotation from scratch based on data from the participants which are glossed and translated in English
       * creation of a project on ArboratorGrew       * creation of a project on ArboratorGrew
Line 28: Line 28:
     * ideally, having some data you want to annotate (please take contact before the summer school for the preparation of the data)     * ideally, having some data you want to annotate (please take contact before the summer school for the preparation of the data)
  
-  * **Preparatory work**: +  * **Preparatory work** (offered in a parallel course by Bruno Guillaume)
-    * looking at treebanks on Grew-Match +    * looking at treebanks on Grew-Match   
-    * comparing UD and SUD annotation (possible with Grew-Match) +    * comparing UD and SUD annotation
-    * reading Gerdes et al. 2018 +
-    * reading a book or a tutorial on dependency syntax: Mel’cuk 1988,  Tesnière 2015, Osborne 2019, Kahane 2013+
  
-=====Annotation multiword expressions for newcomers=====+  * **Further readings**: 
 +    * Lucien Tesnière (2015), [[https://benjamins.com/catalog/z.185|Elements of structural syntax]], Benjamins. ebook in open access. 
 +    * Igor Mel’cuk (1988), Dependency syntax: theory and practice. SUNY press. 
 +    * Timothy Osborne (2019), A Dependency Grammar of English. Benjamins. 
 +    * Sylvain Kahane, 2003, [[https://kahane.fr/wp-content/uploads/2017/01/mtt-handbook2003.pdf|The Meaning-Text Theory]], in Dependency and Valency, Handbooks of Linguistics and Communication Sciences, 25 : 1-2, Berlin/NY: De Gruyter, 32 p. 
 +    * De Marneffe, M. C., Manning, C. D., Nivre, J., & Zeman, D. (2021). [[https://aclanthology.org/2021.cl-2.11/|Universal dependencies]]. Computational linguistics, 47(2), 255-308. 
 +    * Gerdes K., Guillaume B., Kahane S., Perrier G. (2018) [[https://aclanthology.org/W18-6008/|SUD or Surface-Syntactic Universal Dependencies: An annotation scheme near-isomorphic to UD]], Proceedings of the Universal Dependencies Workshop (UDW), EMNLP. 
 +    * Gerdes K., Guillaume B., Kahane S, Perrier G. (2021) [[https://aclanthology.org/2021.depling-1.4.pdf|Starting a new treebank? Go SUD!]], Proceedings of 6th international conference on Dependency Linguistics (DepLing), SyntaxFest, ACL. 
 + 
 +=====Annotation of multiword expressions for newcomers=====
  
   * **Trainers**   * **Trainers**
     * [[https://www.racai.ro/en/about-us/racai-staff/verginica-barbu-mititelu/|Verginica Mititelu]] (Romanian Academy, Bucarest, Romania)     * [[https://www.racai.ro/en/about-us/racai-staff/verginica-barbu-mititelu/|Verginica Mititelu]] (Romanian Academy, Bucarest, Romania)
-    * [[https://www.ilsp.gr/en/members/giouli-voula-2/|Voula Giouli]] (ATHENA Research Centre, Athens and Aricstotle University of Tessaloniki, Greece)+    * [[https://www.ilsp.gr/en/members/giouli-voula-2/|Voula Giouli]] (ATHENA Research Centre, Athens and Aristotle University of Tessaloniki, Greece)
  
   * **Objectives**: Upon completion of the course, the trainees will be able to   * **Objectives**: Upon completion of the course, the trainees will be able to
Line 62: Line 69:
   * **Preparatory work**: To be done by the trainees before the training school:    * **Preparatory work**: To be done by the trainees before the training school: 
     * prepare a parallel corpus or a monolingual one; it would preferably contain a new language, a new dialect, or a new genre; by “new” we mean “not already covered in the PARSEME 1.3 corpus”.     * prepare a parallel corpus or a monolingual one; it would preferably contain a new language, a new dialect, or a new genre; by “new” we mean “not already covered in the PARSEME 1.3 corpus”.
 +
 +=====Corpus annotation infrastructure=====
 +
 +  * **Trainers**
 +    * [[https://members.loria.fr/BGuillaume/|Bruno Guillaume]] (INRIA, LORIA, France)
 +    * [[https://ufal.mff.cuni.cz/daniel-zeman|Daniel Zeman]] (Charles University, Czech Republic)
 +    * [[https://perso.limsi.fr/savary/|Agata Savary]] (Université Paris-Saclay, CNRS, LISN, France)
 +
 +  * **Objectives**: 
 +    * Understand and efficiently use the technical infrastructure supporting UD and PARSEME corpus annotation and query
 +
 +  * **Form of instruction**
 +    * mostly practical exercises in corpus querying and processing
 +
 +  * **Contents**
 +    * Session 1 (by Daniel Zeman & Agata Savary & Bruno Guillaume): **Git infrastructure**
 +      * Git for beginners
 +      * UD GitHub repositories
 +      * PARSEME Gitlab repositories 
 +      * Github synchronisation in Grew"
 +    * Session 2 (by Bruno Guillaume jointly with Sylvain's course on dependency syntax): **Basics of treebank querying and annotation**
 +      * Corpus queries with Grew-Match 
 +      * UD vs. SUD 
 +      * Corpus annotation with Arborator Grew
 +    * Session 3 (by Daniel Zeman & Agata Savary): **Corpus format validation**
 +      * File formats (CoNLL-U, CUPT)
 +      * CoNLL-U validator
 +      * PARSEME validator
 +      * UD/PARSEME consistency
 +    * Session 4 (by Bruno Guillaume): **Advanced treebank querying and annotation**
 +      * querrying PARSEME data
 +      * corpus pre-annotation
 +    * Session 5 (by Daniel Zeman & Bruno Guillaume): **Corpus quality**
 +      * error mining and correcting with Grew-match
 +      * fixing errors in text editors
 +    * Session 6 (by Daniel Zeman & Agata Savary): **Documentation and discussion on Git**
 +      * Documenting a corpus in README
 +      * UD Github issues 
 +      * PARSEME Gitlab issues
 +
meetings/other-events/1st_unidive_training_school/courses.1713356373.txt.gz · Last modified: 2024/04/17 14:19 by agata.savary