User Tools

Site Tools


meetings:other-events:1st_unidive_training_school:courses

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
meetings:other-events:1st_unidive_training_school:courses [2024/04/17 14:39] agata.savarymeetings:other-events:1st_unidive_training_school:courses [2024/04/29 13:47] (current) – [Dependency syntax, Surface-Syntactic UD, and UD] agata.savary
Line 6: Line 6:
   * **Trainers**   * **Trainers**
     * [[https://kahane.fr/|Sylvain Kahane]] (Université Paris Nanterre and Institut Universitaire de France)     * [[https://kahane.fr/|Sylvain Kahane]] (Université Paris Nanterre and Institut Universitaire de France)
-    * Francis M. Tyers (Indiana University Bloomington, USA) - to confirm 
  
   * **Objectives**:    * **Objectives**: 
Line 19: Line 18:
  
   * **Exercises**:    * **Exercises**: 
-    * understanding the SUD (and UD) annotation scheme by exploring some treebanks with Grew-match (SUD_English, converted from UD; SUD_Naija, a native SUD treebank of a pidgincreole of English; mSUD_Beja, a native morph-based SUD treebank glossed in English) (joint session with Bruno Guillaume?)+    * understanding the [[https://surfacesyntacticud.github.io/|SUD]] (and [[https://universaldependencies.org/guidelines.html|UD]]) annotation scheme by exploring some treebanks with [[https://match.grew.fr/|Grew-match]] (SUD_English, converted from UD; [[https://universal.grew.fr/?corpus=SUD_Naija-NSC@2.13|SUD_Naija]], a native SUD treebank of a pidgincreole of English; [[https://universal.grew.fr/?corpus=mSUD_Beja-NSC@2.13|mSUD_Beja]], a native morph-based SUD treebank glossed in English) (joint session with Bruno Guillaume?)
     * example of a SUD annotation from scratch based on data from the participants which are glossed and translated in English     * example of a SUD annotation from scratch based on data from the participants which are glossed and translated in English
       * creation of a project on ArboratorGrew       * creation of a project on ArboratorGrew
Line 29: Line 28:
     * ideally, having some data you want to annotate (please take contact before the summer school for the preparation of the data)     * ideally, having some data you want to annotate (please take contact before the summer school for the preparation of the data)
  
-  * **Preparatory work**: +  * **Preparatory work** (offered in a parallel course by Bruno Guillaume)
-    * looking at treebanks on Grew-Match (this will be offered in a parallel course by Bruno Guillaume)  +    * looking at treebanks on Grew-Match   
-    * comparing UD and SUD annotation (possible with Grew-Match) +    * comparing UD and SUD annotation 
-    * reading Gerdes et al2018 + 
-    * reading a book or a tutorial on dependency syntax: Mel’cuk 1988,  Tesnière 2015, Osborne 2019, Kahane 2013+  * **Further readings**: 
 +    * Lucien Tesnière (2015), [[https://benjamins.com/catalog/z.185|Elements of structural syntax]], Benjamins. ebook in open access
 +    * Igor Mel’cuk (1988)Dependency syntax: theory and practice. SUNY press. 
 +    * Timothy Osborne (2019)A Dependency Grammar of English. Benjamins. 
 +    * Sylvain Kahane, 2003, [[https://kahane.fr/wp-content/uploads/2017/01/mtt-handbook2003.pdf|The Meaning-Text Theory]], in Dependency and Valency, Handbooks of Linguistics and Communication Sciences, 25 : 1-2, Berlin/NY: De Gruyter, 32 p. 
 +    * De Marneffe, M. C., Manning, C. D., Nivre, J., & Zeman, D. (2021). [[https://aclanthology.org/2021.cl-2.11/|Universal dependencies]]. Computational linguistics, 47(2), 255-308. 
 +    * Gerdes K., Guillaume B., Kahane S., Perrier G. (2018) [[https://aclanthology.org/W18-6008/|SUD or Surface-Syntactic Universal Dependencies: An annotation scheme near-isomorphic to UD]], Proceedings of the Universal Dependencies Workshop (UDW), EMNLP. 
 +    * Gerdes K., Guillaume B., Kahane S, Perrier G. (2021) [[https://aclanthology.org/2021.depling-1.4.pdf|Starting a new treebank? Go SUD!]], Proceedings of 6th international conference on Dependency Linguistics (DepLing), SyntaxFest, ACL.
  
 =====Annotation of multiword expressions for newcomers===== =====Annotation of multiword expressions for newcomers=====
Line 39: Line 45:
   * **Trainers**   * **Trainers**
     * [[https://www.racai.ro/en/about-us/racai-staff/verginica-barbu-mititelu/|Verginica Mititelu]] (Romanian Academy, Bucarest, Romania)     * [[https://www.racai.ro/en/about-us/racai-staff/verginica-barbu-mititelu/|Verginica Mititelu]] (Romanian Academy, Bucarest, Romania)
-    * [[https://www.ilsp.gr/en/members/giouli-voula-2/|Voula Giouli]] (ATHENA Research Centre, Athens and Aricstotle University of Tessaloniki, Greece)+    * [[https://www.ilsp.gr/en/members/giouli-voula-2/|Voula Giouli]] (ATHENA Research Centre, Athens and Aristotle University of Tessaloniki, Greece)
  
   * **Objectives**: Upon completion of the course, the trainees will be able to   * **Objectives**: Upon completion of the course, the trainees will be able to
Line 63: Line 69:
   * **Preparatory work**: To be done by the trainees before the training school:    * **Preparatory work**: To be done by the trainees before the training school: 
     * prepare a parallel corpus or a monolingual one; it would preferably contain a new language, a new dialect, or a new genre; by “new” we mean “not already covered in the PARSEME 1.3 corpus”.     * prepare a parallel corpus or a monolingual one; it would preferably contain a new language, a new dialect, or a new genre; by “new” we mean “not already covered in the PARSEME 1.3 corpus”.
 +
 +=====Corpus annotation infrastructure=====
 +
 +  * **Trainers**
 +    * [[https://members.loria.fr/BGuillaume/|Bruno Guillaume]] (INRIA, LORIA, France)
 +    * [[https://ufal.mff.cuni.cz/daniel-zeman|Daniel Zeman]] (Charles University, Czech Republic)
 +    * [[https://perso.limsi.fr/savary/|Agata Savary]] (Université Paris-Saclay, CNRS, LISN, France)
 +
 +  * **Objectives**: 
 +    * Understand and efficiently use the technical infrastructure supporting UD and PARSEME corpus annotation and query
 +
 +  * **Form of instruction**
 +    * mostly practical exercises in corpus querying and processing
 +
 +  * **Contents (not necessarily in chronological order)**
 +    * Session 1 (by Bruno Guillaume), joined with Sylvain's course in dependency syntax
 +      * Storage formats of data: ConNLL-U, CUPT
 +      * Basic usage of Grew-match of morpho-syntactic treebanks
 +      *  Hands-on: observe main difference between UD and SUD
 +      * ArboratorGrew basic usage: users roles, graphical edition, conllu edition, metadata
 +    * Sessions 2-3 (by Bruno Guillaume)
 +      * Advanced usages of Grew-match
 +        * On PARSEME data
 +        * Usage of clustering / tables for corpus maintenance, error mining and checking annotation consistency
 +      * Advanced usage of ArboratorGrew
 +        * usage of rewriting rules for corpus pre-annotation / maintenance
 +        * usage of Parser for pre-annotation
 +        * usage of Github synchronisation
 +    * Session 4 (by Agata Savary)
 +      * Git for beginners: 
 +        * a repository, a clone, a commit
 +        * Git operations: clone, pull, add, commit, push
 +        * branches
 +        *Gitlab vs. Github
 +      * PARSEME Git infrastructure
 +        * PARSEME project on Git and its repositories
 +        * Managing language repositories
 +        * PARSEME utilities
 +        * PARSEME/UD consistency
 +    * Sessions 5-6 (by Daniel Zeman)
 +      * UD GitHub repositories
 +        * Branches, push access, pull requests
 +        * How to upload: Use git diff before committing and pushing
 +        * TortoiseGit
 +      * Prescribed structure of the dev branch
 +        * Do not pull history from the master branch
 +        * The docs repository, language-specific documentation
 +        * Working with personal UD repositories
 +      * Validator
 +        * On-line report after uploading data
 +        * How to run locally (there are two scripts!)
 +        * How to locate and fix the error
 +          * Demonstrate some common errors, validation levels
 +        * How to register language-specific features, relation subtypes, auxiliaries
 +        * How to fix documentation errors (demonstrate)
 +      * Fixing the errors
 +        * Annotation tool (cf. Grew)
 +        * Text editor (do not use Word!)
 +        * Udapi
 +      * UD Github issues: asking for linguistic help in docs, reporting bugs in treebank-specific repos
 +        * Referring to particular commits, files and lines in the repo.
 +
meetings/other-events/1st_unidive_training_school/courses.1713357583.txt.gz · Last modified: 2024/04/17 14:39 by agata.savary