User Tools

Site Tools


meetings:other-events:1st_unidive_training_school:courses

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
meetings:other-events:1st_unidive_training_school:courses [2024/04/17 14:37] agata.savarymeetings:other-events:1st_unidive_training_school:courses [2024/04/29 13:47] – [Dependency syntax, Surface-Syntactic UD, and UD] agata.savary
Line 18: Line 18:
  
   * **Exercises**:    * **Exercises**: 
-    * understanding the SUD (and UD) annotation scheme by exploring some treebanks with Grew-match (SUD_English, converted from UD; SUD_Naija, a native SUD treebank of a pidgincreole of English; mSUD_Beja, a native morph-based SUD treebank glossed in English) (joint session with Bruno Guillaume?)+    * understanding the [[https://surfacesyntacticud.github.io/|SUD]] (and [[https://universaldependencies.org/guidelines.html|UD]]) annotation scheme by exploring some treebanks with [[https://match.grew.fr/|Grew-match]] (SUD_English, converted from UD; [[https://universal.grew.fr/?corpus=SUD_Naija-NSC@2.13|SUD_Naija]], a native SUD treebank of a pidgincreole of English; [[https://universal.grew.fr/?corpus=mSUD_Beja-NSC@2.13|mSUD_Beja]], a native morph-based SUD treebank glossed in English) (joint session with Bruno Guillaume?)
     * example of a SUD annotation from scratch based on data from the participants which are glossed and translated in English     * example of a SUD annotation from scratch based on data from the participants which are glossed and translated in English
       * creation of a project on ArboratorGrew       * creation of a project on ArboratorGrew
Line 28: Line 28:
     * ideally, having some data you want to annotate (please take contact before the summer school for the preparation of the data)     * ideally, having some data you want to annotate (please take contact before the summer school for the preparation of the data)
  
-  * **Preparatory work**: +  * **Preparatory work** (offered in a parallel course by Bruno Guillaume)
-    * looking at treebanks on Grew-Match (this will be offered in a parallel course by Bruno Guillaume)  +    * looking at treebanks on Grew-Match   
-    * comparing UD and SUD annotation (possible with Grew-Match) +    * comparing UD and SUD annotation 
-    * reading Gerdes et al2018 + 
-    * reading a book or a tutorial on dependency syntax: Mel’cuk 1988,  Tesnière 2015, Osborne 2019, Kahane 2013+  * **Further readings**: 
 +    * Lucien Tesnière (2015), [[https://benjamins.com/catalog/z.185|Elements of structural syntax]], Benjamins. ebook in open access
 +    * Igor Mel’cuk (1988)Dependency syntax: theory and practice. SUNY press. 
 +    * Timothy Osborne (2019)A Dependency Grammar of English. Benjamins. 
 +    * Sylvain Kahane, 2003, [[https://kahane.fr/wp-content/uploads/2017/01/mtt-handbook2003.pdf|The Meaning-Text Theory]], in Dependency and Valency, Handbooks of Linguistics and Communication Sciences, 25 : 1-2, Berlin/NY: De Gruyter, 32 p. 
 +    * De Marneffe, M. C., Manning, C. D., Nivre, J., & Zeman, D. (2021). [[https://aclanthology.org/2021.cl-2.11/|Universal dependencies]]. Computational linguistics, 47(2), 255-308. 
 +    * Gerdes K., Guillaume B., Kahane S., Perrier G. (2018) [[https://aclanthology.org/W18-6008/|SUD or Surface-Syntactic Universal Dependencies: An annotation scheme near-isomorphic to UD]], Proceedings of the Universal Dependencies Workshop (UDW), EMNLP. 
 +    * Gerdes K., Guillaume B., Kahane S, Perrier G. (2021) [[https://aclanthology.org/2021.depling-1.4.pdf|Starting a new treebank? Go SUD!]], Proceedings of 6th international conference on Dependency Linguistics (DepLing), SyntaxFest, ACL.
  
 =====Annotation of multiword expressions for newcomers===== =====Annotation of multiword expressions for newcomers=====
Line 38: Line 45:
   * **Trainers**   * **Trainers**
     * [[https://www.racai.ro/en/about-us/racai-staff/verginica-barbu-mititelu/|Verginica Mititelu]] (Romanian Academy, Bucarest, Romania)     * [[https://www.racai.ro/en/about-us/racai-staff/verginica-barbu-mititelu/|Verginica Mititelu]] (Romanian Academy, Bucarest, Romania)
-    * [[https://www.ilsp.gr/en/members/giouli-voula-2/|Voula Giouli]] (ATHENA Research Centre, Athens and Aricstotle University of Tessaloniki, Greece)+    * [[https://www.ilsp.gr/en/members/giouli-voula-2/|Voula Giouli]] (ATHENA Research Centre, Athens and Aristotle University of Tessaloniki, Greece)
  
   * **Objectives**: Upon completion of the course, the trainees will be able to   * **Objectives**: Upon completion of the course, the trainees will be able to
Line 62: Line 69:
   * **Preparatory work**: To be done by the trainees before the training school:    * **Preparatory work**: To be done by the trainees before the training school: 
     * prepare a parallel corpus or a monolingual one; it would preferably contain a new language, a new dialect, or a new genre; by “new” we mean “not already covered in the PARSEME 1.3 corpus”.     * prepare a parallel corpus or a monolingual one; it would preferably contain a new language, a new dialect, or a new genre; by “new” we mean “not already covered in the PARSEME 1.3 corpus”.
 +
 +=====Corpus annotation infrastructure=====
 +
 +  * **Trainers**
 +    * [[https://members.loria.fr/BGuillaume/|Bruno Guillaume]] (INRIA, LORIA, France)
 +    * [[https://ufal.mff.cuni.cz/daniel-zeman|Daniel Zeman]] (Charles University, Czech Republic)
 +    * [[https://perso.limsi.fr/savary/|Agata Savary]] (Université Paris-Saclay, CNRS, LISN, France)
 +
 +  * **Objectives**: 
 +    * Understand and efficiently use the technical infrastructure supporting UD and PARSEME corpus annotation and query
 +
 +  * **Form of instruction**
 +    * mostly practical exercises in corpus querying and processing
 +
 +  * **Contents (not necessarily in chronological order)**
 +    * Session 1 (by Bruno Guillaume), joined with Sylvain's course in dependency syntax
 +      * Storage formats of data: ConNLL-U, CUPT
 +      * Basic usage of Grew-match of morpho-syntactic treebanks
 +      *  Hands-on: observe main difference between UD and SUD
 +      * ArboratorGrew basic usage: users roles, graphical edition, conllu edition, metadata
 +    * Sessions 2-3 (by Bruno Guillaume)
 +      * Advanced usages of Grew-match
 +        * On PARSEME data
 +        * Usage of clustering / tables for corpus maintenance, error mining and checking annotation consistency
 +      * Advanced usage of ArboratorGrew
 +        * usage of rewriting rules for corpus pre-annotation / maintenance
 +        * usage of Parser for pre-annotation
 +        * usage of Github synchronisation
 +    * Session 4 (by Agata Savary)
 +      * Git for beginners: 
 +        * a repository, a clone, a commit
 +        * Git operations: clone, pull, add, commit, push
 +        * branches
 +        *Gitlab vs. Github
 +      * PARSEME Git infrastructure
 +        * PARSEME project on Git and its repositories
 +        * Managing language repositories
 +        * PARSEME utilities
 +        * PARSEME/UD consistency
 +    * Sessions 5-6 (by Daniel Zeman)
 +      * UD GitHub repositories
 +        * Branches, push access, pull requests
 +        * How to upload: Use git diff before committing and pushing
 +        * TortoiseGit
 +      * Prescribed structure of the dev branch
 +        * Do not pull history from the master branch
 +        * The docs repository, language-specific documentation
 +        * Working with personal UD repositories
 +      * Validator
 +        * On-line report after uploading data
 +        * How to run locally (there are two scripts!)
 +        * How to locate and fix the error
 +          * Demonstrate some common errors, validation levels
 +        * How to register language-specific features, relation subtypes, auxiliaries
 +        * How to fix documentation errors (demonstrate)
 +      * Fixing the errors
 +        * Annotation tool (cf. Grew)
 +        * Text editor (do not use Word!)
 +        * Udapi
 +      * UD Github issues: asking for linguistic help in docs, reporting bugs in treebank-specific repos
 +        * Referring to particular commits, files and lines in the repo.
 +
meetings/other-events/1st_unidive_training_school/courses.txt · Last modified: 2024/05/30 15:22 by agata.savary