User Tools

Site Tools


meetings:other-events:1st_unidive_training_school:courses

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
meetings:other-events:1st_unidive_training_school:courses [2024/04/29 12:39] – [Dependency syntax, Surface-Syntactic UD, and UD] agata.savarymeetings:other-events:1st_unidive_training_school:courses [2024/06/22 10:12] (current) agata.savary
Line 26: Line 26:
   * **Pre-requisites**:   * **Pre-requisites**:
     * being concerned by syntactic annotation     * being concerned by syntactic annotation
-    * ideally, having some data you want to annotate (please take contact before the summer school for the preparation of the data)+    * preparing data to annotate - see [[https://github.com/UniDive/2024-UniDive-Chisinau-training-school#pre-requisites-for-the-courses|instructions]]
  
-  * **Preparatory work**: +  * **Preparatory work** (offered in a parallel course by Bruno Guillaume)
-    * looking at treebanks on Grew-Match (this will be offered in a parallel course by Bruno Guillaume)  +    * looking at treebanks on Grew-Match   
-    * comparing UD and SUD annotation (possible with Grew-Match) +    * comparing UD and SUD annotation
-    * reading (Gerdes et al. 2018) +
-    * reading a book or a tutorial on dependency syntax: (Mel’cuk 1988),  (Tesnière 2015), (Osborne 2019), (Kahane 2013)+
  
-  * **Bibliography**:+  * **Recommended readings**:
     * Lucien Tesnière (2015), [[https://benjamins.com/catalog/z.185|Elements of structural syntax]], Benjamins. ebook in open access.     * Lucien Tesnière (2015), [[https://benjamins.com/catalog/z.185|Elements of structural syntax]], Benjamins. ebook in open access.
     * Igor Mel’cuk (1988), Dependency syntax: theory and practice. SUNY press.     * Igor Mel’cuk (1988), Dependency syntax: theory and practice. SUNY press.
Line 67: Line 65:
   * **Pre-requisites**   * **Pre-requisites**
     * theoretical linguistics knowledge (parts of speech, inflection, syntactic structures)      * theoretical linguistics knowledge (parts of speech, inflection, syntactic structures) 
-    * Timothy Baldwin and Su Nam Kim. 2010. Multiword expressions. In Nitin Indurkhya and Fred J. Damerau, editors, Handbook of Natural Language Processing, 2 edition, pages 267–292. CRC Press, Taylor and Francis Group, Boca Raton, FL, USA.+    * prepare a parallel corpus or a monolingual one - see the [[https://github.com/UniDive/2024-UniDive-Chisinau-training-school#pre-requisites-for-the-courses|instructions]]
  
-  * **Preparatory work**: To be done by the trainees before the training school:  +  * **Recommended readings** 
-    * prepare a parallel corpus or a monolingual one; it would preferably contain a new languagea new dialector a new genre; by “new” we mean “not already covered in the PARSEME 1.3 corpus”.+    * Timothy Baldwin and Su Nam Kim. 2010. Multiword expressions. In Nitin Indurkhya and Fred J. DameraueditorsHandbook of Natural Language Processing, 2 edition, pages 267–292CRC Press, Taylor and Francis Group, Boca Raton, FL, USA.
  
 =====Corpus annotation infrastructure===== =====Corpus annotation infrastructure=====
Line 85: Line 83:
     * mostly practical exercises in corpus querying and processing     * mostly practical exercises in corpus querying and processing
  
-  * **Contents (not necessarily in chronological order)** +  * **Contents** 
-    * Session 1 (by Bruno Guillaume), joined with Sylvain's course in dependency syntax +    * Session 1 (by Daniel Zeman & Agata Savary & Bruno Guillaume): **Git infrastructure** 
-      * Storage formats of dataConNLL-U, CUPT +      * Git for beginners
-      Basic usage of Grew-match of morpho-syntactic treebanks +
-       Hands-on: observe main difference between UD and SUD +
-      ArboratorGrew basic usage: users roles, graphical edition, conllu edition, metadata +
-    Sessions 2-3 (by Bruno Guillaume) +
-      * Advanced usages of Grew-match +
-        * On PARSEME data +
-        * Usage of clustering / tables for corpus maintenance, error mining and checking annotation consistency +
-      * Advanced usage of ArboratorGrew +
-        * usage of rewriting rules for corpus pre-annotation / maintenance +
-        * usage of Parser for pre-annotation +
-        * usage of Github synchronisation +
-    * Session 4 (by Agata Savary) +
-      * Git for beginners:  +
-        * a repository, a clone, a commit +
-        * Git operations: clone, pull, add, commit, push +
-        * branches +
-        *Gitlab vs. Github +
-      * PARSEME Git infrastructure +
-        * PARSEME project on Git and its repositories +
-        * Managing language repositories +
-        * PARSEME utilities +
-        * PARSEME/UD consistency +
-    * Sessions 5-6 (by Daniel Zeman)+
       * UD GitHub repositories       * UD GitHub repositories
-        Branches, push access, pull requests +      PARSEME Gitlab repositories  
-        How to uploadUse git diff before committing and pushing +      Github synchronisation in Grew" 
-        TortoiseGit +    * Session 2 (by Bruno Guillaume jointly with Sylvain's course on dependency syntax)**Basics of treebank querying and annotation** 
-      * Prescribed structure of the dev branch +      Corpus queries with Grew-Match  
-        Do not pull history from the master branch +      * UD vs. SUD  
-        The docs repository, language-specific documentation +      Corpus annotation with Arborator Grew 
-        Working with personal UD repositories +    Session 3 (by Daniel Zeman & Agata Savary): **Corpus format validation** 
-      * Validator +      * File formats (CoNLL-U, CUPT) 
-        On-line report after uploading data +      CoNLL-U validator 
-        How to run locally (there are two scripts!) +      * PARSEME validator 
-        How to locate and fix the error +      UD/PARSEME consistency 
-          Demonstrate some common errors, validation levels +    Session 4 (by Bruno Guillaume)**Advanced treebank querying and annotation** 
-        How to register language-specific features, relation subtypes, auxiliaries +      querrying PARSEME data 
-        How to fix documentation errors (demonstrate+      corpus pre-annotation 
-      * Fixing the errors +    Session 5 (by Daniel Zeman & Bruno Guillaume): **Corpus quality** 
-        Annotation tool (cf. Grew) +      * error mining and correcting with Grew-match 
-        Text editor (do not use Word!+      fixing errors in text editors 
-        Udapi +    Session 6 (by Daniel Zeman & Agata Savary): **Documentation and discussion on Git** 
-      * UD Github issues: asking for linguistic help in docs, reporting bugs in treebank-specific repos +      Documenting a corpus in README 
-        Referring to particular commits, files and lines in the repo.+      * UD Github issues  
 +      * PARSEME Gitlab issues 
 + 
 +  * **Recommended readings** 
 +    * [[https://git-scm.com/book/en/v2|Git manual]] 
 +    Grew-match [[https://universal.grew.fr/?tutorial=yes|tutorial]] and [[https://grew.fr/grew_match/help/|manual]] 
 +    * [[https://universaldependencies.org/tools.html|Tools]] for Universal Dependencies 
 +    * PARSEME corpus [[https://gitlab.com/parseme/corpora/-/wikis/home|wiki]] 
  
meetings/other-events/1st_unidive_training_school/courses.1714387167.txt.gz · Last modified: by agata.savary