User Tools

Site Tools


meetings:other-events:1st_unidive_training_school:courses

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
meetings:other-events:1st_unidive_training_school:courses [2024/05/30 15:22] – [Corpus annotation infrastructure] agata.savarymeetings:other-events:1st_unidive_training_school:courses [2024/06/22 10:12] (current) agata.savary
Line 26: Line 26:
   * **Pre-requisites**:   * **Pre-requisites**:
     * being concerned by syntactic annotation     * being concerned by syntactic annotation
-    * ideally, having some data you want to annotate (please take contact before the summer school for the preparation of the data)+    * preparing data to annotate - see [[https://github.com/UniDive/2024-UniDive-Chisinau-training-school#pre-requisites-for-the-courses|instructions]]
  
   * **Preparatory work** (offered in a parallel course by Bruno Guillaume):   * **Preparatory work** (offered in a parallel course by Bruno Guillaume):
Line 32: Line 32:
     * comparing UD and SUD annotation     * comparing UD and SUD annotation
  
-  * **Further readings**:+  * **Recommended readings**:
     * Lucien Tesnière (2015), [[https://benjamins.com/catalog/z.185|Elements of structural syntax]], Benjamins. ebook in open access.     * Lucien Tesnière (2015), [[https://benjamins.com/catalog/z.185|Elements of structural syntax]], Benjamins. ebook in open access.
     * Igor Mel’cuk (1988), Dependency syntax: theory and practice. SUNY press.     * Igor Mel’cuk (1988), Dependency syntax: theory and practice. SUNY press.
Line 65: Line 65:
   * **Pre-requisites**   * **Pre-requisites**
     * theoretical linguistics knowledge (parts of speech, inflection, syntactic structures)      * theoretical linguistics knowledge (parts of speech, inflection, syntactic structures) 
-    * Timothy Baldwin and Su Nam Kim. 2010. Multiword expressions. In Nitin Indurkhya and Fred J. Damerau, editors, Handbook of Natural Language Processing, 2 edition, pages 267–292. CRC Press, Taylor and Francis Group, Boca Raton, FL, USA.+    * prepare a parallel corpus or a monolingual one - see the [[https://github.com/UniDive/2024-UniDive-Chisinau-training-school#pre-requisites-for-the-courses|instructions]]
  
-  * **Preparatory work**: To be done by the trainees before the training school:  +  * **Recommended readings** 
-    * prepare a parallel corpus or a monolingual one; it would preferably contain a new languagea new dialector a new genre; by “new” we mean “not already covered in the PARSEME 1.3 corpus”.+    * Timothy Baldwin and Su Nam Kim. 2010. Multiword expressions. In Nitin Indurkhya and Fred J. DameraueditorsHandbook of Natural Language Processing, 2 edition, pages 267–292CRC Press, Taylor and Francis Group, Boca Raton, FL, USA.
  
 =====Corpus annotation infrastructure===== =====Corpus annotation infrastructure=====
Line 84: Line 84:
  
   * **Contents**   * **Contents**
-   
     * Session 1 (by Daniel Zeman & Agata Savary & Bruno Guillaume): **Git infrastructure**     * Session 1 (by Daniel Zeman & Agata Savary & Bruno Guillaume): **Git infrastructure**
       * Git for beginners       * Git for beginners
Line 90: Line 89:
       * PARSEME Gitlab repositories        * PARSEME Gitlab repositories 
       * Github synchronisation in Grew"       * Github synchronisation in Grew"
-   
     * Session 2 (by Bruno Guillaume jointly with Sylvain's course on dependency syntax): **Basics of treebank querying and annotation**     * Session 2 (by Bruno Guillaume jointly with Sylvain's course on dependency syntax): **Basics of treebank querying and annotation**
       * Corpus queries with Grew-Match        * Corpus queries with Grew-Match 
       * UD vs. SUD        * UD vs. SUD 
       * Corpus annotation with Arborator Grew       * Corpus annotation with Arborator Grew
- 
     * Session 3 (by Daniel Zeman & Agata Savary): **Corpus format validation**     * Session 3 (by Daniel Zeman & Agata Savary): **Corpus format validation**
       * File formats (CoNLL-U, CUPT)       * File formats (CoNLL-U, CUPT)
Line 101: Line 98:
       * PARSEME validator       * PARSEME validator
       * UD/PARSEME consistency       * UD/PARSEME consistency
- 
     * Session 4 (by Bruno Guillaume): **Advanced treebank querying and annotation**     * Session 4 (by Bruno Guillaume): **Advanced treebank querying and annotation**
       * querrying PARSEME data       * querrying PARSEME data
       * corpus pre-annotation       * corpus pre-annotation
-       
     * Session 5 (by Daniel Zeman & Bruno Guillaume): **Corpus quality**     * Session 5 (by Daniel Zeman & Bruno Guillaume): **Corpus quality**
       * error mining and correcting with Grew-match       * error mining and correcting with Grew-match
       * fixing errors in text editors       * fixing errors in text editors
- 
     * Session 6 (by Daniel Zeman & Agata Savary): **Documentation and discussion on Git**     * Session 6 (by Daniel Zeman & Agata Savary): **Documentation and discussion on Git**
       * Documenting a corpus in README       * Documenting a corpus in README
       * UD Github issues        * UD Github issues 
       * PARSEME Gitlab issues       * PARSEME Gitlab issues
 +
 +  * **Recommended readings**
 +    * [[https://git-scm.com/book/en/v2|Git manual]]
 +    * Grew-match [[https://universal.grew.fr/?tutorial=yes|tutorial]] and [[https://grew.fr/grew_match/help/|manual]]
 +    * [[https://universaldependencies.org/tools.html|Tools]] for Universal Dependencies
 +    * PARSEME corpus [[https://gitlab.com/parseme/corpora/-/wikis/home|wiki]]
 +
  
meetings/other-events/1st_unidive_training_school/courses.1717075325.txt.gz · Last modified: by agata.savary