Annotation guidelines of UD and SUD treebanks for spoken corpora - Archive ouverte HAL Access content directly
Book Sections Year : 2021

Annotation guidelines of UD and SUD treebanks for spoken corpora

(1) , (2) , (1) , (3, 4)
1
2
3
4

Abstract

This paper presents practical and theoretical guidelines for the development of treebanks for spoken languages in the UD and SUD annotation schemes. We discuss text-sound alignment, segmentation into "sentences", use of "punctuation", paradigmatic lists, disfluencies, and paratactic constructions. This proposal is based on the development of (Surface-Syntactic) Universal Dependencies treebanks for spoken French, Naija, and Beja.
Fichier principal
Vignette du fichier
Kahane_Annotation_Guidelines_2021.pdf (974.84 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-03839772 , version 1 (04-11-2022)

Licence

Attribution - CC BY 4.0

Identifiers

  • HAL Id : hal-03839772 , version 1

Cite

Sylvain Kahane, Bernard Caron, Emmett Strickland, Kim Gerdes. Annotation guidelines of UD and SUD treebanks for spoken corpora: a proposal. Daniel Dakota, Kilian Evang, Sandra Kübler. Proceedings of the 20th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2021), Association for Computational Linguistics, pp. 35-47, 2021. ⟨hal-03839772⟩
0 View
0 Download

Share

Gmail Facebook Twitter LinkedIn More