Pull your treebank up by its own bootstraps - Laboratoire Modèles, Dynamiques, Corpus Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

Pull your treebank up by its own bootstraps

Résumé

We analyze the performance of recent neural syntactic parsers in the task of bootstrapping a treebank, i.e. training and analyzing iteratively in order to enhance speed and quality of the human syntactic analysis. By conducting an extensive and heuristically guided search in the vast grid of options (parser, embedding, configuration, epochs, batch size, size of training set, annotation scheme, language, evaluation method…), we determine the best performing parser configurations: UDify and Trankit share the podium depending on the size of the training set. We also show how these results are integrated into the annotation tool ArboratorGrew, and we propose some preliminary measures that allow predicting the quality of the parse for a new language.
Fichier principal
Vignette du fichier
504.pdf (2.47 Mo) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-03846834 , version 1 (14-11-2022)

Identifiants

  • HAL Id : hal-03846834 , version 1

Citer

Ziqian Peng, Kim Gerdes, Kirian Guiller. Pull your treebank up by its own bootstraps. Journées Jointes des Groupements de Recherche Linguistique Informatique, Formelle et de Terrain (LIFT) et Traitement Automatique des Langues (TAL), Nov 2022, Marseille, France. pp.139-153. ⟨hal-03846834⟩
91 Consultations
56 Téléchargements

Partager

Gmail Facebook X LinkedIn More