Feature #3190

Import, CoNLL-U corpus

Added by Matthieu Decorde almost 2 years ago. Updated about 1 year ago.

Status:New Start date:12/07/2021
Priority:Normal Due date:
Assignee:- % Done:

60%

Category:Import Spent time: -
Target version:TXM Profiterole 2.0

Description

Add a new import module : conllu

The import creates a CQP corpus with the ud words and properties

+ create CQP structures using CoNLL-U comment lines

+ based on the XTZ import with texts order, metadata, XSLs, ...

+ manage the word contractions

+ it creates the head-* and deps-* pre-computed conllu properties

+ it creates also a TIGERSearch representation

see https://groupes.renater.fr/wiki/txm-info/public/chantier_profiterole/chantier_finalisation_extension_syntactic_annotation and https://groupes.renater.fr/wiki/txm-info/public/import/conllu


Related issues

related to Bug #3296: Import, CoNLL-U corpus, '--' sentence comment word breaks... New 10/06/2022

History

#1 Updated by Matthieu Decorde almost 2 years ago

  • % Done changed from 0 to 50

head and deps projection not done yet

#2 Updated by Matthieu Decorde almost 2 years ago

  • Description updated (diff)

#3 Updated by Matthieu Decorde almost 2 years ago

  • % Done changed from 50 to 80

#4 Updated by Matthieu Decorde about 1 year ago

  • % Done changed from 80 to 60

split the conllu files before all processes

Also available in: Atom PDF