Bug #2053
RCP: 0.7.8, XTZ import module, default edition build fails on some corpora
Statut: | New | Début: | 02/03/2017 | |
---|---|---|---|---|
Priorité: | Normal | Echéance: | ||
Assigné à: | - | % réalisé: | 0% |
|
Catégorie: | Import | Temps passé: | - | |
Version cible: | TXM 0.X.X |
Description
To reproduce the bug, use
smb://ensldfs/services/Laboratoires/labo_ana_corpus/Projets/Textométrie/SpUV/SourcesChrétiennes/src/HOMELIE
(confidential)
AL: Default edition fails with the following message:
.Exception in thread "Thread-485" java.lang.NullPointerException
The same corpus works fine if reimporting the XML-TXM with the TXM import module.
SLH: Import doesn't finish in TXM 0.7.8 2017/03/07 09:40. Here are the console messages:
Sauvegarde des paramètres d'importation... Checking corpus name validity with '[A-Z][-A-Z0-9]+': HOMELIE Retriving /home/sheiden/TXM/scripts/import/xtzLoader.groovy from Toolbox plugin if script.date < toolbox_script.date Retrieving file_path from org.txm.toolbox plugin. Exécution du script/home/sheiden/TXM/scripts/import/xtzLoader.groovy -- Split-Merge XSL Step with /home/sheiden/Corpus/src/HOMELIE/xsl/1-split-merge -- Front XSL Step with /home/sheiden/Corpus/src/HOMELIE/xsl/2-front -- Check XML files for well-formedness. .. -- Tokenizing 1 files No tokenization do to. -- Building XML-TXM (1 files) . Building TT source files (1) from directory /home/sheiden/TXM/corpora/HOMELIE/txm/HOMELIE . Applying la.par TreeTagger model on dir: /home/sheiden/TXM/corpora/HOMELIE/treetagger (1 files) Skipping ANNOTATE: '/home/sheiden/Software/TreeTagger/lib/la.par' TreeTagger language model file not found. Error while importing corpus during 'annotate' step, reason=TreeTagger annotation failed. Importation terminée : 197 msec (197 ms) L'import n'a pas abouti.Other bugs:
- some log messages must be corrected:
- "Retriving /home/sheiden/TXM/scripts/import/xtzLoader.groovy from Toolbox plugin if script.date < "
->
"Retrieving /home/sheiden/TXM/scripts/import/xtzLoader.groovy from Toolbox plugin if script.date < ???" - "-- Tokenizing 1 files"
->
"-- Tokenizing the file" - "-- Building XML-TXM (1 files)"
->
"-- Building the XML-TXM file" - "Building TT source files (1) from directory /home/sheiden/TXM/corpora/HOMELIE/txm/HOMELIE"
->
"Building TT source file from directory /home/sheiden/TXM/corpora/HOMELIE/txm/HOMELIE" - "Applying la.par TreeTagger model on dir: /home/sheiden/TXM/corpora/HOMELIE/treetagger (1 files)"
->
"Using the la.par TreeTagger model on the /home/sheiden/TXM/corpora/HOMELIE/treetagger directory file" / "Using the la.par TreeTagger model on the /home/sheiden/TXM/corpora/HOMELIE/treetagger directory files" - "Skipping ANNOTATE: '/home/sheiden/Software/TreeTagger/lib/la.par' TreeTagger language model file not found."
->
"Skipping the ANNOTATE step: /home/sheiden/Software/TreeTagger/lib/la.par TreeTagger language model file not found." - "Error while importing corpus during 'annotate' step, reason=TreeTagger annotation failed."
->
"Error while importing the corpus during the ANNOTATE step, reason=TreeTagger annotation failed." - "Importation terminée : 197 msec (197 ms)"
->
"Importation terminée : temps total 197 ms." - When the "L'import n'a pas abouti." situation occurs, the "Importation terminée : 197 msec (197 ms)" and the "L'import n'a pas abouti." messages should be merged into the following unique message : "L'import n'a pas abouti : temps total 197 ms."
- "Retriving /home/sheiden/TXM/scripts/import/xtzLoader.groovy from Toolbox plugin if script.date < "
- when the "No tokenization do to." case occurs and only one file is processed, the import script should raise an import error.
Historique
#1 Mis à jour par Serge Heiden il y a plus de 8 ans
- Description mis à jour (diff)
- Catégorie mis à Import
#2 Mis à jour par Sebastien Jacquot il y a plus de 7 ans
- Version cible changé de TXM 0.8.0a (split/restructuration) à TXM 0.8.0
#3 Mis à jour par Matthieu Decorde il y a plus de 6 ans
- Version cible changé de TXM 0.8.0 à TXM 0.X.X