Bug #2053

RCP: 0.7.8, XTZ import module, default edition build fails on some corpora

Added by Alexey Lavrentev almost 3 years ago. Updated 9 months ago.

Status:New Start date:03/02/2017
Priority:Normal Due date:
Assignee:- % Done:

0%

Category:Import Spent time: -
Target version:TXM X.X

Description

To reproduce the bug, use

smb://ensldfs/services/Laboratoires/labo_ana_corpus/Projets/Textométrie/SpUV/SourcesChrétiennes/src/HOMELIE
(confidential)

AL: Default edition fails with the following message:

.Exception in thread "Thread-485" java.lang.NullPointerException

The same corpus works fine if reimporting the XML-TXM with the TXM import module.

SLH: Import doesn't finish in TXM 0.7.8 2017/03/07 09:40. Here are the console messages:

Sauvegarde des paramètres d'importation...
Checking corpus name validity with '[A-Z][-A-Z0-9]+': HOMELIE
Retriving /home/sheiden/TXM/scripts/import/xtzLoader.groovy from Toolbox plugin if script.date < toolbox_script.date
Retrieving file_path from org.txm.toolbox plugin.
Exécution du script/home/sheiden/TXM/scripts/import/xtzLoader.groovy
-- Split-Merge XSL Step with /home/sheiden/Corpus/src/HOMELIE/xsl/1-split-merge
-- Front XSL Step with /home/sheiden/Corpus/src/HOMELIE/xsl/2-front
-- Check XML files for well-formedness.
 ..
-- Tokenizing 1 files
No tokenization do to.
-- Building XML-TXM (1 files)
 .
Building TT source files (1) from directory /home/sheiden/TXM/corpora/HOMELIE/txm/HOMELIE
 .
Applying la.par TreeTagger model on dir: /home/sheiden/TXM/corpora/HOMELIE/treetagger (1 files)
Skipping ANNOTATE: '/home/sheiden/Software/TreeTagger/lib/la.par' TreeTagger language model file not found.
Error while importing corpus during 'annotate' step, reason=TreeTagger annotation failed.

Importation terminée : 197 msec (197 ms)
L'import n'a pas abouti.

Other bugs:
  • some log messages must be corrected:
    • "Retriving /home/sheiden/TXM/scripts/import/xtzLoader.groovy from Toolbox plugin if script.date < " -> "Retrieving /home/sheiden/TXM/scripts/import/xtzLoader.groovy from Toolbox plugin if script.date < ???"
    • "-- Tokenizing 1 files" -> "-- Tokenizing the file"
    • "-- Building XML-TXM (1 files)" -> "-- Building the XML-TXM file"
    • "Building TT source files (1) from directory /home/sheiden/TXM/corpora/HOMELIE/txm/HOMELIE" -> "Building TT source file from directory /home/sheiden/TXM/corpora/HOMELIE/txm/HOMELIE"
    • "Applying la.par TreeTagger model on dir: /home/sheiden/TXM/corpora/HOMELIE/treetagger (1 files)" -> "Using the la.par TreeTagger model on the /home/sheiden/TXM/corpora/HOMELIE/treetagger directory file" / "Using the la.par TreeTagger model on the /home/sheiden/TXM/corpora/HOMELIE/treetagger directory files"
    • "Skipping ANNOTATE: '/home/sheiden/Software/TreeTagger/lib/la.par' TreeTagger language model file not found." -> "Skipping the ANNOTATE step: /home/sheiden/Software/TreeTagger/lib/la.par TreeTagger language model file not found."
    • "Error while importing corpus during 'annotate' step, reason=TreeTagger annotation failed." -> "Error while importing the corpus during the ANNOTATE step, reason=TreeTagger annotation failed."
    • "Importation terminée : 197 msec (197 ms)" -> "Importation terminée : temps total 197 ms."
    • When the "L'import n'a pas abouti." situation occurs, the "Importation terminée : 197 msec (197 ms)" and the "L'import n'a pas abouti." messages should be merged into the following unique message : "L'import n'a pas abouti : temps total 197 ms."
  • when the "No tokenization do to." case occurs and only one file is processed, the import script should raise an import error.

History

#1 Updated by Serge Heiden over 2 years ago

  • Description updated (diff)
  • Category set to Import

#2 Updated by Sebastien Jacquot over 1 year ago

  • Target version changed from TXM 0.8.0a (split/restructuration) to TXM 0.8.0

#3 Updated by Matthieu Decorde 9 months ago

  • Target version changed from TXM 0.8.0 to TXM X.X

Also available in: Atom PDF