Bug #2053

RCP: 0.7.8, XTZ import module, default edition build fails on some corpora

Ajouté par Alexey Lavrentev il y a plus de 8 ans. Mis à jour il y a plus de 6 ans.

Statut:New Début:02/03/2017
Priorité:Normal Echéance:
Assigné à:- % réalisé:

0%

Catégorie:Import Temps passé: -
Version cible:TXM 0.X.X

Description

To reproduce the bug, use

smb://ensldfs/services/Laboratoires/labo_ana_corpus/Projets/Textométrie/SpUV/SourcesChrétiennes/src/HOMELIE
(confidential)

AL: Default edition fails with the following message:

.Exception in thread "Thread-485" java.lang.NullPointerException

The same corpus works fine if reimporting the XML-TXM with the TXM import module.

SLH: Import doesn't finish in TXM 0.7.8 2017/03/07 09:40. Here are the console messages:

Sauvegarde des paramètres d'importation...
Checking corpus name validity with '[A-Z][-A-Z0-9]+': HOMELIE
Retriving /home/sheiden/TXM/scripts/import/xtzLoader.groovy from Toolbox plugin if script.date < toolbox_script.date
Retrieving file_path from org.txm.toolbox plugin.
Exécution du script/home/sheiden/TXM/scripts/import/xtzLoader.groovy
-- Split-Merge XSL Step with /home/sheiden/Corpus/src/HOMELIE/xsl/1-split-merge
-- Front XSL Step with /home/sheiden/Corpus/src/HOMELIE/xsl/2-front
-- Check XML files for well-formedness.
 ..
-- Tokenizing 1 files
No tokenization do to.
-- Building XML-TXM (1 files)
 .
Building TT source files (1) from directory /home/sheiden/TXM/corpora/HOMELIE/txm/HOMELIE
 .
Applying la.par TreeTagger model on dir: /home/sheiden/TXM/corpora/HOMELIE/treetagger (1 files)
Skipping ANNOTATE: '/home/sheiden/Software/TreeTagger/lib/la.par' TreeTagger language model file not found.
Error while importing corpus during 'annotate' step, reason=TreeTagger annotation failed.

Importation terminée : 197 msec (197 ms)
L'import n'a pas abouti.

Other bugs:
  • some log messages must be corrected:
    • "Retriving /home/sheiden/TXM/scripts/import/xtzLoader.groovy from Toolbox plugin if script.date < " -> "Retrieving /home/sheiden/TXM/scripts/import/xtzLoader.groovy from Toolbox plugin if script.date < ???"
    • "-- Tokenizing 1 files" -> "-- Tokenizing the file"
    • "-- Building XML-TXM (1 files)" -> "-- Building the XML-TXM file"
    • "Building TT source files (1) from directory /home/sheiden/TXM/corpora/HOMELIE/txm/HOMELIE" -> "Building TT source file from directory /home/sheiden/TXM/corpora/HOMELIE/txm/HOMELIE"
    • "Applying la.par TreeTagger model on dir: /home/sheiden/TXM/corpora/HOMELIE/treetagger (1 files)" -> "Using the la.par TreeTagger model on the /home/sheiden/TXM/corpora/HOMELIE/treetagger directory file" / "Using the la.par TreeTagger model on the /home/sheiden/TXM/corpora/HOMELIE/treetagger directory files"
    • "Skipping ANNOTATE: '/home/sheiden/Software/TreeTagger/lib/la.par' TreeTagger language model file not found." -> "Skipping the ANNOTATE step: /home/sheiden/Software/TreeTagger/lib/la.par TreeTagger language model file not found."
    • "Error while importing corpus during 'annotate' step, reason=TreeTagger annotation failed." -> "Error while importing the corpus during the ANNOTATE step, reason=TreeTagger annotation failed."
    • "Importation terminée : 197 msec (197 ms)" -> "Importation terminée : temps total 197 ms."
    • When the "L'import n'a pas abouti." situation occurs, the "Importation terminée : 197 msec (197 ms)" and the "L'import n'a pas abouti." messages should be merged into the following unique message : "L'import n'a pas abouti : temps total 197 ms."
  • when the "No tokenization do to." case occurs and only one file is processed, the import script should raise an import error.

Historique

#1 Mis à jour par Serge Heiden il y a plus de 8 ans

  • Description mis à jour (diff)
  • Catégorie mis à Import

#2 Mis à jour par Sebastien Jacquot il y a plus de 7 ans

  • Version cible changé de TXM 0.8.0a (split/restructuration) à TXM 0.8.0

#3 Mis à jour par Matthieu Decorde il y a plus de 6 ans

  • Version cible changé de TXM 0.8.0 à TXM 0.X.X

Formats disponibles : Atom PDF