Bug #2053

Updated by Serge Heiden over 3 years ago

To reproduce the bug, use

%smb://ensldfs/services/Laboratoires/labo_ana_corpus/Projets/Textométrie/SpUV/SourcesChrétiennes/src/HOMELIE%
(confidential)

AL: Default edition fails with the following message:
<pre> <code>
.Exception in thread "Thread-485" java.lang.NullPointerException
</pre> </code>

The same corpus works fine if reimporting the XML-TXM with the TXM import module.

SLH: Import doesn't finish in TXM 0.7.8 2017/03/07 09:40. Here are the console messages:
<pre>
Sauvegarde des paramètres d'importation...
Checking corpus name validity with '[A-Z][-A-Z0-9]+': HOMELIE
Retriving /home/sheiden/TXM/scripts/import/xtzLoader.groovy from Toolbox plugin if script.date < toolbox_script.date
Retrieving file_path from org.txm.toolbox plugin.
Exécution du script/home/sheiden/TXM/scripts/import/xtzLoader.groovy
-- Split-Merge XSL Step with /home/sheiden/Corpus/src/HOMELIE/xsl/1-split-merge
-- Front XSL Step with /home/sheiden/Corpus/src/HOMELIE/xsl/2-front
-- Check XML files for well-formedness.
..
-- Tokenizing 1 files
No tokenization do to.
-- Building XML-TXM (1 files)
.
Building TT source files (1) from directory /home/sheiden/TXM/corpora/HOMELIE/txm/HOMELIE
.
Applying la.par TreeTagger model on dir: /home/sheiden/TXM/corpora/HOMELIE/treetagger (1 files)
Skipping ANNOTATE: '/home/sheiden/Software/TreeTagger/lib/la.par' TreeTagger language model file not found.
Error while importing corpus during 'annotate' step, reason=TreeTagger annotation failed.

Importation terminée : 197 msec (197 ms)
L'import n'a pas abouti.
</pre>

Other bugs:
* some log messages must be corrected:
** "Retriving /home/sheiden/TXM/scripts/import/xtzLoader.groovy from Toolbox plugin if script.date < " @->@ "Retrieving /home/sheiden/TXM/scripts/import/xtzLoader.groovy from Toolbox plugin if script.date < ???"
** "-- Tokenizing 1 files" @->@ "-- Tokenizing the file"
** "-- Building XML-TXM (1 files)" @->@ "-- Building the XML-TXM file"
** "Building TT source files (1) from directory /home/sheiden/TXM/corpora/HOMELIE/txm/HOMELIE" @->@ "Building TT source file from directory /home/sheiden/TXM/corpora/HOMELIE/txm/HOMELIE"
** "Applying la.par TreeTagger model on dir: /home/sheiden/TXM/corpora/HOMELIE/treetagger (1 files)" @->@ "Using the la.par TreeTagger model on the /home/sheiden/TXM/corpora/HOMELIE/treetagger directory file" / "Using the la.par TreeTagger model on the /home/sheiden/TXM/corpora/HOMELIE/treetagger directory files"
** "Skipping ANNOTATE: '/home/sheiden/Software/TreeTagger/lib/la.par' TreeTagger language model file not found." @->@ "Skipping the ANNOTATE step: /home/sheiden/Software/TreeTagger/lib/la.par TreeTagger language model file not found."
** "Error while importing corpus during 'annotate' step, reason=TreeTagger annotation failed." @->@ "Error while importing the corpus during the ANNOTATE step, reason=TreeTagger annotation failed."
** "Importation terminée : 197 msec (197 ms)" @->@ "Importation terminée : temps total 197 ms."
** When the "L'import n'a pas abouti." situation occurs, the "Importation terminée : 197 msec (197 ms)" and the "L'import n'a pas abouti." messages should be merged into the following unique message : "L'import n'a pas abouti : temps total 197 ms."
* when the "No tokenization do to." case occurs and only one file is processed, the import script should raise an import error.

Back