Bug #1444
RCP: 0.7.7 Ubuntu1404_64, XML-TMX import module broken
Status: | New | Start date: | 08/25/2015 | ||
---|---|---|---|---|---|
Priority: | Immediate | Due date: | |||
Assignee: | - | % Done: | 80% |
||
Category: | Import | Spent time: | - | ||
Target version: | TXM 0.7.8 |
Description
Import of the UNOSAMPLE demo corpus (source in ///SpUV/uno-sample) does not terminate anymore with following console messages:
Sauvegarde des paramètres d'importation... Tokenizer parametrized with whitespaces=[\p{Z}\p{C}]+ Tokenizer parametrized with regPunct=[\p{Ps}\p{Pe}\p{Pi}\p{Pf}\p{Po}\p{S}] Tokenizer parametrized with punct_strong=[.!?]|\.\.|\.\.\.|…|\| Tokenizer parametrized with regElision=['‘’] Execution du script : /home/sheiden/TXM/scripts/import/tmxLoader.groovy -- IMPORTER - Reading source files skip file : /home/sheiden/Corpus/src/uno-sample/import.xml initialize writers for : /home/sheiden/Corpus/src/uno-sample/uncorpora_20090831-sample-b.tmx build Writer : 0 en build Writer : 1 ar build Writer : 2 zh build Writer : 3 fr build Writer : 4 ru build Writer : 5 es add header : [creationtool:ORESAligner, creationtoolversion:1.0, datatype:plaintext, segtype:paragraph, adminlang:en-us, srclang:EN, o-tmf:ORES] initialize writers for : /home/sheiden/Corpus/src/uno-sample/uncorpora_20090831-sample-a.tmx build Writer : 0 en build Writer : 1 ar build Writer : 2 zh build Writer : 3 fr build Writer : 4 ru build Writer : 5 es add header : [creationtool:ORESAligner, creationtoolversion:1.0, datatype:plaintext, segtype:paragraph, adminlang:en-us, srclang:EN, o-tmf:ORES] Tokenizing 12 files ............ Building xml-tei-txm (12 files) ............ -- ANNOTATE - Running NLP tools TT with fr /home/sheiden/TXM/corpora/UNOSAMPLE/txm/uncorpora_20090831-sample-a_3.xml+/home/sheiden/TXM/corpora/UNOSAMPLE/annotations/uncorpora_20090831-sample-a_3.xml-STDOFF.xml > /home/sheiden/TXM/corpora/UNOSAMPLE/ptreetagger/uncorpora_20090831-sample-a_3.xml-src.tt > /home/sheiden/TXM/corpora/UNOSAMPLE/treetagger/uncorpora_20090831-sample-a_3.xml-out.tt TT with en /home/sheiden/TXM/corpora/UNOSAMPLE/txm/uncorpora_20090831-sample-a_0.xml+/home/sheiden/TXM/corpora/UNOSAMPLE/annotations/uncorpora_20090831-sample-a_0.xml-STDOFF.xml > /home/sheiden/TXM/corpora/UNOSAMPLE/ptreetagger/uncorpora_20090831-sample-a_0.xml-src.tt > /home/sheiden/TXM/corpora/UNOSAMPLE/treetagger/uncorpora_20090831-sample-a_0.xml-out.tt No Modelfile available for lang /home/sheiden/Software/TreeTagger/lib/ar.par. Continue import process No Modelfile available for lang /home/sheiden/Software/TreeTagger/lib/es.par. Continue import process No Modelfile available for lang /home/sheiden/Software/TreeTagger/lib/ru.par. Continue import process No Modelfile available for lang /home/sheiden/Software/TreeTagger/lib/ar.par. Continue import process No Modelfile available for lang /home/sheiden/Software/TreeTagger/lib/ru.par. Continue import process TT with fr /home/sheiden/TXM/corpora/UNOSAMPLE/txm/uncorpora_20090831-sample-b_3.xml+/home/sheiden/TXM/corpora/UNOSAMPLE/annotations/uncorpora_20090831-sample-b_3.xml-STDOFF.xml > /home/sheiden/TXM/corpora/UNOSAMPLE/ptreetagger/uncorpora_20090831-sample-b_3.xml-src.tt > /home/sheiden/TXM/corpora/UNOSAMPLE/treetagger/uncorpora_20090831-sample-b_3.xml-out.tt No Modelfile available for lang /home/sheiden/Software/TreeTagger/lib/zh.par. Continue import process TT with en /home/sheiden/TXM/corpora/UNOSAMPLE/txm/uncorpora_20090831-sample-b_0.xml+/home/sheiden/TXM/corpora/UNOSAMPLE/annotations/uncorpora_20090831-sample-b_0.xml-STDOFF.xml > /home/sheiden/TXM/corpora/UNOSAMPLE/ptreetagger/uncorpora_20090831-sample-b_0.xml-src.tt > /home/sheiden/TXM/corpora/UNOSAMPLE/treetagger/uncorpora_20090831-sample-b_0.xml-out.tt No Modelfile available for lang /home/sheiden/Software/TreeTagger/lib/zh.par. Continue import process No Modelfile available for lang /home/sheiden/Software/TreeTagger/lib/es.par. Continue import process langs : [uncorpora_20090831-sample-b_0.xml:en, uncorpora_20090831-sample-b_1.xml:ar, uncorpora_20090831-sample-b_2.xml:zh, uncorpora_20090831-sample-b_3.xml:fr, uncorpora_20090831-sample-b_4.xml:ru, uncorpora_20090831-sample-b_5.xml:es, uncorpora_20090831-sample-a_0.xml:en, uncorpora_20090831-sample-a_1.xml:ar, uncorpora_20090831-sample-a_2.xml:zh, uncorpora_20090831-sample-a_3.xml:fr, uncorpora_20090831-sample-a_4.xml:ru, uncorpora_20090831-sample-a_5.xml:es] texts : [0:[uncorpora_20090831-sample-b_0.xml, uncorpora_20090831-sample-a_0.xml], 1:[uncorpora_20090831-sample-b_1.xml, uncorpora_20090831-sample-a_1.xml], 2:[uncorpora_20090831-sample-b_2.xml, uncorpora_20090831-sample-a_2.xml], 3:[uncorpora_20090831-sample-b_3.xml, uncorpora_20090831-sample-a_3.xml], 4:[uncorpora_20090831-sample-b_4.xml, uncorpora_20090831-sample-a_4.xml], 5:[uncorpora_20090831-sample-b_5.xml, uncorpora_20090831-sample-a_5.xml]] -- COMPILING - Building Search Engine indexes Using corpus ID: [0:en0, 1:ar1, 2:zh2, 3:fr3, 4:ru4, 5:es5] ............ P-attributes: [id, ref] S-attributes: [hi:0+type, seg:0+id, sub:0+type, text:0+id+base+project, tu:0+tuid+committee+session+vote+lead, txmcorpus:0+id+lang] Usage error: invalid filename 'UNOSAMPLE_zh2' for registry entry. Filename must not contain uppercase letters, '.' or '~'. Error: The registry file was not created: /home/sheiden/TXM/corpora/UNOSAMPLE/registry/UNOSAMPLE_zh2. See https://groupes.renater.fr/wiki/txm-users/public/faq Compiler failed Importation terminée : 10 sec (10023 ms) L'import n'a pas abouti. Moteur de recherche lancé.
Validation test¶
Run the import with the UNO sample corpus : smb://ensldfs.ens-lyon.fr/services/Laboratoires/labo_ana_corpus/Projets/Textométrie/SpUV/uno-sample- The import should end
- The concordance of "la" :ONUSAMPLE_EN0 "the" with Corpus UNOSAMPLE_FR3 should return XX results
History
#1 Updated by Serge Heiden over 5 years ago
- Priority changed from Normal to Immediate
#2 Updated by Matthieu Decorde over 5 years ago
- % Done changed from 0 to 80
#3 Updated by Matthieu Decorde almost 5 years ago
- Description updated (diff)
#4 Updated by Matthieu Decorde almost 5 years ago
- % Done changed from 80 to 60
Ubuntu 64bit with last CWB binaries: cwb-align failed with asegmentation fault error
#5 Updated by Matthieu Decorde over 4 years ago
- % Done changed from 60 to 80