Bug #1444

RCP: 0.7.7 Ubuntu1404_64, XML-TMX import module broken

Ajouté par Serge Heiden il y a environ 10 ans. Mis à jour il y a plus d'un an.

Statut:Closed Début:25/08/2015
Priorité:Immediate Echéance:
Assigné à:- % réalisé:

100%

Catégorie:Import Temps passé: -
Version cible:TXM 0.7.8

Description

Import of the UNOSAMPLE demo corpus (source in ///SpUV/uno-sample) does not terminate anymore with following console messages:

Sauvegarde des paramètres d'importation...
 Tokenizer parametrized with whitespaces=[\p{Z}\p{C}]+
 Tokenizer parametrized with regPunct=[\p{Ps}\p{Pe}\p{Pi}\p{Pf}\p{Po}\p{S}]
 Tokenizer parametrized with punct_strong=[.!?]|\.\.|\.\.\.|…|\|
 Tokenizer parametrized with regElision=['‘’]
Execution du script : /home/sheiden/TXM/scripts/import/tmxLoader.groovy
-- IMPORTER - Reading source files
skip file : /home/sheiden/Corpus/src/uno-sample/import.xml
initialize writers for : /home/sheiden/Corpus/src/uno-sample/uncorpora_20090831-sample-b.tmx
build Writer : 0 en
build Writer : 1 ar
build Writer : 2 zh
build Writer : 3 fr
build Writer : 4 ru
build Writer : 5 es
add header : [creationtool:ORESAligner, creationtoolversion:1.0, datatype:plaintext, segtype:paragraph, adminlang:en-us, srclang:EN, o-tmf:ORES]
initialize writers for : /home/sheiden/Corpus/src/uno-sample/uncorpora_20090831-sample-a.tmx
build Writer : 0 en
build Writer : 1 ar
build Writer : 2 zh
build Writer : 3 fr
build Writer : 4 ru
build Writer : 5 es
add header : [creationtool:ORESAligner, creationtoolversion:1.0, datatype:plaintext, segtype:paragraph, adminlang:en-us, srclang:EN, o-tmf:ORES]
Tokenizing 12 files
............
Building xml-tei-txm (12 files)
............
-- ANNOTATE - Running NLP tools
TT with fr /home/sheiden/TXM/corpora/UNOSAMPLE/txm/uncorpora_20090831-sample-a_3.xml+/home/sheiden/TXM/corpora/UNOSAMPLE/annotations/uncorpora_20090831-sample-a_3.xml-STDOFF.xml > /home/sheiden/TXM/corpora/UNOSAMPLE/ptreetagger/uncorpora_20090831-sample-a_3.xml-src.tt > /home/sheiden/TXM/corpora/UNOSAMPLE/treetagger/uncorpora_20090831-sample-a_3.xml-out.tt
TT with en /home/sheiden/TXM/corpora/UNOSAMPLE/txm/uncorpora_20090831-sample-a_0.xml+/home/sheiden/TXM/corpora/UNOSAMPLE/annotations/uncorpora_20090831-sample-a_0.xml-STDOFF.xml > /home/sheiden/TXM/corpora/UNOSAMPLE/ptreetagger/uncorpora_20090831-sample-a_0.xml-src.tt > /home/sheiden/TXM/corpora/UNOSAMPLE/treetagger/uncorpora_20090831-sample-a_0.xml-out.tt
No Modelfile available for lang /home/sheiden/Software/TreeTagger/lib/ar.par. Continue import process 
No Modelfile available for lang /home/sheiden/Software/TreeTagger/lib/es.par. Continue import process 
No Modelfile available for lang /home/sheiden/Software/TreeTagger/lib/ru.par. Continue import process 
No Modelfile available for lang /home/sheiden/Software/TreeTagger/lib/ar.par. Continue import process 
No Modelfile available for lang /home/sheiden/Software/TreeTagger/lib/ru.par. Continue import process 
TT with fr /home/sheiden/TXM/corpora/UNOSAMPLE/txm/uncorpora_20090831-sample-b_3.xml+/home/sheiden/TXM/corpora/UNOSAMPLE/annotations/uncorpora_20090831-sample-b_3.xml-STDOFF.xml > /home/sheiden/TXM/corpora/UNOSAMPLE/ptreetagger/uncorpora_20090831-sample-b_3.xml-src.tt > /home/sheiden/TXM/corpora/UNOSAMPLE/treetagger/uncorpora_20090831-sample-b_3.xml-out.tt
No Modelfile available for lang /home/sheiden/Software/TreeTagger/lib/zh.par. Continue import process 
TT with en /home/sheiden/TXM/corpora/UNOSAMPLE/txm/uncorpora_20090831-sample-b_0.xml+/home/sheiden/TXM/corpora/UNOSAMPLE/annotations/uncorpora_20090831-sample-b_0.xml-STDOFF.xml > /home/sheiden/TXM/corpora/UNOSAMPLE/ptreetagger/uncorpora_20090831-sample-b_0.xml-src.tt > /home/sheiden/TXM/corpora/UNOSAMPLE/treetagger/uncorpora_20090831-sample-b_0.xml-out.tt
No Modelfile available for lang /home/sheiden/Software/TreeTagger/lib/zh.par. Continue import process 
No Modelfile available for lang /home/sheiden/Software/TreeTagger/lib/es.par. Continue import process 
langs : [uncorpora_20090831-sample-b_0.xml:en, uncorpora_20090831-sample-b_1.xml:ar, uncorpora_20090831-sample-b_2.xml:zh, uncorpora_20090831-sample-b_3.xml:fr, uncorpora_20090831-sample-b_4.xml:ru, uncorpora_20090831-sample-b_5.xml:es, uncorpora_20090831-sample-a_0.xml:en, uncorpora_20090831-sample-a_1.xml:ar, uncorpora_20090831-sample-a_2.xml:zh, uncorpora_20090831-sample-a_3.xml:fr, uncorpora_20090831-sample-a_4.xml:ru, uncorpora_20090831-sample-a_5.xml:es]
texts : [0:[uncorpora_20090831-sample-b_0.xml, uncorpora_20090831-sample-a_0.xml], 1:[uncorpora_20090831-sample-b_1.xml, uncorpora_20090831-sample-a_1.xml], 2:[uncorpora_20090831-sample-b_2.xml, uncorpora_20090831-sample-a_2.xml], 3:[uncorpora_20090831-sample-b_3.xml, uncorpora_20090831-sample-a_3.xml], 4:[uncorpora_20090831-sample-b_4.xml, uncorpora_20090831-sample-a_4.xml], 5:[uncorpora_20090831-sample-b_5.xml, uncorpora_20090831-sample-a_5.xml]]
-- COMPILING - Building Search Engine indexes
Using corpus ID: [0:en0, 1:ar1, 2:zh2, 3:fr3, 4:ru4, 5:es5]
............
P-attributes: [id, ref]
S-attributes: [hi:0+type, seg:0+id, sub:0+type, text:0+id+base+project, tu:0+tuid+committee+session+vote+lead, txmcorpus:0+id+lang]
Usage error: invalid filename 'UNOSAMPLE_zh2' for registry entry.
Filename must not contain uppercase letters, '.' or '~'.
Error: The registry file was not created: /home/sheiden/TXM/corpora/UNOSAMPLE/registry/UNOSAMPLE_zh2. See https://groupes.renater.fr/wiki/txm-users/public/faq
Compiler failed

Importation terminée : 10 sec (10023 ms)
L'import n'a pas abouti.
Moteur de recherche lancé.

Validation test

Run the import with the UNO sample corpus : smb://ensldfs.ens-lyon.fr/services/Laboratoires/labo_ana_corpus/Projets/Textométrie/SpUV/uno-sample
  • The import should end
  • The concordance of "la" :ONUSAMPLE_EN0 "the" with Corpus UNOSAMPLE_FR3 should return XX results

Historique

#1 Mis à jour par Serge Heiden il y a environ 10 ans

  • Priorité changé de Normal à Immediate

#2 Mis à jour par Matthieu Decorde il y a environ 10 ans

  • % réalisé changé de 0 à 80

#3 Mis à jour par Matthieu Decorde il y a plus de 9 ans

  • Description mis à jour (diff)

#4 Mis à jour par Matthieu Decorde il y a plus de 9 ans

  • % réalisé changé de 80 à 60

Ubuntu 64bit with last CWB binaries: cwb-align failed with asegmentation fault error

#5 Mis à jour par Matthieu Decorde il y a plus de 9 ans

  • % réalisé changé de 60 à 80

#6 Mis à jour par Sebastien Jacquot il y a plus d'un an

  • Statut changé de New à Closed

#7 Mis à jour par Sebastien Jacquot il y a plus d'un an

  • % réalisé changé de 80 à 100

Formats disponibles : Atom PDF