Bug #2093: RCP: 0.7.8, TXT+CSV import module, <s> tags added by the sentencer while import are interpreted as litterals by CQP - Plateforme TXM - Forge du Centre Blaise Pascal

Bug #2093

RCP: 0.7.8, TXT+CSV import module, <s> tags added by the sentencer while import are interpreted as litterals by CQP

Ajouté par Alexey Lavrentev il y a plus de 8 ans. Mis à jour il y a presque 2 ans.

Statut:

Closed

Début:

30/03/2017

Priorité:

Normal

Echéance:

Assigné à:

% réalisé:

100%

Catégorie:

Import

Temps passé:

Version cible:

TXM 0.8.0

Description

To reproduce the bug:

Use the source files from:

smb://ensldfs.ens-lyon.fr/services/Laboratoires/labo_ana_corpus/Projets/Textométrie/SpUV/ENC/sources/CORNEILLEMOLIERETXT

(confidential)

After import, search for [word="\</s\>"] in the corpus.

Solution¶

The s-attributes are build using XMLTXM2WTC.getSattributes() run with the last processed TXT file.

In the case of the "CORNEILLEMOLIERETER" corpus, the last text was an empty "import_.xml" file, and the s s-attributes was not added because there is no word thus no sentence.

First solution is to not process empty TXT files.

Next solution is to run XMLTXM2WTC.getSattributes() for all TXT files

Historique

#1 Mis à jour par Matthieu Decorde il y a plus de 8 ans

Description mis à jour (diff)
% réalisé changé de 0 à 80

#2 Mis à jour par Alexey Lavrentev il y a plus de 8 ans

Description mis à jour (diff)

#3 Mis à jour par Sebastien Jacquot il y a plus de 7 ans

Version cible changé de TXM 0.8.0a (split/restructuration) à TXM 0.8.0

#4 Mis à jour par Sebastien Jacquot il y a presque 2 ans

% réalisé changé de 80 à 100

#5 Mis à jour par Sebastien Jacquot il y a presque 2 ans

Statut changé de New à Closed

Formats disponibles : Atom PDF

Laboratoire ICAR » Plateforme TXM

Demandes

Rapports personnalisés