Bug #2093

RCP: 0.7.8, TXT+CSV import module, <s> tags added by the sentencer while import are interpreted as litterals by CQP

Added by Alexey Lavrentev about 2 years ago. Updated 11 months ago.

Status:New Start date:03/30/2017
Priority:Normal Due date:
Assignee:- % Done:

80%

Category:Import Spent time: -
Target version:TXM 0.8.0

Description

To reproduce the bug:
  1. Use the source files from:
    smb://ensldfs.ens-lyon.fr/services/Laboratoires/labo_ana_corpus/Projets/Textométrie/SpUV/ENC/sources/CORNEILLEMOLIERETXT
    

    (confidential)
  2. After import, search for [word="\</s\>"] in the corpus.

Solution

The s-attributes are build using XMLTXM2WTC.getSattributes() run with the last processed TXT file.

In the case of the "CORNEILLEMOLIERETER" corpus, the last text was an empty "import_.xml" file, and the s s-attributes was not added because there is no word thus no sentence.

First solution is to not process empty TXT files.

Next solution is to run XMLTXM2WTC.getSattributes() for all TXT files

History

#1 Updated by Matthieu Decorde about 2 years ago

  • Description updated (diff)
  • % Done changed from 0 to 80

#2 Updated by Alexey Lavrentev about 2 years ago

  • Description updated (diff)

#3 Updated by Sebastien Jacquot 11 months ago

  • Target version changed from TXM 0.8.0a (split/restructuration) to TXM 0.8.0

Also available in: Atom PDF