Bug #2258

RCP: 0.7.8, XMLW and XTZ import modules, line breaks trimmed causing tokenization errors

Added by Alexey Lavrentev about 4 years ago. Updated 5 months ago.

Status:New Start date:10/09/2017
Priority:Urgent Due date:
Assignee:- % Done:


Category:Import Spent time: -
Target version:TXM 0.8.2


In text nodes the new lines are trimmed and hense words on different lines are merged unless there is a white space before the new line.
To reproduce the bug, use the following test file to see that "ouperaction" appears in the lexicon:

Tout art et toute doctrine et semblablement tout fait ou
operacion et eleccion appetent et desirent aucun bien. Pour
ce parloient bien les anciens en disant ainsi: " Bien est ce
que toutes choses desirent. " Et semble que il est difference
de fins; car les unes fins sont les operacions, les autres sont

It looks like the trimming happens before the file is sent to XSL filters, so it is impossible to use XSL to fix the problem.


  1. Replace the new line with a space (ideally unless preceded or followed by another white space)
  2. Trim the new lines after XSLT filters application


#1 Updated by Alexey Lavrentev almost 4 years ago

  • Description updated (diff)

#2 Updated by Alexey Lavrentev almost 4 years ago

Le bug semble résolu (TXM Mettre à jour l'état d'avancement ?

#3 Updated by Alexey Lavrentev over 3 years ago

The test file works fine but the problem persist when trying to catch line breaks in XTZ XSL filters

#4 Updated by Sebastien Jacquot over 3 years ago

  • Target version changed from TXM 0.8.0a (split/restructuration) to TXM 0.8.0

#5 Updated by Matthieu Decorde over 2 years ago

  • Target version changed from TXM 0.8.0 to TXM 0.8.2

#6 Updated by Matthieu Decorde about 1 year ago

  • Category set to Import

#7 Updated by Matthieu Decorde 5 months ago

  • % Done changed from 0 to 80

Also available in: Atom PDF