Bug #2258
RCP: 0.7.8, XMLW and XTZ import modules, line breaks trimmed causing tokenization errors
Status: | New | Start date: | 10/09/2017 | ||
---|---|---|---|---|---|
Priority: | Urgent | Due date: | |||
Assignee: | - | % Done: | 80% |
||
Category: | Import | Spent time: | - | ||
Target version: | TXM 0.8.2 |
Description
In text nodes the new lines are trimmed and hense words on different lines are merged unless there is a white space before the new line.
To reproduce the bug, use the following test file to see that "ouperaction" appears in the lexicon:
<text> Tout art et toute doctrine et semblablement tout fait ou operacion et eleccion appetent et desirent aucun bien. Pour ce parloient bien les anciens en disant ainsi: " Bien est ce que toutes choses desirent. " Et semble que il est difference de fins; car les unes fins sont les operacions, les autres sont </text>
It looks like the trimming happens before the file is sent to XSL filters, so it is impossible to use XSL to fix the problem.
Solution¶
- Replace the new line with a space (ideally unless preceded or followed by another white space)
- Trim the new lines after XSLT filters application
History
#1 Updated by Alexey Lavrentev almost 6 years ago
- Description updated (diff)
#2 Updated by Alexey Lavrentev almost 6 years ago
Le bug semble résolu (TXM 0.7.8.201712011718). Mettre à jour l'état d'avancement ?
#3 Updated by Alexey Lavrentev over 5 years ago
The test file works fine but the problem persist when trying to catch line breaks in XTZ XSL filters
#4 Updated by Sebastien Jacquot over 5 years ago
- Target version changed from TXM 0.8.0a (split/restructuration) to TXM 0.8.0
#5 Updated by Matthieu Decorde over 4 years ago
- Target version changed from TXM 0.8.0 to TXM 0.8.2
#6 Updated by Matthieu Decorde about 3 years ago
- Category set to Import
#7 Updated by Matthieu Decorde over 2 years ago
- % Done changed from 0 to 80