Task #3242

Import, XTZ, manage nested w elements

Added by Matthieu Decorde about 1 year ago. Updated about 1 year ago.

Status:New Start date:04/21/2022
Priority:Normal Due date:
Assignee:- % Done:

0%

Category:Import Spent time: -
Target version:TXM 0.8.4

Description

Decide what to do with nested w elements

<text>
<w><w>un</w> <w>mot</w> <w>spécial</w></w>
</text>

Currently the compiler and pager steps creates a phantom words after the inner words

eg, in the example : indexes and HTML pages contains the following words : "un", "mot", "spécial" and ""


Related issues

related to Bug #3233: Import, TreeTagger, fails with nested w elements New 03/04/2022

History

#1 Updated by Serge Heiden about 1 year ago

TXM XML importers don't define nested <w> (the XML TEI-TXM format neither) -> nested <w> must be rejected or ignored

Also available in: Atom PDF