Bug #2220
TBX: 0.7.8, XTZ import, "out-of text-to-edit" elements tokenised and indexed if an OTTO elements contains sub-elements
Status: | New | Start date: | 06/16/2017 | |
---|---|---|---|---|
Priority: | High | Due date: | ||
Assignee: | - | % Done: | 0% |
|
Category: | Import | Spent time: | - | |
Target version: | TXM X.X |
Description
If one declares two out-of-text-to-edit elements that may be nested in the document, tokenisation and indexing resume after the nested element inside the out-of-text-to-edit ancestor.
This happens with identical (note // note) or different (teiHeader // note) elements or if an OTTO element contains any other element (head // sic or head //hi).
The reason probably is that tokenization resumes at any end tag of OTTO element.
Related issues
History
#1 Updated by Alexey Lavrentev over 5 years ago
- Subject changed from TBX: 0.7.8, XTZ import, "out-of text-to-edit" elements tokenised and indexed if nested to TBX: 0.7.8, XTZ import, "out-of text-to-edit" elements tokenised and indexed if nested (or if one OTTO element contains another)
- Description updated (diff)
#2 Updated by Alexey Lavrentev almost 5 years ago
- Subject changed from TBX: 0.7.8, XTZ import, "out-of text-to-edit" elements tokenised and indexed if nested (or if one OTTO element contains another) to TBX: 0.7.8, XTZ import, "out-of text-to-edit" elements tokenised and indexed if an OTTO elements contains sub-elements
- Description updated (diff)
- Priority changed from Normal to High
The bug persists in TXM 0.7.9
To reproduce the bug, take the CHARTES_HAIN13 corpus sources from sharedocs/[...]/Cactus/Projets/Textométrie/Corpus/src and import using.
The content of <head> declared as OTTO will be tokenized after <sic> (w/@id="w_chartes_hain13_1")
#3 Updated by Sebastien Jacquot over 4 years ago
- Target version changed from TXM 0.8.0a (split/restructuration) to TXM 0.8.0
#4 Updated by Sebastien Jacquot over 4 years ago
- Category set to Import
#5 Updated by Alexey Lavrentev over 4 years ago
- Description updated (diff)
#6 Updated by Matthieu Decorde almost 4 years ago
- Target version changed from TXM 0.8.0 to TXM X.X