Bug #2098
TBX: 0.7.8, XTZ import, <num> and <w> tags indexed even if they are located in an element declared in the 'out-of text-to-edit' plan
Status: | New | Start date: | 10/04/2016 | |
---|---|---|---|---|
Priority: | High | Due date: | ||
Assignee: | - | % Done: | 0% |
|
Category: | Import | Spent time: | - | |
Target version: | TXM X.X |
Description
To reproduce the bug, take strasbBfm.xml from BFM repository, import via XTZ with teiHeader in out-of-text-to-edit and search for [word="[0-9]+"].
- <num> element should not be transformed into <w>
- no element placed inside "out-of-text-to-edit" should be indexed
Currently to implement the "out-of-text-to-edit" plan, the compiler and the pager steps use the words (w elements) identified by the Tokenizer. So if an "out-of-text-to-edit" plan contains already word tags (<w> or <num>), these are indexed by the search engine.
Solution¶
The pager and compiler steps must use the "out-of-text-to-edit" plan import parameter instead of relying on the Tokenizer result.
Related issues
History
#1 Updated by Alexey Lavrentev over 6 years ago
- File deleted (
cleve-edition.png)
#2 Updated by Matthieu Decorde over 6 years ago
- Subject changed from TBX: 0.7.8, XTZ import, <num> and <w> tags indexed in 'out-of text-to-edit' plan to TBX: 0.7.8, XTZ import, <num> and <w> tags indexed even if they are declared in the 'out-of text-to-edit' plan
- Description updated (diff)
- Priority changed from Normal to High
#3 Updated by Alexey Lavrentev about 6 years ago
- Subject changed from TBX: 0.7.8, XTZ import, <num> and <w> tags indexed even if they are declared in the 'out-of text-to-edit' plan to TBX: 0.7.8, XTZ import, <num>, <w> and <author> tags indexed even if they are located in an element declared in the 'out-of text-to-edit' plan
Similar behavior is caused by the <note> element. The text nodes followind </note> are tokenized and idexed even if they are inside an element declared as out-of-text-to-edit. See the related ticket.
#4 Updated by Alexey Lavrentev about 6 years ago
- Subject changed from TBX: 0.7.8, XTZ import, <num>, <w> and <author> tags indexed even if they are located in an element declared in the 'out-of text-to-edit' plan to TBX: 0.7.8, XTZ import, <num> and <w> tags indexed even if they are located in an element declared in the 'out-of text-to-edit' plan
#5 Updated by Sebastien Jacquot over 5 years ago
- Target version changed from TXM 0.8.0a (split/restructuration) to TXM 0.8.0
#6 Updated by Matthieu Decorde over 4 years ago
- Target version changed from TXM 0.8.0 to TXM X.X