Bug #2098
Updated by Matthieu Decorde over 3 years ago
If an "out-of-text-to-edit" plan contains word tags (<w> or <num>), these are indexed by the search engine.
To reproduce the bug, take strasbBfm.xml from BFM repository, import via XTZ with teiHeader in out-of-text-to-edit and search for [word="[0-9]+"].
# <num> element should not be transformed into <w>
# no element placed inside "out-of-text-to-edit" should be indexed
Currently to implement the "out-of-text-to-edit" plan, the compiler and the pager steps use the words (w elements) identified by the Tokenizer. So if an "out-of-text-to-edit" plan contains already word tags (<w> or <num>), these are indexed by the search engine.
h3. Solution
The pager and compiler steps must use the "out-of-text-to-edit" plan import parameter instead of relying on the Tokenizer result.
To reproduce the bug, take strasbBfm.xml from BFM repository, import via XTZ with teiHeader in out-of-text-to-edit and search for [word="[0-9]+"].
# <num> element should not be transformed into <w>
# no element placed inside "out-of-text-to-edit" should be indexed
Currently to implement the "out-of-text-to-edit" plan, the compiler and the pager steps use the words (w elements) identified by the Tokenizer. So if an "out-of-text-to-edit" plan contains already word tags (<w> or <num>), these are indexed by the search engine.
h3. Solution
The pager and compiler steps must use the "out-of-text-to-edit" plan import parameter instead of relying on the Tokenizer result.