Bug #2882
Mis à jour par Alexey Lavrentev il y a environ 5 ans
See also #2358.
The words from teiHeader are tokenized and processed in TEI-TXM XML files, they are counted in the corpus size but they are all indexed as "__UNDEF__".
To reproduce the bug, take any TEI-XML document with a teiHeader. Use "XML TEI Zero + CSV" import module with default settings.
Make a concordance of "__UNDEF__".
Solutions:
* Pre-fill the "out-of-text" to edit field with teiHeader
* If the field is intentionnaly left blank by the user, index the words from the header properly. This implies using <TEI> instead of <text> for identifying text limits.
The words from teiHeader are tokenized and processed in TEI-TXM XML files, they are counted in the corpus size but they are all indexed as "__UNDEF__".
To reproduce the bug, take any TEI-XML document with a teiHeader. Use "XML TEI Zero + CSV" import module with default settings.
Make a concordance of "__UNDEF__".
Solutions:
* Pre-fill the "out-of-text" to edit field with teiHeader
* If the field is intentionnaly left blank by the user, index the words from the header properly. This implies using <TEI> instead of <text> for identifying text limits.