Bug #2882

RCP: 0.8.1, XTZ Import, teiHeader tokenized but not correctly indexed if it is not listed in "out-of-text" or "out-of-text to edit" elements

Added by Alexey Lavrentev 9 months ago. Updated 8 months ago.

Status:New Start date:08/18/2020
Priority:Normal Due date:
Assignee:- % Done:


Category:Import Spent time: -
Target version:TXM 0.8.2


See also #2358.

The words from teiHeader are tokenized and processed in TEI-TXM XML files, they are counted in the corpus size but they are all indexed as "__UNDEF__".

To reproduce the bug, take any TEI-XML document with a teiHeader. Use "XML TEI Zero + CSV" import module with default settings.
Make a concordance of "__UNDEF__".

  • Pre-fill the "out-of-text" to edit field with teiHeader
  • If the field is intentionnaly left blank by the user, index the words from the header properly. This implies using <TEI> instead of <text> for identifying text limits.


#1 Updated by Alexey Lavrentev 8 months ago

  • Description updated (diff)

Also available in: Atom PDF