Bug #2882
RCP: 0.8.1, XTZ Import, teiHeader tokenized but not correctly indexed if it is not listed in "out-of-text" or "out-of-text to edit" elements
Statut: | New | Début: | 18/08/2020 | |
---|---|---|---|---|
Priorité: | Normal | Echéance: | ||
Assigné à: | - | % réalisé: | 0% |
|
Catégorie: | Import | Temps passé: | - | |
Version cible: | TXM 0.8.4 |
Description
See also #2358.
The words from teiHeader are tokenized and processed in TEI-TXM XML files, they are counted in the corpus size but they are all indexed as "__UNDEF__".
To reproduce the bug, take any TEI-XML document with a teiHeader. Use "XML TEI Zero + CSV" import module with default settings.
Make a concordance of "__UNDEF__".
- Pre-fill the "out-of-text" to edit field with teiHeader
- If the field is intentionnaly left blank by the user, index the words from the header properly. This implies using <TEI> instead of <text> for identifying text limits.
Historique
#1 Mis à jour par Alexey Lavrentev il y a environ 5 ans
- Description mis à jour (diff)
#2 Mis à jour par Matthieu Decorde il y a plus de 4 ans
- Version cible changé de TXM 0.8.2 à TXM 0.8.4