Bug #2882: RCP: 0.8.1, XTZ Import, teiHeader tokenized but not correctly indexed if it is not listed in "out-of-text" or "out-of-text to edit" elements - Plateforme TXM - Forge du Centre Blaise Pascal

Bug #2882

RCP: 0.8.1, XTZ Import, teiHeader tokenized but not correctly indexed if it is not listed in "out-of-text" or "out-of-text to edit" elements

Ajouté par Alexey Lavrentev il y a environ 5 ans. Mis à jour il y a plus de 4 ans.

Statut:

New

Début:

18/08/2020

Priorité:

Normal

Echéance:

Assigné à:

-

% réalisé:

0%

Catégorie:

Import

Temps passé:

-

Version cible:

Description

See also #2358.

The words from teiHeader are tokenized and processed in TEI-TXM XML files, they are counted in the corpus size but they are all indexed as "__UNDEF__".

To reproduce the bug, take any TEI-XML document with a teiHeader. Use "XML TEI Zero + CSV" import module with default settings.
Make a concordance of "__UNDEF__".

Solutions:

Pre-fill the "out-of-text" to edit field with teiHeader
If the field is intentionnaly left blank by the user, index the words from the header properly. This implies using <TEI> instead of <text> for identifying text limits.

Historique

#1 Mis à jour par Alexey Lavrentev il y a environ 5 ans

Description mis à jour (diff)

#2 Mis à jour par Matthieu Decorde il y a plus de 4 ans

Version cible changé de TXM 0.8.2 à TXM 0.8.4

Formats disponibles : Atom PDF