Bug #2364: TBX: 0.7.9, build word IDs if not present in w tags for back-to-text when not tokenizing - Plateforme TXM - Forge du Centre Blaise Pascal

Bug #2364

TBX: 0.7.9, build word IDs if not present in w tags for back-to-text when not tokenizing

Ajouté par Serge Heiden il y a plus de 7 ans. Mis à jour il y a plus de 4 ans.

Statut:

New

Début:

10/04/2018

Priorité:

Urgent

Echéance:

Assigné à:

% réalisé:

Catégorie:

Import

Temps passé:

Version cible:

TXM 0.8.4

Description

Currently when the 'Tokenization' import option is unchecked, no word IDs management is done. The result is that the back-to-text, URS Unit highlight, etc. functionalities don't work with default text editions. It is a problem because word properties can be imported for different reasons but the back-to-text functionality should not be broken. The w@id attribute has a special status.

Discussion¶

Decide a w ID management policy (the decision can be a new import parameter or a new TXM behavior):

a0) foreign IDs (coming from the sources) must be compatible with TXM w ID related functionalities otherwise the import must abort (all IDs present, right pattern, etc.)
a1) foreign IDs can be mixed with TXM built w IDs to manage, especially, back-to-text -> add IDs to w that don't have an ID and all w ID related functionalities, like back-to-text, must be able to use those IDs
or a2) don't mix foreign IDs with TXM built IDs
- a2.1) force w IDs to TXM built IDs
- a2.2.1) rename foreign IDs to 'txm:host-id' or 'txm-host-id', etc. and build TXM w IDs with the 'id' attribute
- a2.2.2) build TXM w IDs with an identifier specific to the corpus, and use that identifier instead of 'id' in all w ID related functionalities, like back-to-text
- a2.2.3) use the 'txmid' word property name (and later 'txm:id') to force and use TXM private IDs even when foreign ID are present and even if not tokenizing

Solution¶

When tokenizing or not tokenizing, apply the a2.2.3 policy on import (and load if possible), ID related functionalities.

Demandes liées

Historique

#1 Mis à jour par Serge Heiden il y a plus de 7 ans

Sujet changé de TBX: 0.7.9, build word IDs if not present in w tags when not tokenizing à TBX: 0.7.9, build word IDs if not present in w tags for back-to-text when not tokenizing

#2 Mis à jour par Serge Heiden il y a plus de 7 ans

Description mis à jour (diff)
Priorité changé de Normal à Urgent

#3 Mis à jour par Serge Heiden il y a plus de 7 ans

Catégorie changé de Edition à Import

#4 Mis à jour par Sebastien Jacquot il y a plus de 7 ans

Version cible changé de TXM 0.8.0a (split/restructuration) à TXM 0.8.0

#5 Mis à jour par Matthieu Decorde il y a plus de 6 ans

Version cible changé de TXM 0.8.0 à TXM 0.8.2

#6 Mis à jour par Matthieu Decorde il y a plus de 4 ans

Version cible changé de TXM 0.8.2 à TXM 0.8.4

#7 Mis à jour par Matthieu Decorde il y a plus de 4 ans

Description mis à jour (diff)

#8 Mis à jour par Matthieu Decorde il y a plus de 4 ans

Description mis à jour (diff)

Formats disponibles : Atom PDF

	lié à Feature #1636: RCP: X.X, word tag and skip tokenization import parameters	Closed	08/01/2016
	lié à Bug #2160: RCP: X.X, words not highlighted in editions	Closed	19/04/2017

Laboratoire ICAR » Plateforme TXM

Demandes

Rapports personnalisés