Bug #2364
Mis à jour par Matthieu Decorde il y a plus de 4 ans
Currently when the 'Tokenization' import option is unchecked, no word IDs management is done. The result is that the back-to-text, URS Unit highlight, etc. functionalities don't work with default text editions. It is a problem because word properties can be imported for different reasons but the back-to-text functionality should not be broken. The w@id attribute has a special status.
h3. Solution 1
1) Decide a w ID management policy (the decision can be a new import parameter or a new TXM behavior):
** a0) foreign IDs (coming from the sources) must be compatible with TXM w ID related functionalities otherwise the import must abort (all IDs present, right pattern, etc.)
** a1) foreign IDs can be mixed with TXM built w IDs to manage, especially, back-to-text -> add IDs to w that don't have an ID and all w ID related functionalities, like back-to-text, must be able to use those IDs
** or a2) don't mix foreign IDs with TXM built IDs
*** a2.1) force w IDs to TXM built IDs
*** a2.2.1) rename foreign IDs to 'txm:host-id' or 'txm-host-id', etc. and build TXM w IDs with the 'id' attribute
*** a2.2.2) build TXM w IDs with an identifier specific to the corpus, and use that identifier instead of 'id' in all w ID related functionalities, like back-to-text
*** a2.2.3) use the 'txmid' word property name (and later 'txm:id') to force and use TXM private IDs even when foreign ID are present and even if not tokenizing
h3. Solution 2
2) When tokenizing or not tokenizing, apply the a2.2.3 policy on import (and load if possible), ID related functionalities.
h3. Solution 1
1) Decide a w ID management policy (the decision can be a new import parameter or a new TXM behavior):
** a0) foreign IDs (coming from the sources) must be compatible with TXM w ID related functionalities otherwise the import must abort (all IDs present, right pattern, etc.)
** a1) foreign IDs can be mixed with TXM built w IDs to manage, especially, back-to-text -> add IDs to w that don't have an ID and all w ID related functionalities, like back-to-text, must be able to use those IDs
** or a2) don't mix foreign IDs with TXM built IDs
*** a2.1) force w IDs to TXM built IDs
*** a2.2.1) rename foreign IDs to 'txm:host-id' or 'txm-host-id', etc. and build TXM w IDs with the 'id' attribute
*** a2.2.2) build TXM w IDs with an identifier specific to the corpus, and use that identifier instead of 'id' in all w ID related functionalities, like back-to-text
*** a2.2.3) use the 'txmid' word property name (and later 'txm:id') to force and use TXM private IDs even when foreign ID are present and even if not tokenizing
h3. Solution 2
2) When tokenizing or not tokenizing, apply the a2.2.3 policy on import (and load if possible), ID related functionalities.