Bug #2364
TBX: 0.7.9, build word IDs if not present in w tags for back-to-text when not tokenizing
Status: | New | Start date: | 04/10/2018 | |
---|---|---|---|---|
Priority: | Urgent | Due date: | ||
Assignee: | - | % Done: | 0% |
|
Category: | Import | Spent time: | - | |
Target version: | TXM 0.8.4 |
Description
Currently when the 'Tokenization' import option is unchecked, no word IDs management is done. The result is that the back-to-text, URS Unit highlight, etc. functionalities don't work with default text editions. It is a problem because word properties can be imported for different reasons but the back-to-text functionality should not be broken. The w@id attribute has a special status.
Discussion¶
Decide a w ID management policy (the decision can be a new import parameter or a new TXM behavior):- a0) foreign IDs (coming from the sources) must be compatible with TXM w ID related functionalities otherwise the import must abort (all IDs present, right pattern, etc.)
- a1) foreign IDs can be mixed with TXM built w IDs to manage, especially, back-to-text -> add IDs to w that don't have an ID and all w ID related functionalities, like back-to-text, must be able to use those IDs
- or a2) don't mix foreign IDs with TXM built IDs
- a2.1) force w IDs to TXM built IDs
- a2.2.1) rename foreign IDs to 'txm:host-id' or 'txm-host-id', etc. and build TXM w IDs with the 'id' attribute
- a2.2.2) build TXM w IDs with an identifier specific to the corpus, and use that identifier instead of 'id' in all w ID related functionalities, like back-to-text
- a2.2.3) use the 'txmid' word property name (and later 'txm:id') to force and use TXM private IDs even when foreign ID are present and even if not tokenizing
Solution¶
When tokenizing or not tokenizing, apply the a2.2.3 policy on import (and load if possible), ID related functionalities.
Related issues
History
#1 Updated by Serge Heiden over 5 years ago
- Subject changed from TBX: 0.7.9, build word IDs if not present in w tags when not tokenizing to TBX: 0.7.9, build word IDs if not present in w tags for back-to-text when not tokenizing
#2 Updated by Serge Heiden over 5 years ago
- Description updated (diff)
- Priority changed from Normal to Urgent
#3 Updated by Serge Heiden over 5 years ago
- Category changed from Edition to Import
#4 Updated by Sebastien Jacquot about 5 years ago
- Target version changed from TXM 0.8.0a (split/restructuration) to TXM 0.8.0
#5 Updated by Matthieu Decorde over 4 years ago
- Target version changed from TXM 0.8.0 to TXM 0.8.2
#6 Updated by Matthieu Decorde over 2 years ago
- Target version changed from TXM 0.8.2 to TXM 0.8.4
#7 Updated by Matthieu Decorde over 2 years ago
- Description updated (diff)
#8 Updated by Matthieu Decorde over 2 years ago
- Description updated (diff)