Feature #3004
Import, Tokenizer, re-tokenize words option
Status: | New | Start date: | 01/22/2021 | ||
---|---|---|---|---|---|
Priority: | Normal | Due date: | |||
Assignee: | - | % Done: | 80% |
||
Category: | Import | Spent time: | - | ||
Target version: | TXM 0.8.2 - 13NOV 1.0 |
Description
If enable the Tokenizer can retokenize words already wrapped with a <w> element.
Enabled for the :- XTZ import
- XML/w import
- transcriber
- re-tokenize pre-encoded words @flyover(Performs word segmentation within word encoding tags.)
- re-segmenter lexicalement les mots pré-encodés @flyover(Réalise une segmentation en mots au sein des balises d'encodage de mots.)
Associated revisions
add the re-tokenize import parameter refs #3004
History
#1 Updated by Matthieu Decorde over 2 years ago
- Description updated (diff)
#2 Updated by Matthieu Decorde over 2 years ago
- Description updated (diff)
#3 Updated by Matthieu Decorde over 2 years ago
- Description updated (diff)
- % Done changed from 0 to 30
ui is ready
#4 Updated by Matthieu Decorde over 2 years ago
- % Done changed from 30 to 80
tested with XTZ, XML/w and transcriber import modules