Feature #1636
RCP: X.X, word tag and skip tokenization import parameters
Status: | New | Start date: | 01/08/2016 | ||
---|---|---|---|---|---|
Priority: | Normal | Due date: | |||
Assignee: | - | % Done: | 80% |
||
Category: | Import | Spent time: | - | ||
Target version: | TXM 0.7.8 |
Description
See specifications at https://groupes.renater.fr/wiki/txm-info/public/import_xtz#modify_the_import_form.
Add new import parameters:- word tag: specify the XML element that encode words
- don't tokenize : if selected, no tokenization is done (no W element created)
Solution¶
Available only in the XTZ+CSV import.
Change the "Lexical Segmentation":
Unités lexicales- Balise de mots : w
- Segmenter [o]/n
- Caractères séparateurs
- Espaces
- Ponctuations
- Caractères d'élision
- Caractères de fin de phrase
- Caractères séparateurs
- Words tag : w
- Tokenize [o]/n
- Separator characters
- Spaces
- Punctuations
- Elision characters
- End of sentence characters
- Separator characters
Related issues
History
#1 Updated by Matthieu Decorde over 7 years ago
- % Done changed from 0 to 80
#2 Updated by Alexey Lavrentev over 7 years ago
- Description updated (diff)