Feature #1636

Updated by Alexey Lavrentev almost 4 years ago

See specifications at https://groupes.renater.fr/wiki/txm-info/public/import_xtz#modify_the_import_form.

Add new import parameters:
* word tag: specify the XML element that encode words
* don't tokenize : if selected, no tokenization is done (no W element created)

h3. Solution

Available only in the XTZ+CSV import.

Change the "Lexical Segmentation":

Unités lexicales Segmentation lexicale
* Balise de mots : w
* Segmenter Tokenisation [o]/n
** Caractères séparateurs
*** Espaces
*** Ponctuations
** Caractères d'élision
** Caractères de fin de phrase

Lexical Units Segmentation
* Words tag : w
* Tokenize Tokenization [o]/n
** Separator characters
*** Spaces
*** Punctuations
** Elision characters
** End of sentence characters