Feature #217

Updated by Matthieu Decorde over 3 years ago

The SimpleTokenizerXML does not At first: like weblex, treetagger tokenizer do
Better :
use lang specifiq rules to tokenize clitics.

h3. Solution 1

Use TreeTagger clitic
Unitex tokenizer rules for the fr, en and it languages.

h3. Solution 2

Use another tokenizer, to choose between existing solutions: https://groupes.renater.fr/wiki/txm-info/public/specs_import_annotation_lexicale_auto#solution