Feature #217
Mis à jour par Matthieu Decorde il y a plus de 8 ans
The SimpleTokenizerXML does not At first: like weblex, treetagger tokenizer do
Better : use lang specifiq rules to tokenize clitics.
h3. Solution 1
Use TreeTagger clitic Unitex tokenizer rules for the fr, en and it languages.
h3. Solution 2
Use another tokenizer, to choose between existing solutions: https://groupes.renater.fr/wiki/txm-info/public/specs_import_annotation_lexicale_auto#solution
Better : use lang specifiq rules to tokenize clitics.
h3. Solution 1
Use TreeTagger clitic Unitex tokenizer rules for the fr, en and it languages.
h3. Solution 2
Use another tokenizer, to choose between existing solutions: https://groupes.renater.fr/wiki/txm-info/public/specs_import_annotation_lexicale_auto#solution