Feature #217
Mis à jour par Serge Heiden il y a plus de 8 ans
The SimpleTokenizerXML does not use language specific lang specifiq rules to tokenize clitics.
h3. Solution 1
Use TreeTagger clitic tokenizer rules for the fr, en and it languages as defined in the in "Gestion de la langue" section of https://groupes.renater.fr/wiki/txm-info/public/composant_de_tokenisation#solution_1_simpletokenizerxml
h3. Solution 2
Use another tokenizer, to be choosen between existing solutions of https://groupes.renater.fr/wiki/txm-info/public/specs_import_annotation_lexicale_auto#solution, if TreeTagger lemmatization is not used. https://groupes.renater.fr/wiki/txm-info/public/specs_import_annotation_lexicale_auto#solution
h3. Solution 1
Use TreeTagger clitic tokenizer rules for the fr, en and it languages as defined in the in "Gestion de la langue" section of https://groupes.renater.fr/wiki/txm-info/public/composant_de_tokenisation#solution_1_simpletokenizerxml
h3. Solution 2
Use another tokenizer, to be choosen between existing solutions of https://groupes.renater.fr/wiki/txm-info/public/specs_import_annotation_lexicale_auto#solution, if TreeTagger lemmatization is not used. https://groupes.renater.fr/wiki/txm-info/public/specs_import_annotation_lexicale_auto#solution