Bug #1251: RCP: 0.7.6, fix tokenizer parameters fields in import form editor - Plateforme TXM - Forge du Centre Blaise Pascal

Bug #1251

Mis à jour par Matthieu Decorde il y a plus de 10 ans

* the word space separators field default value '[\p{Z}|\p{C}]+' is incorrect -> fix it to '[\p{Z}\p{C}]+'
** MD: *OK* fixed in 0.7.7
* the end of sentence characters field default value '[\p{Ps}|\p{Pe}|\p{Pi}|\p{Pf}|\p{Po}|\p{S}]' is incorrect -> fix it to '[\p{Ps}\p{Pe}\p{Pi}\p{Pf}\p{Po}\p{S}]'
** MD: *OK* fixed in 0.7.7
* when any tokenizer parameter field is edited, the import module is always run with the "MISSING TOKENIZER KEY: punct_strong" message, which means that orthographic sentences are never analyzed?
* if separators and elision fields are left emtpy, the import module cannot run and dumps a stacktrace -> run without those parameters or force default values before running
* if the elision parameter field is left empty, the import module is run with "Tokenizer parametrized with regElision=['‘’]"
* the 'Reset' button restores the parameter values from the import.xml file -> it should restore TXM default values instead
* what is the purpose of the 'OK' button? -> suggestion: remove the 'OK' button
** MD: *OK* removed the "OK" button in 0.7.7

Retour

Laboratoire ICAR » Plateforme TXM

Bug #1251