Bug #2709

Updated by Serge Heiden 7 months ago

To reproduce the bug, create a source folder named "СИНДАРИН", place any XML or text file in it.
[note that the "С" and "А" characters are not the usual "C" and "A" ascii characters]
Then call any import module and select the "СИНДАРИН" folder.
TXM proposes "СИНДАРИН" for the corpus name.
Click OK.
The import process is stuck, no message appears on the console.
Close the import form, exit TXM.
Re-open TXM, set log level to "ALL" and try to start an import module.
The following message appears in the console:
"Checking corpus name validity with '[A-Z][-A-Z0-9]+': СИНДАРИН" and nothing else happens.

To restore the import capacity, you need to delete the user/TXM 0.8.0/.txm folder.

Possible solutions: solutions :
* Do not build propose an invalid CQP corpus name if the folder name consists only of non ASCII ISO characters (from the first UTF-8 plane)
* If the corpus name normalization process check returns an empty sequence, display an error message "** corpus name normalization: no valid character available to build the corpus name (try to use some American English - ASCII - characters in the folder name)" and return to the corpus name definition stage

Possible help:

* Use transliteration/vocalization rules to build the normalized suggest a valid corpus name

Back