Bug #871

Updated by Matthieu Decorde about 4 years ago

When using the search engine in 'memory mode', CQP cannot access corpora with paths composed of accented or special characters.

A) *Document* the problem: See (FR) https://groupes.renater.fr/wiki/txm-users/public/faq#sous_windows_txm_07_et_versions_ulterieures_aucune_requete_cql_ne_fonctionne_sur_aucun_corpus

B) Temporary *Solution*:
* under Windows, check if the corpus directory path obeys current CQP character constraints (accented or special characters)
* if not, ask the user to change the 'TXM User Home directory' preference to something compatible with the current CQP constraints
* corpus directory path must be checked at installation and when the 'TXM User Home directory' preference is changed
* TXM should not give the impression to the user that it works until the 'TXM User Home directory' complies with the current CQP constraints

C) Definitive *Solution*

Change CQP registry directory access code to comply with current operating systems pathnames constraints.

h3. Solution

* Replace FileSystem The solution was to replace FS IO functions with glib FileSystem FS IO functions (fopen -> g_fopen, etc.)
* change registry files encoding to UTF-8 : On Windows and Mac OS X the system encoding is no UTF-8 and these files contain the DATA path to the index files.

see changes : https://groupes.renater.fr/wiki/txm-info/specs_search_engine

h3. Validation test

In a Windows TXM session:
* get new CWB binaries from: "labo_ana_corpus/..."
* rename the "registry" directory to "registré"
* update TXM preferences to use the "regsitré" directory
* restart engines
* select GRAAL corpus, word count should appear in the status bar

* Windows XP: *OK MD*
* Windows 7 32bit:
* Windows 7 64bit:
* Mac OS X:
* Linux 32bit: *OK MD*
* Linux 64bit: *OK MD*