Bug #2869
TIGERSearch, managing directories containing accents
Status: | New | Start date: | 07/02/2020 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | - | % Done: | 0% |
|
Category: | SearchEngine | Spent time: | - | |
Target version: | TXM 0.8.4 |
Description
TIGERSearch fails to open corpus configuration (corpus_config.xml file ) which path contains accents (e.g. "Télécharger").
Solution 0¶
Display an error message if the path contains accents and abort the command.
Sample code to detect accents:
import com.ibm.icu.text.Transliterator // remove accents from characters ICU Transform, see http://userguide.icu-project.org/transforms/general removeAccentsTransform = "NFD; [:M:] Remove; NFC" path_no_accents = Transliterator.getInstance(removeAccentsTransform).transform(path) if (!(path == path_no_accents)) then accents
Solution 1¶
Change the way XML files are opened&read in TIGERSearch core libraries and update the libraries of the TIGERSearch TXM extension
History
#1 Updated by Matthieu Decorde almost 3 years ago
- Category set to SearchEngine
#2 Updated by Serge Heiden about 2 years ago
- Description updated (diff)
#3 Updated by Matthieu Decorde about 2 years ago
- Target version changed from TXM 0.8.2 to TXM 0.8.4