Bug #2869

TIGERSearch, managing directories containing accents

Added by Matthieu Decorde 12 months ago. Updated 9 days ago.

Status:New Start date:07/02/2020
Priority:Normal Due date:
Assignee:- % Done:

0%

Category:SearchEngine Spent time: -
Target version:TXM 0.8.3

Description

TIGERSearch fails to open corpus configuration (corpus_config.xml file ) which path contains accents (e.g. "Télécharger").

Solution 0

Display an error message if the path contains accents and abort the command.

Sample code to detect accents:

import com.ibm.icu.text.Transliterator

// remove accents from characters ICU Transform, see http://userguide.icu-project.org/transforms/general
removeAccentsTransform = "NFD; [:M:] Remove; NFC" 
path_no_accents = Transliterator.getInstance(removeAccentsTransform).transform(path)

if (!(path == path_no_accents)) then accents

Solution 1

Change the way XML files are opened&read in TIGERSearch core libraries and update the libraries of the TIGERSearch TXM extension

History

#1 Updated by Matthieu Decorde 10 months ago

  • Category set to SearchEngine

#2 Updated by Serge Heiden 3 months ago

  • Description updated (diff)

#3 Updated by Matthieu Decorde 9 days ago

  • Target version changed from TXM 0.8.2 to TXM 0.8.3

Also available in: Atom PDF