Bug #2869: TIGERSearch, managing directories containing accents - Plateforme TXM - Forge du Centre Blaise Pascal

Bug #2869

TIGERSearch, managing directories containing accents

Ajouté par Matthieu Decorde il y a plus de 5 ans. Mis à jour il y a plus de 4 ans.

Statut:

New

Début:

02/07/2020

Priorité:

Normal

Echéance:

Assigné à:

% réalisé:

Catégorie:

SearchEngine

Temps passé:

Version cible:

TXM 0.8.4

Description

TIGERSearch fails to open corpus configuration (corpus_config.xml file ) which path contains accents (e.g. "Télécharger").

Solution 0¶

Display an error message if the path contains accents and abort the command.

Sample code to detect accents:

import com.ibm.icu.text.Transliterator

// remove accents from characters ICU Transform, see http://userguide.icu-project.org/transforms/general
removeAccentsTransform = "NFD; [:M:] Remove; NFC" 
path_no_accents = Transliterator.getInstance(removeAccentsTransform).transform(path)

if (!(path == path_no_accents)) then accents

Solution 1¶

Change the way XML files are opened&read in TIGERSearch core libraries and update the libraries of the TIGERSearch TXM extension

Historique

#1 Mis à jour par Matthieu Decorde il y a environ 5 ans

Catégorie mis à SearchEngine

#2 Mis à jour par Serge Heiden il y a plus de 4 ans

Description mis à jour (diff)

#3 Mis à jour par Matthieu Decorde il y a plus de 4 ans

Version cible changé de TXM 0.8.2 à TXM 0.8.4

Formats disponibles : Atom PDF

Laboratoire ICAR » Plateforme TXM

Demandes

Rapports personnalisés