Bug #3243

XML import modules, force non ASCII chars in element names or attributes names before CQP format

Ajouté par Serge Heiden il y a plus de 3 ans.

Statut:New Début:22/04/2022
Priorité:Normal Echéance:
Assigné à:- % réalisé:

0%

Catégorie:Import Temps passé: -
Version cible:TXM 0.8.4

Description

Currently, if an XML element is named with an accent, CQP breaks on the structure attribute value access when using the corpus Properties command.

For example, source:

<répondant>

We get:

** Échec de lecture des valeurs de la propriété de structure répondant_n : org.txm.searchengine.cqp.clientExceptions.CqiClientException: org.txm.searchengine.cqp.serverException.CqiClErrorInternal: 
Stacktrace: 
[1]      org.txm.searchengine.cqp.corpus.StructuralUnitProperty.          getValues  StructuralUnitProperty.java, 139
[2]  org.txm.properties.core.functions.CorpusPropertiesComputer.stepStructuralUnits  CorpusPropertiesComputer.java, 199
[3]  org.txm.properties.core.functions.CorpusPropertiesComputer.           _compute  CorpusPropertiesComputer.java, 464

Solution

Force XML elements and attributes names normalization by a Unicode character conversion method.

Formats disponibles : Atom PDF