Bug #3243

XML import modules, force non ASCII chars in element names or attributes names before CQP format

Added by Serge Heiden about 1 year ago.

Status:New Start date:04/22/2022
Priority:Normal Due date:
Assignee:- % Done:

0%

Category:Import Spent time: -
Target version:TXM 0.8.4

Description

Currently, if an XML element is named with an accent, CQP breaks on the structure attribute value access when using the corpus Properties command.

For example, source:

<répondant>

We get:

** Échec de lecture des valeurs de la propriété de structure répondant_n : org.txm.searchengine.cqp.clientExceptions.CqiClientException: org.txm.searchengine.cqp.serverException.CqiClErrorInternal: 
Stacktrace: 
[1]      org.txm.searchengine.cqp.corpus.StructuralUnitProperty.          getValues  StructuralUnitProperty.java, 139
[2]  org.txm.properties.core.functions.CorpusPropertiesComputer.stepStructuralUnits  CorpusPropertiesComputer.java, 199
[3]  org.txm.properties.core.functions.CorpusPropertiesComputer.           _compute  CorpusPropertiesComputer.java, 464

Solution

Force XML elements and attributes names normalization by a Unicode character conversion method.

Also available in: Atom PDF