Bug #3243
XML import modules, force non ASCII chars in element names or attributes names before CQP format
Status: | New | Start date: | 04/22/2022 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | - | % Done: | 0% |
|
Category: | Import | Spent time: | - | |
Target version: | TXM 0.8.4 |
Description
Currently, if an XML element is named with an accent, CQP breaks on the structure attribute value access when using the corpus Properties command.
For example, source:
<répondant>
We get:
** Échec de lecture des valeurs de la propriété de structure répondant_n : org.txm.searchengine.cqp.clientExceptions.CqiClientException: org.txm.searchengine.cqp.serverException.CqiClErrorInternal: Stacktrace: [1] org.txm.searchengine.cqp.corpus.StructuralUnitProperty. getValues StructuralUnitProperty.java, 139 [2] org.txm.properties.core.functions.CorpusPropertiesComputer.stepStructuralUnits CorpusPropertiesComputer.java, 199 [3] org.txm.properties.core.functions.CorpusPropertiesComputer. _compute CorpusPropertiesComputer.java, 464
Solution¶
Force XML elements and attributes names normalization by a Unicode character conversion method.