Bug #1488

TBX: 0.7.7, allow XML special characters entities in word or structure properties in all XML based import modules

Added by Serge Heiden about 4 years ago. Updated over 3 years ago.

Status:New Start date:09/10/2015
Priority:Normal Due date:
Assignee:- % Done:

70%

Category:Import Spent time: -
Target version:TXM 0.7.8

Description

Currently, the Compiler translates any XML entities found in XML attribute values like '<', '"' or '&' to the native character (like '<', '"' or '&') which breaks the CWB syntax.

Note: those entities are built from the sources by previous steps of the import process. If the entities are already present in the sources, they pass to the CWB format without translation and the import is OK.

Solution

  • a) keep the entities in the CWB output
  • b) change CWB syntax to allow < (change
    • MD: the "-x" option is already set
    • MD: all TXM 0.7.8 import modules now write "&"s and "<"s

History

#1 Updated by Matthieu Decorde about 4 years ago

  • Target version changed from TXM 0.7.8 to TXM 0.8.0a (split/restructuration)

#2 Updated by Serge Heiden about 4 years ago

  • Description updated (diff)

#3 Updated by Serge Heiden about 4 years ago

  • Target version changed from TXM 0.8.0a (split/restructuration) to TXM 0.7.8

#4 Updated by Serge Heiden about 4 years ago

  • Subject changed from TBX: 0.7.7, allow < character in word or structure properties in all XML based import modules to TBX: 0.7.7, allow XML special characters entities in word or structure properties in all XML based import modules
  • Description updated (diff)

#5 Updated by Matthieu Decorde almost 4 years ago

  • Description updated (diff)
  • % Done changed from 0 to 80

#6 Updated by Serge Heiden almost 4 years ago

What about '"' ?
  • it is an XML reserved character
  • it is sensible to encode XML element attributes

#7 Updated by Serge Heiden over 3 years ago

  • % Done changed from 80 to 70

Also available in: Atom PDF