Bug #2520

RCP: 0.7.9, CQP recursive structures management in conflict with some structure names given in sources, for all XML based import modules

Added by Serge Heiden over 2 years ago. Updated 4 months ago.

Status:New Start date:03/20/2019
Priority:Normal Due date:
Assignee:- % Done:

0%

Category:Import Spent time: -
Target version:TXM 0.8.3

Description

Currently CQP manages tag recursion by renaming nested tags (and attributes) with a suffix index: tag/tag/tag -> tag/tag1/tag2.

So in TEI context, cwb-encode renames nested div tags with div1, div2, etc. depending on the depth of the div.

If the sources contain div1 or div2, etc. tags, there is a name conflict, at least in the declaration of the structures in the REGISTRY:

attributes:setup_attribute(): Warning: 
  Attribute div1 of type Structural Attribute already defined in corpus teig
REGISTRY ERROR (../registry/teig): Structure attribute div1 declared twice -- semantic error
REGISTRY ERROR (../registry/teig): Parse Error.

extract of the registry file:

$ grep -i div1 ../registry/teig 
# <div1 n=".."> ... </div1>
STRUCTURE div1
STRUCTURE div1_n               # [annotations]
# (3 levels of embedding: <div>, <div1>, <div2>, <div3>).
STRUCTURE div1

-> and the corpus is not useable.

Solution 0

Always use the 'div0123456todiv.xsl' XSLT stylesheet in the 2-front step of XTZ for XML-TEI encoded sources.

Solution 1

Don't declare structures twice.

Solution 2

Re-implement recursion management to prevent such conflicts.

History

#1 Updated by Serge Heiden 8 months ago

  • Description updated (diff)

#2 Updated by Matthieu Decorde 4 months ago

  • Target version changed from TXM 0.8.2 to TXM 0.8.3

Also available in: Atom PDF