Bug #2520
RCP: 0.7.9, CQP recursive structures management in conflict with some structure names given in sources, for all XML based import modules
Status: | New | Start date: | 03/20/2019 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | - | % Done: | 0% |
|
Category: | Import | Spent time: | - | |
Target version: | TXM 0.8.2 |
Description
Currently cwb-encode renames nested div tags with div1, div2, etc. depending on the depth of the div.
If the sources contain div1 or div2, etc. tags, there is a name conflict, at least in the declaration of the structures in the REGISTRY:
attributes:setup_attribute(): Warning: Attribute div1 of type Structural Attribute already defined in corpus teig REGISTRY ERROR (../registry/teig): Structure attribute div1 declared twice -- semantic error REGISTRY ERROR (../registry/teig): Parse Error.
extract of the registry file:
$ grep -i div1 ../registry/teig # <div1 n=".."> ... </div1> STRUCTURE div1 STRUCTURE div1_n # [annotations] # (3 levels of embedding: <div>, <div1>, <div2>, <div3>). STRUCTURE div1
-> and the corpus is not useable.
Solution 0¶
Always use the 'div0123456todiv.xsl' XSLT stylesheet in the 2-front step of XTZ for XML-TEI encoded sources.
Solution 1¶
Don't declare structures twice.
Solution 2¶
Re-implement recursion management to prevent such conflicts.