Bug #528
Task #490: RCP: 0.7.5 Fix 0.7.5 beta bugs
RCP: 0.7.5, xml/w import module, some xml tags are indexed as words
Status: | Closed | Start date: | 01/17/2014 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | Matthieu Decorde | % Done: | 100% |
|
Category: | Import | Spent time: | - | |
Target version: | TXM 0.7.5 |
Description
Some xml tags from the source document appear as words in lexical indexes, e.g.
</?ab.*>in Schiller corpus (check source documents and binary corpus at /SpUV/Schiller).
The same sources were correctly imported with TXM 0.7.2 with the same parameters...
In the BVHEPISTEMON2014 corpus, such misinterpreted tags are very numerous.
History
#1 Updated by Alexey Lavrentev over 9 years ago
- Description updated (diff)
#2 Updated by Matthieu Decorde over 9 years ago
- % Done changed from 0 to 70
fix bugs in the SattributeListener class:
- structure depth
- missing properties
#3 Updated by Matthieu Decorde over 9 years ago
- Parent task set to #490
#4 Updated by Matthieu Decorde over 9 years ago
I've add a test after cwb-encode call to check if the registry file was created or not. This would help people to spot the bug.
#5 Updated by Matthieu Decorde over 9 years ago
- % Done changed from 70 to 100
#6 Updated by Matthieu Decorde over 9 years ago
- Status changed from New to Closed
#7 Updated by Matthieu Decorde about 8 years ago
- Category set to Import