Bug #2059
RCP: 0.7.8, fix pre-encoded word properties in XML/w+CSV
Status: | New | Start date: | 03/07/2017 | ||
---|---|---|---|---|---|
Priority: | Urgent | Due date: | |||
Assignee: | - | % Done: | 80% |
||
Category: | Import | Spent time: | - | ||
Target version: | TXM 0.8.2 |
Description
Currently, if a <w> element in an XML source pre-encodes a property possibly built by TreeTagger, the TreeTagger properties are added to the word instead of not being touched (pre-encoding has priority over on the fly annotations).
For example, the following XML source :
établissements membres et d’un organisme de recherche associé, l’INSERM. <w frpos="PUN">■</w> L’Université Claude Bernard, qui forme chaque année 40 000 étudiants dans les sciences
produces the following TXM text:
établissements membres et d’un organisme de recherche associé, l’INSERM. ■ L’Université Claude Bernard, qui forme chaque année 40 000 étudiants dans les sciences
Where the '■' word properties are :
- frpos:PUN
- n:4516
- frpos:NOM
- frlemma:■
instead of the correct following TXM text:
établissements membres et d’un organisme de recherche associé, l’INSERM. ■ L’Université Claude Bernard, qui forme chaque année 40 000 étudiants dans les sciences
Where the '■' word properties are :
- frpos:PUN
- n:4516
- frlemma:■
Solution¶
Add a new import parameter to activate or not the existing annotation correction see for details https://groupes.renater.fr/wiki/txm-info/public/annotation/tal_treetagger
History
#1 Updated by Matthieu Decorde about 6 years ago
- Priority changed from Normal to Urgent
#2 Updated by Sebastien Jacquot over 5 years ago
- Target version changed from TXM 0.8.0a (split/restructuration) to TXM 0.8.0
#3 Updated by Matthieu Decorde over 4 years ago
- Target version changed from TXM 0.8.0 to TXM 0.8.2
#4 Updated by Matthieu Decorde almost 3 years ago
- Description updated (diff)
#5 Updated by Matthieu Decorde over 2 years ago
- % Done changed from 0 to 80
There was not need for an option, if a property value is already set in thx XML source files, it is used