Bug #1978

RCP, 0.7.8, outside text to edit bug

Added by Matthieu Decorde almost 7 years ago. Updated over 6 years ago.

Status:New Start date:12/22/2016
Priority:Normal Due date:
Assignee:- % Done:

80%

Category:Import Spent time: -
Target version:TXM 0.7.8

Description

<extent>- Numéros de pages saisies : de 13r à 13r ;
            - Nombre de mots d'après la base des descripteurs : <num xml:id="nb_mots_given">115</num> ;
            - Nombre de tokens (mots et ponctuations) d'après la base des descripteurs : <num xml:id="nb_tokens_given">130</num>.
    </extent>

is tokenized after the </num> element

<extent>
- Numéros de pages saisies : de 13r à 13r ;
            - Nombre de mots d'après la base des descripteurs : <num xml:id="nb_mots_given">
115</num>
<w id="w_strasbBfm_2" n="2">;</w>
<w id="w_strasbBfm_3" n="3">-</w>
<w id="w_strasbBfm_4" n="4">Nombre</w>
<w id="w_strasbBfm_5" n="5">de</w>
<w id="w_strasbBfm_6" n="6">tokens</w>
<w id="w_strasbBfm_7" n="7">(</w>
<w id="w_strasbBfm_8" n="8">mots</w>
<w id="w_strasbBfm_9" n="9">et</w>
<w id="w_strasbBfm_10" n="10">ponctuations</w>
<w id="w_strasbBfm_11" n="11">)</w>
<w id="w_strasbBfm_12" n="12">d'</w>
<w id="w_strasbBfm_13" n="13">après</w>
<w id="w_strasbBfm_14" n="14">la</w>
<w id="w_strasbBfm_15" n="15">base</w>
<w id="w_strasbBfm_16" n="16">des</w>
<w id="w_strasbBfm_17" n="17">descripteurs</w>
<w id="w_strasbBfm_18" n="18">:</w>
<num xml:id="nb_tokens_given">
130</num>
<w id="w_strasbBfm_20" n="20">.</w>
</extent>

Solution

The word elements are identified using the "w|abbr|num" regular expression. when a word was identified the outside text boolean state variable was set to false breaking the outside text logic.

History

#1 Updated by Matthieu Decorde over 6 years ago

  • Description updated (diff)
  • % Done changed from 0 to 80

Also available in: Atom PDF