Bug #1179
Updated by Serge Heiden about 5 years ago
In XML format sources, when a word is pre-encoded with a <w>...</w> tag
and the word form contains an end of line, the resulting word form is incorrect
because the end of line is just removed from the graphic form.
For example: <w>parce
que</w> que</w>> gives 'parceque' word form, instead of 'parce que'.
h3. Solution
Replace any special (white?) character (like new-line, carriage-return, tabulation...) by space at tokenization level.
and the word form contains an end of line, the resulting word form is incorrect
because the end of line is just removed from the graphic form.
For example: <w>parce
que</w> que</w>> gives 'parceque' word form, instead of 'parce que'.
h3. Solution
Replace any special (white?) character (like new-line, carriage-return, tabulation...) by space at tokenization level.