Bug #1179

Updated by Serge Heiden over 4 years ago

In XML format sources, when a word is pre-encoded with a <w>...</w> tag
and the word form contains an end of line, the resulting word form is incorrect
because the end of line is just removed from the graphic form.

For example: <w>parce
que</w> que</w>> gives 'parceque' word form, instead of 'parce que'.

h3. Solution

Replace any special (white?) character (like new-line, carriage-return, tabulation...) by space at tokenization level.