Bug #1179
Mis à jour par Serge Heiden il y a plus de 10 ans
In XML format sources, when a word is pre-encoded with a <w>...</w> tag
and the word form contains an end of line, the resulting word form is incorrect
because the end of line is just removed from the graphic form.
For example: <w>parce
que</w> gives 'parceque' word form, instead of 'parce que'.
h3. Solution 1
Replace any 'new-line' and 'tabulation' special (white?) character (like new-line, carriage-return, tabulation...) by 'space' character space at tokenization level.
h3. Solution 2
Replace any white character as defined by Java by a 'space' character.
Java white characters are defined by the "isWhitespace method":(http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-)
and the word form contains an end of line, the resulting word form is incorrect
because the end of line is just removed from the graphic form.
For example: <w>parce
que</w> gives 'parceque' word form, instead of 'parce que'.
h3. Solution 1
Replace any 'new-line' and 'tabulation' special (white?) character (like new-line, carriage-return, tabulation...) by 'space' character space at tokenization level.
h3. Solution 2
Replace any white character as defined by Java by a 'space' character.
Java white characters are defined by the "isWhitespace method":(http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-)