Bug #917

Updated by Serge Heiden almost 5 years ago

*Issue* In an old Greek corpus, alphabetical sort of right context doesn't follow [old] Greek collation rules defined by the Unicode consortium for that writing system.

The output of sorting the right context of a Concordance of [word="πυρετὸς"] with left context to 0 and right context to 1 is currently (selected lines):
<pre>
Epid_V πυρετὸς αὖθις
Epid_V πυρετὸς βληχρός
Epid_V πυρετὸς δὲ
Epid_V πυρετὸς εἶχε
Epid_V πυρετὸς εἶχεν
Epid_V πυρετὸς εἶχεν
Epid_V πυρετὸς εἶχεν
Epid_V πυρετὸς ξυνεχὴς
Epid_V πυρετὸς οὐ
Epid_V πυρετὸς οὐκ
Epid_V πυρετὸς παρείπετο
Epid_V πυρετὸς ἐπέβαλε
Epid_V πυρετὸς ἐπέλαβε
Epid_V πυρετὸς ἐπέλαβε
Epid_V πυρετὸς ἐπέλαβεν
Epid_V πυρετὸς ἐπεγίνετο
Epid_V πυρετὸς ἐπεῖχε
Epid_V πυρετὸς ἔλαβε
</pre>

But:
<pre>
Epid_V πυρετὸς εἶχε
Epid_V πυρετὸς εἶχεν
Epid_V πυρετὸς εἶχεν
Epid_V πυρετὸς εἶχεν
</pre>
lines, should immediately be immediately followed by the:
<pre>
Epid_V πυρετὸς ἐπέβαλε
Epid_V πυρετὸς ἐπέλαβε
Epid_V πυρετὸς ἐπέλαβε
Epid_V πυρετὸς ἐπέλαβεν
Epid_V πυρετὸς ἐπεγίνετο
Epid_V πυρετὸς ἐπεῖχε
Epid_V πυρετὸς ἔλαβε
</pre>
lines.

*Origin* The 'lang' property of the corpus set to 'grc' (old Greek) or to 'el' (modern Greek) in 'import.xml' binary doesn't change the Java collation rules behavior in TXM.

*Solution* Currently no solution.

*Status* We need to check Java collation system for 'grc' or 'el' languages: The following word list should be correctly sorted:
<pre>
αὖθις
βληχρός
δὲ
εἶχε
εἶχεν
ἐπέβαλε
ἐπέλαβε
ἐπέλαβεν
ἐπεγίνετο
ἐπεῖχε
ἔλαβε
ξυνεχὴς
οὐ
οὐκ
παρείπετο
</pre>
See "Unicode Collation Algorithm Demo" in Java: http://www.unicode.org/reports/tr10/Sample.

Back