Feature #2042: RCP: X.X, Import annotations from Glozz corpus command - Plateforme TXM - Forge du Centre Blaise Pascal

Feature #2042

Mis à jour par Matthieu Decorde il y a plus de 7 ans

Currently, in the "Import a Glozz corpus..." command, the TXT+CSV import module tokenization seems sufficient for raw text Analec annotated texts. But for XML-TRS source files or part of files, the XML-TRS import module tokenization doesn't work.

h3. Solution

Add a "Import annotations from Glozz..." command to align Analec annotations (character positions) to a TXM corpus Analec annotations (word positions).

The algorithm must deal with missing or added characters in the base text character flow.

Retour

Laboratoire ICAR » Plateforme TXM

Feature #2042