Feature #2050: RCP: X.X, improve XTZ import module - Plateforme TXM - Forge du Centre Blaise Pascal

Feature #2050

Mis à jour par Alexey Lavrentev il y a plus de 8 ans

This ticket groups various features that need improvement in the XTZ module.

# Use $pagination-element and not pb when building the facs edition
# Simplify producing multi-facetted editions (from XML-TXM format)
#* it is currently very complicated to keep tags (e.g page breaks) inside non default edition facet
# Check if all word-elements have @id if not re-tokenizing (or do not use w/@id for back-to-text)
# Handle XML-TXM as input format
#* currently XML-TXM can be imported via XTZ module (word properties and default editions are correct) but injection of morpho-syntactic annotation is broken due to nesting txm:form elements
#* TXM should detect txm:form and txm:ana child nodes of the word element and transfer them correctly to XML-TXM
# Handle nesting word-level elements
#* currently, if you have num/w in the source file, nesting w elements are created in the XML-TXM file
#* if nesting word-level elements are detected, only the lowest level should be considered as token by TXM
# implement alternative ways of defining text order in the corpus
## textorder column in metadata.csv (currently implemented)
## text/@textorder
## xpath
# implement definining metadata through XPath (as in current TEI-BFM module)

Retour

Laboratoire ICAR » Plateforme TXM

Feature #2050