Feature #1548: RCP: X.X, add XTZ import module - Plateforme TXM - Forge du Centre Blaise Pascal

Feature #1548

Mis à jour par Matthieu Decorde il y a presque 10 ans

* copy XML/w+CSV import to XTZ+CSV
** entry menu
** scripts in scripts/import
* add new source directory sub-directories management
** 'dtd' sub-directory contains the dtd files to use with XSLs
** 'css' sub-directory contains the css files to use with HTML pages in editions
** 'xsl' sub-directory contains different types of XSL xsl sub-directories (if a directory is absent or empty it is not used)
*** '1-split-merge' sub-subdirectory containing an XSL xsl stylesheet used to split or merge source files to adapt them to the TXM corpus model (1 text = 1 file)
**** this XSL xsl receives a "binary-src-dir-path" parameter with a path to write result files
**** the standard XSL output file of this stylesheet *is not used*
**** examples: split-texts.xsl or merge-files.xsl
*** '2-front' sub-sub-directory containing the XSL xsl stylesheets to process the sources at the beginning of the import process (replaces the 'front XSL' section mecanism). The XSL xsl are applied in the lexicographical order of their file names.
**** examples: txm-filter-teip5-xmlw-preserve.xsl
*** '3-posttok' sub-sub-directory containing the XSL xsl stylesheets to process the xml-txm representation of the sources after the tokenization phase (all words are encoded). The XSL xsl are applied in the lexicographical order of their file names.
**** examples: reduce-caesura.xsl, build-word-ref.xsl
*** '4-edition' sub-sub-directory containing the XSL xsl stylesheets to build the HTML edition from the xml-txm representation using the pagination done by the pager. The XSL xsl are applied in the lexicographical order of their file names. Their file name is used to name the edition produced.
**** example: in order, 1-default-html.xsl, 2-default-pager.xsl, to build the 'default' edition followed by, 3-facs-html.xsl, 4-facs-html.xsl to build the facsimile edition to hold the images
**** all XSL xsl receive the following parameters: "number-words-per-page", "pagination-element", "import-xml-path".
**** The XSL file writes the first word ID in each HTML file produced : <pre><meta name="description" content="{id du 1er lmot}"/></pre>. If there is no word in the page, then the "content" value is "w_0"
** all sub-directories are copied to the binary corpus
* modify the import form :
** add section "Plans textuels"
*** liste des balises codant le hors-texte (transform to Regexp)
*** liste des balises codant le hors-texte à éditer (affichées dans l'édition) (transform to Regexp)
** remove "front XSL" section
*** note: "add parameter" is broken
** move "font" section after "editions" and before "commands"
** modify Éditions section
*** "Editions" -> "Éditions"
*** add 'images' URI declaration (see below)
<pre>
[x] Construire l'édition

Nombre de mots par page [500] Élément de pagination [pb]
Répertoire local d'images de facsimilés [...]
</pre>

* transfert the edition macros into the XTZ import module

*Later*

Integrate the XMLText2MetadataCSV macro content to pull metadata from teiHeaders directly.

Retour

Laboratoire ICAR » Plateforme TXM

Feature #1548