Feature #1548

RCP: X.X, add XTZ import module

Added by Matthieu Decorde about 4 years ago. Updated over 3 years ago.

Status:New Start date:10/02/2015
Priority:Normal Due date:
Assignee:- % Done:

80%

Category:Import Spent time: -
Target version:TXM 0.7.8

Description

See full specification (listing three other tickets related to XTZ): https://groupes.renater.fr/wiki/txm-info/public/import_xtz

  • copy XML/w+CSV import to XTZ+CSV
    • entry menu
    • scripts in scripts/import
  • add new source directory sub-directories management
    • 'dtd' sub-directory contains the dtd files to use with XSLs.
    • 'css' sub-directory contains the css files to use with HTML pages in editions
      • MD: the pager must declare the css files in each HTML page with a path "css/cssfilename.css"
      • MD: the css directory must be copied next to the HTML pages for each edition (Groovy or XSL)
    • 'xsl' sub-directory contains different types of XSL sub-directories (if a directory is absent or empty it is not used)
      • '1-split-merge' sub-subdirectory containing an XSL stylesheet used to split or merge source files to adapt them to the TXM corpus model (1 text = 1 file)
        • this XSL receives a "binary-src-dir-path" parameter with a path to write result files
        • the standard XSL output file of this stylesheet is not used
        • examples: split-texts.xsl or merge-files.xsl
      • '2-front' sub-sub-directory containing the XSL stylesheets to process the sources at the beginning of the import process (replaces the 'front XSL' section mecanism). The XSL are applied in the lexicographical order of their file names.
        • examples: txm-filter-teip5-xmlw-preserve.xsl
      • '3-posttok' sub-sub-directory containing the XSL stylesheets to process the xml-txm representation of the sources after the tokenization phase (all words are encoded). The XSL are applied in the lexicographical order of their file names.
        • examples: reduce-caesura.xsl, build-word-ref.xsl
      • '4-edition' sub-sub-directory containing the XSL stylesheets to build the HTML edition from the xml-txm representation using the pagination done by the pager. The XSL are applied in the lexicographical order of their file names.
        • example: in order, 1-default-html.xsl, 2-default-pager.xsl, to build the 'default' edition followed by, 3-facs-html.xsl, 4-facs-html.xsl to build the facsimile edition to hold the images
        • all XSL receive the following parameters: "number-words-per-page", "pagination-element", "import-xml-path".
          • Note: this XSL parameters are not mandatory (MD: tested)
        • The XSL file writes the first word ID in each HTML file produced :
          <meta name="description" content="{id du 1er lmot}"/>
          . If there is no word in the page, then the "content" value is "w_0"
        • Their file name is used to name the edition produced
    • all sub-directories are copied to the binary corpus
  • modify the import form :
    • add section "Plans textuels"
      • liste des balises codant le hors-texte (ni indexé ni édité) (transform to Regexp)
        • MD : ajout d'un paramètre d'import "element.ignored.always" (anciennement codé dans des fichiers properties)
      • liste des balises codant le hors-texte à éditer (affichées dans l'édition) (transform to Regexp)
        • MD : ajout d'un paramètre d'import "element.edited.only" (anciennement codé dans des fichiers properties)
    • remove "front XSL" section
      • note: "add parameter" is broken
    • move "font" section after "editions" and before "commands"
    • modify Éditions section
      • "Editions" -> "Éditions"
      • add 'images' URI declaration (see below)
        [x] Construire l'édition
        
        Nombre de mots par page [500]    Élément de pagination [pb]
        Répertoire local d'images de facsimilés [...]
        
  • Import script update strategy :
    • if the import script is missing, the script is retrieved from the Toolbox jar Groovy files
    • if the script exits and its date is older than the Toolbox jar script, then it is replaced
  • transfert the edition macros into the XTZ import module

Later

Integrate the XMLText2MetadataCSV macro content to pull metadata from teiHeaders directly.


Related issues

related to Feature #1701: RCP: X.X, facsimile edition of XTZ import module New 10/02/2015

History

#1 Updated by Matthieu Decorde about 4 years ago

  • Description updated (diff)

#2 Updated by Serge Heiden about 4 years ago

  • Description updated (diff)

#3 Updated by Serge Heiden about 4 years ago

  • Description updated (diff)

#4 Updated by Serge Heiden about 4 years ago

  • Description updated (diff)

#5 Updated by Alexey Lavrentev about 4 years ago

  • Description updated (diff)

#6 Updated by Alexey Lavrentev about 4 years ago

  • Description updated (diff)

#7 Updated by Matthieu Decorde about 4 years ago

  • Description updated (diff)

#8 Updated by Matthieu Decorde about 4 years ago

  • Description updated (diff)

#9 Updated by Matthieu Decorde about 4 years ago

  • Description updated (diff)

#10 Updated by Matthieu Decorde about 4 years ago

  • Description updated (diff)

#11 Updated by Matthieu Decorde about 4 years ago

  • Description updated (diff)

#12 Updated by Matthieu Decorde about 4 years ago

  • Description updated (diff)

#13 Updated by Matthieu Decorde about 4 years ago

  • Description updated (diff)

#14 Updated by Matthieu Decorde about 4 years ago

  • % Done changed from 0 to 70

#15 Updated by Matthieu Decorde about 4 years ago

  • Description updated (diff)

#16 Updated by Matthieu Decorde about 4 years ago

  • Description updated (diff)

#17 Updated by Matthieu Decorde over 3 years ago

  • % Done changed from 70 to 80

#18 Updated by Serge Heiden over 3 years ago

  • Description updated (diff)

Also available in: Atom PDF