Feature #1548: RCP: X.X, add XTZ import module - Plateforme TXM - Forge du Centre Blaise Pascal

Feature #1548

RCP: X.X, add XTZ import module

Ajouté par Matthieu Decorde il y a environ 10 ans. Mis à jour il y a presque 2 ans.

Statut:

Closed

Début:

02/10/2015

Priorité:

Normal

Echéance:

Assigné à:

-

% réalisé:

100%

Catégorie:

Import

Temps passé:

-

Version cible:

Description

See full specification (listing three other tickets related to XTZ): https://groupes.renater.fr/wiki/txm-info/public/import_xtz

copy XML/w+CSV import to XTZ+CSV
- entry menu
- scripts in scripts/import
add new source directory sub-directories management
- 'dtd' sub-directory contains the dtd files to use with XSLs.
  - See http://docs.oracle.com/javase/7/docs/api/javax/xml/stream/XMLInputFactory.html
- 'css' sub-directory contains the css files to use with HTML pages in editions
  - MD: the pager must declare the css files in each HTML page with a path "css/cssfilename.css"
  - MD: the css directory must be copied next to the HTML pages for each edition (Groovy or XSL)
- 'xsl' sub-directory contains different types of XSL sub-directories (if a directory is absent or empty it is not used)
  - '1-split-merge' sub-subdirectory containing an XSL stylesheet used to split or merge source files to adapt them to the TXM corpus model (1 text = 1 file)
    - this XSL receives a "binary-src-dir-path" parameter with a path to write result files
    - the standard XSL output file of this stylesheet is not used
    - examples: split-texts.xsl or merge-files.xsl
  - '2-front' sub-sub-directory containing the XSL stylesheets to process the sources at the beginning of the import process (replaces the 'front XSL' section mecanism). The XSL are applied in the lexicographical order of their file names.
    - examples: txm-filter-teip5-xmlw-preserve.xsl
  - '3-posttok' sub-sub-directory containing the XSL stylesheets to process the xml-txm representation of the sources after the tokenization phase (all words are encoded). The XSL are applied in the lexicographical order of their file names.
    - examples: reduce-caesura.xsl, build-word-ref.xsl
  - '4-edition' sub-sub-directory containing the XSL stylesheets to build the HTML edition from the xml-txm representation using the pagination done by the pager. The XSL are applied in the lexicographical order of their file names.
    - example: in order, 1-default-html.xsl, 2-default-pager.xsl, to build the 'default' edition followed by, 3-facs-html.xsl, 4-facs-html.xsl to build the facsimile edition to hold the images
    - all XSL receive the following parameters: "number-words-per-page", "pagination-element", "import-xml-path".
      - Note: this XSL parameters are not mandatory (MD: tested)
    - The XSL file writes the first word ID in each HTML file produced :
```
<meta name="description" content="{id du 1er lmot}"/>
```
      . If there is no word in the page, then the "content" value is "w_0"
    - Their file name is used to name the edition produced
- all sub-directories are copied to the binary corpus
modify the import form :
- add section "Plans textuels"
  - liste des balises codant le hors-texte (ni indexé ni édité) (transform to Regexp)
    - MD : ajout d'un paramètre d'import "element.ignored.always" (anciennement codé dans des fichiers properties)
  - liste des balises codant le hors-texte à éditer (affichées dans l'édition) (transform to Regexp)
    - MD : ajout d'un paramètre d'import "element.edited.only" (anciennement codé dans des fichiers properties)
- remove "front XSL" section
  - note: "add parameter" is broken
- move "font" section after "editions" and before "commands"
- modify Éditions section
  - "Editions" -> "Éditions"
  - add 'images' URI declaration (see below)
```
[x] Construire l'édition

Nombre de mots par page [500]    Élément de pagination [pb]
Répertoire local d'images de facsimilés [...]
```

Import script update strategy :
- if the import script is missing, the script is retrieved from the Toolbox jar Groovy files
- if the script exits and its date is older than the Toolbox jar script, then it is replaced
transfert the edition macros into the XTZ import module

Later

Integrate the XMLText2MetadataCSV macro content to pull metadata from teiHeaders directly.

Demandes liées

Historique

#1 Mis à jour par Matthieu Decorde il y a environ 10 ans

Description mis à jour (diff)

#2 Mis à jour par Serge Heiden il y a environ 10 ans

Description mis à jour (diff)

#3 Mis à jour par Serge Heiden il y a environ 10 ans

Description mis à jour (diff)

#4 Mis à jour par Serge Heiden il y a environ 10 ans

Description mis à jour (diff)

#5 Mis à jour par Alexey Lavrentev il y a environ 10 ans

Description mis à jour (diff)

#6 Mis à jour par Alexey Lavrentev il y a environ 10 ans

Description mis à jour (diff)

#7 Mis à jour par Matthieu Decorde il y a environ 10 ans

Description mis à jour (diff)

#8 Mis à jour par Matthieu Decorde il y a environ 10 ans

Description mis à jour (diff)

#9 Mis à jour par Matthieu Decorde il y a environ 10 ans

Description mis à jour (diff)

#10 Mis à jour par Matthieu Decorde il y a environ 10 ans

Description mis à jour (diff)

#11 Mis à jour par Matthieu Decorde il y a environ 10 ans

Description mis à jour (diff)

#12 Mis à jour par Matthieu Decorde il y a environ 10 ans

Description mis à jour (diff)

#13 Mis à jour par Matthieu Decorde il y a presque 10 ans

Description mis à jour (diff)

#14 Mis à jour par Matthieu Decorde il y a presque 10 ans

% réalisé changé de 0 à 70

#15 Mis à jour par Matthieu Decorde il y a presque 10 ans

Description mis à jour (diff)

#16 Mis à jour par Matthieu Decorde il y a presque 10 ans

Description mis à jour (diff)

#17 Mis à jour par Matthieu Decorde il y a plus de 9 ans

% réalisé changé de 70 à 80

#18 Mis à jour par Serge Heiden il y a plus de 9 ans

Description mis à jour (diff)

#19 Mis à jour par Sebastien Jacquot il y a presque 2 ans

Statut changé de New à Closed

#20 Mis à jour par Sebastien Jacquot il y a presque 2 ans

% réalisé changé de 80 à 100

Formats disponibles : Atom PDF