Bug #2793

XTZ, XML/w and Transcriber Import, XLSX and ODS metadata files not supported when XSL directory is present

Ajouté par Alexey Lavrentev il y a plus de 5 ans. Mis à jour il y a plus d'un an.

Statut:Closed Début:08/04/2020
Priorité:Urgent Echéance:
Assigné à:- % réalisé:

100%

Catégorie:Import Temps passé: -
Version cible:TXM 0.8.2

Description

This is due to the fact that TXM tries to run XSLT transformations on XSLX and ODS files. The module works properly with metadata.csv.

Démarrage de TXM 0.8.0.2221 (2019-08-30 14h42)…
TXM est prêt.
The JOUBERTXTZOG corpus will be created from the /home/alavrent/Bureau/Ex5/joubert-xtz-og directory.
The 'annotate' import parameter has been activated since TreeTagger is installed.
Sauvegarde des paramètres d'importation…
Démarrage du script d'import Groovy xtzLoader.groovy.
[[id, auteur, titre, extrait, date, ville], [joubert1579_1-02, Joubert, Laurent, Erreurs populaires, Livre 1, ch. 2, 1579, Bordeaux], [joubert1587_1-02, Joubert, Laurent, Erreurs populaires, Livre 1, ch. 2, 1587, Paris]]
-- Split-Merge XSL Step with /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/1-split-merge
-- Front XSL Step with the /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/2-front directory.
ApplyXsl2 with the /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/2-front/01-txm-front-teip5-og-xtz-joubert-removeAncor.xsl stylesheet.
-- Applying /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/2-front/01-txm-front-teip5-og-xtz-joubert-removeAncor.xsl XSL to 3 (from /home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src) files with parameters: {output-directory=file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/txm/JOUBERTXTZOG/} on directory /home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src result written in /home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/txm/JOUBERTXTZOG
003 .Error on line 1 column 1 of metadata.xlsx:
  SXXP0003: Error reported by XML parser: Contenu non autorisé dans le prologue.
net.sf.saxon.trans.XPathException: org.xml.sax.SAXParseException; systemId: file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src/metadata.xlsx; lineNumber: 1; columnNumber: 1; Contenu non autorisé dans le prologue.
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:425)
    at net.sf.saxon.event.Sender.send(Sender.java:178)
    at net.sf.saxon.Controller.transform(Controller.java:1790)
    at org.txm.importer.ApplyXsl2.process(ApplyXsl2.java:304)
    at org.txm.importer.ApplyXsl2.processImportSources(ApplyXsl2.java:437)
    at org.txm.importer.ApplyXsl2.processImportSources(ApplyXsl2.java:363)
    at org.txm.importer.ApplyXsl2$processImportSources.call(Unknown Source)
    ...
    at org.txm.core.engines.ScriptedImportEngine.build(ScriptedImportEngine.java:56)
    at org.txm.objects.Project._compute(Project.java:320)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2224)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2143)
    at org.txm.rcp.handlers.scripts.ExecuteImportScript$2.run(ExecuteImportScript.java:146)
    at org.eclipse.core.internal.jobs.Worker.run(Worker.java:56)
Caused by: org.xml.sax.SAXParseException; systemId: file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src/metadata.xlsx; lineNumber: 1; columnNumber: 1; Contenu non autorisé dans le prologue.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
    ...
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:405)
    ... 37 more
---------
org.xml.sax.SAXParseException; systemId: file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src/metadata.xlsx; lineNumber: 1; columnNumber: 1; Contenu non autorisé dans le prologue.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
    ...
    at org.txm.scripts.importer.xtz.xtzLoader.run(xtzLoader.groovy:58)
    at groovy.util.GroovyScriptEngine.run(GroovyScriptEngine.java:599)
    at org.txm.groovy.core.GroovyScriptedImportEngine._build(GroovyScriptedImportEngine.java:123)
    at org.txm.core.engines.ScriptedImportEngine.build(ScriptedImportEngine.java:56)
    at org.txm.objects.Project._compute(Project.java:320)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2224)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2143)
    at org.txm.rcp.handlers.scripts.ExecuteImportScript$2.run(ExecuteImportScript.java:146)
    at org.eclipse.core.internal.jobs.Worker.run(Worker.java:56)

ODS :

Sauvegarde des paramètres d'importation…
Démarrage du script d'import Groovy xtzLoader.groovy.
Warning: the 7the column name is empty
-- Split-Merge XSL Step with /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/1-split-merge
-- Front XSL Step with the /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/2-front directory.
ApplyXsl2 with the /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/2-front/01-txm-front-teip5-og-xtz-joubert-removeAncor.xsl stylesheet.
-- Applying /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/2-front/01-txm-front-teip5-og-xtz-joubert-removeAncor.xsl XSL to 3 (from /home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src) files with parameters: {output-directory=file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/txm/JOUBERTXTZOG/} on directory /home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src result written in /home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/txm/JOUBERTXTZOG
003 ...Error on line 1 column 1 of metadata.ods:
  SXXP0003: Error reported by XML parser: Contenu non autorisé dans le prologue.
net.sf.saxon.trans.XPathException: org.xml.sax.SAXParseException; systemId: file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src/metadata.ods; lineNumber: 1; columnNumber: 1; Contenu non autorisé dans le prologue.
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:425)
    at net.sf.saxon.event.Sender.send(Sender.java:178)
    at net.sf.saxon.Controller.transform(Controller.java:1790)
    at org.txm.importer.ApplyXsl2.process(ApplyXsl2.java:304)
    at org.txm.importer.ApplyXsl2.processImportSources(ApplyXsl2.java:437)
    at org.txm.importer.ApplyXsl2.processImportSources(ApplyXsl2.java:363)
    ...
    at org.txm.scripts.importer.xtz.XTZImport.start(XTZImport.groovy:86)
    at org.txm.importer.xtz.ImportModule.process(ImportModule.java:242)
    at org.txm.importer.xtz.ImportModule$process$2.call(Unknown Source)
    at org.txm.scripts.importer.xtz.xtzLoader.run(xtzLoader.groovy:58)
    at groovy.util.GroovyScriptEngine.run(GroovyScriptEngine.java:599)
    at org.txm.groovy.core.GroovyScriptedImportEngine._build(GroovyScriptedImportEngine.java:123)
    at org.txm.core.engines.ScriptedImportEngine.build(ScriptedImportEngine.java:56)
    at org.txm.objects.Project._compute(Project.java:320)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2224)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2143)
    at org.txm.rcp.handlers.scripts.ExecuteImportScript$2.run(ExecuteImportScript.java:146)
    at org.eclipse.core.internal.jobs.Worker.run(Worker.java:56)
Caused by: org.xml.sax.SAXParseException; systemId: file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src/metadata.ods; lineNumber: 1; columnNumber: 1; Contenu non autorisé dans le prologue.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
    ...
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:405)
    ... 31 more
---------
org.xml.sax.SAXParseException; systemId: file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src/metadata.ods; lineNumber: 1; columnNumber: 1; Contenu non autorisé dans le prologue.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
    a...
    at org.txm.scripts.importer.xtz.XTZImport.start(XTZImport.groovy:86)
    at org.txm.importer.xtz.ImportModule.process(ImportModule.java:242)
    at org.txm.importer.xtz.ImportModule$process$2.call(Unknown Source)
    at org.txm.scripts.importer.xtz.xtzLoader.run(xtzLoader.groovy:58)
    at groovy.util.GroovyScriptEngine.run(GroovyScriptEngine.java:599)
    at org.txm.groovy.core.GroovyScriptedImportEngine._build(GroovyScriptedImportEngine.java:123)
    at org.txm.core.engines.ScriptedImportEngine.build(ScriptedImportEngine.java:56)
    at org.txm.objects.Project._compute(Project.java:320)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2224)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2143)
    at org.txm.rcp.handlers.scripts.ExecuteImportScript$2.run(ExecuteImportScript.java:146)
    at org.eclipse.core.internal.jobs.Worker.run(Worker.java:56)

Solution

update ApplyXSL file filters and fix data source selection in XML/w and XTZ import module

Validation test

- download attached archive: xslodsmetadata.zip
- import with XML/w and XTZ
- the import don't fail and only one text "t1" is present

xslodsmetadata.zip - for validation test (56,08 ko) Matthieu Decorde, 05/06/2020 12:11

xslodsmetadata.zip (92,34 ko) Matthieu Decorde, 27/05/2021 11:59

Révisions associées

Révision 3123
Ajouté par Matthieu Decorde il y a plus de 4 ans

fix xlsx extension filtering refs #2793

Historique

#1 Mis à jour par Matthieu Decorde il y a plus de 5 ans

  • Catégorie mis à Import
  • Version cible changé de TXM 0.8.2 à TXM 0.8.1

#2 Mis à jour par Matthieu Decorde il y a plus de 5 ans

  • Sujet changé de RCP: 0.8.0, XTZ Import, XLSX and ODS metadata files not supported when XSL directory present à 0.8.0, XTZ Import, XLSX and ODS metadata files not supported when XSL directory is present

#3 Mis à jour par Matthieu Decorde il y a plus de 5 ans

#4 Mis à jour par Serge Heiden il y a plus de 4 ans

  • Sujet changé de 0.8.0, XTZ Import, XLSX and ODS metadata files not supported when XSL directory is present à XTZ, XML/w and Transcriber Import, XLSX and ODS metadata files not supported when XSL directory is present
  • Priorité changé de Normal à Urgent
  • Version cible changé de TXM 0.8.1 à TXM 0.8.2
  • % réalisé changé de 80 à 0

Bug still present in Transcriber import module.

#5 Mis à jour par Matthieu Decorde il y a plus de 4 ans

update attached test directory

#6 Mis à jour par Sebastien Jacquot il y a plus d'un an

  • % réalisé changé de 80 à 100

#7 Mis à jour par Sebastien Jacquot il y a plus d'un an

  • Statut changé de New à Closed

Formats disponibles : Atom PDF