Bug #2793

0.8.0, XTZ Import, XLSX and ODS metadata files not supported when XSL directory is present

Added by Alexey Lavrentev 7 months ago. Updated 5 months ago.

Status:New Start date:04/08/2020
Priority:Normal Due date:
Assignee:- % Done:

80%

Category:Import Spent time: -
Target version:TXM 0.8.1

Description

This is due to the fact that TXM tries to run XSLT transformations on XSLX and ODS files. The module works properly with metadata.csv.

Démarrage de TXM 0.8.0.2221 (2019-08-30 14h42)…
TXM est prêt.
The JOUBERTXTZOG corpus will be created from the /home/alavrent/Bureau/Ex5/joubert-xtz-og directory.
The 'annotate' import parameter has been activated since TreeTagger is installed.
Sauvegarde des paramètres d'importation…
Démarrage du script d'import Groovy xtzLoader.groovy.
[[id, auteur, titre, extrait, date, ville], [joubert1579_1-02, Joubert, Laurent, Erreurs populaires, Livre 1, ch. 2, 1579, Bordeaux], [joubert1587_1-02, Joubert, Laurent, Erreurs populaires, Livre 1, ch. 2, 1587, Paris]]
-- Split-Merge XSL Step with /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/1-split-merge
-- Front XSL Step with the /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/2-front directory.
ApplyXsl2 with the /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/2-front/01-txm-front-teip5-og-xtz-joubert-removeAncor.xsl stylesheet.
-- Applying /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/2-front/01-txm-front-teip5-og-xtz-joubert-removeAncor.xsl XSL to 3 (from /home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src) files with parameters: {output-directory=file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/txm/JOUBERTXTZOG/} on directory /home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src result written in /home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/txm/JOUBERTXTZOG
003 .Error on line 1 column 1 of metadata.xlsx:
  SXXP0003: Error reported by XML parser: Contenu non autorisé dans le prologue.
net.sf.saxon.trans.XPathException: org.xml.sax.SAXParseException; systemId: file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src/metadata.xlsx; lineNumber: 1; columnNumber: 1; Contenu non autorisé dans le prologue.
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:425)
    at net.sf.saxon.event.Sender.send(Sender.java:178)
    at net.sf.saxon.Controller.transform(Controller.java:1790)
    at org.txm.importer.ApplyXsl2.process(ApplyXsl2.java:304)
    at org.txm.importer.ApplyXsl2.processImportSources(ApplyXsl2.java:437)
    at org.txm.importer.ApplyXsl2.processImportSources(ApplyXsl2.java:363)
    at org.txm.importer.ApplyXsl2$processImportSources.call(Unknown Source)
    ...
    at org.txm.core.engines.ScriptedImportEngine.build(ScriptedImportEngine.java:56)
    at org.txm.objects.Project._compute(Project.java:320)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2224)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2143)
    at org.txm.rcp.handlers.scripts.ExecuteImportScript$2.run(ExecuteImportScript.java:146)
    at org.eclipse.core.internal.jobs.Worker.run(Worker.java:56)
Caused by: org.xml.sax.SAXParseException; systemId: file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src/metadata.xlsx; lineNumber: 1; columnNumber: 1; Contenu non autorisé dans le prologue.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
    ...
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:405)
    ... 37 more
---------
org.xml.sax.SAXParseException; systemId: file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src/metadata.xlsx; lineNumber: 1; columnNumber: 1; Contenu non autorisé dans le prologue.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
    ...
    at org.txm.scripts.importer.xtz.xtzLoader.run(xtzLoader.groovy:58)
    at groovy.util.GroovyScriptEngine.run(GroovyScriptEngine.java:599)
    at org.txm.groovy.core.GroovyScriptedImportEngine._build(GroovyScriptedImportEngine.java:123)
    at org.txm.core.engines.ScriptedImportEngine.build(ScriptedImportEngine.java:56)
    at org.txm.objects.Project._compute(Project.java:320)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2224)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2143)
    at org.txm.rcp.handlers.scripts.ExecuteImportScript$2.run(ExecuteImportScript.java:146)
    at org.eclipse.core.internal.jobs.Worker.run(Worker.java:56)

ODS :

Sauvegarde des paramètres d'importation…
Démarrage du script d'import Groovy xtzLoader.groovy.
Warning: the 7the column name is empty
-- Split-Merge XSL Step with /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/1-split-merge
-- Front XSL Step with the /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/2-front directory.
ApplyXsl2 with the /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/2-front/01-txm-front-teip5-og-xtz-joubert-removeAncor.xsl stylesheet.
-- Applying /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/2-front/01-txm-front-teip5-og-xtz-joubert-removeAncor.xsl XSL to 3 (from /home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src) files with parameters: {output-directory=file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/txm/JOUBERTXTZOG/} on directory /home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src result written in /home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/txm/JOUBERTXTZOG
003 ...Error on line 1 column 1 of metadata.ods:
  SXXP0003: Error reported by XML parser: Contenu non autorisé dans le prologue.
net.sf.saxon.trans.XPathException: org.xml.sax.SAXParseException; systemId: file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src/metadata.ods; lineNumber: 1; columnNumber: 1; Contenu non autorisé dans le prologue.
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:425)
    at net.sf.saxon.event.Sender.send(Sender.java:178)
    at net.sf.saxon.Controller.transform(Controller.java:1790)
    at org.txm.importer.ApplyXsl2.process(ApplyXsl2.java:304)
    at org.txm.importer.ApplyXsl2.processImportSources(ApplyXsl2.java:437)
    at org.txm.importer.ApplyXsl2.processImportSources(ApplyXsl2.java:363)
    ...
    at org.txm.scripts.importer.xtz.XTZImport.start(XTZImport.groovy:86)
    at org.txm.importer.xtz.ImportModule.process(ImportModule.java:242)
    at org.txm.importer.xtz.ImportModule$process$2.call(Unknown Source)
    at org.txm.scripts.importer.xtz.xtzLoader.run(xtzLoader.groovy:58)
    at groovy.util.GroovyScriptEngine.run(GroovyScriptEngine.java:599)
    at org.txm.groovy.core.GroovyScriptedImportEngine._build(GroovyScriptedImportEngine.java:123)
    at org.txm.core.engines.ScriptedImportEngine.build(ScriptedImportEngine.java:56)
    at org.txm.objects.Project._compute(Project.java:320)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2224)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2143)
    at org.txm.rcp.handlers.scripts.ExecuteImportScript$2.run(ExecuteImportScript.java:146)
    at org.eclipse.core.internal.jobs.Worker.run(Worker.java:56)
Caused by: org.xml.sax.SAXParseException; systemId: file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src/metadata.ods; lineNumber: 1; columnNumber: 1; Contenu non autorisé dans le prologue.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
    ...
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:405)
    ... 31 more
---------
org.xml.sax.SAXParseException; systemId: file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src/metadata.ods; lineNumber: 1; columnNumber: 1; Contenu non autorisé dans le prologue.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
    a...
    at org.txm.scripts.importer.xtz.XTZImport.start(XTZImport.groovy:86)
    at org.txm.importer.xtz.ImportModule.process(ImportModule.java:242)
    at org.txm.importer.xtz.ImportModule$process$2.call(Unknown Source)
    at org.txm.scripts.importer.xtz.xtzLoader.run(xtzLoader.groovy:58)
    at groovy.util.GroovyScriptEngine.run(GroovyScriptEngine.java:599)
    at org.txm.groovy.core.GroovyScriptedImportEngine._build(GroovyScriptedImportEngine.java:123)
    at org.txm.core.engines.ScriptedImportEngine.build(ScriptedImportEngine.java:56)
    at org.txm.objects.Project._compute(Project.java:320)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2224)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2143)
    at org.txm.rcp.handlers.scripts.ExecuteImportScript$2.run(ExecuteImportScript.java:146)
    at org.eclipse.core.internal.jobs.Worker.run(Worker.java:56)

Solution

update ApplyXSL file filters and fix data source selection in XML/w and XTZ import module

Validation test

- download attached archive: xslodsmetadata.zip
- import with XML/w and XTZ
- the import don't fail and only one text "t1" is present

xslodsmetadata.zip - for validation test (56.1 kB) Matthieu Decorde, 06/05/2020 12:11 pm

History

#1 Updated by Matthieu Decorde 7 months ago

  • Category set to Import
  • Target version changed from TXM 0.8.2 to TXM 0.8.1

#2 Updated by Matthieu Decorde 6 months ago

  • Subject changed from RCP: 0.8.0, XTZ Import, XLSX and ODS metadata files not supported when XSL directory present to 0.8.0, XTZ Import, XLSX and ODS metadata files not supported when XSL directory is present

#3 Updated by Matthieu Decorde 5 months ago

Also available in: Atom PDF