Bug #2793

XTZ, XML/w and Transcriber Import, XLSX and ODS metadata files not supported when XSL directory is present

Added by Alexey Lavrentev over 1 year ago. Updated 5 months ago.

Status:New Start date:04/08/2020
Priority:Urgent Due date:
Assignee:- % Done:

80%

Category:Import Spent time: -
Target version:TXM 0.8.2

Description

This is due to the fact that TXM tries to run XSLT transformations on XSLX and ODS files. The module works properly with metadata.csv.

Démarrage de TXM 0.8.0.2221 (2019-08-30 14h42)…
TXM est prêt.
The JOUBERTXTZOG corpus will be created from the /home/alavrent/Bureau/Ex5/joubert-xtz-og directory.
The 'annotate' import parameter has been activated since TreeTagger is installed.
Sauvegarde des paramètres d'importation…
Démarrage du script d'import Groovy xtzLoader.groovy.
[[id, auteur, titre, extrait, date, ville], [joubert1579_1-02, Joubert, Laurent, Erreurs populaires, Livre 1, ch. 2, 1579, Bordeaux], [joubert1587_1-02, Joubert, Laurent, Erreurs populaires, Livre 1, ch. 2, 1587, Paris]]
-- Split-Merge XSL Step with /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/1-split-merge
-- Front XSL Step with the /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/2-front directory.
ApplyXsl2 with the /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/2-front/01-txm-front-teip5-og-xtz-joubert-removeAncor.xsl stylesheet.
-- Applying /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/2-front/01-txm-front-teip5-og-xtz-joubert-removeAncor.xsl XSL to 3 (from /home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src) files with parameters: {output-directory=file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/txm/JOUBERTXTZOG/} on directory /home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src result written in /home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/txm/JOUBERTXTZOG
003 .Error on line 1 column 1 of metadata.xlsx:
  SXXP0003: Error reported by XML parser: Contenu non autorisé dans le prologue.
net.sf.saxon.trans.XPathException: org.xml.sax.SAXParseException; systemId: file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src/metadata.xlsx; lineNumber: 1; columnNumber: 1; Contenu non autorisé dans le prologue.
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:425)
    at net.sf.saxon.event.Sender.send(Sender.java:178)
    at net.sf.saxon.Controller.transform(Controller.java:1790)
    at org.txm.importer.ApplyXsl2.process(ApplyXsl2.java:304)
    at org.txm.importer.ApplyXsl2.processImportSources(ApplyXsl2.java:437)
    at org.txm.importer.ApplyXsl2.processImportSources(ApplyXsl2.java:363)
    at org.txm.importer.ApplyXsl2$processImportSources.call(Unknown Source)
    ...
    at org.txm.core.engines.ScriptedImportEngine.build(ScriptedImportEngine.java:56)
    at org.txm.objects.Project._compute(Project.java:320)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2224)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2143)
    at org.txm.rcp.handlers.scripts.ExecuteImportScript$2.run(ExecuteImportScript.java:146)
    at org.eclipse.core.internal.jobs.Worker.run(Worker.java:56)
Caused by: org.xml.sax.SAXParseException; systemId: file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src/metadata.xlsx; lineNumber: 1; columnNumber: 1; Contenu non autorisé dans le prologue.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
    ...
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:405)
    ... 37 more
---------
org.xml.sax.SAXParseException; systemId: file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src/metadata.xlsx; lineNumber: 1; columnNumber: 1; Contenu non autorisé dans le prologue.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
    ...
    at org.txm.scripts.importer.xtz.xtzLoader.run(xtzLoader.groovy:58)
    at groovy.util.GroovyScriptEngine.run(GroovyScriptEngine.java:599)
    at org.txm.groovy.core.GroovyScriptedImportEngine._build(GroovyScriptedImportEngine.java:123)
    at org.txm.core.engines.ScriptedImportEngine.build(ScriptedImportEngine.java:56)
    at org.txm.objects.Project._compute(Project.java:320)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2224)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2143)
    at org.txm.rcp.handlers.scripts.ExecuteImportScript$2.run(ExecuteImportScript.java:146)
    at org.eclipse.core.internal.jobs.Worker.run(Worker.java:56)

ODS :

Sauvegarde des paramètres d'importation…
Démarrage du script d'import Groovy xtzLoader.groovy.
Warning: the 7the column name is empty
-- Split-Merge XSL Step with /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/1-split-merge
-- Front XSL Step with the /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/2-front directory.
ApplyXsl2 with the /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/2-front/01-txm-front-teip5-og-xtz-joubert-removeAncor.xsl stylesheet.
-- Applying /home/alavrent/Bureau/Ex5/joubert-xtz-og/xsl/2-front/01-txm-front-teip5-og-xtz-joubert-removeAncor.xsl XSL to 3 (from /home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src) files with parameters: {output-directory=file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/txm/JOUBERTXTZOG/} on directory /home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src result written in /home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/txm/JOUBERTXTZOG
003 ...Error on line 1 column 1 of metadata.ods:
  SXXP0003: Error reported by XML parser: Contenu non autorisé dans le prologue.
net.sf.saxon.trans.XPathException: org.xml.sax.SAXParseException; systemId: file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src/metadata.ods; lineNumber: 1; columnNumber: 1; Contenu non autorisé dans le prologue.
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:425)
    at net.sf.saxon.event.Sender.send(Sender.java:178)
    at net.sf.saxon.Controller.transform(Controller.java:1790)
    at org.txm.importer.ApplyXsl2.process(ApplyXsl2.java:304)
    at org.txm.importer.ApplyXsl2.processImportSources(ApplyXsl2.java:437)
    at org.txm.importer.ApplyXsl2.processImportSources(ApplyXsl2.java:363)
    ...
    at org.txm.scripts.importer.xtz.XTZImport.start(XTZImport.groovy:86)
    at org.txm.importer.xtz.ImportModule.process(ImportModule.java:242)
    at org.txm.importer.xtz.ImportModule$process$2.call(Unknown Source)
    at org.txm.scripts.importer.xtz.xtzLoader.run(xtzLoader.groovy:58)
    at groovy.util.GroovyScriptEngine.run(GroovyScriptEngine.java:599)
    at org.txm.groovy.core.GroovyScriptedImportEngine._build(GroovyScriptedImportEngine.java:123)
    at org.txm.core.engines.ScriptedImportEngine.build(ScriptedImportEngine.java:56)
    at org.txm.objects.Project._compute(Project.java:320)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2224)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2143)
    at org.txm.rcp.handlers.scripts.ExecuteImportScript$2.run(ExecuteImportScript.java:146)
    at org.eclipse.core.internal.jobs.Worker.run(Worker.java:56)
Caused by: org.xml.sax.SAXParseException; systemId: file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src/metadata.ods; lineNumber: 1; columnNumber: 1; Contenu non autorisé dans le prologue.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
    ...
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:405)
    ... 31 more
---------
org.xml.sax.SAXParseException; systemId: file:/home/alavrent/TXM-0.8.0/corpora/JOUBERTXTZOG/src/metadata.ods; lineNumber: 1; columnNumber: 1; Contenu non autorisé dans le prologue.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
    a...
    at org.txm.scripts.importer.xtz.XTZImport.start(XTZImport.groovy:86)
    at org.txm.importer.xtz.ImportModule.process(ImportModule.java:242)
    at org.txm.importer.xtz.ImportModule$process$2.call(Unknown Source)
    at org.txm.scripts.importer.xtz.xtzLoader.run(xtzLoader.groovy:58)
    at groovy.util.GroovyScriptEngine.run(GroovyScriptEngine.java:599)
    at org.txm.groovy.core.GroovyScriptedImportEngine._build(GroovyScriptedImportEngine.java:123)
    at org.txm.core.engines.ScriptedImportEngine.build(ScriptedImportEngine.java:56)
    at org.txm.objects.Project._compute(Project.java:320)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2224)
    at org.txm.core.results.TXMResult.compute(TXMResult.java:2143)
    at org.txm.rcp.handlers.scripts.ExecuteImportScript$2.run(ExecuteImportScript.java:146)
    at org.eclipse.core.internal.jobs.Worker.run(Worker.java:56)

Solution

update ApplyXSL file filters and fix data source selection in XML/w and XTZ import module

Validation test

- download attached archive: xslodsmetadata.zip
- import with XML/w and XTZ
- the import don't fail and only one text "t1" is present

xslodsmetadata.zip - for validation test (56.1 kB) Matthieu Decorde, 06/05/2020 12:11 pm

xslodsmetadata.zip (92.3 kB) Matthieu Decorde, 05/27/2021 11:59 am

Associated revisions

Revision 3123
Added by Matthieu Decorde 5 months ago

fix xlsx extension filtering refs #2793

History

#1 Updated by Matthieu Decorde over 1 year ago

  • Category set to Import
  • Target version changed from TXM 0.8.2 to TXM 0.8.1

#2 Updated by Matthieu Decorde over 1 year ago

  • Subject changed from RCP: 0.8.0, XTZ Import, XLSX and ODS metadata files not supported when XSL directory present to 0.8.0, XTZ Import, XLSX and ODS metadata files not supported when XSL directory is present

#3 Updated by Matthieu Decorde over 1 year ago

#4 Updated by Serge Heiden 5 months ago

  • Subject changed from 0.8.0, XTZ Import, XLSX and ODS metadata files not supported when XSL directory is present to XTZ, XML/w and Transcriber Import, XLSX and ODS metadata files not supported when XSL directory is present
  • Priority changed from Normal to Urgent
  • Target version changed from TXM 0.8.1 to TXM 0.8.2
  • % Done changed from 80 to 0

Bug still present in Transcriber import module.

#5 Updated by Matthieu Decorde 5 months ago

update attached test directory

Also available in: Atom PDF