Bug #2096

RCP: 0.7.8, ODT/DOC/RTF + CSV import module breaks on source file name containing special characters (blanks?)

Added by Serge Heiden over 6 years ago. Updated over 4 years ago.

Status:New Start date:04/01/2017
Priority:Normal Due date:
Assignee:- % Done:

0%

Category:Import Spent time: -
Target version:TXM X.X

Description

In the following configuration:

  TXM    0.7.8.201702141439    org.txm.rcp    null

Importing a DOCX file named "ANTRACT - proposition détaillée V4.docx" produces the following error:

Sauvegarde des paramètres d'importation...
-- CONVERTER - Converting source files
.** ODT TO TEI /home/sheiden/TXM/corpora/ANTRACT/txm/ANTRACT/workflowdocx8561635276831476395sfsdf.odt /home/sheiden/TXM/corpora/ANTRACT/txm/ANTRACT/ANTRACT - proposition détaillée V4.xml
Error 
  I/O error reported by XML parser processing
  /home/sheiden/TXM/corpora/ANTRACT/txm/ANTRACT/files-ANTRACT - proposition détaillée
  V4.xml/content.xml: no protocol:
  /home/sheiden/TXM/corpora/ANTRACT/txm/ANTRACT/files-ANTRACT - proposition détaillée
  V4.xml/content.xml
DOCX to ODT to TEI failed: /home/sheiden/Documents/antract/antract/ANTRACT - proposition détaillée V4.docx: net.sf.saxon.trans.XPathException: I/O error reported by XML parser processing /home/sheiden/TXM/corpora/ANTRACT/txm/ANTRACT/files-ANTRACT - proposition détaillée V4.xml/content.xml: no protocol: /home/sheiden/TXM/corpora/ANTRACT/txm/ANTRACT/files-ANTRACT - proposition détaillée V4.xml/content.xml
avr. 01, 2017 1:31:04 PM org.artofsolving.jodconverter.office.ProcessPoolOfficeManager stop
INFOS: stopping
avr. 01, 2017 1:31:04 PM org.artofsolving.jodconverter.office.OfficeConnection$1 disposing
INFOS: disconnected: 'socket,host=127.0.0.1,port=2002,tcpNoDelay=1'
avr. 01, 2017 1:31:05 PM org.artofsolving.jodconverter.office.ManagedOfficeProcess doEnsureProcessExited
INFOS: process exited with code 0
avr. 01, 2017 1:31:05 PM org.artofsolving.jodconverter.office.ProcessPoolOfficeManager stop
INFOS: stopped

Retrieving data folders and style files
[/home/sheiden/TXM/corpora/ANTRACT/txm/ANTRACT/files-ANTRACT - proposition détaillée V4.xml, /home/sheiden/TXM/corpora/ANTRACT/txm/ANTRACT/ANTRACT - proposition détaillée V4.xml]
[Fatal Error] ANTRACT - proposition détaillée V4.xml:1:1: Fin prématurée du fichier.
** Erreur lors de l'exécution du script groovy : org.xml.sax.SAXParseException; systemId: file:///home/sheiden/TXM/corpora/ANTRACT/txm/ANTRACT/ANTRACT - proposition détaillée V4.xml; lineNumber: 1; columnNumber: 1; Fin prématurée du fichier.
Moteur de recherche lancé.

If the file is renamed ANTRACTV4.docx, the import works correctly.

Diagnostics

  • the Saxon "no protocol" XPath URL error suggests a problem in addressing the file for opening.
  • Remark: event though the extracted file from DOCX is finally empty, the import module continues -> this is an error, the import script should diagnoze the situation and stop on error

History

#1 Updated by Serge Heiden over 6 years ago

  • File deleted (bug-xml-editor-grammar.png)

#2 Updated by Sebastien Jacquot about 5 years ago

  • Target version changed from TXM 0.8.0a (split/restructuration) to TXM 0.8.0

#3 Updated by Matthieu Decorde over 4 years ago

  • Target version changed from TXM 0.8.0 to TXM X.X

Also available in: Atom PDF