Bug #2096
RCP: 0.7.8, ODT/DOC/RTF + CSV import module breaks on source file name containing special characters (blanks?)
Status: | New | Start date: | 04/01/2017 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | - | % Done: | 0% |
|
Category: | Import | Spent time: | - | |
Target version: | TXM X.X |
Description
In the following configuration:
TXM 0.7.8.201702141439 org.txm.rcp null
Importing a DOCX file named "ANTRACT - proposition détaillée V4.docx" produces the following error:
Sauvegarde des paramètres d'importation... -- CONVERTER - Converting source files .** ODT TO TEI /home/sheiden/TXM/corpora/ANTRACT/txm/ANTRACT/workflowdocx8561635276831476395sfsdf.odt /home/sheiden/TXM/corpora/ANTRACT/txm/ANTRACT/ANTRACT - proposition détaillée V4.xml Error I/O error reported by XML parser processing /home/sheiden/TXM/corpora/ANTRACT/txm/ANTRACT/files-ANTRACT - proposition détaillée V4.xml/content.xml: no protocol: /home/sheiden/TXM/corpora/ANTRACT/txm/ANTRACT/files-ANTRACT - proposition détaillée V4.xml/content.xml DOCX to ODT to TEI failed: /home/sheiden/Documents/antract/antract/ANTRACT - proposition détaillée V4.docx: net.sf.saxon.trans.XPathException: I/O error reported by XML parser processing /home/sheiden/TXM/corpora/ANTRACT/txm/ANTRACT/files-ANTRACT - proposition détaillée V4.xml/content.xml: no protocol: /home/sheiden/TXM/corpora/ANTRACT/txm/ANTRACT/files-ANTRACT - proposition détaillée V4.xml/content.xml avr. 01, 2017 1:31:04 PM org.artofsolving.jodconverter.office.ProcessPoolOfficeManager stop INFOS: stopping avr. 01, 2017 1:31:04 PM org.artofsolving.jodconverter.office.OfficeConnection$1 disposing INFOS: disconnected: 'socket,host=127.0.0.1,port=2002,tcpNoDelay=1' avr. 01, 2017 1:31:05 PM org.artofsolving.jodconverter.office.ManagedOfficeProcess doEnsureProcessExited INFOS: process exited with code 0 avr. 01, 2017 1:31:05 PM org.artofsolving.jodconverter.office.ProcessPoolOfficeManager stop INFOS: stopped Retrieving data folders and style files [/home/sheiden/TXM/corpora/ANTRACT/txm/ANTRACT/files-ANTRACT - proposition détaillée V4.xml, /home/sheiden/TXM/corpora/ANTRACT/txm/ANTRACT/ANTRACT - proposition détaillée V4.xml] [Fatal Error] ANTRACT - proposition détaillée V4.xml:1:1: Fin prématurée du fichier. ** Erreur lors de l'exécution du script groovy : org.xml.sax.SAXParseException; systemId: file:///home/sheiden/TXM/corpora/ANTRACT/txm/ANTRACT/ANTRACT - proposition détaillée V4.xml; lineNumber: 1; columnNumber: 1; Fin prématurée du fichier. Moteur de recherche lancé.
If the file is renamed ANTRACTV4.docx, the import works correctly.
Diagnostics¶
- the Saxon "no protocol" XPath URL error suggests a problem in addressing the file for opening.
- Remark: event though the extracted file from DOCX is finally empty, the import module continues -> this is an error, the import script should diagnoze the situation and stop on error
History
#1 Updated by Serge Heiden over 6 years ago
- File deleted (
bug-xml-editor-grammar.png)
#2 Updated by Sebastien Jacquot about 5 years ago
- Target version changed from TXM 0.8.0a (split/restructuration) to TXM 0.8.0
#3 Updated by Matthieu Decorde over 4 years ago
- Target version changed from TXM 0.8.0 to TXM X.X