Task #3342

Load, corpus restoration, ignore broken/deleted corpora

Added by Serge Heiden 4 months ago. Updated about 1 month ago.

Status:New Start date:02/15/2023
Priority:Normal Due date:
Assignee:- % Done:

80%

Category:Corpus Spent time: -
Target version:TXM 0.8.3

Description

Currently, TXM tries to restore deleted corpora and throws a stacktrace before stopping the load.

Here is a typical sample console log:

Le corpus binaire /home/sheiden/TXM-0.8.2/corpora/CAPITAINEFRACASSE est au format 0.8.0.
Le corpus binaire /home/sheiden/TXM-0.8.2/corpora/CONLLU-TEST est au format 0.8.0.
** Erreur : le dossier d'entrée /home/sheiden/TXM-0.8.2/corpora/CONLLU-TEST n'est pas conforme au format de corpus binaire de TXM : corpus ignoré.
TXM a besoin des dossiers /home/sheiden/TXM-0.8.2/corpora/CONLLU-TEST/HTML, /home/sheiden/TXM-0.8.2/corpora/CONLLU-TEST/data, /home/sheiden/TXM-0.8.2/corpora/CONLLU-TEST/registry et /home/sheiden/TXM-0.8.2/corpora/CONLLU-TEST/.settings.
** Échec du chargement du corpus à partir du dossier /home/sheiden/TXM-0.8.2/corpora/CONLLU-TEST. Impossible de trouver les composants nécessaires.
Stacktrace: 
[1]           org.txm.rcp.commands.workspace.Load080BinaryCorpus.         loadBinaryCorpusAsDirectory  Load080BinaryCorpus.java, 158
[2]              org.txm.rcp.commands.workspace.LoadBinaryCorpus.         loadBinaryCorpusAsDirectory  LoadBinaryCorpus.java, 362
[3]    org.txm.rcp.commands.workspace.LoadBinaryCorporaDirectory.loadBinaryCorpusFromCorporaDirectory  LoadBinaryCorporaDirectory.java, 380
[4]  org.txm.rcp.commands.workspace.LoadBinaryCorporaDirectory$1.                                 run  LoadBinaryCorporaDirectory.java, 168
Le corpus binaire /home/sheiden/TXM-0.8.2/corpora/ELEMENTS-HOBBES est au format 0.8.0.
Le corpus binaire /home/sheiden/TXM-0.8.2/corpora/ELEMENTS-HOBBES-03-09 est au format 0.8.0.

The CONLLU-TEST corpus should be ignored or at least no stacktrace should be displayed (it is not a TXM internal error).

Solution

Detect if the directory is well-formed before trying to load the corpus (the directory is skipped if malformed)

History

#1 Updated by Matthieu Decorde about 1 month ago

  • Subject changed from Load, corpus restoration, ignore deleted corpora to Load, corpus restoration, ignore broken/deleted corpora
  • Description updated (diff)
  • % Done changed from 0 to 80

Also available in: Atom PDF