Task #3342
Load, corpus restoration, ignore broken/deleted corpora
Status: | New | Start date: | 02/15/2023 | ||
---|---|---|---|---|---|
Priority: | Normal | Due date: | |||
Assignee: | - | % Done: | 80% |
||
Category: | Corpus | Spent time: | - | ||
Target version: | TXM 0.8.3 |
Description
Currently, TXM tries to restore deleted corpora and throws a stacktrace before stopping the load.
Here is a typical sample console log:
Le corpus binaire /home/sheiden/TXM-0.8.2/corpora/CAPITAINEFRACASSE est au format 0.8.0. Le corpus binaire /home/sheiden/TXM-0.8.2/corpora/CONLLU-TEST est au format 0.8.0. ** Erreur : le dossier d'entrée /home/sheiden/TXM-0.8.2/corpora/CONLLU-TEST n'est pas conforme au format de corpus binaire de TXM : corpus ignoré. TXM a besoin des dossiers /home/sheiden/TXM-0.8.2/corpora/CONLLU-TEST/HTML, /home/sheiden/TXM-0.8.2/corpora/CONLLU-TEST/data, /home/sheiden/TXM-0.8.2/corpora/CONLLU-TEST/registry et /home/sheiden/TXM-0.8.2/corpora/CONLLU-TEST/.settings. ** Échec du chargement du corpus à partir du dossier /home/sheiden/TXM-0.8.2/corpora/CONLLU-TEST. Impossible de trouver les composants nécessaires. Stacktrace: [1] org.txm.rcp.commands.workspace.Load080BinaryCorpus. loadBinaryCorpusAsDirectory Load080BinaryCorpus.java, 158 [2] org.txm.rcp.commands.workspace.LoadBinaryCorpus. loadBinaryCorpusAsDirectory LoadBinaryCorpus.java, 362 [3] org.txm.rcp.commands.workspace.LoadBinaryCorporaDirectory.loadBinaryCorpusFromCorporaDirectory LoadBinaryCorporaDirectory.java, 380 [4] org.txm.rcp.commands.workspace.LoadBinaryCorporaDirectory$1. run LoadBinaryCorporaDirectory.java, 168 Le corpus binaire /home/sheiden/TXM-0.8.2/corpora/ELEMENTS-HOBBES est au format 0.8.0. Le corpus binaire /home/sheiden/TXM-0.8.2/corpora/ELEMENTS-HOBBES-03-09 est au format 0.8.0.
The CONLLU-TEST
corpus should be ignored or at least no stacktrace should be displayed (it is not a TXM internal error).
Solution¶
Detect if the directory is well-formed before trying to load the corpus (the directory is skipped if malformed)
History
#1 Updated by Matthieu Decorde about 1 month ago
- Subject changed from Load, corpus restoration, ignore deleted corpora to Load, corpus restoration, ignore broken/deleted corpora
- Description updated (diff)
- % Done changed from 0 to 80