Bug #763
RCP: 0.7.5: Fix concordance export memory usage
Status: | Closed | Start date: | 04/24/2014 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | - | % Done: | 100% |
|
Category: | Commands | Spent time: | - | |
Target version: | TXM 0.7.6 |
Description
The current Concordance export loads all the concordance Lines in memory before writing them into the CSV file which leads to memory exhaustion for voluminous concordance results.
Solution¶
A solution is to write the concordance lines per packet, letting early packets to be garbage collected to keep memory consumption level.
2 development steps :
A- Create a macro (to provide a rapid answer)
B- Fix Concordance.toTxt(...) in sources and produce an update
Recette macro (FR)¶
- Télécharger l'archive de la macro (pièce jointe)
- copier le dossier "export" que l'archive contient dans le dossier des macros de TXM ($TXMHOME/scripts/macro
(et non "macros" !)).
- faire une concordance de "[]" sur BROWN
- lancer la macro sur la concordance
- vérifier que le nombre de lignes (wc -l) est le nombre de résultat + 1 (l'entête)
Recette maj (FR)¶
- ...
- faire une concordance de "[]" sur BROWN
- lancer la macro sur la concordance
- vérifier que le nombre de lignes (wc -l) est le nombre de résultat + 1 (l'entête)
History
#1 Updated by Matthieu Decorde almost 9 years ago
Packet size will be set to 5000.
R code to visualize the experimental export time graph (in milliseconds) in terms of packet size (line), for a concordance of 900k lines on a standard Linux workstation:
size <- c(10,100,1000,5000,10000,50000,100000) time <- c(57152,50090,49592,49525,50394,53320,58106) plot(size, time,type="p")
#2 Updated by Matthieu Decorde almost 9 years ago
- Description updated (diff)
#3 Updated by Serge Heiden almost 9 years ago
- Description updated (diff)
#4 Updated by Matthieu Decorde almost 9 years ago
- % Done changed from 0 to 50
A macro fixing the bug is currenlty under test
#5 Updated by Alexey Lavrentev almost 9 years ago
- Description updated (diff)
#6 Updated by Sebastien Jacquot almost 9 years ago
It worked very well with the "Brown" corpora.
Number of lines in concordance: 1 161 028
Number of lines in .tsv file: 1 161 029
Elapsed time: 106621 ms
#7 Updated by Matthieu Decorde almost 9 years ago
- % Done changed from 50 to 80
#8 Updated by Sebastien Jacquot over 8 years ago
- % Done changed from 80 to 90
#9 Updated by Matthieu Decorde over 8 years ago
- % Done changed from 90 to 100
#10 Updated by Matthieu Decorde over 8 years ago
- Status changed from New to Closed