Feature #2167

Caches on Java side some R data values

Added by Sebastien Jacquot over 2 years ago. Updated 7 days ago.

Status:New Start date:04/25/2017
Priority:Normal Due date:
Assignee:- % Done:

0%

Category:Stats / R Spent time: -
Target version:TXM 0.8.1

Description

Discuss about caches on Java side some R data values or not.
At this time, some results do that and some others don't.
Eg. At this moment it seems there is a bug with CAH and large corpus, R consumes up to 2 Go od RAM and never ends. Maybe cache can help.

Eg. Lexical Table, add cache to getFreqs() method to not query R each time frequencies are needed.
(Discuss about this because we may want the Java object reflects the R object. At this moment it is not uniform for all TXM results)

Original :

    @Override
    public List<Integer> getFreqs() {
        ArrayList<Integer> freqs = new ArrayList<Integer>();
        ArrayList<double[]> cols = new ArrayList<double[]>();
        int Nrows = this.getNRows();
        int Ncols = this.getNColumns();
        for (int i = 0; i < Ncols; i++)
            try {
                cols.add(this.getCol(i).asDoubleArray());
            } catch (RException e) {
                // TODO Auto-generated catch block
                org.txm.utils.logger.Log.printStackTrace(e);
            } catch (RWorkspaceException e) {
                // TODO Auto-generated catch block
                org.txm.utils.logger.Log.printStackTrace(e);
            } catch (StatException e) {
                // TODO Auto-generated catch block
                org.txm.utils.logger.Log.printStackTrace(e);
            }
        int sum = 0;

        for (int i = 0; i < Nrows; i++) {
            sum = 0;
            for (int j = 0; j < Ncols; j++)
                sum += (int) cols.get(j)[i];
            freqs.add(sum);
        }
        return freqs;
    }


Proposal :
    @Override
    public List<Integer> getFreqs() {
        // caching
        if(this.frequencies == null)    {
            this.frequencies = new ArrayList<Integer>();
            ArrayList<double[]> cols = new ArrayList<double[]>();
            int Nrows = this.getNRows();
            int Ncols = this.getNColumns();
            for (int i = 0; i < Ncols; i++)
                try {
                    cols.add(this.getCol(i).asDoubleArray());
                } catch (RException e) {
                    // TODO Auto-generated catch block
                    org.txm.utils.logger.Log.printStackTrace(e);
                } catch (RWorkspaceException e) {
                    // TODO Auto-generated catch block
                    org.txm.utils.logger.Log.printStackTrace(e);
                } catch (StatException e) {
                    // TODO Auto-generated catch block
                    org.txm.utils.logger.Log.printStackTrace(e);
                }
            int sum = 0;

            for (int i = 0; i < Nrows; i++) {
                sum = 0;
                for (int j = 0; j < Ncols; j++)
                    sum += (int) cols.get(j)[i];
                this.frequencies.add(sum);
            }
        }
        return this.frequencies;
    }

History

#1 Updated by Sebastien Jacquot over 2 years ago

  • Description updated (diff)

#2 Updated by Sebastien Jacquot over 2 years ago

  • Description updated (diff)

#3 Updated by Sebastien Jacquot about 1 year ago

  • Target version changed from TXM 0.8.0a (split/restructuration) to TXM 0.8.1

#4 Updated by Sebastien Jacquot 7 days ago

  • Subject changed from Lexical Table, add cache to getFreqs() method to not query R each time frequencies are needed to Caches on Java side some R data values
  • Description updated (diff)
  • Category changed from Development to Stats / R

#5 Updated by Sebastien Jacquot 7 days ago

  • Description updated (diff)

Also available in: Atom PDF