Feature #870

TBX: X.X, Add texts start positions to the R object produced by the Corpus SendToR command

Added by Matthieu Decorde over 5 years ago. Updated over 4 years ago.

Status:Feedback Start date:06/18/2014
Priority:Normal Due date:
Assignee:- % Done:

90%

Category:Stats / R Spent time: -
Target version:TXM 0.7.7

Description

We can add 'text' structure start-end positions to the R object (Dataframe) created.

Solution

  • add a 'struct' vector of structure vectors OK
  • add a vector per text structure in the 'struct vector of name 'text' OK
  • add a 'lex' vector of lexicons (lexicons must loose their 'lex' string in their name) OK
  • each vector element is a vector of [start, end] Integer positions OK the start positions are in $structs$text$start and the end position in $struct$text$end
  • position values start at 0 for the first word of a corpus and a sub-corpus OK for subcorpus with no hole. For non-contiguous sub-corpus see #1048

Other structures can be transfered later see ticket #1031

Validation test

  • start TXM
  • select DISCOURS corpus
  • call SendToR command
  • the R command displays the 2 "start" and "end" lists:
    print(Corpus1$struct$text)
    

MD: *OK Linux64 and Mac OS X

History

#1 Updated by Serge Heiden over 5 years ago

  • Subject changed from TBX: X.X, Add corpus structure start-end positions to the result of the SendToR command to TBX: X.X, Add texts (or other structures) start-end positions to the R object produced by the Corpus SendToR command
  • Description updated (diff)

#2 Updated by Matthieu Decorde about 5 years ago

  • Description updated (diff)

#3 Updated by Matthieu Decorde about 5 years ago

  • Subject changed from TBX: X.X, Add texts (or other structures) start-end positions to the R object produced by the Corpus SendToR command to TBX: X.X, Add texts start positions to the R object produced by the Corpus SendToR command
  • Description updated (diff)
  • % Done changed from 0 to 70

#4 Updated by Matthieu Decorde about 5 years ago

  • Category set to Stats / R

#5 Updated by Matthieu Decorde almost 5 years ago

  • Description updated (diff)

#6 Updated by Matthieu Decorde almost 5 years ago

  • Target version changed from TXM 0.7.7 to TXM 0.7.6

#7 Updated by Matthieu Decorde almost 5 years ago

  • Description updated (diff)
  • % Done changed from 70 to 80

#8 Updated by Matthieu Decorde almost 5 years ago

  • Description updated (diff)

#9 Updated by Matthieu Decorde almost 5 years ago

  • Description updated (diff)

#10 Updated by Matthieu Decorde almost 5 years ago

  • % Done changed from 80 to 70

#11 Updated by Matthieu Decorde almost 5 years ago

  • Description updated (diff)
  • % Done changed from 70 to 80

#12 Updated by Sebastien Jacquot almost 5 years ago

  • Target version changed from TXM 0.7.6 to TXM 0.7.7

#13 Updated by Matthieu Decorde almost 5 years ago

  • Status changed from New to Feedback

#14 Updated by Matthieu Decorde almost 5 years ago

  • Description updated (diff)

#15 Updated by Matthieu Decorde over 4 years ago

  • Description updated (diff)
  • % Done changed from 80 to 90

#16 Updated by Sebastien Jacquot over 4 years ago

I didn't understand the validation test last step, at least here is the log when doing the first steps:

Reval : Corpus1 <- matrix(ncol=7, nrow=105191)
INT_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@26109640
Reval : Corpus1[,1] <- tmp
INT_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@3629ba4a
Reval : Corpus1[,2] <- tmp
INT_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@5ad49518
Reval : Corpus1[,3] <- tmp
INT_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@3a88893e
Reval : Corpus1[,4] <- tmp
INT_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@59ea6377
Reval : Corpus1[,5] <- tmp
INT_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@2b2847bf
Reval : Corpus1[,6] <- tmp
INT_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@617e62bb
Reval : Corpus1[,7] <- tmp
CHAR_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@308414fe
Reval : colnames(Corpus1) <- tmpcol
Reval : Corpus1 <- list(data=Corpus1)
LexiqueDISCOURS
Lexique du sous-corpus {0} calculé en {1} msDISCOURS dans42
CHAR_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@2f7e4894
Reval : Corpus1$lex$word <- tmp
LexiqueDISCOURS
Lexique du sous-corpus {0} calculé en {1} msDISCOURS dans23
CHAR_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@bcf04e8
Reval : Corpus1$lex$id <- tmp
LexiqueDISCOURS
Lexique du sous-corpus {0} calculé en {1} msDISCOURS dans19
CHAR_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@444b2166
Reval : Corpus1$lex$sid <- tmp
LexiqueDISCOURS
Lexique du sous-corpus {0} calculé en {1} msDISCOURS dans0
CHAR_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@207dd291
Reval : Corpus1$lex$pid <- tmp
LexiqueDISCOURS
Lexique du sous-corpus {0} calculé en {1} msDISCOURS dans15
CHAR_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@248b3e41
Reval : Corpus1$lex$pos <- tmp
LexiqueDISCOURS
Lexique du sous-corpus {0} calculé en {1} msDISCOURS dans15
CHAR_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@31a43025
Reval : Corpus1$lex$func <- tmp
LexiqueDISCOURS
Lexique du sous-corpus {0} calculé en {1} msDISCOURS dans32
CHAR_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@1254e699
Reval : Corpus1$lex$lemma <- tmp
INT_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@5a1f3ec6
Reval : Corpus1$structs$text$start <- text_limits
INT_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@39526763
Reval : Corpus1$structs$text$end <- text_limits
DISCOURS >> Corpus1

Also available in: Atom PDF