Feature #870
TBX: X.X, Add texts start positions to the R object produced by the Corpus SendToR command
Status: | Feedback | Start date: | 06/18/2014 | ||
---|---|---|---|---|---|
Priority: | Normal | Due date: | |||
Assignee: | - | % Done: | 90% |
||
Category: | Stats / R | Spent time: | - | ||
Target version: | TXM 0.7.7 |
Description
We can add 'text' structure start-end positions to the R object (Dataframe) created.
Solution¶
- add a 'struct' vector of structure vectors OK
- add a vector per text structure in the 'struct vector of name 'text' OK
- add a 'lex' vector of lexicons (lexicons must loose their 'lex' string in their name) OK
- each vector element is a vector of [start, end] Integer positions OK the start positions are in
$structs$text$start
and the end position in$struct$text$end
- position values start at 0 for the first word of a corpus and a sub-corpus OK for subcorpus with no hole. For non-contiguous sub-corpus see #1048
Other structures can be transfered later see ticket #1031
Validation test¶
- start TXM
- select DISCOURS corpus
- call SendToR command
- the R command displays the 2 "start" and "end" lists:
print(Corpus1$struct$text)
MD: *OK Linux64 and Mac OS X
History
#1 Updated by Serge Heiden over 6 years ago
- Subject changed from TBX: X.X, Add corpus structure start-end positions to the result of the SendToR command to TBX: X.X, Add texts (or other structures) start-end positions to the R object produced by the Corpus SendToR command
- Description updated (diff)
#2 Updated by Matthieu Decorde over 6 years ago
- Description updated (diff)
#3 Updated by Matthieu Decorde over 6 years ago
- Subject changed from TBX: X.X, Add texts (or other structures) start-end positions to the R object produced by the Corpus SendToR command to TBX: X.X, Add texts start positions to the R object produced by the Corpus SendToR command
- Description updated (diff)
- % Done changed from 0 to 70
#4 Updated by Matthieu Decorde over 6 years ago
- Category set to Stats / R
#5 Updated by Matthieu Decorde over 6 years ago
- Description updated (diff)
#6 Updated by Matthieu Decorde over 6 years ago
- Target version changed from TXM 0.7.7 to TXM 0.7.6
#7 Updated by Matthieu Decorde over 6 years ago
- Description updated (diff)
- % Done changed from 70 to 80
#8 Updated by Matthieu Decorde over 6 years ago
- Description updated (diff)
#9 Updated by Matthieu Decorde over 6 years ago
- Description updated (diff)
#10 Updated by Matthieu Decorde over 6 years ago
- % Done changed from 80 to 70
#11 Updated by Matthieu Decorde over 6 years ago
- Description updated (diff)
- % Done changed from 70 to 80
#12 Updated by Sebastien Jacquot over 6 years ago
- Target version changed from TXM 0.7.6 to TXM 0.7.7
#13 Updated by Matthieu Decorde over 6 years ago
- Status changed from New to Feedback
#14 Updated by Matthieu Decorde over 6 years ago
- Description updated (diff)
#15 Updated by Matthieu Decorde almost 6 years ago
- Description updated (diff)
- % Done changed from 80 to 90
#16 Updated by Sebastien Jacquot almost 6 years ago
I didn't understand the validation test last step, at least here is the log when doing the first steps:
Reval : Corpus1 <- matrix(ncol=7, nrow=105191) INT_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@26109640 Reval : Corpus1[,1] <- tmp INT_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@3629ba4a Reval : Corpus1[,2] <- tmp INT_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@5ad49518 Reval : Corpus1[,3] <- tmp INT_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@3a88893e Reval : Corpus1[,4] <- tmp INT_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@59ea6377 Reval : Corpus1[,5] <- tmp INT_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@2b2847bf Reval : Corpus1[,6] <- tmp INT_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@617e62bb Reval : Corpus1[,7] <- tmp CHAR_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@308414fe Reval : colnames(Corpus1) <- tmpcol Reval : Corpus1 <- list(data=Corpus1) LexiqueDISCOURS Lexique du sous-corpus {0} calculé en {1} msDISCOURS dans42 CHAR_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@2f7e4894 Reval : Corpus1$lex$word <- tmp LexiqueDISCOURS Lexique du sous-corpus {0} calculé en {1} msDISCOURS dans23 CHAR_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@bcf04e8 Reval : Corpus1$lex$id <- tmp LexiqueDISCOURS Lexique du sous-corpus {0} calculé en {1} msDISCOURS dans19 CHAR_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@444b2166 Reval : Corpus1$lex$sid <- tmp LexiqueDISCOURS Lexique du sous-corpus {0} calculé en {1} msDISCOURS dans0 CHAR_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@207dd291 Reval : Corpus1$lex$pid <- tmp LexiqueDISCOURS Lexique du sous-corpus {0} calculé en {1} msDISCOURS dans15 CHAR_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@248b3e41 Reval : Corpus1$lex$pos <- tmp LexiqueDISCOURS Lexique du sous-corpus {0} calculé en {1} msDISCOURS dans15 CHAR_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@31a43025 Reval : Corpus1$lex$func <- tmp LexiqueDISCOURS Lexique du sous-corpus {0} calculé en {1} msDISCOURS dans32 CHAR_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@1254e699 Reval : Corpus1$lex$lemma <- tmp INT_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@5a1f3ec6 Reval : Corpus1$structs$text$start <- text_limits INT_VECTOR_ADDED_TO_WORKSPACE[Ljava.lang.Object;@39526763 Reval : Corpus1$structs$text$end <- text_limits DISCOURS >> Corpus1