Feature #3353

Corpus, sample texts to n first words

Added by Serge Heiden 3 months ago. Updated about 1 month ago.

Status:New Start date:03/14/2023
Priority:Normal Due date:
Assignee:- % Done:

0%

Category:Corpus Spent time: -
Target version:TXM 0.8.4

Description

Help to sample a corpus at:

a)
  • import
    • cut texts at n first words after tokenization
      • add 'Sampling' section in import parameters form
      • add 'Sample texts to [ ] first words' parameter
      • add 'Cut at sentence boundaries' option parameter

or

b)
  • update
    • add new corpus command 'Sample texts at n first words' (on XML-TXM pivot)
      • add 'Number of words' parameter
      • add 'Cut at sentence boundaries' option parameter
      • update corpus

or

c)
  • update
    • add new corpus command 'Sample texts from sub-corpus' (on XML-TXM pivot from sub-corpus matches)
      • for example with sub-corpus built with query <text> []{1,10000} and MatchingStrategy set at 'longest'
      • update corpus

History

#1 Updated by Serge Heiden about 1 month ago

  • Description updated (diff)
  • Target version changed from TXM 0.8.4 to TXM 0.8.3

#2 Updated by Matthieu Decorde about 1 month ago

  • Target version changed from TXM 0.8.3 to TXM 0.8.4

Also available in: Atom PDF