Feature #3373
Macro, Corpus, sample texts to n first words
Statut: | Closed | Début: | 14/03/2023 | |
---|---|---|---|---|
Priorité: | Normal | Echéance: | ||
Assigné à: | - | % réalisé: | 100% |
|
Catégorie: | Corpus | Temps passé: | - | |
Version cible: | TXM 0.8.3 |
Description
Help to sample a corpus at:
a)- import
- cut texts at n first words after tokenization
- add 'Sampling/Échantillonnage' section in import parameters form
- add 'Sample texts to [ ] first words' parameter
- add 'Cut at sentence boundaries (inclusive)' option parameter
- cut texts at n first words after tokenization
or
b)- update
- add new corpus command 'Sample texts at n first words' (on XML-TXM pivot)
- add 'Number of words' parameter
- add 'Cut at sentence boundaries (inclusive)' option parameter
- update corpus
- add new corpus command 'Sample texts at n first words' (on XML-TXM pivot)
or
c)- update
- add new corpus command 'Sample texts from sub-corpus' (on XML-TXM pivot from sub-corpus matches)
- for example with sub-corpus built with query
<text> []{1,10000}
and MatchingStrategy set at 'longest' - update corpus
- for example with sub-corpus built with query
- add new corpus command 'Sample texts from sub-corpus' (on XML-TXM pivot from sub-corpus matches)
Solution¶
Create the corpus/TruncateTextsAtFirstWords macro to sample the xml-txm files of a TXM corpus with one parameter : number of words to keep per text
Historique
#1 Mis à jour par Serge Heiden il y a plus de 2 ans
- Description mis à jour (diff)
#2 Mis à jour par Sebastien Jacquot il y a plus d'un an
- % réalisé changé de 80 à 100
#3 Mis à jour par Sebastien Jacquot il y a plus d'un an
- Statut changé de New à Closed