Feature #1636: RCP: X.X, word tag and skip tokenization import parameters - Plateforme TXM - Forge du Centre Blaise Pascal

Feature #1636

RCP: X.X, word tag and skip tokenization import parameters

Ajouté par Matthieu Decorde il y a presque 10 ans. Mis à jour il y a presque 2 ans.

Statut:

Closed

Début:

08/01/2016

Priorité:

Normal

Echéance:

Assigné à:

-

% réalisé:

100%

Catégorie:

Import

Temps passé:

-

Version cible:

Description

See specifications at https://groupes.renater.fr/wiki/txm-info/public/import_xtz#modify_the_import_form.

Add new import parameters:

word tag: specify the XML element that encode words
don't tokenize : if selected, no tokenization is done (no W element created)

Solution¶

Available only in the XTZ+CSV import.

Change the "Lexical Segmentation":

Unités lexicales

Balise de mots : w
Segmenter [o]/n
- Caractères séparateurs
  - Espaces
  - Ponctuations
- Caractères d'élision
- Caractères de fin de phrase

Lexical Units

Words tag : w
Tokenize [o]/n
- Separator characters
  - Spaces
  - Punctuations
- Elision characters
- End of sentence characters

Demandes liées

Historique

#1 Mis à jour par Matthieu Decorde il y a plus de 9 ans

% réalisé changé de 0 à 80

#2 Mis à jour par Alexey Lavrentev il y a plus de 9 ans

Description mis à jour (diff)

#3 Mis à jour par Sebastien Jacquot il y a presque 2 ans

Statut changé de New à Closed

#4 Mis à jour par Sebastien Jacquot il y a presque 2 ans

% réalisé changé de 80 à 100

Formats disponibles : Atom PDF