Feature #3051: Tokenizer, separate the XML parsing and the String tokenization processes - Plateforme TXM - Forge du Centre Blaise Pascal

Feature #3051

Statut:

New

Début:

09/04/2021

Priorité:

Normal

Echéance:

Assigné à:

% réalisé:

Catégorie:

Annotation

Temps passé:

Version cible:

Description

The TXM tokenizer class (SimpleXMLTokenizer) must be splitted in 2 classes :

SimpleXMLTokenizer using by default the SimpleStringTokenizer class to tokenize text.

This will allow to work with another StringTokenizer (likethe UDpipe tokenizer)

Demandes liées

Sujet changé de Tokenizer, separate XML parsing from the String tokenization process à Tokenizer, separate the XML parsing and the String tokenization processes

Formats disponibles : Atom PDF