Feature #3051
Tokenizer, separate the XML parsing and the String tokenization processes
Status: | New | Start date: | 04/09/2021 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | - | % Done: | 0% |
|
Category: | Annotation | Spent time: | - | |
Target version: | TXM - Eltec 1.0 |
Description
The TXM tokenizer class (SimpleXMLTokenizer) must be splitted in 2 classes :
- SimpleXMLTokenizer
- SimpleStringTokenizer
SimpleXMLTokenizer using by default the SimpleStringTokenizer class to tokenize text.
This will allow to work with another StringTokenizer (likethe UDpipe tokenizer)
Related issues
History
#1 Updated by Matthieu Decorde about 2 years ago
- Subject changed from Tokenizer, separate XML parsing from the String tokenization process to Tokenizer, separate the XML parsing and the String tokenization processes