Feature #3049

UDPipe annotation engine, tokenizer

Added by Matthieu Decorde about 1 month ago.

Status:New Start date:04/09/2021
Priority:Normal Due date:
Assignee:- % Done:

0%

Category:Annotation Spent time: -
Target version:TXM - Eltec 1.0

Description

Integrate the UDPipe annotation engine in TXM like TreeTagger is.

If available, The UDPipe annotation engine can be selected in the import form instead of TreeTagger (in the Language section).

To work properly, the UD tagger needs the appropriate tokenization.

During the import process, if the UDPipe annotation engine is selected, TXM will not use its tokenisation rules but the UD pipe model tokenisation.


Related issues

related to Feature #3051: Tokenizer, separate the XML parsing and the String tokeni... New 04/09/2021

Also available in: Atom PDF