Feature #3004

Import, Tokenizer, re-tokenize words option

Added by Matthieu Decorde about 1 month ago. Updated 27 days ago.

Status:New Start date:01/22/2021
Priority:Normal Due date:
Assignee:- % Done:

80%

Category:Import Spent time: -
Target version:TXM 0.8.2 - 13NOV 1.0

Description

If enable the Tokenizer can retokenize words already wrapped with a <w> element.

Enabled for the :
  • XTZ import
  • XML/w import
  • transcriber
labels:
  • re-tokenize pre-encoded words @flyover(Performs word segmentation within word encoding tags.)
  • re-segmenter lexicalement les mots pré-encodés @flyover(Réalise une segmentation en mots au sein des balises d'encodage de mots.)

Associated revisions

Revision 3005
Added by Matthieu Decorde 27 days ago

add the re-tokenize import parameter refs #3004

History

#1 Updated by Matthieu Decorde about 1 month ago

  • Description updated (diff)

#2 Updated by Matthieu Decorde 27 days ago

  • Description updated (diff)

#3 Updated by Matthieu Decorde 27 days ago

  • Description updated (diff)
  • % Done changed from 0 to 30

ui is ready

#4 Updated by Matthieu Decorde 27 days ago

  • % Done changed from 30 to 80

tested with XTZ, XML/w and transcriber import modules

Also available in: Atom PDF