Bug #1251

RCP: 0.7.6, fix tokenizer parameters fields in import form editor

Added by Serge Heiden over 4 years ago. Updated almost 4 years ago.

Status:In Progress Start date:02/04/2015
Priority:Normal Due date:
Assignee:- % Done:

80%

Category:Import Spent time: -
Target version:TXM 0.7.7

Description

  • the word space separators field default value '[\p{Z}|\p{C}]+' is incorrect -> fix it to '[\p{Z}\p{C}]+'
    • MD: OK fixed in 0.7.7
  • the end of sentence characters field default value '[\p{Ps}|\p{Pe}|\p{Pi}|\p{Pf}|\p{Po}|\p{S}]' is incorrect -> fix it to '[\p{Ps}\p{Pe}\p{Pi}\p{Pf}\p{Po}\p{S}]'
    • MD: OK fixed in 0.7.7
  • when any tokenizer parameter field is edited, the import module is always run with the "MISSING TOKENIZER KEY: punct_strong" message, which means that orthographic sentences are never analyzed?
    • MD: OK fixed in 0.7.7, the "punct_strong" was not well updated causing this message display
  • what is the purpose of the 'OK' button? -> suggestion: remove the 'OK' button
    • MD: OK removed the "OK" button in 0.7.7

see ticket #1347 for more tokenizer import fixes

History

#1 Updated by Serge Heiden over 4 years ago

  • Description updated (diff)

#2 Updated by Serge Heiden over 4 years ago

  • Description updated (diff)

#3 Updated by Matthieu Decorde over 4 years ago

  • Description updated (diff)
  • Status changed from New to In Progress
  • % Done changed from 0 to 30

#4 Updated by Matthieu Decorde about 4 years ago

  • Description updated (diff)
  • % Done changed from 30 to 40

#5 Updated by Matthieu Decorde almost 4 years ago

  • Description updated (diff)
  • % Done changed from 40 to 80

Also available in: Atom PDF