Feature #2363

TBX: 0.7.9, move some TreeTagger default options to public preferences

Added by Serge Heiden over 1 year ago. Updated over 1 year ago.

Status:New Start date:04/09/2018
Priority:Normal Due date:
Assignee:- % Done:

70%

Category:TAL Spent time: -
Target version:TXM Profiterole 1.0

Description

Currently, the "-no-unknown" TreeTagger option is used by default in TXM. This option is not accessible to the user, so it is not possible to change it (for example remove it from the TreeTagger command line). For some projects, for example OCR results quality analysis, it can be necessary to change this processing otpion. Some TreeTagger options like this one should be manageable by the end user.

Solution

  • build two sets of TreeTagger options:
    • a) private internal TreeTagger options -> that we don't want the user to change to prevent breaking TXM tools
    • b) public TreeTagger options (preferences) -> that the user can change
      • -no-unknown should be placed in this set
      • debug (show TreeTagger messages)
      • cap-heuristic (Look up unknown capitalized words in the list of lower-case words)
      • hyphen-heuristic (Turn on the heuristics fur guessing the parts of speech of unknown hyphenated words)
      • prob (Print tag probabilities)
      • lex (Read auxiliary lexicon entries from file <f>)
      • wc (Read a word-class automaton from file <f>)
  • pre-set the 'TreeTagger command line options preferences' field with the 'public TreeTagger options'

History

#1 Updated by Serge Heiden over 1 year ago

  • Description updated (diff)

#2 Updated by Matthieu Decorde over 1 year ago

  • Description updated (diff)
  • % Done changed from 0 to 70

Also available in: Atom PDF