Feature #2363

Updated by Matthieu Decorde 9 months ago

Currently, the "-no-unknown" TreeTagger option is used by default in TXM. This option is not accessible to the user, so it is not possible to change it (for example remove it from the TreeTagger command line). For some projects, for example OCR results quality analysis, it can be necessary to change this processing otpion. Some TreeTagger options like this one should be manageable by the end user.

h3. Solution

* build two sets of TreeTagger options:
** a) private internal TreeTagger options -> that we don't want the user to change to prevent breaking TXM tools
** b) public TreeTagger options (preferences) -> that the user can change
*** -no-unknown should be placed in this set
*** debug (show TreeTagger messages)
*** cap-heuristic (Look up unknown capitalized words in the list of lower-case words)
*** hyphen-heuristic (Turn on the heuristics fur guessing the parts of speech of unknown hyphenated words)
*** prob (Print tag probabilities)
*** lex (Read auxiliary lexicon entries from file <f>)
*** wc (Read a word-class automaton from file <f>)

* pre-set the 'TreeTagger command line options preferences' field with the 'public TreeTagger options'

Back