Feature #2363

TBX: 0.7.9, move some TreeTagger default options to public preferences

Ajouté par Serge Heiden il y a 11 mois. Mis à jour il y a 6 mois.

Statut:New Début:09/04/2018
Priorité:Normal Echéance:
Assigné à:- % réalisé:

70%

Catégorie:TAL Temps passé: -
Version cible:TXM Profiterole 1.0

Description

Currently, the "-no-unknown" TreeTagger option is used by default in TXM. This option is not accessible to the user, so it is not possible to change it (for example remove it from the TreeTagger command line). For some projects, for example OCR results quality analysis, it can be necessary to change this processing otpion. Some TreeTagger options like this one should be manageable by the end user.

Solution

  • build two sets of TreeTagger options:
    • a) private internal TreeTagger options -> that we don't want the user to change to prevent breaking TXM tools
    • b) public TreeTagger options (preferences) -> that the user can change
      • -no-unknown should be placed in this set
      • debug (show TreeTagger messages)
      • cap-heuristic (Look up unknown capitalized words in the list of lower-case words)
      • hyphen-heuristic (Turn on the heuristics fur guessing the parts of speech of unknown hyphenated words)
      • prob (Print tag probabilities)
      • lex (Read auxiliary lexicon entries from file <f>)
      • wc (Read a word-class automaton from file <f>)
  • pre-set the 'TreeTagger command line options preferences' field with the 'public TreeTagger options'

Historique

#1 Mis à jour par Serge Heiden il y a 11 mois

  • Description mis à jour (diff)

#2 Mis à jour par Matthieu Decorde il y a 6 mois

  • Description mis à jour (diff)
  • % réalisé changé de 0 à 70

Formats disponibles : Atom PDF