/ - Diff - Plateforme TXM - Forge du Centre Blaise Pascal

Révision 2854

     /col7 {1.000 1.000 1.000 srgb} bind def
     /col8 {0.000 0.000 0.560 srgb} bind def
     /col9 {0.000 0.000 0.690 srgb} bind def
     /col10 {0.000 0.000 0.8.020 srgb} bind def
     /col11 {0.530 0.8.010 1.000 srgb} bind def
     /col10 {0.000 0.000 0.820 srgb} bind def
     /col11 {0.530 0.810 1.000 srgb} bind def
     /col12 {0.000 0.560 0.000 srgb} bind def
     /col13 {0.000 0.690 0.000 srgb} bind def
     /col14 {0.000 0.8.020 0.000 srgb} bind def
     /col14 {0.000 0.820 0.000 srgb} bind def
     /col15 {0.000 0.560 0.560 srgb} bind def
     /col16 {0.000 0.690 0.690 srgb} bind def
     /col17 {0.000 0.8.020 0.8.020 srgb} bind def
     /col17 {0.000 0.820 0.820 srgb} bind def
     /col18 {0.560 0.000 0.000 srgb} bind def
     /col19 {0.690 0.000 0.000 srgb} bind def
     /col20 {0.8.020 0.000 0.000 srgb} bind def
     /col20 {0.820 0.000 0.000 srgb} bind def
     /col21 {0.560 0.000 0.560 srgb} bind def
     /col22 {0.690 0.000 0.690 srgb} bind def
     /col23 {0.8.020 0.000 0.8.020 srgb} bind def
     /col23 {0.820 0.000 0.820 srgb} bind def
     /col24 {0.500 0.190 0.000 srgb} bind def
     /col25 {0.630 0.250 0.000 srgb} bind def
     /col26 {0.750 0.380 0.000 srgb} bind def
     /col27 {1.000 0.500 0.500 srgb} bind def
     /col28 {1.000 0.630 0.630 srgb} bind def
     /col29 {1.000 0.750 0.750 srgb} bind def
     /col30 {1.000 0.8.080 0.8.080 srgb} bind def
     /col31 {1.000 0.8.040 0.000 srgb} bind def
     /col30 {1.000 0.880 0.880 srgb} bind def
     /col31 {1.000 0.840 0.000 srgb} bind def
     end
     save
-...
     /col7 {1.000 1.000 1.000 srgb} bind def
     /col8 {0.000 0.000 0.560 srgb} bind def
     /col9 {0.000 0.000 0.690 srgb} bind def
     /col10 {0.000 0.000 0.8.020 srgb} bind def
     /col11 {0.530 0.8.010 1.000 srgb} bind def
     /col10 {0.000 0.000 0.820 srgb} bind def
     /col11 {0.530 0.810 1.000 srgb} bind def
     /col12 {0.000 0.560 0.000 srgb} bind def
     /col13 {0.000 0.690 0.000 srgb} bind def
     /col14 {0.000 0.8.020 0.000 srgb} bind def
     /col14 {0.000 0.820 0.000 srgb} bind def
     /col15 {0.000 0.560 0.560 srgb} bind def
     /col16 {0.000 0.690 0.690 srgb} bind def
     /col17 {0.000 0.8.020 0.8.020 srgb} bind def
     /col17 {0.000 0.820 0.820 srgb} bind def
     /col18 {0.560 0.000 0.000 srgb} bind def
     /col19 {0.690 0.000 0.000 srgb} bind def
     /col20 {0.8.020 0.000 0.000 srgb} bind def
     /col20 {0.820 0.000 0.000 srgb} bind def
     /col21 {0.560 0.000 0.560 srgb} bind def
     /col22 {0.690 0.000 0.690 srgb} bind def
     /col23 {0.8.020 0.000 0.8.020 srgb} bind def
     /col23 {0.820 0.000 0.820 srgb} bind def
     /col24 {0.500 0.190 0.000 srgb} bind def
     /col25 {0.630 0.250 0.000 srgb} bind def
     /col26 {0.750 0.380 0.000 srgb} bind def
     /col27 {1.000 0.500 0.500 srgb} bind def
     /col28 {1.000 0.630 0.630 srgb} bind def
     /col29 {1.000 0.750 0.750 srgb} bind def
     /col30 {1.000 0.8.080 0.8.080 srgb} bind def
     /col31 {1.000 0.8.040 0.000 srgb} bind def
     /col30 {1.000 0.880 0.880 srgb} bind def
     /col31 {1.000 0.840 0.000 srgb} bind def
     end
     save
-...
     gs 1 -1 sc (ty    \(NN:0.45, JJ:0.35, NP:0.2\)) col-1 sh gr
     /Times-Roman ff 600.00 scf sf
 9945 m
     gs 1 -1 sc (son  \(NP:0.8.0, NN:0.1, JJ:0.1\)) col-1 sh gr
     gs 1 -1 sc (son  \(NP:0.8, NN:0.1, JJ:0.1\)) col-1 sh gr
     /Times-Roman ff 600.00 scf sf
 11550 m
     gs 1 -1 sc (man \(NP:0.8.0, NN:0.2\)) col-1 sh gr
     gs 1 -1 sc (man \(NP:0.8, NN:0.2\)) col-1 sh gr
     /Times-Roman ff 600.00 scf sf
 10725 m
     gs 1 -1 sc (ton  \(NP:0.9, NN:0.05, JJ:0.05\)) col-1 sh gr

                       ************************
                       *  License Conditions  *
                       ************************
     concerning the use and distribution of the program system 'TreeTagger'.
     The license is granted by
     Helmut Schmid, Markusstraße 8, 72760 Reutlingen, Germany
     Email schmid@cis.lmu.de
 . The user can freely use TreeTagger for evaluation, research, and
        teaching purposes. Any commercial usage is forbidden without a
        separate commercial license available from the licensor.
 . The user is not allowed to distribute or sell the system to third
        parties without written permission from the licensor.
     			   NO WARRANTY
 . BECAUSE THE SYSTEM IS LICENSED FREE OF CHARGE, WE PROVIDE
     ABSOLUTELY NO WARRANTY, TO THE EXTENT PERMITTED BY APPLICABLE STATE
     LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE LICENSOR PROVIDES THE
     SYSTEM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR
     IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
     MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK
     AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH THE USER.
     SHOULD THE SYSTEM PROVE DEFECTIVE, THE USER ASSUMES THE COST OF ALL
     NECESSARY SERVICING, REPAIR OR CORRECTION.
 . IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW WILL THE LICENSOR BE
     LIABLE TO THE USER FOR DAMAGES, INCLUDING ANY LOST PROFITS, LOST
     MONIES, OR OTHER SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
     OUT OF THE USE OR INABILITY TO USE (INCLUDING BUT NOT LIMITED TO LOSS
     OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY THIRD
     PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAM)
     THE PROGRAM, EVEN IF THE USER HAS BEEN ADVISED OF THE POSSIBILITY OF
     SUCH DAMAGES, OR FOR ANY CLAIM BY ANY OTHER PARTY.
     The wording of this license agreement has been adapted from the
     license of the ALF system by Michael Hanus, Max-Planck-Institut
     Saarbruecken and the GnuEmacs General Public License (c) 1991 Free
     Software Foundation.

     /***************************************************************************/
     /* How to use the TreeTagger                                               */
     /* Author: Helmut Schmid, University of Stuttgart, Germany                 */
     /***************************************************************************/
     /*****************************************************************************/
     /* How to use the TreeTagger                                                 */
     /*****************************************************************************/
     The TreeTagger consists of two programs: train-tree-tagger is used to
-...
     input will be read from stdin. If neither an input file nor an output file
     is specified, the tagger will print to stdout.
     tree-tagger {-options-} <parameter file> {<input file> {<output file>}}
     tree-tagger <parameter file> <input file> <output file> {-eps <epsilon>}
            {-base} {-proto} {-sgml} {-token} {-lemma} {-beam <threshold>}
     Description of the command line arguments:
     * <parameter file>: Name of a parameter file which was created with the
       train-tree-tagger program.
     * <input file>: Name of the file which is to be tagged. Each token in this
       file has to be on a separate line. Tokens may contain blanks. It is possible
       file must be on a separate line. Tokens may contain blanks. It is possible
       to override the lexical information contained in the parameter file of the
       tagger by specifying a list of possible tags after a token. This list has
       to be preceded by a tab character and the elements are separated by tab
       characters. This pretagging feature could be used e.g. to ensure that
       to be preceded by a tab character. The tags are optionally followed by a
       floating point value to specify the probability of the tag. Adding such
       tag information in the tagger's input is sometimes useful to ensure that
       certain text-specific expressions are tagged properly.
       Punctuation marks must be on separate lines as well. Clitics (like "'s",
       "'re", and "'d" in English or "-la" and "-t-elle" in French) should be
       separated if they were separated in the training data. (The French and
       English parameter files available by ftp expect separation of clitics).
       English parameter files available by ftp, expect separation of clitics).
       Sample input file:
         He
         moved
         to
         New York City	NP
         New York City	NP 1.0
+        .
     * <output file>: Name of the file to which the tagger should write its output.
-...
     * -lemma: tells the tagger to print the lemmas of the words also.
     * -sgml: tells the tagger to ignore tokens starting with '<' and ending
       with '>' (SGML tags).
     - -no-unknown: If an unknown word is encountered, emit the word form
       as lemma. This was previously the default behaviour. Now, the default
       behaviour is to print "<unknown>" as lemma.
     - -threshold <p>: This option tells the tagger to print all tags of a
       word with a probability higher than <p> times the largest probability.
       (The tagger will use a different algorithm in this case and the set of
       best tags might be different from the tags generated without this
       option.)
     - -prob: Print tag probabilities (in combination with option -threshold)
     - -pt-with-prob: If this option is specified, then each pretagging tag
       (see above) has to be followed by a whitespace and a tag probability
       value.
     - -pt-with-lemma: If this option is specified, then each pretagging tag
       (see above) has to be followed by a whitespace and a lemma. Lemmas may
       contain blanks.
       If both -pt-with-prob and -pt-with-lemma have been specified, then each
       pretagging tag is followed by a probability and a lemma in that order.
     The options below are for advanced users. Please, read the papers on the
     TreeTagger to fully understand their meaning.
     The options below are for advanced users. Read the papers on the TreeTagger
     to fully understand their meaning.
     * -proto: If this option is specified, the tagger creates a file named
       "lexicon-protocol.txt", which contains information about the degree of
-...
       hyphen has been found in the fullform lexicon.
     * -eps <epsilon>: Value which is used to replace zero lexical frequencies.
       This is the case if a word/tag pair is contained in the lexicon but not
       in the training corpus. The choice of this parameter has only minor
       influence on the tagging accuracy.
       in the training corpus. The default is 0.1. The choice of this parameter
       has some minor influence on tagging accuracy.
     * -beam <threshold>: If the tagger is slow, this option can be used to speed it up.
       Good values for <threshold> are in the range 0.001-0.00001.
     * -base: If this option is specified, only lexical information is used
       for tagging but no contextual information about the preceding tags.
       This option is only useful in order to obtain a baseline result
       to which to compare the actual tagger output.
     There is another tagger program called "tree-tagger-flush" which
     flushes the output after reading an empty line. It expects a parameter
     file as argument and reads from stdin and writes to stdout. No command
     line options are supported. This program might be useful for
     implementing wrappers.
     Training
     --------
     Training is done with the *train-tree-tagger* program. It expects at least
     four command line arguments which are described below.
     train-tree-tagger {options} <lexicon> <open class file> <input file> <output file>
     train-tree-tagger <lexicon> <open class file> <input file> <output file>
                 {-cl <context length>} {-dtg <min. decision tree gain>}
                 {-ecw <eq. class weight>} {-atg <affix tree gain>} {-st <sent. tag>}
     Description of the command line arguments:
     * <lexicon>: name of a file which contains the fullform lexicon. Each line
       of the lexicon corresponds to one word form and contains the word form
       and a sequence of tag-lemma pairs. Each tag is preceded by a tab character
       and each lemma is preceded by a blank or tab character.
       itself followed by a Tab character and a sequence of tag-lemma pairs.
       The tags and lemmata are separated by whitespace.
       Example:
     aback	RB aback
     abacuses	NNS abacus
     abandon	VB abandon	VBP abandon
     abandoned	JJ abandoned	VBD abandon	VBN abandon
     abandon	VB abandon VBP abandon
     abandoned	JJ abandoned VBD abandon VBN abandon
     abandoning	VBG abandon
       Attention: Ordinal and cardinal numbers which consist of digits
       (like 1, 13, 1278 or 2. and 75.) should not be included in the
       lexicon. Otherwise, the tagger will not be able to learn how to tag
       numbers which are not listed in the lexicon. Numbers with unusual
       tags should be added to the lexicon, however. If the training
       program reports an error because the POS tag used for numbers is
       unknown, you should add a lexicon entry for one number.
       Remark: The tagger doesn't need the lemmata actually. If you do not have
       the lemma information or if you do not plan to annotate corpora with
       lemmas, you can replace the lemma with a dummy value, e.g. "-".
       Remark: The tagger doesn't need the lemmata for tagging actually. If
       you do not have the lemma information or if you do not plan to
       annotate corpora with lemmas, you can replace the lemma with a dummy
       value, e.g. "-".
     * <open class file>: name of a file which contains a list of open class tags
       i.e. possible tags of unknown word forms separated by whitespace.
     * <open class file>: name of a file which contains a list of open class
       tags i.e. possible tags of unknown word forms separated by whitespace.
       The tagger will use this information when it encounters unknown words,
       i.e. words which are not contained in the lexicon.
       Example: (for Penn Treebank tagset)
-...
     * <input file>: name of a file which contains tagged training data. The data
       must be in one-word-per-line format. This means that each line contains
       one token and one tag in that order separated by a tabulator.
       Punctuation marks are considered as tokens and must be tagged as well.
       The file should neither contain empty lines nor untagged SGML markup.
       Punctuation marks are considered as tokens and must have been tagged as well.
       Example:
     Pierre  NP
-...
       this parameter to 1.
     * -dtg <min. decision tree gain>: Threshold - If the information gain at a
       leaf node of the decision tree is below this threshold, the node is deleted.
     * -sw <weight>: A smoothing parameter, which determines how much the
       probability distribution of some decision tree node is smoothed with the
       probability distribution of the parent node.
       The default value is 0.7.
     * -ecw <eq. class weight>: weight of the equivalence class based probability
       estimates.
       estimates. The default is 0.15.
     * -atg <affix tree gain> Threshold - If the information gain at a leaf of an
       affix tree is below this threshold, it is deleted. The default is 1.2.
     The accuracy of the TreeTagger usually improves, if different settings
     of the above parameters are tested and the best combination is chosen.
     Caveat: Make sure that the lexicon and the training corpus contain no
     extra blanks. If the word form, for instance, is followed by a blank
     and a tab character, the blank will be considered part of the word.
     The accuracy of the TreeTagger is usually slightly improved, if different
     settings of the above parameters are tested and the best combination is
     chosen.

     This package contains the TreeTagger, a probabilistic part-of-speech
     tagger developed by Helmut Schmid. All rights are reserved by the
     Institute for Computational Linguistics at the University of Stuttgart.
     The programs have been compiled for Sun Sparcstations with SunOS operating
     system version 5.6 or higher.
     tagger developed by Helmut Schmid. All rights are reserved by Helmut
     Schmid.
     Files contained in this package:
-...
     - README                How to use the tagger
     - bin/train-tree-tagger training program
     - bin/tree-tagger       tagger programm
     - bin/separate-punctuation program for tokenization (used by the shell scripts)
     - cmd/lookup.perl       Perl script for pretagging
     - doc/nemlap94.ps       paper describing the TreeTagger
     - doc/sigdat95.ps       paper describing the TreeTagger
     This package can be downloaded at
     http://www.ims.uni-stuttgart.de/Tools/DecisionTreeTagger.html
     http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger
     Also available at this URL:
     - parameter files

Formats disponibles : Unified diff

Laboratoire ICAR » Plateforme TXM

Révision 2854