/ - Diff - Plateforme TXM - Forge du Centre Blaise Pascal

        <feature url="features/org.txm.wordcloud.feature_1.0.0.1660.jar" id="org.txm.wordcloud.feature" version="1.0.0.1660">
           <category name="Commands"/>
        </feature>
        <feature url="features/org.txm.treetagger.binaries.feature_1.0.0.1660.jar" id="org.txm.treetagger.binaries.feature" version="1.0.0.1660" os="linux,macosx,win32" ws="cocoa,gtk,win32">
        <feature url="features/org.txm.treetagger.binaries.feature_1.0.0.1669.jar" id="org.txm.treetagger.binaries.feature" version="1.0.0.1669" os="" ws="">
           <category name="Annotation"/>
        </feature>
        <feature url="features/org.txm.treetagger.models.feature_1.0.0.1660.jar" id="org.txm.treetagger.models.feature" version="1.0.0.1660">

tmp/org.txm.treetagger.core.linux/META-INF/MANIFEST.MF (revision 1670)
5	5	Bundle-Version: 1.0.0.qualifier
6	6	Fragment-Host: org.txm.treetagger.core;bundle-version="1.0.0"
7	7	Bundle-RequiredExecutionEnvironment: JavaSE-1.7
8		Eclipse-PlatformFilter: (osgi.os=linux)

     #Fri Jul 06 10:25:03 CEST 2018
     output..=bin/
     bin.includes=META-INF/,.,plugin.xml,icons/,OSGI-INF/l10n/bundle.properties
     bin.includes = META-INF/,\
                    .,\
                    plugin.xml,\
                    icons/,\
                    OSGI-INF/l10n/bundle.properties,\
                    OSGI-INF/l10n/bundle_fr.properties,\
                    OSGI-INF/l10n/bundle_ru.properties
     source..=src/
     qualifier=svn

     bin.includes = META-INF/,\
                    .,\
                    plugin.xml,\
                    OSGI-INF/l10n/bundle.properties,\
                    icons/
                    icons/,\
                    OSGI-INF/
     source..=src/
     qualifier=svn

     #Fri Jul 06 10:25:03 CEST 2018
     output..=bin/
     bin.includes=META-INF/,.,plugin.xml
     bin.includes = META-INF/,\
                    .,\
                    plugin.xml,\
                    OSGI-INF/
     source..=src/
     qualifier=svn

           id="org.txm.treetagger.binaries.feature"
           label="TreeTagger software"
           version="1.0.0.qualifier"
           provider-name="Textometrie.org"
           os="linux,macosx,win32"
           ws="cocoa,gtk,win32">
           provider-name="Textometrie.org">
        <description url="http://www.example.com/description">
           Install TreeTagger software / Installation du logiciel TreeTagger
-...
        <plugin
              id="org.txm.treetagger.core.linux"
              os="linux"
              ws="gtk"
              download-size="0"
              install-size="0"
              version="0.0.0"
-...
        <plugin
              id="org.txm.treetagger.core.macosx"
              os="macosx"
              ws="cocoa"
              download-size="0"
              install-size="0"
              version="0.0.0"
-...
        <plugin
              id="org.txm.treetagger.core.win32"
              os="win32"
              ws="win32"
              download-size="0"
              install-size="0"
              version="0.0.0"

     		String osname = System.getProperty("os.name").toLowerCase();
     		if (osname.contains("windows")) {
     			osname = "win32";
     		} else if (osname.contains("macosx")) {
     		} else if (osname.contains("mac os x")) {
     			osname = "macosx";
     		} else {
     			osname = "linux";

tmp/org.txm.treetagger.core.macosx/META-INF/MANIFEST.MF (revision 1670)
6	6	Bundle-Version: 1.0.0.qualifier
7	7	Fragment-Host: org.txm.treetagger.core
8	8	Bundle-RequiredExecutionEnvironment: JavaSE-1.7
9		Eclipse-PlatformFilter: (osgi.os=macosx)

tmp/org.txm.treetagger.core.win32/META-INF/MANIFEST.MF (revision 1670)
5	5	Bundle-Version: 1.0.0.qualifier
6	6	Fragment-Host: org.txm.treetagger.core;bundle-version="1.0.0"
7	7	Bundle-RequiredExecutionEnvironment: JavaSE-1.7
8		Eclipse-PlatformFilter: (osgi.os=win32)

     /****************************************************************************/
     /* How to use the TreeTagger                                                */
     /*                                                                          */
     /* Author: Helmut Schmid, CIS, Ludwig-Maximilians-Universität, Germany      */
     /****************************************************************************/
     The TreeTagger consists of two programs: the training program creates
     a parameter file from a fullform lexicon and a handtagged corpus. The
     tagger program reads the parameter file and annotates the text with
     part of speech and lemma information. Both programs print information
     about their usage when they are called without arguments.
     Tagging
     -------
     Tagging is done with the *tree-tagger* program.
     The first argument is the name of a parameter file which was generated
     with the train-tree-tagger program. Parameter files generated on
     different platforms or with older versions of train-tree-tagger will
     not work.
     The second argument is the input file. It must be in one-word-per-line
     format, i.e. each line contains one token (word, punctuation character
     or parenthesis) and should not exceed 1000 characters. Tokens may contain
     blanks. It is possible to override the lexical information contained
     in the parameter file of the tagger by specifying a list of possible
     tags after the token. This list has to be preceded by a tab character
     and the elements are separated by tab characters. Pretagging could be
     used e.g. to ensure that certain text-specific expressions are tagged
     correctly. Clitics (like "'s", "'re", and "'d" in English or "-la" and
     "-t-elle" in French) have to be separated if they were separated in
     the training data. (The French and English parameter files available
     by ftp expect separation of clitics).
     Sample input file:
     He
     moved
     to
     New York City	NP
+    .
     The third argument is the name of the output file. The output is also
     in one-word-per-line format. Depending on the specified options, it
     will contain columns with tokens, tags and lemmas. If the third
     argument is missing, the output will be printed to standard output. If
     the second argument is missing, too, input is read from standard
     input.
     Options:
     -token: Prints the token as well.
     -lemma: Prints the lemma as well.
     -sgml:  Don't tag SGML annotations, i.e. lines starting with '<' and ending
             with '>'.
     -threshold <p>: Print all tags with a probability higher than <p> times the
             probability of the best tag.
     -prob:  Print tag probabilities (requires option -threshold)
     -no-unknown: Print the token rather than <unknown> for unknown lemmas
     -quiet: Don't print status messages
     -pt-with-lemma: If this option is specified, then each pretagging tag
             (see above) has to be followed by a whitespace and a lemma.
     -pt-with-prob: If this option is specified, then each pretagging tag
             (see above) has to be followed by whitespace and a tag probability
             value. If -pt-with-prob and -pt-with-lemma have been specified,
             then each pretagging tag is followed by a probability and a lemma
             in that order.
     -files f: Read the names of input and output files pairwise from the
             file f. The format of f is the lexicon file format described below.
     -lex f: Read auxiliary lexicon entries from the file f.
     -eos-tag <tag>: The SGML tag <tag> signals the end of a sentence.
             This option implies the option -sgml
     Some more exotic options:
     -proto: Print lexical information for each word
       The lexicon type is signalled by one of the characters
       f: The word was found in the full form lexicon.
       c: The word in lowercase was found in the lexicon
       h: The word contains an hyphen and the word following the hyphen was found
          in the full form lexicon; e.g. instead of "table-wine" only "wine" has
          been found.
       s: The word has been looked up in the suffix lexicon
       p: Tags have been assigned by pretagging.
     -gramotron: Same as -proto but with a different format
     -proto-with-prob: Same as -proto but with lexical tag probabilities
     -print-prob-tree: Print the transition probability tree and exit
     -eps <epsilon>: Value which is used to replace zero lexical frequencies.
       Zero frequencies occur when a word/tag pair is contained in the lexicon
       but not in the training corpus. The default is 0.1.
     -base:  Use only lexical probabilities for tagging. This option is only
       useful to obtain a baseline result to which the actual tagger output is
       compared.
     Training
     --------
     Training is done with the *train-tree-tagger* program. If the program is
     called without arguments, the following output is printed:
     USAGE: train-tree-tagger <lexicon> <open class file> <infile> <outfile>
            {-cl <context length>} {-dtg <min. decision tree gain>}
            {-ecw <eq. class weight>} {-atg <affix tree gain>} {-st <sent. tag>}
     Description of the command line arguments:
     * <lexicon>: name of a file which contains the fullform lexicon. Each line
       of the lexicon corresponds to one word form and contains the word form
       itself followed by a Tab character and a sequence of tag-lemma pairs.
       The tags and lemmata are separated by whitespace.
     Example:
     aback	RB aback
     abacuses	NNS abacus
     abandon	VB abandon	VBP abandon
     abandoned	JJ abandoned	VBD abandon	VBN abandon
     abandoning	VBG abandon
       Important: Ordinal and cardinal numbers which consist of digits
       should not be included in the lexicon. Otherwise, the tagger will
       not be able to learn how to tag numbers which are not listed in the
       lexicon. Numbers with unusual tags should be added to the lexicon,
       however.
       Remark: The tagger doesn't need the lemmata for tagging. If
       you do not have the lemma information or if you do not plan to
       annotate corpora with lemmas, you can replace the lemma with a dummy
       value, e.g. "-".
     * <open class file>: name of a file which contains a list of open class tags
       i.e. possible tags of unknown word forms. This information is needed to
       estimate likely tags of unknown words. This file would typically contain
       adverb, adjective, noun, proper name and perhaps verb tags, but not
       prepositions, determiners, pronouns or numbers.
     * <input file>: name of a file which contains tagged training data. The data
       must be in one-word-per-line format. This means that each line contains
       one token and one tag in that order separated by a tabulator.
       Punctuation marks are considered as tokens and must have been tagged as well.
     Example:
     Pierre	NP
     Vinken	NP
     ,	,
 	CD
     years	NNS
     * <output file>: name of the file in which the resulting tagger parameters
       are stored.
     The following parameters are optional:
     * -cl <context length>: number of preceding words forming the tagging
       context. The default is 2 which corresponds to a trigram context. For
       small training corpora and/or large tagsets, it could be useful to reduce
       this parameter to 1.
     * -dtg <min. decision tree gain>: Threshold - If the information gain at a
       leaf node of the decision tree is below this threshold, the node is deleted.
       The default value is 0.7.
     * -ecw <eq. class weight>: weight of the equivalence class based probability
       estimates. The default is 0.15.
     * -atg <affix tree gain> Threshold - If the information gain at a leaf of an
       affix tree is below this threshold, it is deleted. The default is 1.2.
     * -st <sent. tag>: the end-of-sentence part-of-speech tag, i.e. the tag which
       is assigned to sentence punctuation like ".", "!", "?".
       Default is "SENT". It is important to set this option properly, if your
       tag for sentence punctuation is not "SENT".
     The accuracy of the TreeTagger usually improves a bit, if different
     settings of the above parameters are tested and the best combination
     is chosen.

     /****************************************************************************/
     /* How to install the Windows version of the TreeTagger                     */
     /*                                                                          */
     /* Author: Helmut Schmid, CIS, Ludwig-Maximilians-Universität, Germany      */
     /****************************************************************************/
     This is the Windows distribution of the TreeTagger.
     It contains the following files:
     - tree-tagger.exe: the tagger program
     - train-tree-tagger.exe: the training program
     - utf8-tokenize.perl: A Perl script which transforms the tagger input
                         into one-word-perl-line format
     - *-abbreviations:  abbreviation lists required by the tokenizer
     - tag-*.bat:        batch files for different languages which call
                         the tokeniser and the tagger
     - chunk-*.bat	    batch files for POS tagging and chunking
     Installation
     ------------
 . Install a Perl interpreter (if you have not already installed one).
        You can download a Perl interpreter for Windows for free at
        http://www.activestate.com/activeperl/
 . Move the TreeTagger directory to the root directory of drive C:.
 . Download the PC parameter files for the languages you need, decompress
        them (e.g. using Winzip or 7zip) and move them to the subdirectory lib.
        Rename the parameter files to <language>-utf8.par
        Example: Rename french-par-linux-3.2-utf8.bin to french-utf8.par
        Non-UTF8 parameter files are not supported anymore.
 . Add the path C:\TreeTagger\bin to the PATH environment variable.
 . Open a shell and type the command
        set PATH=C:\TreeTagger\bin;%PATH%
 . Change to the directory C:\TreeTagger
 . Now you can test the tagger, e.g. by analyzing this file with the command
        tag-english INSTALL.txt
     If you install the TreeTagger in a different directory, you have to
     modify the first path in the batch files tag-*.bat.
     Michaela Atterer told me that she had difficulties to install the
     TreeTagger on a Windows XP system. She recommends the following
     work-around.
 . Windows XP:
     -Right click on "My Computer"
     -Select the "Advanced" tab
     -Click on "Environment Variables"
     -click on New: enter PATH and C:\TreeTagger\bin\;%PATH%
     If the files have been unpacked into a single directory, you should
     restore the following directory structure:
     TreeTagger:
     INSTALL.txt  README.txt  bin  cmd  lib
     TreeTagger/bin:
     tag-english.bat  tag-german.bat   tag-spanish.bat        tree-tagger.exe
     tag-french.bat   tag-italian.bat  train-tree-tagger.exe
     TreeTagger/cmd:
     mwl-lookup.perl  tokenize.pl
     TreeTagger/lib:
     english-abbreviations  german-abbreviations   spanish-abbreviations
     french-abbreviations   italian-abbreviations  spanish-mwls
     Note that the TreeTagger comes without a graphical interface. You have
     to run it by entering a command in a command line window. If you prefer
     a graphical interface, try the one provided by Ciarán Ó Duibhín at
     http://www.smo.uhi.ac.uk/~oduibhin/oideasra/interfaces/winttinterface.htm

     /****************************************************************************/
     /* How to use the TreeTagger                                                */
     /*                                                                          */
     /* Author: Helmut Schmid, CIS, Ludwig-Maximilians-Universität, Germany      */
     /****************************************************************************/
     The TreeTagger consists of two programs: the training program creates
     a parameter file from a fullform lexicon and a handtagged corpus. The
     tagger program reads the parameter file and annotates the text with
     part of speech and lemma information. Both programs print information
     about their usage when they are called without arguments.
     Tagging
     -------
     Tagging is done with the *tree-tagger* program.
     The first argument is the name of a parameter file which was generated
     with the train-tree-tagger program. Parameter files generated on
     different platforms or with older versions of train-tree-tagger will
     not work.
     The second argument is the input file. It must be in one-word-per-line
     format, i.e. each line contains one token (word, punctuation character
     or parenthesis) and should not exceed 1000 characters. Tokens may contain
     blanks. It is possible to override the lexical information contained
     in the parameter file of the tagger by specifying a list of possible
     tags after the token. This list has to be preceded by a tab character
     and the elements are separated by tab characters. Pretagging could be
     used e.g. to ensure that certain text-specific expressions are tagged
     correctly. Clitics (like "'s", "'re", and "'d" in English or "-la" and
     "-t-elle" in French) have to be separated if they were separated in
     the training data. (The French and English parameter files available
     by ftp expect separation of clitics).
     Sample input file:
     He
     moved
     to
     New York City	NP
+    .
     The third argument is the name of the output file. The output is also
     in one-word-per-line format. Depending on the specified options, it
     will contain columns with tokens, tags and lemmas. If the third
     argument is missing, the output will be printed to standard output. If
     the second argument is missing, too, input is read from standard
     input.
     Options:
     -token: Prints the token as well.
     -lemma: Prints the lemma as well.
     -sgml:  Don't tag SGML annotations, i.e. lines starting with '<' and ending
             with '>'.
     -threshold <p>: Print all tags with a probability higher than <p> times the
             probability of the best tag.
     -prob:  Print tag probabilities (requires option -threshold)
     -no-unknown: Print the token rather than <unknown> for unknown lemmas
     -quiet: Don't print status messages
     -pt-with-lemma: If this option is specified, then each pretagging tag
             (see above) has to be followed by a whitespace and a lemma.
     -pt-with-prob: If this option is specified, then each pretagging tag
             (see above) has to be followed by whitespace and a tag probability
             value. If -pt-with-prob and -pt-with-lemma have been specified,
             then each pretagging tag is followed by a probability and a lemma
             in that order.
     -files f: Read the names of input and output files pairwise from the
             file f. The format of f is the lexicon file format described below.
     -lex f: Read auxiliary lexicon entries from the file f.
     -eos-tag <tag>: The SGML tag <tag> signals the end of a sentence.
             This option implies the option -sgml
     Some more exotic options:
     -proto: Print lexical information for each word
       The lexicon type is signalled by one of the characters
       f: The word was found in the full form lexicon.
       c: The word in lowercase was found in the lexicon
       h: The word contains an hyphen and the word following the hyphen was found
          in the full form lexicon; e.g. instead of "table-wine" only "wine" has
          been found.
       s: The word has been looked up in the suffix lexicon
       p: Tags have been assigned by pretagging.
     -gramotron: Same as -proto but with a different format
     -proto-with-prob: Same as -proto but with lexical tag probabilities
     -print-prob-tree: Print the transition probability tree and exit
     -eps <epsilon>: Value which is used to replace zero lexical frequencies.
       Zero frequencies occur when a word/tag pair is contained in the lexicon
       but not in the training corpus. The default is 0.1.
     -base:  Use only lexical probabilities for tagging. This option is only
       useful to obtain a baseline result to which the actual tagger output is
       compared.
     Training
     --------
     Training is done with the *train-tree-tagger* program. If the program is
     called without arguments, the following output is printed:
     USAGE: train-tree-tagger <lexicon> <open class file> <infile> <outfile>
            {-cl <context length>} {-dtg <min. decision tree gain>}
            {-ecw <eq. class weight>} {-atg <affix tree gain>} {-st <sent. tag>}
     Description of the command line arguments:
     * <lexicon>: name of a file which contains the fullform lexicon. Each line
       of the lexicon corresponds to one word form and contains the word form
       itself followed by a Tab character and a sequence of tag-lemma pairs.
       The tags and lemmata are separated by whitespace.
     Example:
     aback	RB aback
     abacuses	NNS abacus
     abandon	VB abandon	VBP abandon
     abandoned	JJ abandoned	VBD abandon	VBN abandon
     abandoning	VBG abandon
       Important: Ordinal and cardinal numbers which consist of digits
       should not be included in the lexicon. Otherwise, the tagger will
       not be able to learn how to tag numbers which are not listed in the
       lexicon. Numbers with unusual tags should be added to the lexicon,
       however.
       Remark: The tagger doesn't need the lemmata for tagging. If
       you do not have the lemma information or if you do not plan to
       annotate corpora with lemmas, you can replace the lemma with a dummy
       value, e.g. "-".
     * <open class file>: name of a file which contains a list of open class tags
       i.e. possible tags of unknown word forms. This information is needed to
       estimate likely tags of unknown words. This file would typically contain
       adverb, adjective, noun, proper name and perhaps verb tags, but not
       prepositions, determiners, pronouns or numbers.
     * <input file>: name of a file which contains tagged training data. The data
       must be in one-word-per-line format. This means that each line contains
       one token and one tag in that order separated by a tabulator.
       Punctuation marks are considered as tokens and must have been tagged as well.
     Example:
     Pierre	NP
     Vinken	NP
     ,	,
 	CD
     years	NNS
     * <output file>: name of the file in which the resulting tagger parameters
       are stored.
     The following parameters are optional:
     * -cl <context length>: number of preceding words forming the tagging
       context. The default is 2 which corresponds to a trigram context. For
       small training corpora and/or large tagsets, it could be useful to reduce
       this parameter to 1.
     * -dtg <min. decision tree gain>: Threshold - If the information gain at a
       leaf node of the decision tree is below this threshold, the node is deleted.
       The default value is 0.7.
     * -ecw <eq. class weight>: weight of the equivalence class based probability
       estimates. The default is 0.15.
     * -atg <affix tree gain> Threshold - If the information gain at a leaf of an
       affix tree is below this threshold, it is deleted. The default is 1.2.
     * -st <sent. tag>: the end-of-sentence part-of-speech tag, i.e. the tag which
       is assigned to sentence punctuation like ".", "!", "?".
       Default is "SENT". It is important to set this option properly, if your
       tag for sentence punctuation is not "SENT".
     The accuracy of the TreeTagger usually improves a bit, if different
     settings of the above parameters are tested and the best combination
     is chosen.

     /****************************************************************************/
     /* How to install the Windows version of the TreeTagger                     */
     /*                                                                          */
     /* Author: Helmut Schmid, CIS, Ludwig-Maximilians-Universität, Germany      */
     /****************************************************************************/
     This is the Windows distribution of the TreeTagger.
     It contains the following files:
     - tree-tagger.exe: the tagger program
     - train-tree-tagger.exe: the training program
     - utf8-tokenize.perl: A Perl script which transforms the tagger input
                         into one-word-perl-line format
     - *-abbreviations:  abbreviation lists required by the tokenizer
     - tag-*.bat:        batch files for different languages which call
                         the tokeniser and the tagger
     - chunk-*.bat	    batch files for POS tagging and chunking
     Installation
     ------------
 . Install a Perl interpreter (if you have not already installed one).
        You can download a Perl interpreter for Windows for free at
        http://www.activestate.com/activeperl/
 . Move the TreeTagger directory to the root directory of drive C:.
 . Download the PC parameter files for the languages you need, decompress
        them (e.g. using Winzip or 7zip) and move them to the subdirectory lib.
        Rename the parameter files to <language>-utf8.par
        Example: Rename french-par-linux-3.2-utf8.bin to french-utf8.par
        Non-UTF8 parameter files are not supported anymore.
 . Add the path C:\TreeTagger\bin to the PATH environment variable.
 . Open a shell and type the command
        set PATH=C:\TreeTagger\bin;%PATH%
 . Change to the directory C:\TreeTagger
 . Now you can test the tagger, e.g. by analyzing this file with the command
        tag-english INSTALL.txt
     If you install the TreeTagger in a different directory, you have to
     modify the first path in the batch files tag-*.bat.
     Michaela Atterer told me that she had difficulties to install the
     TreeTagger on a Windows XP system. She recommends the following
     work-around.
 . Windows XP:
     -Right click on "My Computer"
     -Select the "Advanced" tab
     -Click on "Environment Variables"
     -click on New: enter PATH and C:\TreeTagger\bin\;%PATH%
     If the files have been unpacked into a single directory, you should
     restore the following directory structure:
     TreeTagger:
     INSTALL.txt  README.txt  bin  cmd  lib
     TreeTagger/bin:
     tag-english.bat  tag-german.bat   tag-spanish.bat        tree-tagger.exe
     tag-french.bat   tag-italian.bat  train-tree-tagger.exe
     TreeTagger/cmd:
     mwl-lookup.perl  tokenize.pl
     TreeTagger/lib:
     english-abbreviations  german-abbreviations   spanish-abbreviations
     french-abbreviations   italian-abbreviations  spanish-mwls
     Note that the TreeTagger comes without a graphical interface. You have
     to run it by entering a command in a command line window. If you prefer
     a graphical interface, try the one provided by Ciarán Ó Duibhín at
     http://www.smo.uhi.ac.uk/~oduibhin/oideasra/interfaces/winttinterface.htm

Laboratoire ICAR » Plateforme TXM

Révision 1670