/Portal/configurations/html/Help_SrcmfProject.jsp - Plateforme TXM - Forge du Centre Blaise Pascal

root / Portal / configurations / html / Help_SrcmfProject.jsp @ 3

Historique | Voir | Annoter | Télécharger (50,69 ko)

       <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
       <html xmlns="http://www.w3.org/1999/xhtml">
           <head>
               <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
               <title></title>
           </head>
           <body>
               <div style="padding-left:25px;">
                   <h1>SRCMF corpus: TIGERSearch web interface</h1>
                   <h2>Contents</h2>
                   <ul>
                       <li><a href="#interface">Using the TIGERSearch web interface</a></li>
                       <li><a href="#query">Writing a simple query</a></li>
                       <li><a href="#concordances">Exporting a concordance</a></li>
                       <li><a href="#tags">Tagset used</a></li>
                       <li><a href="#sample">Sample queries</a></li>
                   </ul>
                   <h2><a name="interface"></a>Using the TIGERSearch web interface</h2>
                   <h3>Writing a query and browsing the results</h3>
                   <p>In the TigerSearch tab, queries are entered in the top panel, and matching sentences
                       are shown in tree form in the bottom panel. A tutorial on TigerSearch queries may be
                       found in the section “<a href="#query">Writing a simple query</a>”.</p>
                   <ul>
                       <li>Type your query in the top panel (e.g. <tt>#pivot:[word = "Tristran"])</tt></li>
                       <li>Click on the ‘Search’ button at the bottom right of the panel.</li>
                   </ul>
                   <p>If the query is well-formed, and if there are matching results in the corpus, the
                       first tree in the forest will appear in the bottom panel.</p>
                   <p>The central bar gives the number of matches and the position of the sentence in the
                       corpus, in the form <i>sent: [sentence number] [match number] / [total matching
                           sentences].</i> Note that subgraph navigation is not yet implemented, and the
                       interface does <strong>not</strong> show the total number of matches, only the
                       number of matching sentences. You can navigate through the forest of matches using
                       the forward and back arrows on this bar. The ‘Export’ button displays the current
                       tree as an .SVG file in the browser, which can be saved and downloaded. The ‘Export
                       Concordance’ button allows matching sentences to be exported in <a
                           href="#concordances">concordance form</a>.</p>
                   <h3>Exporting the results</h3>
                   <p>To export the results of your query, click the ‘Export Concordance’ button. An export
                       window will appear, with the following options:</p>
                   <ul>
                       <li><p><strong>Type</strong></p>
                           <p>Three concordances are currently implemented:</p>
                           <ul>
                               <li>basic concordance</li>
                               <li>single word pivot concordance</li>
                               <li>pivot and block concordance</li>
                           </ul>
                           <p>It is important to note that these concordances use the names of TigerSearch
                               variables from the query to structure the concordance. <strong>No
                                   concordance will be produced if your query does not contain a
                                       <tt>#pivot</tt> variable.</strong> The pivot and block concordance
                               requires at least one additional <tt>#blockXX</tt> variable.</p>
                           <p>Further documentation for these concordances may be found in the section “<a
                                   href="#concordances">Exporting a concordance</a>”.</p></li>
                       <li><p><strong>Context (number of words)</strong></p>
                           <p>Sets the size of the context preceding and following the pivot.</p></li>
                       <li><p><strong>Restore punctuation</strong></p>
                           <p>Adds punctuation from the BFM’s digitized edition to the exported
                               concordance. It will also restore words excluded from the TIGERSearch corpus
                               (e.g. lacunae, AOI in the <i>Chanson de Roland</i>).</p></li>
                       <li><p><strong>Properties to show in concordance</strong></p>
                           <p>Select which features of terminal and non-terminal nodes should be shown in
                               the concordance. This function is only active for the ‘pivot and block
                               concordance’.</p></li>
                   </ul>
                   <p>When you have filled in the form:</p>
                   <ul>
                       <li>Click the ‘OK’ button.</li>
                   </ul>
                   <p>After a short delay, a new tab will open in your browser, containing the concordance
                       in plain text tabular format (.csv).</p>
                   <ul>
                       <li>Save this file to disk using the ‘File &gt; Save As...’ menu in your
                           browser.</li>
                   </ul>
                   <h3>Viewing the concordance</h3>
                   <p>To view and manipulate the concordance, you will need to use a spreadsheet
                       package.</p>
                   <ul>
                       <li>Open the spreadsheet application.</li>
                       <li>Select ‘File > Open...’ from the toolbar.</li>
                       <li>Ensure that the file list is showing either ‘All files’ or ‘CSV text
                           files’.</li>
                       <li>Select the saved .csv file.</li>
                   </ul>
                   <p>You will need to correctly configure your spreadsheet software to read the file. We
                       recommend using LibreOffice or OpenOffice Calc, which will prompt the user for
                       settings whenever a .csv file is opened. The following settings are required for the
                       import to function:</p>
                   <ul>
                       <li>Character set: Unicode (UTF-8);</li>
                       <li>Separated by Tab (ONLY);</li>
                       <li>Merge delimiters OFF;</li>
                       <li>Text delimiter: NONE (empty box)</li>
                   </ul>
                   <p>Troubleshooting likely problems:</p>
                   <ul>
                       <li>If accented characters do not appear correctly &gt; check the character set is
                           UTF-8;</li>
                       <li>If some rows do not seem to have the correct number of columns &gt; check that
                           Text Delimiter is set to nothing (the default is usually double quote, which
                           will cause an error where the text contains double quotes), merge delimiters is
                           OFF, and TAB is the only separator selected.</li>
                       <li>If zeros appear rather than punctuation (unlikely) &gt; use the ‘Fields’ section
                           of the import window to set every column type to ‘Text’ rather than
                           ‘Standard’.</li>
                   </ul>
                   <h2><a name="query"></a>Writing a simple query</h2>
                   <p>The following section will enable you to write simple TIGERSearch queries for the
                       SRCMF corpus. It is not comprehensive, and must be read in conjunction with:</p>
                   <ul>
                       <li>chapter III of the <a target="_blank"
                               href="http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERSearch/manual_html.html"
                               >TIGERSearch user’s guide</a></li>
                   </ul>
                   <h3>Nodes in the TS graph</h3>
                   <p>A TigerSearch graph is made up of two types of nodes: terminal and non-terminal
                       nodes. In the graph viewer, terminal nodes appear at the bottom of the graph, while
                       non-terminal nodes are represented by labelled white ovals, as shown in the example
                           <i>je puis dire</i>.</p>
                   <img src="images/jepuisdire.png" alt="Example TIGERSearch tree" />
                   <p>Each node has a number of features (see section “<a href="#tags">Tagset used</a>”</p>
                   <h4>SRCMF: ‘split’ nodes</h4>
                   <p>In a true dependency graph, words form the only nodes.</p>
                   <p>In the TigerXML SRCMF corpus, each ‘word’ in the dependency structure is in fact
                       split between a terminal node (which contains the lexical form and the PoS tag of
                       the word itself) and a non-terminal node (which contains the syntactic features of
                       the structure headed by the word). The non-terminal node and the terminal node are
                       linked by an edge labelled ‘L’ (for lexical realization).</p>
                   <p>In the example tree, an ‘L’ edge links:</p>
                   <ul>
                       <li>the terminal node <i>puis</i> to the non-terminal node ‘Snt’: these nodes
                           represent the finite verb which heads the sentence;</li>
                       <li>the terminal node <i>je</i> to the non-terminal node ‘SjPer’: these nodes
                           represent the subject of the sentence <i>je</i>;</li>
                       <li>the terminal node <i>dire</i> to the non-terminal node ‘AuxA’: these nodes
                           represent the infinitive verb <i>dire</i>.</li>
                   </ul>
                   <p>A ‘D’ edge links the ‘Snt’ node to the non-terminal nodes ‘SjPer’ and ‘AuxA’: this
                       indicates that the subject <i>je</i> and the ‘auxiliated’ infinitive <i>dire</i>
                       depend on the main verb <i>puis</i>.</p>
                   <h4>SRCMF corpus node features</h4>
                   <p>The SRCMF corpus has the following node features:</p>
                   <p><i>Terminal nodes:</i></p>
                   <ul>
                       <li><tt>word</tt>: the word form</li>
                       <li><tt>pos</tt>: part-of-speech tag (Cattex)</li>
                       <li><tt>form</tt>: whether the text is verse or prose, and position of the word in
                           the line of verse.</li>
                   </ul>
                   <p><i>Non-terminal nodes:</i></p>
                   <ul>
                       <li><tt>cat</tt>: function of the structure headed by the node</li>
                       <li><tt>type</tt>: morpho-syntactic category of the node (VFin, VPar, VInf, NV)</li>
                       <li><tt>headpos</tt>: part-of-speech tag of the head word</li>
                       <li><tt>coord</tt>: set to ‘y’ if the structure forms part of a coordination</li>
                       <li><tt>dom</tt>: underscore-separated list of all functions dominated by the node
                           (e.g. for the ‘Snt’ node above ‘AuxA_SjPer’)</li>
                   </ul>
                   <p>For simple queries, we will focus mainly on the <tt>word</tt>, <tt>pos</tt> and
                           <tt>cat</tt> features.</p>
                   <h4>Defining the feature specifications of a node</h4>
                   <p>Node feature specifications are written between [square brackets] and take the
                       following form:</p>
                   <ul>
                       <li><tt>[feature operator "value"]</tt></li>
                   </ul>
                   <p>where <i>value</i> is a string or</p>
                   <ul>
                       <li><tt>[feature operator /value/]</tt></li>
                   </ul>
                   <p>where <i>value</i> is a regular expression. Permitted <i>operators</i> are ‘=’
                       (equals) and ‘!=’ (does not equal). For example, the following expression identifies
                       all nodes where <tt>cat</tt> is "SjPer" (personal subject):</p>
                   <ul>
                       <li><tt>[cat = "SjPer"]</tt></li>
                   </ul>
                   <p>If we wish to include impersonal subjects (i.e. "SjPer" and "SjImp") we can use a
                       regular expression:</p>
                   <ul>
                       <li><tt>[cat = /Sj.*/]</tt></li>
                   </ul>
                   <p>We can identify all nodes which are <i>not</i> subjects:</p>
                   <ul>
                       <li><tt>[cat != /Sj.*/]</tt></li>
                   </ul>
                   <p>We may also the conjunction (&amp;) operator within the square brackets to specify
                       several properties. For example, we can search for subordinate clause subjects by
                       requiring the subject to be headed by a finite verb (<tt>type</tt> is "VFin"):</p>
                   <ul>
                       <li><tt>[cat = /Sj.*/ &amp; type = "VFin"]</tt></li>
                   </ul>
                   <h4>Assigning a variable name to a node</h4>
                   <p>A variable name may be assigned to the node definition. These are useful to refer to
                       the same node several times in a complex query and are also used to indicate the
                       pivot node to concordance scripts.</p>
                   <p>Variable definitions adopt the following syntax:</p>
                   <ul>
                       <li><tt>#name:[&lt;definition&gt;]</tt></li>
                   </ul>
                   <p>where <i>definition</i> is a feature specification as described above. Note that
                       variable names must begin with hash (#) and are separated from their definition by a
                       colon (:).</p>
                   <p>For example, we may to construct a concordance in which the subject forms the pivot.
                       We define the #pivot variable as follows:</p>
                   <ul>
                       <li><tt>#pivot:[cat = /Sj.*/]</tt></li>
                   </ul>
                   <h3>Node relations</h3>
                   <p>All but the most simple queries will require more than one node to be defined, and
                       will usually require the relationship between the nodes to be specified. </p>
                   <p>For example, suppose we wish to identify all subjects headed by the word
                           <i>Tristran</i>. First, we define the subject:</p>
                   <ul>
                       <li><tt>#subject:[cat = /Sj.*/]</tt></li>
                   </ul>
                   <p>Second, we define the word Tristran as a terminal node:</p>
                   <ul>
                       <li><tt>#tristran:[word = "Tristran"]</tt></li>
                   </ul>
                   <p>Finally, we must indicate the relationship between the nodes. The relationship
                       between a non-terminal node and the terminal node representing its lexical content
                       in the TigerSearch graph is one of direct dominance, labelled ‘L’ (lexical).</p>
                   <h4>Direct dominance</h4>
                   <p>In TigerSearch, direct dominance is expressed by using the operator ‘&gt;’ with the
                       following syntax:</p>
                   <ul>
                       <li><tt>node &gt;[label] node2</tt></li>
                   </ul>
                   <p>where <i>node</i> and <i>node2</i> are feature specifications or node variables, and
                       label (optional) is a string.</p>
                   <p>To identify subjects headed by the word <i>Tristran</i>, the relationship between
                       nodes #subject and #tristran is expressed as follows:</p>
                   <ul>
                       <li><tt>#subject &gt;L #tristran</tt></li>
                   </ul>
                   <h4>Left corner dominance</h4>
                   <p>The ‘>@l’ operator specifies the leftmost terminal node dominated at any depth by a
                       non-terminal node. It has the following syntax:</p>
                   <ul>
                       <li><tt>node &gt;@l tnode</tt></li>
                   </ul>
                   <p>where <i>node</i> and <i>tnode</i> are feature specifications or node variables, and
                           <i>tnode</i> is a terminal node.</p>
                   <p>For example, instead of searching for all subjects which are headed by the word
                           <i>Tristran</i>, we may wish to identify all subjects <strong>beginning</strong>
                       with the word <i>Tristran</i>. This relation would be written as follows:</p>
                   <ul>
                       <li><tt>#subject &gt;@l #tristran</tt></li>
                   </ul>
                   <p>Note that there is also a right corner dominance operator ‘>@r’.</p>
                   <h4>Precedence</h4>
                   <p>The precedence operator ‘.*’ permits the user to specify the word order of two
                       terminal nodes with the following syntax:</p>
                   <ul>
                       <li><tt>tnode .* tnode2</tt></li>
                   </ul>
                   <p>where <i>tnode</i> and <i>tnode2</i> are feature specifications or node variables
                       representing terminal nodes.</p>
                   <p> For example, suppose we wish to identify all sentences in which the word Tristran
                       heads the subject and precedes the main clause verb.</p>
                   <p>We need to add two additional conditions to the query in the previous section. First,
                       we need to identify the terminal node containing the main verb of the sentence: i.e.
                       the lexical realization of the non-terminal node ‘Snt’:</p>
                   <ul>
                       <li><tt>#snt:[cat = "Snt"] &gt;L #verb</tt></li>
                   </ul>
                   <p>You may have noticed that #verb has no feature specification. This is perfectly valid
                       in TigerSearch query syntax. In practice, we know that only one node can be linked
                       to #snt by an ‘L’ relation in the corpus. #Verb is thus defined by its relation to
                       #snt rather than by its features.</p>
                   <p>We then need to specify that the word Tristran precedes the verb:</p>
                   <ul>
                       <li><tt>#tristran .* #verb</tt></li>
                   </ul>
                   <p>Finally, we need to clarify that #subject is the the subject of #snt. Otherwise, we
                       risk finding subjects of a subordinate clause which happen to precede the main
                       clause verb:</p>
                   <ul>
                       <li><tt>#snt &gt;D #subject</tt></li>
                   </ul>
                   <p>Putting it all together, the query is as follows:</p>
                   <ul>
                       <li><tt>#subject:[cat = /Sj.*/] &gt;L #tristran:[word = "Tristran"] <br /> &amp;
                               #snt:[cat = "Snt"] &gt;L #verb <br /> &amp; #tristran .* #verb <br /> &amp;
                               #snt &gt;D #subject</tt></li>
                   </ul>
                   <p>There is also a direct precedence operator, ‘.’, which specifies that the two
                       terminal nodes must be directly adjacent.</p>
                   <h4>Negation</h4>
                   <p>It is important to learn one (extremely frustrating) golden rule of Tiger query
                       syntax:</p>
                   <ul>
                       <li>you can negate a feature specification (e.g. <tt>[cat != "SjPer"]</tt>);</li>
                       <li>you can negate a relation between nodes (e.g. <tt>#subject !&gt;L
                           #tristran</tt>)</li>
                       <li><strong>but you can’t negate the existence of a node!</strong></li>
                   </ul>
                   <p>In practice, this means that when we write:</p>
                   <ul>
                       <li><tt>#snt:[cat = "Snt"] !&gt;D #subject:[cat = /Sj.*/]</tt></li>
                   </ul>
                   <p>we have <strong>not</strong> found all null subject main clauses. Instead, we have
                       asked for sentences (#snt) which contain a subject node (#subject) which is
                           <strong>not</strong> the subject of a sentence. TigerSearch will return all
                       sentences with subjects in a subordinate clause.</p>
                   <p>The SRCMF corpus provides a partial work-around for this problem by using the
                           <i>dom</i> feature. The <i>dom</i> feature of a non-terminal node lists the cat
                       features of all nodes linked to it by a ‘D’ edge in alphabetical order separated by
                       an underscore. For example, the ‘Snt’ node in the example tree has two dependants:
                       SjPer and AuxA. It therefore has a <i>dom</i> property ‘AuxA_SjPer’.</p>
                   <p>As a result, we can identify all main clauses without subjects by negating the
                           <i>dom</i> feature:</p>
                   <ul>
                       <li><tt>#snt:[cat = "Snt" &amp; dom != /.*Sj.*/]</tt></li>
                   </ul>
                   <p>This will return all ‘Snt’ nodes whose <i>dom</i> property does not contain the
                       characters ‘Sj’: in other words, a main clause without an expressed subject.</p>
                   <h4>Syntactic variation</h4>
                   <p>TigerSearch syntax is quite flexible, and we may express queries in a number of ways.
                       For example, the query identifying all subjects headed by the word <i>Tristran</i>
                       may be expressed using three statements...</p>
                   <ul>
                       <li><tt>#subject:[cat = /Sj.*/] <br /> &amp; #tristran:[word = "Tristran"] <br />
                               &amp; #subject &gt;L #tristran</tt></li>
                   </ul>
                   <p>... or two statements, e.g.:</p>
                   <ul>
                       <li><tt>#subject:[cat = /Sj.*/] <br /> &amp; #subject &gt;L #tristran:[word =
                               "Tristran"]</tt></li>
                   </ul>
                   <p>... or one statement:</p>
                   <ul>
                       <li><tt>#subject:[cat = /Sj.*/] &gt;L #tristran:[word = "Tristran"]</tt></li>
                   </ul>
                   <p>... or without variable names:</p>
                   <ul>
                       <li><tt>[cat = /Sj.*/] &gt;L [word = "Tristran"]</tt></li>
                   </ul>
                   <p>Where multiple statements are used, the order of statements is irrelevant.
                       Confusingly for programmers, you may reference variables before assigning a value,
                       e.g.:</p>
                   <ul>
                       <li><tt>#subject &gt;L #tristran &amp; #tristran:[word = "Tristran"] &amp;
                               #subject:[cat = /Sj.*/]</tt></li>
                   </ul>
                   <h2><a name="concordances"></a>Using concordances</h2>
                   <p>The SRCMF project has developed a number of concordances to present the results of
                       TigerSearch queries in tabular format. Three concordances are currently
                       implemented:</p>
                   <ul>
                       <li>basic concordance</li>
                       <li>single word pivot concordance</li>
                       <li>pivot and block concordance</li>
                   </ul>
                   <p>These concordances produce a text CSV file.</p>
                   <h3>Principles</h3>
                   <p>The concordances use the names of variables from the TigerSearch query to identify
                       the syntactic constituents which should form the focus of the table. All
                       concordances require a #pivot variable to be present in the query.</p>
                   <p>For example, the following query is correct in TigerSearch, but <strong>will
                           not</strong> produce a concordance:</p>
                   <ul>
                       <li><tt>[word = /Tristr?a[nm][sz]?/]</tt></li>
                   </ul>
                   <p>To produce a concordance, the query must identify a node as the #pivot, for
                       example:</p>
                   <ul>
                       <li><tt><strong>#pivot:</strong>[word = /Tristr?a[nm][sz]?/]</tt></li>
                   </ul>
                   <h3>Basic concordance</h3>
                   <p>The basic concordance has four columns:</p>
                   <ul>
                       <li>sentence ID</li>
                       <li>left context</li>
                       <li>pivot</li>
                       <li>right context</li>
                   </ul>
                   <p>The #pivot can be any node in the syntactic tree, either a single word or a larger
                       structure. Currently, only lexical information (not annotation) can be shown in the
                       basic concordance.</p>
                   <p>For example, we may wish to create a concordance of all the main clause subjects
                       containing the word ‘Tristran’:</p>
                   <ul>
                       <li><tt>#snt:[cat = "Snt"] &gt;D #pivot:[cat = "SjPer"] &amp; #pivot &gt;* [word =
                               /Tristr?a[nm][sz]?/]</tt></li>
                   </ul>
                   <p>Note that the #pivot variable is attached to the subject node (cat = "SjPer").</p>
                   <p>Below is a selection of the results from the concordance:</p>
                   <table border="1">
                       <tr>
                           <th>ID</th>
                           <th>contexte gauche</th>
                           <th>pivot</th>
                           <th>contexte droite</th>
                       </tr>
                       <tr>
                           <td>beroul_pb:8_lb:234_1263227636.06</td>
                           <td>di por averté Ce saciés vos de verité Atant s' en est Iseut tornee</td>
                           <td>Tristran</td>
                           <td>l' a plorant salüee Sor le perron de marbre bis Tristran s' apuie ce</td>
                       </tr>
                       <tr>
                           <td>beroul_pb:13_lb:415_1264876249.02</td>
                           <td># croiz Einz croiz parole fole et vaine Ma bone foi me fera saine Tristran
                               [remest] a qui * mot poise </td>
                           <td>Tristran tes niés </td>
                           <td>vint soz cel pin Qui * est laienz en cel jardin Si me manda</td>
                       </tr>
                       <tr>
                           <td>beroul_pb:134_lb:4365_1268928771.68</td>
                           <td>moi le reçoive En sus l' atent s' espee tient Goudoïne autre voie tient</td>
                           <td>Tristran [remest] a qui * mot poise</td>
                           <td>Ist du * buison cela part toise Mais por noient quar cil s' esloigne</td>
                       </tr>
                   </table>
                   <p>Note that the pivot may be one or more words.</p>
                   <h3>What do the square brackets ([]), slashes (/), asterisks (*) and hashes (#)
                       mean?</h3>
                   <p>The third example in the above table contains [square brackets] in the pivot. These
                       are used in all concordances to indicate <strong>words which occur between parts of
                           a discontinuous syntactic constituent</strong>.</p>
                   <p>The annotated subject in this sentence is <i>Tristran ... a qui mot poise</i>. The
                       main verb of the sentence, <i>remest</i>, is not part of the subject, but occurs
                       between its two parts. The verb <i>remest</i> is included in the pivot column, but
                       surrounded by square brackets.</p>
                   <p>This means that:</p>
                   <ul>
                       <li>the pivot column contains <strong>all parts</strong> of discontinuous
                           pivots;</li>
                       <li>reading the concordance from left to right will always give the original
                           sentence.</li>
                   </ul>
                   <p>Slashes (/) indicate division between sentences in the syntactic annotation. These
                       will not correspond to the editor’s division into sentences as shown in the
                       punctuation.</p>
                   <p>Asterisks (*) indicate that the preceding word has two syntactic functions (e.g.
                           <i>qui</i> in <i>a qui mot poise</i> is both a relator and a subject). They may
                       usually be ignored.</p>
                   <p>Hashes (#) are related to the representation of coordination, and may always be
                       ignored.</p>
                   <h3>Single word pivot concordance</h3>
                   <p>The single word pivot concordance has a variable number of columns, based on the
                       following structure:</p>
                   <ul>
                       <li>ID</li>
                       <li>Left context outside the SRCMF sentence containing the pivot</li>
                       <li>Left context within the SRCMF sentence containing the pivot</li>
                       <li>Pivot</li>
                       <li>Structure headed by the pivot</li>
                       <li>Function of the structure headed by the pivot</li>
                       <li>Right context within the SRCMF sentence containing the pivot</li>
                       <li>Right context outside the SRCMF sentence containing the pivot</li>
                   </ul>
                   <p>The single word pivot concordance is designed to give as much information as possible
                       about a single word. For example, a concordance could be created around the word
                       "Tristran":</p>
                   <ul>
                       <li><tt>#pivot:[word = /Tristr?a[nm][sz]?/]</tt></li>
                   </ul>
                   <p>Below is a selection of the results from the concordance (some columns are
                       omitted):</p>
                   <table border="1">
                       <tr>
                           <th>Left context in sentence</th>
                           <th>Pivot</th>
                           <th>Pivot-headed structure</th>
                           <th>Right context in sentence</th>
                       </tr>
                       <tr>
                           <td>Sire</td>
                           <td>Tristran</td>
                           <td>Tristran</td>
                           <td>por Deu le roi Si grant pechié avez de moi Qui * me mandez a itel ore</td>
                       </tr>
                       <tr>
                           <td></td>
                           <td>Tristran</td>
                           <td>Tristran tes niés</td>
                           <td>tes niés vint soz cel pin Qui * est laienz en cel jardin</td>
                       </tr>
                       <tr>
                           <td># Que por Yseut que por</td>
                           <td>Tristranz</td>
                           <td>que por Tristranz</td>
                           <td>Mervellose joie menoient</td>
                       </tr>
                   </table>
                   <p>The ‘pivot-headed structure’ gives the noun phrase of which the word <i>Tristan</i>
                       is head. In the second example, for instance, the word <i>Tristran</i> heads the
                       structure <i>Tristan tes niés</i>.</p>
                   <p>Note that words appearing in the ‘pivot-headed structure’ column are also found in
                       the two context columns. The original sentence may be read across the columns left
                       context — pivot — right context.</p>
                   <h3>Pivot and block concordance</h3>
                   <h4>Introduction</h4>
                   <p>The pivot and block concordance is designed to highlight the position of certain
                       constituents, called ‘blocks’ (e.g. the subject) with respect to a pivot (e.g. the
                       verb). The resulting CSV files are complex, with a large number of columns, and are
                       intended as the basis for more detailed analysis in spreadsheet software.</p>
                   <p>The pivot and block concordances has the following basic structure:</p>
                   <ul>
                       <li>ID</li>
                       <li>Left context outside the SRCMF sentence containing the pivot</li>
                       <li>Left context within the SRCMF sentence containing the pivot</li>
                       <li>Pre-pivot blocks</li>
                       <li>Pivot</li>
                       <li>Post-pivot blocks</li>
                       <li>Right context within the SRCMF sentence containing the pivot</li>
                       <li>Right context outside the SRCMF sentence containing the pivot</li>
                   </ul>
                   <p>As with the other concordances, TigerSearch queries must define a #pivot variable.
                       However, any number of variables whose name begins ‘#block’ may be defined. At least
                       one ‘#blockXX’ variable is required.</p>
                   <p>For example, the following query will generate a pivot and block concordance to show
                       the position of the subject (#block1) with respect to the finite verb (#pivot):</p>
                   <ul>
                       <li><tt>#snt:[cat = "Snt"] &gt;D #block1:[cat = "SjPer"] &amp; #snt &gt;L
                               #pivot</tt></li>
                   </ul>
                   <p>In essence, the central section of the resulting concordance will take the following
                       form:</p>
                   <table border="1">
                       <tr>
                           <th>Left context</th>
                           <th>Block</th>
                           <th>Pivot</th>
                           <th>Block</th>
                           <th>Right context</th>
                       </tr>
                       <tr>
                           <td></td>
                           <td>Li rois</td>
                           <td>pense</td>
                           <td></td>
                           <td>que par folie Sire Tristran vos aie amé</td>
                       </tr>
                       <tr>
                           <td>Si</td>
                           <td></td>
                           <td>voient</td>
                           <td>il</td>
                           <td># Deu et son reigne</td>
                       </tr>
                   </table>
                   <p>Where the subject is pre-verbal, it appears in the block column to the left of the
                       pivot. Where it is post-verbal, it appears in the block column to the right of the
                       pivot.</p>
                   <h4>Why are there square brackets ([]) and curly brackets ({}) in the concordance?</h4>
                   <p>As with other concordances, square brackets denote <strong>words occurring between
                           two parts of a discontinuous unit</strong>. The difference in this concordance
                       is that blocks may be discontinuous, as well as the pivot.</p>
                   <p>Curly brackets denote <strong>words which occur between the block and the
                           pivot</strong> (or, in more complex examples, between two blocks).</p>
                   <table border="1">
                       <tr>
                           <th>Left context</th>
                           <th>Block</th>
                           <th>Pivot</th>
                           <th>Block</th>
                           <th>Right context</th>
                       </tr>
                       <tr>
                           <td></td>
                           <td>Vos {n'}</td>
                           <td>entendez</td>
                           <td></td>
                           <td>pas la raison</td>
                       </tr>
                       <tr>
                           <td>Dex qel pitié</td>
                           <td></td>
                           <td>Faisoit</td>
                           <td>{a} {mainte} {gent} li chiens</td>
                           <td></td>
                       </tr>
                       <tr>
                           <td></td>
                           <td>Ta parole [est] [tost] [entendue] Que li rois la roïne prent</td>
                           <td>est</td>
                           <td></td>
                           <td>tost entendue Que li rois la roïne prent</td>
                       </tr>
                       <tr>
                           <td></td>
                           <td>Tuit [s'] [escrïent] la gent du * reigne {s'}</td>
                           <td>escrïent</td>
                           <td></td>
                           <td>la gent du * reigne</td>
                       </tr>
                   </table>
                   <p>In the table above, note the use of curly brackets in the first example to mark the
                       negative adverb <i>n’</i>, which occurs between the subject-block <i>vos</i> and the
                       verb-pivot <i>entendez</i>. In the second example, the prepositional phrase <i>a
                           maintes gens</i> is marked with curly brackets, as it separates the verb-pivot
                           <i>Faisoit</i> from the post-verbal subject-block <i>li chiens</i>.</p>
                   <p>In the third example, a discontinuous subject <i>Ta parole ... que li rois la roïne
                           prent</i> appears in a pre-verbal block. <strong>The pre- or post-verbal
                           position of a block is determined by the position of its first word relative to
                           the pivot</strong>. The words <i>est tost entendue</i>, which separate the two
                       parts of the block, are marked with square brackets. </p>
                   <p>In the fourth example, the word <i>s’</i> appears (i) in square brackets, between the
                       two halves of a discontinuous subject-block and (ii) in curly brackets, between the
                       first part of the discontinuous subject <i>tost</i> and the verb-pivot
                           <i>escrïent</i>.</p>
                   <h4>Why are there so many columns? I only asked for one block!</h4>
                   <p>The pivot and block concordance shows <strong>only one result per pivot</strong>.
                       Continuing to work with the same example, if a single verb-pivot has multiple
                       subject-blocks (which is quite possible in cases of coordination), each subject
                       occupies a separate column:</p>
                   <table border="1">
                       <tr>
                           <th>Block3</th>
                           <th>Block2</th>
                           <th>Block1</th>
                           <th>Pivot</th>
                           <th>Block</th>
                       </tr>
                       <tr>
                           <td>Ne tor</td>
                           <td>ne mur</td>
                           <td>ne fort chastel {Ne} {me}</td>
                           <td>tendra</td>
                           <td></td>
                       </tr>
                   </table>
                   <p>However, due to the way the number of columns is calculated, it is possible that some
                       will be empty. These may be deleted in the spreadsheet software, if you wish.</p>
                   <p>Note that the concordance will <strong>never</strong> represent the two halves of a
                           <strong>single discontinuous</strong> block in separate columns. The following
                       representation therefore indicates a coordination:</p>
                   <table border="1">
                       <tr>
                           <th>Left context</th>
                           <th>Block</th>
                           <th>Pivot</th>
                           <th>Block</th>
                           <th>Right context</th>
                       </tr>
                       <tr>
                           <td></td>
                           <td>Tristran {en}</td>
                           <td>bese</td>
                           <td>{la} {roïne} {Et} ele</td>
                           <td>lui par la saisine</td>
                       </tr>
                   </table>
                   <p>The SRCMF of the sentence in this table identifies <strong>two coordinated
                           subjects</strong> of the verb <i>bese</i>. One is pre-verbal (<i>Tristran</i>),
                       one is post-verbal (<i>ele</i>); both occupy separate blocks.</p>
                   <h3>Adding annotation information</h3>
                   <p>When a concordance is launched from the TXM-web interface, you may specify which
                       properties of terminal and non-terminal nodes you wish to see in the
                       concordance.</p>
                   <ul>
                       <li>On the ‘Export Concordance’ form, use the drop-down lists of ‘Non-terminal
                           features’ and ‘Terminal Features’.</li>
                       <li>Select the features of terminal and non-terminal nodes that you wish to show in
                           the concordance from the two drop-down lists.</li>
                       <li>Click ‘OK’.</li>
                   </ul>
                   <p>Each added property will be placed in a separate column next to the block or pivot.
                       For example, if the ‘cat’ property is selected for non-terminal nodes, and the ‘pos’
                       property is selected for terminal nodes, the query above will produce the following
                       concordance:</p>
                   <table border="1">
                       <tr>
                           <th>Left context</th>
                           <th>Block</th>
                           <th>Block Cat</th>
                           <th>Pivot</th>
                           <th>Pivot Pos</th>
                           <th>Block</th>
                           <th>Block Cat</th>
                           <th>Right context</th>
                       </tr>
                       <tr>
                           <td></td>
                           <td>Li rois</td>
                           <td>SjPer</td>
                           <td>pense</td>
                           <td>VERcjg</td>
                           <td></td>
                           <td></td>
                           <td>que par folie Sire Tristran vos aie amé</td>
                       </tr>
                       <tr>
                           <td>Si</td>
                           <td></td>
                           <td></td>
                           <td>voient</td>
                           <td>VERcjg</td>
                           <td>il</td>
                           <td>SjPer</td>
                           <td># Deu et son reigne</td>
                       </tr>
                   </table>
                   <h2><a name="tags"></a>Tagset</h2>
                   <h3>Non-terminal nodes</h3>
                   <p>Non-terminal nodes have the following properties and values:</p>
                   <h4>cat</h4>
                   <p>Gives the syntactic function of the element. For more details, please refer to the <a
                           target="_blank" href="http://srcmf.org">SRCMF
                           website</a>.</p>
                   <ul>
                       <li><a name="Apst"></a><strong>Apst</strong>: Vocative (fr. apostrophe)</li>
                       <li><a name="AtObj"></a><strong>AtObj</strong>: Object attribute</li>
                       <li><a name="AtRfc"></a><strong>AtRfc</strong>: Attribute of reflexive pronoun</li>
                       <li><a name="AtSj"></a><strong>AtSj</strong>: Subject attribute</li>
                       <li><a name="Aux"></a><strong>Aux</strong>: Auxiliated non-finite verb (neither
                           passive nor active)</li>
                       <li><a name="AuxA"></a><strong>AuxA</strong>: Auxiliated non-finite verb
                           (active)</li>
                       <li><a name="AuxP"></a><strong>AuxA</strong>: Auxiliated non-finite verb
                           (passive)</li>
                       <li><a name="Circ"></a><strong>Circ</strong>: Adjunct (fr. circonstant)</li>
                       <li><a name="Cmpl"></a><strong>Cmpl</strong>: Complement</li>
                       <li><a name="Coo"></a><strong>Coo</strong>: Coordination</li>
                       <li><a name="GpCoo"></a><strong>GpCoo</strong>: Coordinated group (conjunct)</li>
                       <li><a name="Insrt"></a><strong>Insrt</strong>: Inserted clause</li>
                       <li><a name="Intj"></a><strong>Intj</strong>: Interjection</li>
                       <li><a name="ModA"></a><strong>ModA</strong>: Modifier (attached)</li>
                       <li><a name="ModD"></a><strong>ModD</strong>: Dislocated (detached) modifier</li>
                       <li><a name="Ng"></a><strong>Ng</strong>: Negation</li>
                       <li><a name="NgPrt"></a><strong>NgPrt</strong>: Negative particle (e.g. <i>pas</i>,
                               <i>mie</i></li>
                       <li><a name="nSnt"></a><strong>nSnt</strong>: Non-sentence</li>
                       <li><a name="Obj"></a><strong>Obj</strong>: Object</li>
                       <li><a name="RelC"></a><strong>RelC</strong>: Coordinated relator</li>
                       <li><a name="RelNC"></a><strong>RelNC</strong>: Non-coordinating relator</li>
                       <li><a name="Regim"></a><strong>Regim</strong>: Regime</li>
                       <li><a name="Rfc"></a><strong>Rfc</strong>: Reflexive pronoun</li>
                       <li><a name="Rfx"></a><strong>Rfx</strong>: Doubled reflexive pronoun (e.g. <i>nous
                               ... <strong>nous-mêmes</strong></i>)</li>
                       <li><a name="SjImp"></a><strong>SjImp</strong>: Impersonal subject</li>
                       <li><a name="SjPer"></a><strong>SjPer</strong>: Personal subject</li>
                       <li><a name="Snt"></a><strong>Snt</strong>: Sentence</li>
                   </ul>
                   <h4>type</h4>
                   <p>Gives the syntactic category of the head of the structure.</p>
                   <ul>
                       <li><a name="VFin"></a><strong>VFin</strong>: Finite verb form</li>
                       <li><a name="VInf"></a><strong>VInf</strong>: Infinitive</li>
                       <li><a name="VPar"></a><strong>VPar</strong>: Participle</li>
                       <li><a name="nV"></a><strong>nV</strong>: Non-verbal</li>
                   </ul>
                   <h4>dom</h4>
                   <p>A ‘dom’ property is added to each non-terminal node in the tree listing the functions
                       of all its dependants and relators in alphabetical order, separated by underscores.
                       For example, if a finite verb has a subject, object and two adjuncts, the property
                       [dom = "Circ_Circ_Obj_SjPer"] will be added.</p>
                   <p>This resolves to an extent the problem of ‘negative’ queries. Recall that it is
                       impossible to query the non-existence of a node:</p>
                   <ul>
                       <li><tt>#clause:[type = "VFin"] !&gt;D #suj:[cat = "SjPer"]</tt></li>
                   </ul>
                   <p>Contrary to appearances, this query DOES NOT mean ‘node #suj does not exist’: it
                       means that the node #suj exists, but is not dependant on #clause.</p>
                   <p>However, it is possible to find all finite verbs without a subject by using the dom
                       property of the finite verb:</p>
                   <ul>
                       <li><tt>#clause:[type = "VFin" &amp; dom != /.*SjPer.*/]</tt></li>
                   </ul>
                   <p>The query specifies that we wish to find a node #clause which is a finite verb and
                       does not have the string ‘SjPer’ in the list of dependant nodes given by the dom
                       property.</p>
                   <h4>coord</h4>
                   <p>A ‘coord’ property is added to each non-terminal node in the tree. If the node
                       represents a coordinated structure, [coord = "y"].</p>
                   <p>For example, in the sentence <i>Sade et douz est quanqu’est de li</i> (gcoin1: p. 3,
                       l. 31), <i>sade</i> and <i>douz</i> are coordinated AtSj. The non-terminal nodes
                       dominating the words <i>sade</i> and <i>douz</i> have the properties [cat = "AtSj"
                       &amp; coord="y"].</p>
                   <p>The ‘coord’ property exists primarily to allow non-coordinated structures to be
                       identified. In the original format, this is not possible, as it would require a
                       query specifying the non-existence of a node [cat = "Coo"]. However, with the coord
                       property, it is possible to restrict a query to non-coordinated structures only:</p>
                   <ul>
                       <li><tt>#suj:[cat = "SjPer" &amp; coord != "y"]</tt></li>
                   </ul>
                   <h4>headpos</h4>
                   <p>A ‘headpos’ property is added to each non-terminal node in the tree. If the text is
                       correctly annotated at the deep level, each non-terminal node representing a
                       structure should directly dominate at most one terminal node in the tree, the word
                       representing the lexical content of the head of the structure. If this is the case,
                       the ‘headpos’ property is equal to the ‘pos’ property of the dominated terminal
                       node. Thus:</p>
                   <ul>
                       <li><tt>#node:[headpos = "NOMcom"]</tt></li>
                   </ul>
                   <p>is equivalent to:</p>
                   <ul>
                       <li><tt>#node &gt;L #lexnode:[pos = "NOMcom"]</tt></li>
                   </ul>
                   <p>The headpos property does not improve the usability of the corpus in TigerSearch, but
                       is useful in producing concordances, providing a more detailed morpho-syntactic tag
                       for the head of a structure than the SRCMF ‘NV’ (non-verbal) type tag.</p>
                   <p>If the non-terminal node directly dominates more than one terminal node, the
                       algorithm generating the headpos property makes an calculated guess as to which word
                       is the head, and inserts the tag of this word as the ‘headpos’. For example, if a
                       non-terminal node dominates a word with pos ‘NOMcom’ and a word with pos ‘DETdef’,
                       the algorithm will guess that the noun is the head, and insert the headpos
                       ‘NOMcom?’.</p>
                   <p>Note that headpos values which have been ‘guessed’ are always suffixed by a question
                       mark (e.g. NOMcom?). There will be no guessed headpos values in texts with full NP
                       annotation.</p>
                   <h3>Terminal nodes</h3>
                   <p>Terminal nodes have the following properties:</p>
                   <h4>pos</h4>
                   <p>Part-of-speech tag (Cattex). For more information, please refer to the <a
                           target="_blank" href="http://bfm.ens-lyon.fr/article.php3?id_article=323">Cattex
                           documentation</a> on the <a target="_blank" href="http://bfm.ens-lyon.fr/">BFM website</a>.</p>
                   <h4>form</h4>
                   <p>Each word has a property “form”. For texts in prose, the value of the “form” tags is
                       always “prose”. For texts in verse, the form tag is:</p>
                   <ul>
                       <li>“vers_first” for the first word in a line;</li>
                       <li>“vers_end” for the last word in a line;</li>
                       <li>“vers” for other words.</li>
                   </ul>
                   <p>It is thus possible to formulate a TS query focusing on words at the beginning or end
                       of a line of verse:</p>
                   <ul>
                       <li><tt>[word = "Tristran" &amp; form = "vers_end"]</tt></li>
                   </ul>
                   <p>In <i>Aucassin and Nicolete</i>, the form tag correctly distinguishes the verse and
                       prose sections of the text.</p>
                   <h4>q</h4>
                   <p>Each word has a property “q”. This is equal to ‘y’ when the word occurs as part of
                       direct discourse, and ‘n’ when it does not. This annotation is automatically
                       generated by the BFM team from the position of quote marks in the text.</p>
                   <h2><a name="sample"></a>Sample queries</h2>
                   <p> The following sample queries may be tested by copying and pasting into the query
                       panel. </p>
                   <p>Find all main clause verbs:<br />
                       <tt>[cat = "Snt"]</tt></p>
                   <p>Find all structures introduced by a preposition:<br />
                       <tt>#n >R #relnc:[cat = "RelNC"]<br /> &amp; #relnc >L [pos = /PRE.*/]</tt><br />
                   </p>
                   <p>Find all post-verbal NP subjects:<br />
                       <tt>#verb:[type = "VFin"] >D #suj:[cat = "SjPer" &amp; type="nV"]<br /> &amp; #suj
                           >L [pos = /NOM.*/] <br /> &amp; #suj >@l #sword<br /> &amp; #verb >L
                           #vword<br /> &amp; #vword .* #sword</tt></p>
                   <p>Find indefinite subjects introduced by <q>qui</q>:<br />
                       <tt>[type = "VFin"] >D #suj:[cat = "SjPer"]<br /> &amp; #suj >R #relnc:[cat =
                           "RelNC"]<br /> &amp; ( #relnc >L [word = /[QqKk]u?i/]<br /> | #relnc >~dupl
                           [word = /[QqKk]u?i/] )</tt><br /></p>
                   <p>Find sentences with coordinated subjects:<br />
                       <tt>#coo:[cat = "Coo"] >~coord #sj1:[cat = "SjPer"]<br /> &amp; #coo >~coord
                           #sj2:[cat = "SjPer"]<br /> &amp; #sj1 $ #sj2</tt></p>
                   <p>Find sentences with possible <q>gapping</q> of the finite verb (i.e. coordination of
                       subject–predicate pairs):<br />
                       <tt>#gpcoo1:[cat = "GpCoo"] >~ #suj1:[cat = "SjPer"]<br /> &amp; #gpcoo1 $.*
                           #gpcoo2:[cat = "GpCoo"]<br /> &amp; #gpcoo2 >~ #suj2:[cat = "SjPer"]<br /> &amp;
                           #gpcoo1 >~ #pred1:[cat = /Cmpl|Obj|AtSj/]<br /> &amp; #gpcoo2 >~ #pred2:[cat =
                           /Cmpl|Obj|AtSj/]<br /></tt>
                   </p>
                   <h1> Useful links</h1>
                   <ul>
                       <li><a target="_blank" href="https://listes.cru.fr/wiki/srcmf/index">SRCMF wiki</a></li>
                       <li><a target="_blank" href="http://srcmf.org">SRCMF website</a></li>
                       <li><a
                           target="_blank" href="http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERSearch/oldindex.shtml"
                               >TIGERSearch website</a></li>
                       <li><a target="_blank" href="http://bfm.ens-lyon.fr/">BFM website</a></li>
                       <li><a target="_blank" href="http://textometrie.ens-lyon.fr/?lang=en">TXM website</a></li>
                   </ul>
               </div>
           </body>
       </html>

Laboratoire ICAR » Plateforme TXM

root / Portal / configurations / html / Help_SrcmfProject.jsp @ 3