Statistics
| Revision:

root / Portal / configurations / html / Help_SrcmfProject.jsp @ 3

History | View | Annotate | Download (50.7 kB)

1
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
2
<html xmlns="http://www.w3.org/1999/xhtml">
3
    <head>
4
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
5
        <title></title>
6
    </head>
7
    <body>
8
        <div style="padding-left:25px;">
9
            <h1>SRCMF corpus: TIGERSearch web interface</h1>
10
            <h2>Contents</h2>
11
            <ul>
12
                <li><a href="#interface">Using the TIGERSearch web interface</a></li>
13
                <li><a href="#query">Writing a simple query</a></li>
14
                <li><a href="#concordances">Exporting a concordance</a></li>
15
                <li><a href="#tags">Tagset used</a></li>
16
                <li><a href="#sample">Sample queries</a></li>
17
            </ul>
18
            <h2><a name="interface"></a>Using the TIGERSearch web interface</h2>
19
            <h3>Writing a query and browsing the results</h3>
20
            <p>In the TigerSearch tab, queries are entered in the top panel, and matching sentences
21
                are shown in tree form in the bottom panel. A tutorial on TigerSearch queries may be
22
                found in the section “<a href="#query">Writing a simple query</a>”.</p>
23
            <ul>
24
                <li>Type your query in the top panel (e.g. <tt>#pivot:[word = "Tristran"])</tt></li>
25
                <li>Click on the ‘Search’ button at the bottom right of the panel.</li>
26
            </ul>
27
            <p>If the query is well-formed, and if there are matching results in the corpus, the
28
                first tree in the forest will appear in the bottom panel.</p>
29
            <p>The central bar gives the number of matches and the position of the sentence in the
30
                corpus, in the form <i>sent: [sentence number] [match number] / [total matching
31
                    sentences].</i> Note that subgraph navigation is not yet implemented, and the
32
                interface does <strong>not</strong> show the total number of matches, only the
33
                number of matching sentences. You can navigate through the forest of matches using
34
                the forward and back arrows on this bar. The ‘Export’ button displays the current
35
                tree as an .SVG file in the browser, which can be saved and downloaded. The ‘Export
36
                Concordance’ button allows matching sentences to be exported in <a
37
                    href="#concordances">concordance form</a>.</p>
38
            <h3>Exporting the results</h3>
39
            <p>To export the results of your query, click the ‘Export Concordance’ button. An export
40
                window will appear, with the following options:</p>
41
            <ul>
42
                <li><p><strong>Type</strong></p>
43
                    <p>Three concordances are currently implemented:</p>
44
                    <ul>
45
                        <li>basic concordance</li>
46
                        <li>single word pivot concordance</li>
47
                        <li>pivot and block concordance</li>
48
                    </ul>
49
                    <p>It is important to note that these concordances use the names of TigerSearch
50
                        variables from the query to structure the concordance. <strong>No
51
                            concordance will be produced if your query does not contain a
52
                                <tt>#pivot</tt> variable.</strong> The pivot and block concordance
53
                        requires at least one additional <tt>#blockXX</tt> variable.</p>
54
                    <p>Further documentation for these concordances may be found in the section “<a
55
                            href="#concordances">Exporting a concordance</a>”.</p></li>
56
                <li><p><strong>Context (number of words)</strong></p>
57
                    <p>Sets the size of the context preceding and following the pivot.</p></li>
58
                <li><p><strong>Restore punctuation</strong></p>
59
                    <p>Adds punctuation from the BFM’s digitized edition to the exported
60
                        concordance. It will also restore words excluded from the TIGERSearch corpus
61
                        (e.g. lacunae, AOI in the <i>Chanson de Roland</i>).</p></li>
62
                <li><p><strong>Properties to show in concordance</strong></p>
63
                    <p>Select which features of terminal and non-terminal nodes should be shown in
64
                        the concordance. This function is only active for the ‘pivot and block
65
                        concordance’.</p></li>
66
            </ul>
67
            <p>When you have filled in the form:</p>
68
            <ul>
69
                <li>Click the ‘OK’ button.</li>
70
            </ul>
71
            <p>After a short delay, a new tab will open in your browser, containing the concordance
72
                in plain text tabular format (.csv).</p>
73
            <ul>
74
                <li>Save this file to disk using the ‘File &gt; Save As...’ menu in your
75
                    browser.</li>
76
            </ul>
77
            <h3>Viewing the concordance</h3>
78
            <p>To view and manipulate the concordance, you will need to use a spreadsheet
79
                package.</p>
80
            <ul>
81
                <li>Open the spreadsheet application.</li>
82
                <li>Select ‘File > Open...’ from the toolbar.</li>
83
                <li>Ensure that the file list is showing either ‘All files’ or ‘CSV text
84
                    files’.</li>
85
                <li>Select the saved .csv file.</li>
86
            </ul>
87
            <p>You will need to correctly configure your spreadsheet software to read the file. We
88
                recommend using LibreOffice or OpenOffice Calc, which will prompt the user for
89
                settings whenever a .csv file is opened. The following settings are required for the
90
                import to function:</p>
91
            <ul>
92
                <li>Character set: Unicode (UTF-8);</li>
93
                <li>Separated by Tab (ONLY);</li>
94
                <li>Merge delimiters OFF;</li>
95
                <li>Text delimiter: NONE (empty box)</li>
96
            </ul>
97
            <p>Troubleshooting likely problems:</p>
98
            <ul>
99
                <li>If accented characters do not appear correctly &gt; check the character set is
100
                    UTF-8;</li>
101
                <li>If some rows do not seem to have the correct number of columns &gt; check that
102
                    Text Delimiter is set to nothing (the default is usually double quote, which
103
                    will cause an error where the text contains double quotes), merge delimiters is
104
                    OFF, and TAB is the only separator selected.</li>
105
                <li>If zeros appear rather than punctuation (unlikely) &gt; use the ‘Fields’ section
106
                    of the import window to set every column type to ‘Text’ rather than
107
                    ‘Standard’.</li>
108
            </ul>
109
            <h2><a name="query"></a>Writing a simple query</h2>
110
            <p>The following section will enable you to write simple TIGERSearch queries for the
111
                SRCMF corpus. It is not comprehensive, and must be read in conjunction with:</p>
112
            <ul>
113
                <li>chapter III of the <a target="_blank"
114
                        href="http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERSearch/manual_html.html"
115
                        >TIGERSearch user’s guide</a></li>
116
            </ul>
117
            <h3>Nodes in the TS graph</h3>
118
            <p>A TigerSearch graph is made up of two types of nodes: terminal and non-terminal
119
                nodes. In the graph viewer, terminal nodes appear at the bottom of the graph, while
120
                non-terminal nodes are represented by labelled white ovals, as shown in the example
121
                    <i>je puis dire</i>.</p>
122
            <img src="images/jepuisdire.png" alt="Example TIGERSearch tree" />
123
            <p>Each node has a number of features (see section “<a href="#tags">Tagset used</a>”</p>
124
            <h4>SRCMF: ‘split’ nodes</h4>
125
            <p>In a true dependency graph, words form the only nodes.</p>
126
            <p>In the TigerXML SRCMF corpus, each ‘word’ in the dependency structure is in fact
127
                split between a terminal node (which contains the lexical form and the PoS tag of
128
                the word itself) and a non-terminal node (which contains the syntactic features of
129
                the structure headed by the word). The non-terminal node and the terminal node are
130
                linked by an edge labelled ‘L’ (for lexical realization).</p>
131
            <p>In the example tree, an ‘L’ edge links:</p>
132
            <ul>
133
                <li>the terminal node <i>puis</i> to the non-terminal node ‘Snt’: these nodes
134
                    represent the finite verb which heads the sentence;</li>
135
                <li>the terminal node <i>je</i> to the non-terminal node ‘SjPer’: these nodes
136
                    represent the subject of the sentence <i>je</i>;</li>
137
                <li>the terminal node <i>dire</i> to the non-terminal node ‘AuxA’: these nodes
138
                    represent the infinitive verb <i>dire</i>.</li>
139
            </ul>
140
            <p>A ‘D’ edge links the ‘Snt’ node to the non-terminal nodes ‘SjPer’ and ‘AuxA’: this
141
                indicates that the subject <i>je</i> and the ‘auxiliated’ infinitive <i>dire</i>
142
                depend on the main verb <i>puis</i>.</p>
143
            <h4>SRCMF corpus node features</h4>
144
            <p>The SRCMF corpus has the following node features:</p>
145
            <p><i>Terminal nodes:</i></p>
146
            <ul>
147
                <li><tt>word</tt>: the word form</li>
148
                <li><tt>pos</tt>: part-of-speech tag (Cattex)</li>
149
                <li><tt>form</tt>: whether the text is verse or prose, and position of the word in
150
                    the line of verse.</li>
151
            </ul>
152
            <p><i>Non-terminal nodes:</i></p>
153
            <ul>
154
                <li><tt>cat</tt>: function of the structure headed by the node</li>
155
                <li><tt>type</tt>: morpho-syntactic category of the node (VFin, VPar, VInf, NV)</li>
156
                <li><tt>headpos</tt>: part-of-speech tag of the head word</li>
157
                <li><tt>coord</tt>: set to ‘y’ if the structure forms part of a coordination</li>
158
                <li><tt>dom</tt>: underscore-separated list of all functions dominated by the node
159
                    (e.g. for the ‘Snt’ node above ‘AuxA_SjPer’)</li>
160
            </ul>
161
            <p>For simple queries, we will focus mainly on the <tt>word</tt>, <tt>pos</tt> and
162
                    <tt>cat</tt> features.</p>
163
            <h4>Defining the feature specifications of a node</h4>
164
            <p>Node feature specifications are written between [square brackets] and take the
165
                following form:</p>
166
            <ul>
167
                <li><tt>[feature operator "value"]</tt></li>
168
            </ul>
169
            <p>where <i>value</i> is a string or</p>
170
            <ul>
171
                <li><tt>[feature operator /value/]</tt></li>
172
            </ul>
173
            <p>where <i>value</i> is a regular expression. Permitted <i>operators</i> are ‘=’
174
                (equals) and ‘!=’ (does not equal). For example, the following expression identifies
175
                all nodes where <tt>cat</tt> is "SjPer" (personal subject):</p>
176
            <ul>
177
                <li><tt>[cat = "SjPer"]</tt></li>
178
            </ul>
179
            <p>If we wish to include impersonal subjects (i.e. "SjPer" and "SjImp") we can use a
180
                regular expression:</p>
181
            <ul>
182
                <li><tt>[cat = /Sj.*/]</tt></li>
183
            </ul>
184
            <p>We can identify all nodes which are <i>not</i> subjects:</p>
185
            <ul>
186
                <li><tt>[cat != /Sj.*/]</tt></li>
187
            </ul>
188
            <p>We may also the conjunction (&amp;) operator within the square brackets to specify
189
                several properties. For example, we can search for subordinate clause subjects by
190
                requiring the subject to be headed by a finite verb (<tt>type</tt> is "VFin"):</p>
191
            <ul>
192
                <li><tt>[cat = /Sj.*/ &amp; type = "VFin"]</tt></li>
193
            </ul>
194
            <h4>Assigning a variable name to a node</h4>
195
            <p>A variable name may be assigned to the node definition. These are useful to refer to
196
                the same node several times in a complex query and are also used to indicate the
197
                pivot node to concordance scripts.</p>
198
            <p>Variable definitions adopt the following syntax:</p>
199
            <ul>
200
                <li><tt>#name:[&lt;definition&gt;]</tt></li>
201
            </ul>
202
            <p>where <i>definition</i> is a feature specification as described above. Note that
203
                variable names must begin with hash (#) and are separated from their definition by a
204
                colon (:).</p>
205
            <p>For example, we may to construct a concordance in which the subject forms the pivot.
206
                We define the #pivot variable as follows:</p>
207
            <ul>
208
                <li><tt>#pivot:[cat = /Sj.*/]</tt></li>
209
            </ul>
210
            <h3>Node relations</h3>
211
            <p>All but the most simple queries will require more than one node to be defined, and
212
                will usually require the relationship between the nodes to be specified. </p>
213
            <p>For example, suppose we wish to identify all subjects headed by the word
214
                    <i>Tristran</i>. First, we define the subject:</p>
215
            <ul>
216
                <li><tt>#subject:[cat = /Sj.*/]</tt></li>
217
            </ul>
218
            <p>Second, we define the word Tristran as a terminal node:</p>
219
            <ul>
220
                <li><tt>#tristran:[word = "Tristran"]</tt></li>
221
            </ul>
222
            <p>Finally, we must indicate the relationship between the nodes. The relationship
223
                between a non-terminal node and the terminal node representing its lexical content
224
                in the TigerSearch graph is one of direct dominance, labelled ‘L’ (lexical).</p>
225
            <h4>Direct dominance</h4>
226
            <p>In TigerSearch, direct dominance is expressed by using the operator ‘&gt;’ with the
227
                following syntax:</p>
228
            <ul>
229
                <li><tt>node &gt;[label] node2</tt></li>
230
            </ul>
231
            <p>where <i>node</i> and <i>node2</i> are feature specifications or node variables, and
232
                label (optional) is a string.</p>
233
            <p>To identify subjects headed by the word <i>Tristran</i>, the relationship between
234
                nodes #subject and #tristran is expressed as follows:</p>
235
            <ul>
236
                <li><tt>#subject &gt;L #tristran</tt></li>
237
            </ul>
238
            <h4>Left corner dominance</h4>
239
            <p>The ‘>@l’ operator specifies the leftmost terminal node dominated at any depth by a
240
                non-terminal node. It has the following syntax:</p>
241
            <ul>
242
                <li><tt>node &gt;@l tnode</tt></li>
243
            </ul>
244
            <p>where <i>node</i> and <i>tnode</i> are feature specifications or node variables, and
245
                    <i>tnode</i> is a terminal node.</p>
246
            <p>For example, instead of searching for all subjects which are headed by the word
247
                    <i>Tristran</i>, we may wish to identify all subjects <strong>beginning</strong>
248
                with the word <i>Tristran</i>. This relation would be written as follows:</p>
249
            <ul>
250
                <li><tt>#subject &gt;@l #tristran</tt></li>
251
            </ul>
252
            <p>Note that there is also a right corner dominance operator ‘>@r’.</p>
253
            <h4>Precedence</h4>
254
            <p>The precedence operator ‘.*’ permits the user to specify the word order of two
255
                terminal nodes with the following syntax:</p>
256
            <ul>
257
                <li><tt>tnode .* tnode2</tt></li>
258
            </ul>
259
            <p>where <i>tnode</i> and <i>tnode2</i> are feature specifications or node variables
260
                representing terminal nodes.</p>
261
            <p> For example, suppose we wish to identify all sentences in which the word Tristran
262
                heads the subject and precedes the main clause verb.</p>
263
            <p>We need to add two additional conditions to the query in the previous section. First,
264
                we need to identify the terminal node containing the main verb of the sentence: i.e.
265
                the lexical realization of the non-terminal node ‘Snt’:</p>
266
            <ul>
267
                <li><tt>#snt:[cat = "Snt"] &gt;L #verb</tt></li>
268
            </ul>
269
            <p>You may have noticed that #verb has no feature specification. This is perfectly valid
270
                in TigerSearch query syntax. In practice, we know that only one node can be linked
271
                to #snt by an ‘L’ relation in the corpus. #Verb is thus defined by its relation to
272
                #snt rather than by its features.</p>
273
            <p>We then need to specify that the word Tristran precedes the verb:</p>
274
            <ul>
275
                <li><tt>#tristran .* #verb</tt></li>
276
            </ul>
277
            <p>Finally, we need to clarify that #subject is the the subject of #snt. Otherwise, we
278
                risk finding subjects of a subordinate clause which happen to precede the main
279
                clause verb:</p>
280
            <ul>
281
                <li><tt>#snt &gt;D #subject</tt></li>
282
            </ul>
283
            <p>Putting it all together, the query is as follows:</p>
284
            <ul>
285
                <li><tt>#subject:[cat = /Sj.*/] &gt;L #tristran:[word = "Tristran"] <br /> &amp;
286
                        #snt:[cat = "Snt"] &gt;L #verb <br /> &amp; #tristran .* #verb <br /> &amp;
287
                        #snt &gt;D #subject</tt></li>
288
            </ul>
289
            <p>There is also a direct precedence operator, ‘.’, which specifies that the two
290
                terminal nodes must be directly adjacent.</p>
291
            <h4>Negation</h4>
292
            <p>It is important to learn one (extremely frustrating) golden rule of Tiger query
293
                syntax:</p>
294
            <ul>
295
                <li>you can negate a feature specification (e.g. <tt>[cat != "SjPer"]</tt>);</li>
296
                <li>you can negate a relation between nodes (e.g. <tt>#subject !&gt;L
297
                    #tristran</tt>)</li>
298
                <li><strong>but you can’t negate the existence of a node!</strong></li>
299
            </ul>
300
            <p>In practice, this means that when we write:</p>
301
            <ul>
302
                <li><tt>#snt:[cat = "Snt"] !&gt;D #subject:[cat = /Sj.*/]</tt></li>
303
            </ul>
304
            <p>we have <strong>not</strong> found all null subject main clauses. Instead, we have
305
                asked for sentences (#snt) which contain a subject node (#subject) which is
306
                    <strong>not</strong> the subject of a sentence. TigerSearch will return all
307
                sentences with subjects in a subordinate clause.</p>
308
            <p>The SRCMF corpus provides a partial work-around for this problem by using the
309
                    <i>dom</i> feature. The <i>dom</i> feature of a non-terminal node lists the cat
310
                features of all nodes linked to it by a ‘D’ edge in alphabetical order separated by
311
                an underscore. For example, the ‘Snt’ node in the example tree has two dependants:
312
                SjPer and AuxA. It therefore has a <i>dom</i> property ‘AuxA_SjPer’.</p>
313
            <p>As a result, we can identify all main clauses without subjects by negating the
314
                    <i>dom</i> feature:</p>
315
            <ul>
316
                <li><tt>#snt:[cat = "Snt" &amp; dom != /.*Sj.*/]</tt></li>
317
            </ul>
318
            <p>This will return all ‘Snt’ nodes whose <i>dom</i> property does not contain the
319
                characters ‘Sj’: in other words, a main clause without an expressed subject.</p>
320
            <h4>Syntactic variation</h4>
321
            <p>TigerSearch syntax is quite flexible, and we may express queries in a number of ways.
322
                For example, the query identifying all subjects headed by the word <i>Tristran</i>
323
                may be expressed using three statements...</p>
324
            <ul>
325
                <li><tt>#subject:[cat = /Sj.*/] <br /> &amp; #tristran:[word = "Tristran"] <br />
326
                        &amp; #subject &gt;L #tristran</tt></li>
327
            </ul>
328
            <p>... or two statements, e.g.:</p>
329
            <ul>
330
                <li><tt>#subject:[cat = /Sj.*/] <br /> &amp; #subject &gt;L #tristran:[word =
331
                        "Tristran"]</tt></li>
332
            </ul>
333
            <p>... or one statement:</p>
334
            <ul>
335
                <li><tt>#subject:[cat = /Sj.*/] &gt;L #tristran:[word = "Tristran"]</tt></li>
336
            </ul>
337
            <p>... or without variable names:</p>
338
            <ul>
339
                <li><tt>[cat = /Sj.*/] &gt;L [word = "Tristran"]</tt></li>
340
            </ul>
341
            <p>Where multiple statements are used, the order of statements is irrelevant.
342
                Confusingly for programmers, you may reference variables before assigning a value,
343
                e.g.:</p>
344
            <ul>
345
                <li><tt>#subject &gt;L #tristran &amp; #tristran:[word = "Tristran"] &amp;
346
                        #subject:[cat = /Sj.*/]</tt></li>
347
            </ul>
348
            <h2><a name="concordances"></a>Using concordances</h2>
349
            <p>The SRCMF project has developed a number of concordances to present the results of
350
                TigerSearch queries in tabular format. Three concordances are currently
351
                implemented:</p>
352
            <ul>
353
                <li>basic concordance</li>
354
                <li>single word pivot concordance</li>
355
                <li>pivot and block concordance</li>
356
            </ul>
357
            <p>These concordances produce a text CSV file.</p>
358
            <h3>Principles</h3>
359
            <p>The concordances use the names of variables from the TigerSearch query to identify
360
                the syntactic constituents which should form the focus of the table. All
361
                concordances require a #pivot variable to be present in the query.</p>
362
            <p>For example, the following query is correct in TigerSearch, but <strong>will
363
                    not</strong> produce a concordance:</p>
364
            <ul>
365
                <li><tt>[word = /Tristr?a[nm][sz]?/]</tt></li>
366
            </ul>
367
            <p>To produce a concordance, the query must identify a node as the #pivot, for
368
                example:</p>
369
            <ul>
370
                <li><tt><strong>#pivot:</strong>[word = /Tristr?a[nm][sz]?/]</tt></li>
371
            </ul>
372
            <h3>Basic concordance</h3>
373
            <p>The basic concordance has four columns:</p>
374
            <ul>
375
                <li>sentence ID</li>
376
                <li>left context</li>
377
                <li>pivot</li>
378
                <li>right context</li>
379
            </ul>
380
            <p>The #pivot can be any node in the syntactic tree, either a single word or a larger
381
                structure. Currently, only lexical information (not annotation) can be shown in the
382
                basic concordance.</p>
383
            <p>For example, we may wish to create a concordance of all the main clause subjects
384
                containing the word ‘Tristran’:</p>
385
            <ul>
386
                <li><tt>#snt:[cat = "Snt"] &gt;D #pivot:[cat = "SjPer"] &amp; #pivot &gt;* [word =
387
                        /Tristr?a[nm][sz]?/]</tt></li>
388
            </ul>
389
            <p>Note that the #pivot variable is attached to the subject node (cat = "SjPer").</p>
390
            <p>Below is a selection of the results from the concordance:</p>
391
            <table border="1">
392
                <tr>
393
                    <th>ID</th>
394
                    <th>contexte gauche</th>
395
                    <th>pivot</th>
396
                    <th>contexte droite</th>
397
                </tr>
398
                <tr>
399
                    <td>beroul_pb:8_lb:234_1263227636.06</td>
400
                    <td>di por averté Ce saciés vos de verité Atant s' en est Iseut tornee</td>
401
                    <td>Tristran</td>
402
                    <td>l' a plorant salüee Sor le perron de marbre bis Tristran s' apuie ce</td>
403
                </tr>
404
                <tr>
405
                    <td>beroul_pb:13_lb:415_1264876249.02</td>
406
                    <td># croiz Einz croiz parole fole et vaine Ma bone foi me fera saine Tristran
407
                        [remest] a qui * mot poise </td>
408
                    <td>Tristran tes niés </td>
409
                    <td>vint soz cel pin Qui * est laienz en cel jardin Si me manda</td>
410
                </tr>
411
                <tr>
412
                    <td>beroul_pb:134_lb:4365_1268928771.68</td>
413
                    <td>moi le reçoive En sus l' atent s' espee tient Goudoïne autre voie tient</td>
414
                    <td>Tristran [remest] a qui * mot poise</td>
415
                    <td>Ist du * buison cela part toise Mais por noient quar cil s' esloigne</td>
416
                </tr>
417
            </table>
418
            <p>Note that the pivot may be one or more words.</p>
419
            <h3>What do the square brackets ([]), slashes (/), asterisks (*) and hashes (#)
420
                mean?</h3>
421
            <p>The third example in the above table contains [square brackets] in the pivot. These
422
                are used in all concordances to indicate <strong>words which occur between parts of
423
                    a discontinuous syntactic constituent</strong>.</p>
424
            <p>The annotated subject in this sentence is <i>Tristran ... a qui mot poise</i>. The
425
                main verb of the sentence, <i>remest</i>, is not part of the subject, but occurs
426
                between its two parts. The verb <i>remest</i> is included in the pivot column, but
427
                surrounded by square brackets.</p>
428
            <p>This means that:</p>
429
            <ul>
430
                <li>the pivot column contains <strong>all parts</strong> of discontinuous
431
                    pivots;</li>
432
                <li>reading the concordance from left to right will always give the original
433
                    sentence.</li>
434
            </ul>
435
            <p>Slashes (/) indicate division between sentences in the syntactic annotation. These
436
                will not correspond to the editor’s division into sentences as shown in the
437
                punctuation.</p>
438
            <p>Asterisks (*) indicate that the preceding word has two syntactic functions (e.g.
439
                    <i>qui</i> in <i>a qui mot poise</i> is both a relator and a subject). They may
440
                usually be ignored.</p>
441
            <p>Hashes (#) are related to the representation of coordination, and may always be
442
                ignored.</p>
443
            <h3>Single word pivot concordance</h3>
444
            <p>The single word pivot concordance has a variable number of columns, based on the
445
                following structure:</p>
446
            <ul>
447
                <li>ID</li>
448
                <li>Left context outside the SRCMF sentence containing the pivot</li>
449
                <li>Left context within the SRCMF sentence containing the pivot</li>
450
                <li>Pivot</li>
451
                <li>Structure headed by the pivot</li>
452
                <li>Function of the structure headed by the pivot</li>
453
                <li>Right context within the SRCMF sentence containing the pivot</li>
454
                <li>Right context outside the SRCMF sentence containing the pivot</li>
455
            </ul>
456
            <p>The single word pivot concordance is designed to give as much information as possible
457
                about a single word. For example, a concordance could be created around the word
458
                "Tristran":</p>
459
            <ul>
460
                <li><tt>#pivot:[word = /Tristr?a[nm][sz]?/]</tt></li>
461
            </ul>
462
            <p>Below is a selection of the results from the concordance (some columns are
463
                omitted):</p>
464
            <table border="1">
465
                <tr>
466
                    <th>Left context in sentence</th>
467
                    <th>Pivot</th>
468
                    <th>Pivot-headed structure</th>
469
                    <th>Right context in sentence</th>
470
                </tr>
471
                <tr>
472
                    <td>Sire</td>
473
                    <td>Tristran</td>
474
                    <td>Tristran</td>
475
                    <td>por Deu le roi Si grant pechié avez de moi Qui * me mandez a itel ore</td>
476
                </tr>
477
                <tr>
478
                    <td></td>
479
                    <td>Tristran</td>
480
                    <td>Tristran tes niés</td>
481
                    <td>tes niés vint soz cel pin Qui * est laienz en cel jardin</td>
482
                </tr>
483
                <tr>
484
                    <td># Que por Yseut que por</td>
485
                    <td>Tristranz</td>
486
                    <td>que por Tristranz</td>
487
                    <td>Mervellose joie menoient</td>
488
                </tr>
489
            </table>
490
            <p>The ‘pivot-headed structure’ gives the noun phrase of which the word <i>Tristan</i>
491
                is head. In the second example, for instance, the word <i>Tristran</i> heads the
492
                structure <i>Tristan tes niés</i>.</p>
493
            <p>Note that words appearing in the ‘pivot-headed structure’ column are also found in
494
                the two context columns. The original sentence may be read across the columns left
495
                context — pivot — right context.</p>
496
            <h3>Pivot and block concordance</h3>
497
            <h4>Introduction</h4>
498
            <p>The pivot and block concordance is designed to highlight the position of certain
499
                constituents, called ‘blocks’ (e.g. the subject) with respect to a pivot (e.g. the
500
                verb). The resulting CSV files are complex, with a large number of columns, and are
501
                intended as the basis for more detailed analysis in spreadsheet software.</p>
502
            <p>The pivot and block concordances has the following basic structure:</p>
503
            <ul>
504
                <li>ID</li>
505
                <li>Left context outside the SRCMF sentence containing the pivot</li>
506
                <li>Left context within the SRCMF sentence containing the pivot</li>
507
                <li>Pre-pivot blocks</li>
508
                <li>Pivot</li>
509
                <li>Post-pivot blocks</li>
510
                <li>Right context within the SRCMF sentence containing the pivot</li>
511
                <li>Right context outside the SRCMF sentence containing the pivot</li>
512
            </ul>
513
            <p>As with the other concordances, TigerSearch queries must define a #pivot variable.
514
                However, any number of variables whose name begins ‘#block’ may be defined. At least
515
                one ‘#blockXX’ variable is required.</p>
516
            <p>For example, the following query will generate a pivot and block concordance to show
517
                the position of the subject (#block1) with respect to the finite verb (#pivot):</p>
518
            <ul>
519
                <li><tt>#snt:[cat = "Snt"] &gt;D #block1:[cat = "SjPer"] &amp; #snt &gt;L
520
                        #pivot</tt></li>
521
            </ul>
522
            <p>In essence, the central section of the resulting concordance will take the following
523
                form:</p>
524
            <table border="1">
525
                <tr>
526
                    <th>Left context</th>
527
                    <th>Block</th>
528
                    <th>Pivot</th>
529
                    <th>Block</th>
530
                    <th>Right context</th>
531
                </tr>
532
                <tr>
533
                    <td></td>
534
                    <td>Li rois</td>
535
                    <td>pense</td>
536
                    <td></td>
537
                    <td>que par folie Sire Tristran vos aie amé</td>
538
                </tr>
539
                <tr>
540
                    <td>Si</td>
541
                    <td></td>
542
                    <td>voient</td>
543
                    <td>il</td>
544
                    <td># Deu et son reigne</td>
545
                </tr>
546
            </table>
547
            <p>Where the subject is pre-verbal, it appears in the block column to the left of the
548
                pivot. Where it is post-verbal, it appears in the block column to the right of the
549
                pivot.</p>
550
            <h4>Why are there square brackets ([]) and curly brackets ({}) in the concordance?</h4>
551
            <p>As with other concordances, square brackets denote <strong>words occurring between
552
                    two parts of a discontinuous unit</strong>. The difference in this concordance
553
                is that blocks may be discontinuous, as well as the pivot.</p>
554
            <p>Curly brackets denote <strong>words which occur between the block and the
555
                    pivot</strong> (or, in more complex examples, between two blocks).</p>
556
            <table border="1">
557
                <tr>
558
                    <th>Left context</th>
559
                    <th>Block</th>
560
                    <th>Pivot</th>
561
                    <th>Block</th>
562
                    <th>Right context</th>
563
                </tr>
564
                <tr>
565
                    <td></td>
566
                    <td>Vos {n'}</td>
567
                    <td>entendez</td>
568
                    <td></td>
569
                    <td>pas la raison</td>
570
                </tr>
571
                <tr>
572
                    <td>Dex qel pitié</td>
573
                    <td></td>
574
                    <td>Faisoit</td>
575
                    <td>{a} {mainte} {gent} li chiens</td>
576
                    <td></td>
577
                </tr>
578
                <tr>
579
                    <td></td>
580
                    <td>Ta parole [est] [tost] [entendue] Que li rois la roïne prent</td>
581
                    <td>est</td>
582
                    <td></td>
583
                    <td>tost entendue Que li rois la roïne prent</td>
584
                </tr>
585
                <tr>
586
                    <td></td>
587
                    <td>Tuit [s'] [escrïent] la gent du * reigne {s'}</td>
588
                    <td>escrïent</td>
589
                    <td></td>
590
                    <td>la gent du * reigne</td>
591
                </tr>
592
            </table>
593
            <p>In the table above, note the use of curly brackets in the first example to mark the
594
                negative adverb <i>n’</i>, which occurs between the subject-block <i>vos</i> and the
595
                verb-pivot <i>entendez</i>. In the second example, the prepositional phrase <i>a
596
                    maintes gens</i> is marked with curly brackets, as it separates the verb-pivot
597
                    <i>Faisoit</i> from the post-verbal subject-block <i>li chiens</i>.</p>
598
            <p>In the third example, a discontinuous subject <i>Ta parole ... que li rois la roïne
599
                    prent</i> appears in a pre-verbal block. <strong>The pre- or post-verbal
600
                    position of a block is determined by the position of its first word relative to
601
                    the pivot</strong>. The words <i>est tost entendue</i>, which separate the two
602
                parts of the block, are marked with square brackets. </p>
603
            <p>In the fourth example, the word <i>s’</i> appears (i) in square brackets, between the
604
                two halves of a discontinuous subject-block and (ii) in curly brackets, between the
605
                first part of the discontinuous subject <i>tost</i> and the verb-pivot
606
                    <i>escrïent</i>.</p>
607
            <h4>Why are there so many columns? I only asked for one block!</h4>
608
            <p>The pivot and block concordance shows <strong>only one result per pivot</strong>.
609
                Continuing to work with the same example, if a single verb-pivot has multiple
610
                subject-blocks (which is quite possible in cases of coordination), each subject
611
                occupies a separate column:</p>
612
            <table border="1">
613
                <tr>
614
                    <th>Block3</th>
615
                    <th>Block2</th>
616
                    <th>Block1</th>
617
                    <th>Pivot</th>
618
                    <th>Block</th>
619
                </tr>
620
                <tr>
621
                    <td>Ne tor</td>
622
                    <td>ne mur</td>
623
                    <td>ne fort chastel {Ne} {me}</td>
624
                    <td>tendra</td>
625
                    <td></td>
626
                </tr>
627
            </table>
628
            <p>However, due to the way the number of columns is calculated, it is possible that some
629
                will be empty. These may be deleted in the spreadsheet software, if you wish.</p>
630
            <p>Note that the concordance will <strong>never</strong> represent the two halves of a
631
                    <strong>single discontinuous</strong> block in separate columns. The following
632
                representation therefore indicates a coordination:</p>
633
            <table border="1">
634
                <tr>
635
                    <th>Left context</th>
636
                    <th>Block</th>
637
                    <th>Pivot</th>
638
                    <th>Block</th>
639
                    <th>Right context</th>
640
                </tr>
641
                <tr>
642
                    <td></td>
643
                    <td>Tristran {en}</td>
644
                    <td>bese</td>
645
                    <td>{la} {roïne} {Et} ele</td>
646
                    <td>lui par la saisine</td>
647
                </tr>
648
            </table>
649
            <p>The SRCMF of the sentence in this table identifies <strong>two coordinated
650
                    subjects</strong> of the verb <i>bese</i>. One is pre-verbal (<i>Tristran</i>),
651
                one is post-verbal (<i>ele</i>); both occupy separate blocks.</p>
652
            <h3>Adding annotation information</h3>
653
            <p>When a concordance is launched from the TXM-web interface, you may specify which
654
                properties of terminal and non-terminal nodes you wish to see in the
655
                concordance.</p>
656
            <ul>
657
                <li>On the ‘Export Concordance’ form, use the drop-down lists of ‘Non-terminal
658
                    features’ and ‘Terminal Features’.</li>
659
                <li>Select the features of terminal and non-terminal nodes that you wish to show in
660
                    the concordance from the two drop-down lists.</li>
661
                <li>Click ‘OK’.</li>
662
            </ul>
663
            <p>Each added property will be placed in a separate column next to the block or pivot.
664
                For example, if the ‘cat’ property is selected for non-terminal nodes, and the ‘pos’
665
                property is selected for terminal nodes, the query above will produce the following
666
                concordance:</p>
667
            <table border="1">
668
                <tr>
669
                    <th>Left context</th>
670
                    <th>Block</th>
671
                    <th>Block Cat</th>
672
                    <th>Pivot</th>
673
                    <th>Pivot Pos</th>
674
                    <th>Block</th>
675
                    <th>Block Cat</th>
676
                    <th>Right context</th>
677
                </tr>
678
                <tr>
679
                    <td></td>
680
                    <td>Li rois</td>
681
                    <td>SjPer</td>
682
                    <td>pense</td>
683
                    <td>VERcjg</td>
684
                    <td></td>
685
                    <td></td>
686
                    <td>que par folie Sire Tristran vos aie amé</td>
687
                </tr>
688
                <tr>
689
                    <td>Si</td>
690
                    <td></td>
691
                    <td></td>
692
                    <td>voient</td>
693
                    <td>VERcjg</td>
694
                    <td>il</td>
695
                    <td>SjPer</td>
696
                    <td># Deu et son reigne</td>
697
                </tr>
698
            </table>
699
            <h2><a name="tags"></a>Tagset</h2>
700
            <h3>Non-terminal nodes</h3>
701
            <p>Non-terminal nodes have the following properties and values:</p>
702
            <h4>cat</h4>
703
            <p>Gives the syntactic function of the element. For more details, please refer to the <a
704
                    target="_blank" href="http://srcmf.org">SRCMF
705
                    website</a>.</p>
706
            <ul>
707
                <li><a name="Apst"></a><strong>Apst</strong>: Vocative (fr. apostrophe)</li>
708
                <li><a name="AtObj"></a><strong>AtObj</strong>: Object attribute</li>
709
                <li><a name="AtRfc"></a><strong>AtRfc</strong>: Attribute of reflexive pronoun</li>
710
                <li><a name="AtSj"></a><strong>AtSj</strong>: Subject attribute</li>
711
                <li><a name="Aux"></a><strong>Aux</strong>: Auxiliated non-finite verb (neither
712
                    passive nor active)</li>
713
                <li><a name="AuxA"></a><strong>AuxA</strong>: Auxiliated non-finite verb
714
                    (active)</li>
715
                <li><a name="AuxP"></a><strong>AuxA</strong>: Auxiliated non-finite verb
716
                    (passive)</li>
717
                <li><a name="Circ"></a><strong>Circ</strong>: Adjunct (fr. circonstant)</li>
718
                <li><a name="Cmpl"></a><strong>Cmpl</strong>: Complement</li>
719
                <li><a name="Coo"></a><strong>Coo</strong>: Coordination</li>
720
                <li><a name="GpCoo"></a><strong>GpCoo</strong>: Coordinated group (conjunct)</li>
721
                <li><a name="Insrt"></a><strong>Insrt</strong>: Inserted clause</li>
722
                <li><a name="Intj"></a><strong>Intj</strong>: Interjection</li>
723
                <li><a name="ModA"></a><strong>ModA</strong>: Modifier (attached)</li>
724
                <li><a name="ModD"></a><strong>ModD</strong>: Dislocated (detached) modifier</li>
725
                <li><a name="Ng"></a><strong>Ng</strong>: Negation</li>
726
                <li><a name="NgPrt"></a><strong>NgPrt</strong>: Negative particle (e.g. <i>pas</i>,
727
                        <i>mie</i></li>
728
                <li><a name="nSnt"></a><strong>nSnt</strong>: Non-sentence</li>
729
                <li><a name="Obj"></a><strong>Obj</strong>: Object</li>
730
                <li><a name="RelC"></a><strong>RelC</strong>: Coordinated relator</li>
731
                <li><a name="RelNC"></a><strong>RelNC</strong>: Non-coordinating relator</li>
732
                <li><a name="Regim"></a><strong>Regim</strong>: Regime</li>
733
                <li><a name="Rfc"></a><strong>Rfc</strong>: Reflexive pronoun</li>
734
                <li><a name="Rfx"></a><strong>Rfx</strong>: Doubled reflexive pronoun (e.g. <i>nous
735
                        ... <strong>nous-mêmes</strong></i>)</li>
736
                <li><a name="SjImp"></a><strong>SjImp</strong>: Impersonal subject</li>
737
                <li><a name="SjPer"></a><strong>SjPer</strong>: Personal subject</li>
738
                <li><a name="Snt"></a><strong>Snt</strong>: Sentence</li>
739
            </ul>
740
            <h4>type</h4>
741
            <p>Gives the syntactic category of the head of the structure.</p>
742
            <ul>
743
                <li><a name="VFin"></a><strong>VFin</strong>: Finite verb form</li>
744
                <li><a name="VInf"></a><strong>VInf</strong>: Infinitive</li>
745
                <li><a name="VPar"></a><strong>VPar</strong>: Participle</li>
746
                <li><a name="nV"></a><strong>nV</strong>: Non-verbal</li>
747
            </ul>
748
            <h4>dom</h4>
749
            <p>A ‘dom’ property is added to each non-terminal node in the tree listing the functions
750
                of all its dependants and relators in alphabetical order, separated by underscores.
751
                For example, if a finite verb has a subject, object and two adjuncts, the property
752
                [dom = "Circ_Circ_Obj_SjPer"] will be added.</p>
753
            <p>This resolves to an extent the problem of ‘negative’ queries. Recall that it is
754
                impossible to query the non-existence of a node:</p>
755
            <ul>
756
                <li><tt>#clause:[type = "VFin"] !&gt;D #suj:[cat = "SjPer"]</tt></li>
757
            </ul>
758
            <p>Contrary to appearances, this query DOES NOT mean ‘node #suj does not exist’: it
759
                means that the node #suj exists, but is not dependant on #clause.</p>
760
            <p>However, it is possible to find all finite verbs without a subject by using the dom
761
                property of the finite verb:</p>
762
            <ul>
763
                <li><tt>#clause:[type = "VFin" &amp; dom != /.*SjPer.*/]</tt></li>
764
            </ul>
765
            <p>The query specifies that we wish to find a node #clause which is a finite verb and
766
                does not have the string ‘SjPer’ in the list of dependant nodes given by the dom
767
                property.</p>
768
            <h4>coord</h4>
769
            <p>A ‘coord’ property is added to each non-terminal node in the tree. If the node
770
                represents a coordinated structure, [coord = "y"].</p>
771
            <p>For example, in the sentence <i>Sade et douz est quanqu’est de li</i> (gcoin1: p. 3,
772
                l. 31), <i>sade</i> and <i>douz</i> are coordinated AtSj. The non-terminal nodes
773
                dominating the words <i>sade</i> and <i>douz</i> have the properties [cat = "AtSj"
774
                &amp; coord="y"].</p>
775
            <p>The ‘coord’ property exists primarily to allow non-coordinated structures to be
776
                identified. In the original format, this is not possible, as it would require a
777
                query specifying the non-existence of a node [cat = "Coo"]. However, with the coord
778
                property, it is possible to restrict a query to non-coordinated structures only:</p>
779
            <ul>
780
                <li><tt>#suj:[cat = "SjPer" &amp; coord != "y"]</tt></li>
781
            </ul>
782
            <h4>headpos</h4>
783
            <p>A ‘headpos’ property is added to each non-terminal node in the tree. If the text is
784
                correctly annotated at the deep level, each non-terminal node representing a
785
                structure should directly dominate at most one terminal node in the tree, the word
786
                representing the lexical content of the head of the structure. If this is the case,
787
                the ‘headpos’ property is equal to the ‘pos’ property of the dominated terminal
788
                node. Thus:</p>
789
            <ul>
790
                <li><tt>#node:[headpos = "NOMcom"]</tt></li>
791
            </ul>
792
            <p>is equivalent to:</p>
793
            <ul>
794
                <li><tt>#node &gt;L #lexnode:[pos = "NOMcom"]</tt></li>
795
            </ul>
796
            <p>The headpos property does not improve the usability of the corpus in TigerSearch, but
797
                is useful in producing concordances, providing a more detailed morpho-syntactic tag
798
                for the head of a structure than the SRCMF ‘NV’ (non-verbal) type tag.</p>
799
            <p>If the non-terminal node directly dominates more than one terminal node, the
800
                algorithm generating the headpos property makes an calculated guess as to which word
801
                is the head, and inserts the tag of this word as the ‘headpos’. For example, if a
802
                non-terminal node dominates a word with pos ‘NOMcom’ and a word with pos ‘DETdef’,
803
                the algorithm will guess that the noun is the head, and insert the headpos
804
                ‘NOMcom?’.</p>
805
            <p>Note that headpos values which have been ‘guessed’ are always suffixed by a question
806
                mark (e.g. NOMcom?). There will be no guessed headpos values in texts with full NP
807
                annotation.</p>
808
            <h3>Terminal nodes</h3>
809
            <p>Terminal nodes have the following properties:</p>
810
            <h4>pos</h4>
811
            <p>Part-of-speech tag (Cattex). For more information, please refer to the <a
812
                    target="_blank" href="http://bfm.ens-lyon.fr/article.php3?id_article=323">Cattex
813
                    documentation</a> on the <a target="_blank" href="http://bfm.ens-lyon.fr/">BFM website</a>.</p>
814
            <h4>form</h4>
815
            <p>Each word has a property “form”. For texts in prose, the value of the “form” tags is
816
                always “prose”. For texts in verse, the form tag is:</p>
817
            <ul>
818
                <li>“vers_first” for the first word in a line;</li>
819
                <li>“vers_end” for the last word in a line;</li>
820
                <li>“vers” for other words.</li>
821
            </ul>
822
            <p>It is thus possible to formulate a TS query focusing on words at the beginning or end
823
                of a line of verse:</p>
824
            <ul>
825
                <li><tt>[word = "Tristran" &amp; form = "vers_end"]</tt></li>
826
            </ul>
827
            <p>In <i>Aucassin and Nicolete</i>, the form tag correctly distinguishes the verse and
828
                prose sections of the text.</p>
829
            <h4>q</h4>
830
            <p>Each word has a property “q”. This is equal to ‘y’ when the word occurs as part of
831
                direct discourse, and ‘n’ when it does not. This annotation is automatically
832
                generated by the BFM team from the position of quote marks in the text.</p>
833
            <h2><a name="sample"></a>Sample queries</h2>
834
            <p> The following sample queries may be tested by copying and pasting into the query
835
                panel. </p>
836
            <p>Find all main clause verbs:<br />
837
                <tt>[cat = "Snt"]</tt></p>
838
            <p>Find all structures introduced by a preposition:<br />
839
                <tt>#n >R #relnc:[cat = "RelNC"]<br /> &amp; #relnc >L [pos = /PRE.*/]</tt><br />
840
            </p>
841
            <p>Find all post-verbal NP subjects:<br />
842
                <tt>#verb:[type = "VFin"] >D #suj:[cat = "SjPer" &amp; type="nV"]<br /> &amp; #suj
843
                    >L [pos = /NOM.*/] <br /> &amp; #suj >@l #sword<br /> &amp; #verb >L
844
                    #vword<br /> &amp; #vword .* #sword</tt></p>
845
            <p>Find indefinite subjects introduced by <q>qui</q>:<br />
846
                <tt>[type = "VFin"] >D #suj:[cat = "SjPer"]<br /> &amp; #suj >R #relnc:[cat =
847
                    "RelNC"]<br /> &amp; ( #relnc >L [word = /[QqKk]u?i/]<br /> | #relnc >~dupl
848
                    [word = /[QqKk]u?i/] )</tt><br /></p>
849
            <p>Find sentences with coordinated subjects:<br />
850
                <tt>#coo:[cat = "Coo"] >~coord #sj1:[cat = "SjPer"]<br /> &amp; #coo >~coord
851
                    #sj2:[cat = "SjPer"]<br /> &amp; #sj1 $ #sj2</tt></p>
852
            <p>Find sentences with possible <q>gapping</q> of the finite verb (i.e. coordination of
853
                subject–predicate pairs):<br />
854
                <tt>#gpcoo1:[cat = "GpCoo"] >~ #suj1:[cat = "SjPer"]<br /> &amp; #gpcoo1 $.*
855
                    #gpcoo2:[cat = "GpCoo"]<br /> &amp; #gpcoo2 >~ #suj2:[cat = "SjPer"]<br /> &amp;
856
                    #gpcoo1 >~ #pred1:[cat = /Cmpl|Obj|AtSj/]<br /> &amp; #gpcoo2 >~ #pred2:[cat =
857
                    /Cmpl|Obj|AtSj/]<br /></tt>
858
            </p>
859
            <h1> Useful links</h1>
860
            <ul>
861
                <li><a target="_blank" href="https://listes.cru.fr/wiki/srcmf/index">SRCMF wiki</a></li>
862
                <li><a target="_blank" href="http://srcmf.org">SRCMF website</a></li>
863
                <li><a
864
                    target="_blank" href="http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERSearch/oldindex.shtml"
865
                        >TIGERSearch website</a></li>
866
                <li><a target="_blank" href="http://bfm.ens-lyon.fr/">BFM website</a></li>
867
                <li><a target="_blank" href="http://textometrie.ens-lyon.fr/?lang=en">TXM website</a></li>
868
            </ul>
869
        </div>
870
    </body>
871
</html>