Statistics
| Revision:

## root / Portal / configurations / demo / html / Help_SrcmfProject.jsp @ 7

 1               
 

SRCMF corpus: TIGERSearch web interface

 

Contents

 
 

Using the TIGERSearch web interface

 

Writing a query and browsing the results

 

In the TigerSearch tab, queries are entered in the top panel, and matching sentences   are shown in tree form in the bottom panel. A tutorial on TigerSearch queries may be   found in the section “Writing a simple query”.

 
 
• Type your query in the top panel (e.g. #pivot:[word = "Tristran"])
•  
• Click on the ‘Search’ button at the bottom right of the panel.
•  
 

If the query is well-formed, and if there are matching results in the corpus, the   first tree in the forest will appear in the bottom panel.

 

The central bar gives the number of matches and the position of the sentence in the   corpus, in the form sent: [sentence number] [match number] / [total matching   sentences]. Note that subgraph navigation is not yet implemented, and the   interface does not show the total number of matches, only the   number of matching sentences. You can navigate through the forest of matches using   the forward and back arrows on this bar. The ‘Export’ button displays the current   tree as an .SVG file in the browser, which can be saved and downloaded. The ‘Export   Concordance’ button allows matching sentences to be exported in concordance form.

 

Exporting the results

 

To export the results of your query, click the ‘Export Concordance’ button. An export   window will appear, with the following options:

 
 
• Type

 

Three concordances are currently implemented:

 
 
• basic concordance
•  
• single word pivot concordance
•  
• pivot and block concordance
•  
 

It is important to note that these concordances use the names of TigerSearch   variables from the query to structure the concordance. No   concordance will be produced if your query does not contain a   #pivot variable. The pivot and block concordance   requires at least one additional #blockXX variable.

 

Further documentation for these concordances may be found in the section “Exporting a concordance”.

•  
• Context (number of words)

 

Sets the size of the context preceding and following the pivot.

•  
• Restore punctuation

 

Adds punctuation from the BFM’s digitized edition to the exported   concordance. It will also restore words excluded from the TIGERSearch corpus   (e.g. lacunae, AOI in the Chanson de Roland).

•  
• Properties to show in concordance

 

Select which features of terminal and non-terminal nodes should be shown in   the concordance. This function is only active for the ‘pivot and block   concordance’.

•  
 

When you have filled in the form:

 
 
• Click the ‘OK’ button.
•  
 

After a short delay, a new tab will open in your browser, containing the concordance   in plain text tabular format (.csv).

 
 
• Save this file to disk using the ‘File > Save As...’ menu in your   browser.
•  
 

Viewing the concordance

 

To view and manipulate the concordance, you will need to use a spreadsheet   package.

 
 
•  
• Select ‘File > Open...’ from the toolbar.
•  
• Ensure that the file list is showing either ‘All files’ or ‘CSV text   files’.
•  
• Select the saved .csv file.
•  
 

You will need to correctly configure your spreadsheet software to read the file. We   recommend using LibreOffice or OpenOffice Calc, which will prompt the user for   settings whenever a .csv file is opened. The following settings are required for the   import to function:

 
 
• Character set: Unicode (UTF-8);
•  
• Separated by Tab (ONLY);
•  
• Merge delimiters OFF;
•  
• Text delimiter: NONE (empty box)
•  
 

Troubleshooting likely problems:

 
 
• If accented characters do not appear correctly > check the character set is   UTF-8;
•  
• If some rows do not seem to have the correct number of columns > check that   Text Delimiter is set to nothing (the default is usually double quote, which   will cause an error where the text contains double quotes), merge delimiters is   OFF, and TAB is the only separator selected.
•  
• If zeros appear rather than punctuation (unlikely) > use the ‘Fields’ section   of the import window to set every column type to ‘Text’ rather than   ‘Standard’.
•  
 

Writing a simple query

 

The following section will enable you to write simple TIGERSearch queries for the   SRCMF corpus. It is not comprehensive, and must be read in conjunction with:

 
 

Nodes in the TS graph

 

A TigerSearch graph is made up of two types of nodes: terminal and non-terminal   nodes. In the graph viewer, terminal nodes appear at the bottom of the graph, while   non-terminal nodes are represented by labelled white ovals, as shown in the example   je puis dire.

   

Each node has a number of features (see section “Tagset used

 

SRCMF: ‘split’ nodes

 

In a true dependency graph, words form the only nodes.

 

In the TigerXML SRCMF corpus, each ‘word’ in the dependency structure is in fact   split between a terminal node (which contains the lexical form and the PoS tag of   the word itself) and a non-terminal node (which contains the syntactic features of   the structure headed by the word). The non-terminal node and the terminal node are   linked by an edge labelled ‘L’ (for lexical realization).

 

In the example tree, an ‘L’ edge links:

 
 
• the terminal node puis to the non-terminal node ‘Snt’: these nodes   represent the finite verb which heads the sentence;
•  
• the terminal node je to the non-terminal node ‘SjPer’: these nodes   represent the subject of the sentence je;
•  
• the terminal node dire to the non-terminal node ‘AuxA’: these nodes   represent the infinitive verb dire.
•  
 

A ‘D’ edge links the ‘Snt’ node to the non-terminal nodes ‘SjPer’ and ‘AuxA’: this   indicates that the subject je and the ‘auxiliated’ infinitive dire   depend on the main verb puis.

 

SRCMF corpus node features

 

The SRCMF corpus has the following node features:

 

Terminal nodes:

 
 
• word: the word form
•  
• pos: part-of-speech tag (Cattex)
•  
• form: whether the text is verse or prose, and position of the word in   the line of verse.
•  
 

Non-terminal nodes:

 
 
• cat: function of the structure headed by the node
•  
• type: morpho-syntactic category of the node (VFin, VPar, VInf, NV)
•  
•  
• coord: set to ‘y’ if the structure forms part of a coordination
•  
• dom: underscore-separated list of all functions dominated by the node   (e.g. for the ‘Snt’ node above ‘AuxA_SjPer’)
•  
 

For simple queries, we will focus mainly on the word, pos and   cat features.

 

Defining the feature specifications of a node

 

Node feature specifications are written between [square brackets] and take the   following form:

 
 
• [feature operator "value"]
•  
 

where value is a string or

 
 
• [feature operator /value/]
•  
 

where value is a regular expression. Permitted operators are ‘=’   (equals) and ‘!=’ (does not equal). For example, the following expression identifies   all nodes where cat is "SjPer" (personal subject):

 
 
• [cat = "SjPer"]
•  
 

If we wish to include impersonal subjects (i.e. "SjPer" and "SjImp") we can use a   regular expression:

 
 
• [cat = /Sj.*/]
•  
 

We can identify all nodes which are not subjects:

 
 
• [cat != /Sj.*/]
•  
 

We may also the conjunction (&) operator within the square brackets to specify   several properties. For example, we can search for subordinate clause subjects by   requiring the subject to be headed by a finite verb (type is "VFin"):

 
 
• [cat = /Sj.*/ & type = "VFin"]
•  
 

Assigning a variable name to a node

 

A variable name may be assigned to the node definition. These are useful to refer to   the same node several times in a complex query and are also used to indicate the   pivot node to concordance scripts.

 

Variable definitions adopt the following syntax:

 
 
• #name:[<definition>]
•  
 

where definition is a feature specification as described above. Note that   variable names must begin with hash (#) and are separated from their definition by a   colon (:).

 

For example, we may to construct a concordance in which the subject forms the pivot.   We define the #pivot variable as follows:

 
 
• #pivot:[cat = /Sj.*/]
•  
 

Node relations

 

All but the most simple queries will require more than one node to be defined, and   will usually require the relationship between the nodes to be specified.

 

For example, suppose we wish to identify all subjects headed by the word   Tristran. First, we define the subject:

 
 
• #subject:[cat = /Sj.*/]
•  
 

Second, we define the word Tristran as a terminal node:

 
 
• #tristran:[word = "Tristran"]
•  
 

Finally, we must indicate the relationship between the nodes. The relationship   between a non-terminal node and the terminal node representing its lexical content   in the TigerSearch graph is one of direct dominance, labelled ‘L’ (lexical).

 

Direct dominance

 

In TigerSearch, direct dominance is expressed by using the operator ‘>’ with the   following syntax:

 
 
• node >[label] node2
•  
 

where node and node2 are feature specifications or node variables, and   label (optional) is a string.

 

To identify subjects headed by the word Tristran, the relationship between   nodes #subject and #tristran is expressed as follows:

 
 
• #subject >L #tristran
•  
 

Left corner dominance

 

The ‘>@l’ operator specifies the leftmost terminal node dominated at any depth by a   non-terminal node. It has the following syntax:

 
 
• node >@l tnode
•  
 

where node and tnode are feature specifications or node variables, and   tnode is a terminal node.

 

For example, instead of searching for all subjects which are headed by the word   Tristran, we may wish to identify all subjects beginning   with the word Tristran. This relation would be written as follows:

 
 
• #subject >@l #tristran
•  
 

Note that there is also a right corner dominance operator ‘>@r’.

 

Precedence

 

The precedence operator ‘.*’ permits the user to specify the word order of two   terminal nodes with the following syntax:

 
 
• tnode .* tnode2
•  
 

where tnode and tnode2 are feature specifications or node variables   representing terminal nodes.

 

For example, suppose we wish to identify all sentences in which the word Tristran   heads the subject and precedes the main clause verb.

 

We need to add two additional conditions to the query in the previous section. First,   we need to identify the terminal node containing the main verb of the sentence: i.e.   the lexical realization of the non-terminal node ‘Snt’:

 
 
• #snt:[cat = "Snt"] >L #verb
•  
 

You may have noticed that #verb has no feature specification. This is perfectly valid   in TigerSearch query syntax. In practice, we know that only one node can be linked   to #snt by an ‘L’ relation in the corpus. #Verb is thus defined by its relation to   #snt rather than by its features.

 

We then need to specify that the word Tristran precedes the verb:

 
 
• #tristran .* #verb
•  
 

Finally, we need to clarify that #subject is the the subject of #snt. Otherwise, we   risk finding subjects of a subordinate clause which happen to precede the main   clause verb:

 
 
• #snt >D #subject
•  
 

Putting it all together, the query is as follows:

 
 
• #subject:[cat = /Sj.*/] >L #tristran:[word = "Tristran"]
&   #snt:[cat = "Snt"] >L #verb
& #tristran .* #verb
&   #snt >D #subject
•  
 

There is also a direct precedence operator, ‘.’, which specifies that the two   terminal nodes must be directly adjacent.

 

Negation

 

It is important to learn one (extremely frustrating) golden rule of Tiger query   syntax:

 
 
• you can negate a feature specification (e.g. [cat != "SjPer"]);
•  
• you can negate a relation between nodes (e.g. #subject !>L   #tristran)
•  
• but you can’t negate the existence of a node!
•  
 

In practice, this means that when we write:

 
 
• #snt:[cat = "Snt"] !>D #subject:[cat = /Sj.*/]
•  
 

we have not found all null subject main clauses. Instead, we have   asked for sentences (#snt) which contain a subject node (#subject) which is   not the subject of a sentence. TigerSearch will return all   sentences with subjects in a subordinate clause.

 

The SRCMF corpus provides a partial work-around for this problem by using the   dom feature. The dom feature of a non-terminal node lists the cat   features of all nodes linked to it by a ‘D’ edge in alphabetical order separated by   an underscore. For example, the ‘Snt’ node in the example tree has two dependants:   SjPer and AuxA. It therefore has a dom property ‘AuxA_SjPer’.

 

As a result, we can identify all main clauses without subjects by negating the   dom feature:

 
 
• #snt:[cat = "Snt" & dom != /.*Sj.*/]
•  
 

This will return all ‘Snt’ nodes whose dom property does not contain the   characters ‘Sj’: in other words, a main clause without an expressed subject.

 

Syntactic variation

 

TigerSearch syntax is quite flexible, and we may express queries in a number of ways.   For example, the query identifying all subjects headed by the word Tristran   may be expressed using three statements...

 
 
• #subject:[cat = /Sj.*/]
& #tristran:[word = "Tristran"]
  & #subject >L #tristran
•  
 

... or two statements, e.g.:

 
 
• #subject:[cat = /Sj.*/]
& #subject >L #tristran:[word =   "Tristran"]
•  
 

... or one statement:

 
 
• #subject:[cat = /Sj.*/] >L #tristran:[word = "Tristran"]
•  
 

... or without variable names:

 
 
• [cat = /Sj.*/] >L [word = "Tristran"]
•  
 

Where multiple statements are used, the order of statements is irrelevant.   Confusingly for programmers, you may reference variables before assigning a value,   e.g.:

 
 
• #subject >L #tristran & #tristran:[word = "Tristran"] &   #subject:[cat = /Sj.*/]
•  
 

Using concordances

 

The SRCMF project has developed a number of concordances to present the results of   TigerSearch queries in tabular format. Three concordances are currently   implemented:

 
 
• basic concordance
•  
• single word pivot concordance
•  
• pivot and block concordance
•  
 

These concordances produce a text CSV file.

 

Principles

 

The concordances use the names of variables from the TigerSearch query to identify   the syntactic constituents which should form the focus of the table. All   concordances require a #pivot variable to be present in the query.

 

For example, the following query is correct in TigerSearch, but will   not produce a concordance:

 
 
• [word = /Tristr?a[nm][sz]?/]
•  
 

To produce a concordance, the query must identify a node as the #pivot, for   example:

 
 
• #pivot:[word = /Tristr?a[nm][sz]?/]
•  
 

Basic concordance

 

The basic concordance has four columns:

 
 
• sentence ID
•  
• left context
•  
• pivot
•  
• right context
•  
 

The #pivot can be any node in the syntactic tree, either a single word or a larger   structure. Currently, only lexical information (not annotation) can be shown in the   basic concordance.

 

For example, we may wish to create a concordance of all the main clause subjects   containing the word ‘Tristran’:

 
 
• #snt:[cat = "Snt"] >D #pivot:[cat = "SjPer"] & #pivot >* [word =   /Tristr?a[nm][sz]?/]
•  
 

Note that the #pivot variable is attached to the subject node (cat = "SjPer").

 

Below is a selection of the results from the concordance:

   
IDcontexte gauchepivotcontexte droite
beroul_pb:8_lb:234_1263227636.06di por averté Ce saciés vos de verité Atant s' en est Iseut torneeTristranl' a plorant salüee Sor le perron de marbre bis Tristran s' apuie ce
beroul_pb:13_lb:415_1264876249.02# croiz Einz croiz parole fole et vaine Ma bone foi me fera saine Tristran   [remest] a qui * mot poise Tristran tes niés vint soz cel pin Qui * est laienz en cel jardin Si me manda
beroul_pb:134_lb:4365_1268928771.68moi le reçoive En sus l' atent s' espee tient Goudoïne autre voie tientTristran [remest] a qui * mot poiseIst du * buison cela part toise Mais por noient quar cil s' esloigne
                                                 

Note that the pivot may be one or more words.

 

What do the square brackets ([]), slashes (/), asterisks (*) and hashes (#)   mean?

 

The third example in the above table contains [square brackets] in the pivot. These   are used in all concordances to indicate words which occur between parts of   a discontinuous syntactic constituent.

 

The annotated subject in this sentence is Tristran ... a qui mot poise. The   main verb of the sentence, remest, is not part of the subject, but occurs   between its two parts. The verb remest is included in the pivot column, but   surrounded by square brackets.

 

This means that:

 
 
• the pivot column contains all parts of discontinuous   pivots;
•  
• reading the concordance from left to right will always give the original   sentence.
•  
 

Slashes (/) indicate division between sentences in the syntactic annotation. These   will not correspond to the editor’s division into sentences as shown in the   punctuation.

 

Asterisks (*) indicate that the preceding word has two syntactic functions (e.g.   qui in a qui mot poise is both a relator and a subject). They may   usually be ignored.

 

Hashes (#) are related to the representation of coordination, and may always be   ignored.

 

Single word pivot concordance

 

The single word pivot concordance has a variable number of columns, based on the   following structure:

 
 
• ID
•  
• Left context outside the SRCMF sentence containing the pivot
•  
• Left context within the SRCMF sentence containing the pivot
•  
• Pivot
•  
• Structure headed by the pivot
•  
• Function of the structure headed by the pivot
•  
• Right context within the SRCMF sentence containing the pivot
•  
• Right context outside the SRCMF sentence containing the pivot
•  
 

The single word pivot concordance is designed to give as much information as possible   about a single word. For example, a concordance could be created around the word   "Tristran":

 
 
• #pivot:[word = /Tristr?a[nm][sz]?/]
•  
 

Below is a selection of the results from the concordance (some columns are   omitted):

   
Left context in sentencePivotPivot-headed structureRight context in sentence
SireTristranTristranpor Deu le roi Si grant pechié avez de moi Qui * me mandez a itel ore
TristranTristran tes niéstes niés vint soz cel pin Qui * est laienz en cel jardin
# Que por Yseut que porTristranzque por TristranzMervellose joie menoient
                                                 

The ‘pivot-headed structure’ gives the noun phrase of which the word Tristan   is head. In the second example, for instance, the word Tristran heads the   structure Tristan tes niés.

 

Note that words appearing in the ‘pivot-headed structure’ column are also found in   the two context columns. The original sentence may be read across the columns left   context — pivot — right context.

 

Pivot and block concordance

 

Introduction

 

The pivot and block concordance is designed to highlight the position of certain   constituents, called ‘blocks’ (e.g. the subject) with respect to a pivot (e.g. the   verb). The resulting CSV files are complex, with a large number of columns, and are   intended as the basis for more detailed analysis in spreadsheet software.

 

The pivot and block concordances has the following basic structure:

 
 
• ID
•  
• Left context outside the SRCMF sentence containing the pivot
•  
• Left context within the SRCMF sentence containing the pivot
•  
• Pre-pivot blocks
•  
• Pivot
•  
• Post-pivot blocks
•  
• Right context within the SRCMF sentence containing the pivot
•  
• Right context outside the SRCMF sentence containing the pivot
•  
 

As with the other concordances, TigerSearch queries must define a #pivot variable.   However, any number of variables whose name begins ‘#block’ may be defined. At least   one ‘#blockXX’ variable is required.

 

For example, the following query will generate a pivot and block concordance to show   the position of the subject (#block1) with respect to the finite verb (#pivot):

 
 
• #snt:[cat = "Snt"] >D #block1:[cat = "SjPer"] & #snt >L   #pivot
•  
 

In essence, the central section of the resulting concordance will take the following   form:

   
Left contextBlockPivotBlockRight context
Li roispenseque par folie Sire Tristran vos aie amé
Sivoientil# Deu et son reigne
                                           

Where the subject is pre-verbal, it appears in the block column to the left of the   pivot. Where it is post-verbal, it appears in the block column to the right of the   pivot.

 

Why are there square brackets ([]) and curly brackets ({}) in the concordance?

 

As with other concordances, square brackets denote words occurring between   two parts of a discontinuous unit. The difference in this concordance   is that blocks may be discontinuous, as well as the pivot.

 

Curly brackets denote words which occur between the block and the   pivot (or, in more complex examples, between two blocks).

   
Left contextBlockPivotBlockRight context
Vos {n'}entendezpas la raison
Dex qel pitiéFaisoit{a} {mainte} {gent} li chiens
Ta parole [est] [tost] [entendue] Que li rois la roïne prentesttost entendue Que li rois la roïne prent
Tuit [s'] [escrïent] la gent du * reigne {s'}escrïentla gent du * reigne
                                                                       

In the table above, note the use of curly brackets in the first example to mark the   negative adverb n’, which occurs between the subject-block vos and the   verb-pivot entendez. In the second example, the prepositional phrase a   maintes gens is marked with curly brackets, as it separates the verb-pivot   Faisoit from the post-verbal subject-block li chiens.

 

In the third example, a discontinuous subject Ta parole ... que li rois la roïne   prent appears in a pre-verbal block. The pre- or post-verbal   position of a block is determined by the position of its first word relative to   the pivot. The words est tost entendue, which separate the two   parts of the block, are marked with square brackets.

 

In the fourth example, the word s’ appears (i) in square brackets, between the   two halves of a discontinuous subject-block and (ii) in curly brackets, between the   first part of the discontinuous subject tost and the verb-pivot   escrïent.

 

Why are there so many columns? I only asked for one block!

 

The pivot and block concordance shows only one result per pivot.   Continuing to work with the same example, if a single verb-pivot has multiple   subject-blocks (which is quite possible in cases of coordination), each subject   occupies a separate column:

   
Block3Block2Block1PivotBlock
Ne torne murne fort chastel {Ne} {me}tendra
                             

However, due to the way the number of columns is calculated, it is possible that some   will be empty. These may be deleted in the spreadsheet software, if you wish.

 

Note that the concordance will never represent the two halves of a   single discontinuous block in separate columns. The following   representation therefore indicates a coordination:

   
Left contextBlockPivotBlockRight context
Tristran {en}bese{la} {roïne} {Et} elelui par la saisine
                             

The SRCMF of the sentence in this table identifies two coordinated   subjects of the verb bese. One is pre-verbal (Tristran),   one is post-verbal (ele); both occupy separate blocks.

 

 

When a concordance is launched from the TXM-web interface, you may specify which   properties of terminal and non-terminal nodes you wish to see in the   concordance.

 
 
• On the ‘Export Concordance’ form, use the drop-down lists of ‘Non-terminal   features’ and ‘Terminal Features’.
•  
• Select the features of terminal and non-terminal nodes that you wish to show in   the concordance from the two drop-down lists.
•  
• Click ‘OK’.
•  
 

Each added property will be placed in a separate column next to the block or pivot.   For example, if the ‘cat’ property is selected for non-terminal nodes, and the ‘pos’   property is selected for terminal nodes, the query above will produce the following   concordance:

   
Left contextBlockBlock CatPivotPivot PosBlockBlock CatRight context
Li roisSjPerpenseVERcjgque par folie Sire Tristran vos aie amé
SivoientVERcjgilSjPer# Deu et son reigne
                                                             

Tagset

 

Non-terminal nodes

 

Non-terminal nodes have the following properties and values:

 

cat

 

Gives the syntactic function of the element. For more details, please refer to the SRCMF   website.

 
 
• Apst: Vocative (fr. apostrophe)
•  
• AtObj: Object attribute
•  
• AtRfc: Attribute of reflexive pronoun
•  
• AtSj: Subject attribute
•  
• Aux: Auxiliated non-finite verb (neither   passive nor active)
•  
• AuxA: Auxiliated non-finite verb   (active)
•  
• AuxA: Auxiliated non-finite verb   (passive)
•  
•  
• Cmpl: Complement
•  
• Coo: Coordination
•  
• GpCoo: Coordinated group (conjunct)
•  
• Insrt: Inserted clause
•  
• Intj: Interjection
•  
• ModA: Modifier (attached)
•  
• ModD: Dislocated (detached) modifier
•  
• Ng: Negation
•  
• NgPrt: Negative particle (e.g. pas,   mie
•  
• nSnt: Non-sentence
•  
• Obj: Object
•  
• RelC: Coordinated relator
•  
• RelNC: Non-coordinating relator
•  
• Regim: Regime
•  
• Rfc: Reflexive pronoun
•  
• Rfx: Doubled reflexive pronoun (e.g. nous   ... nous-mêmes)
•  
• SjImp: Impersonal subject
•  
• SjPer: Personal subject
•  
• Snt: Sentence
•  
 

type

 

Gives the syntactic category of the head of the structure.

 
 
• VFin: Finite verb form
•  
• VInf: Infinitive
•  
• VPar: Participle
•  
• nV: Non-verbal
•  
 

dom

 

A ‘dom’ property is added to each non-terminal node in the tree listing the functions   of all its dependants and relators in alphabetical order, separated by underscores.   For example, if a finite verb has a subject, object and two adjuncts, the property   [dom = "Circ_Circ_Obj_SjPer"] will be added.

 

This resolves to an extent the problem of ‘negative’ queries. Recall that it is   impossible to query the non-existence of a node:

 
 
• #clause:[type = "VFin"] !>D #suj:[cat = "SjPer"]
•  
 

Contrary to appearances, this query DOES NOT mean ‘node #suj does not exist’: it   means that the node #suj exists, but is not dependant on #clause.

 

However, it is possible to find all finite verbs without a subject by using the dom   property of the finite verb:

 
 
• #clause:[type = "VFin" & dom != /.*SjPer.*/]
•  
 

The query specifies that we wish to find a node #clause which is a finite verb and   does not have the string ‘SjPer’ in the list of dependant nodes given by the dom   property.

 

coord

 

A ‘coord’ property is added to each non-terminal node in the tree. If the node   represents a coordinated structure, [coord = "y"].

 

For example, in the sentence Sade et douz est quanqu’est de li (gcoin1: p. 3,   l. 31), sade and douz are coordinated AtSj. The non-terminal nodes   dominating the words sade and douz have the properties [cat = "AtSj"   & coord="y"].

 

The ‘coord’ property exists primarily to allow non-coordinated structures to be   identified. In the original format, this is not possible, as it would require a   query specifying the non-existence of a node [cat = "Coo"]. However, with the coord   property, it is possible to restrict a query to non-coordinated structures only:

 
 
• #suj:[cat = "SjPer" & coord != "y"]
•  
 

 

A ‘headpos’ property is added to each non-terminal node in the tree. If the text is   correctly annotated at the deep level, each non-terminal node representing a   structure should directly dominate at most one terminal node in the tree, the word   representing the lexical content of the head of the structure. If this is the case,   the ‘headpos’ property is equal to the ‘pos’ property of the dominated terminal   node. Thus:

 
 
•  
 

is equivalent to:

 
 
• #node >L #lexnode:[pos = "NOMcom"]
•  
 

The headpos property does not improve the usability of the corpus in TigerSearch, but   is useful in producing concordances, providing a more detailed morpho-syntactic tag   for the head of a structure than the SRCMF ‘NV’ (non-verbal) type tag.

 

If the non-terminal node directly dominates more than one terminal node, the   algorithm generating the headpos property makes an calculated guess as to which word   is the head, and inserts the tag of this word as the ‘headpos’. For example, if a   non-terminal node dominates a word with pos ‘NOMcom’ and a word with pos ‘DETdef’,   the algorithm will guess that the noun is the head, and insert the headpos   ‘NOMcom?’.

 

Note that headpos values which have been ‘guessed’ are always suffixed by a question   mark (e.g. NOMcom?). There will be no guessed headpos values in texts with full NP   annotation.

 

Terminal nodes

 

Terminal nodes have the following properties:

 

pos

 

Part-of-speech tag (Cattex). For more information, please refer to the Cattex   documentation on the BFM website.

 

form

 

Each word has a property “form”. For texts in prose, the value of the “form” tags is   always “prose”. For texts in verse, the form tag is:

 
 
• “vers_first” for the first word in a line;
•  
• “vers_end” for the last word in a line;
•  
• “vers” for other words.
•  
 

It is thus possible to formulate a TS query focusing on words at the beginning or end   of a line of verse:

 
 
• [word = "Tristran" & form = "vers_end"]
•  
 

In Aucassin and Nicolete, the form tag correctly distinguishes the verse and   prose sections of the text.

 

q

 

Each word has a property “q”. This is equal to ‘y’ when the word occurs as part of   direct discourse, and ‘n’ when it does not. This annotation is automatically   generated by the BFM team from the position of quote marks in the text.

 

Sample queries

 

The following sample queries may be tested by copying and pasting into the query   panel.

 

Find all main clause verbs:
  [cat = "Snt"]

 

Find all structures introduced by a preposition:
  #n >R #relnc:[cat = "RelNC"]
& #relnc >L [pos = /PRE.*/]

 

 

Find all post-verbal NP subjects:
  #verb:[type = "VFin"] >D #suj:[cat = "SjPer" & type="nV"]
& #suj   >L [pos = /NOM.*/]
& #suj >@l #sword
& #verb >L   #vword
& #vword .* #sword

 

Find indefinite subjects introduced by qui:
  [type = "VFin"] >D #suj:[cat = "SjPer"]
& #suj >R #relnc:[cat =   "RelNC"]
& ( #relnc >L [word = /[QqKk]u?i/]
| #relnc >~dupl   [word = /[QqKk]u?i/] )

 

Find sentences with coordinated subjects:
  #coo:[cat = "Coo"] >~coord #sj1:[cat = "SjPer"]
& #coo >~coord   #sj2:[cat = "SjPer"]
& #sj1 $#sj2   Find sentences with possible gapping of the finite verb (i.e. coordination of   subject–predicate pairs):   #gpcoo1:[cat = "GpCoo"] >~ #suj1:[cat = "SjPer"] & #gpcoo1$.*   #gpcoo2:[cat = "GpCoo"]
& #gpcoo2 >~ #suj2:[cat = "SjPer"]
&   #gpcoo1 >~ #pred1:[cat = /Cmpl|Obj|AtSj/]
& #gpcoo2 >~ #pred2:[cat =   /Cmpl|Obj|AtSj/]