SRCMF corpus: TIGERSearch web interface

Using the TIGERSearch web interface
Writing a simple query
Exporting a concordance
Tagset used
Sample queries

Using the TIGERSearch web interface

Writing a query and browsing the results

In the TigerSearch tab, queries are entered in the top panel, and matching sentences are shown in tree form in the bottom panel. A tutorial on TigerSearch queries may be found in the section “Writing a simple query”.

Type your query in the top panel (e.g. #pivot:[word = "Tristran"])
Click on the ‘Search’ button at the bottom right of the panel.

If the query is well-formed, and if there are matching results in the corpus, the first tree in the forest will appear in the bottom panel.

The central bar gives the number of matches and the position of the sentence in the corpus, in the form sent: [sentence number] [match number] / [total matching sentences]. Note that subgraph navigation is not yet implemented, and the interface does not show the total number of matches, only the number of matching sentences. You can navigate through the forest of matches using the forward and back arrows on this bar. The ‘Export’ button displays the current tree as an .SVG file in the browser, which can be saved and downloaded. The ‘Export Concordance’ button allows matching sentences to be exported in concordance form.

Exporting the results

To export the results of your query, click the ‘Export Concordance’ button. An export window will appear, with the following options:

Type

Three concordances are currently implemented:
- basic concordance
- single word pivot concordance
- pivot and block concordance
It is important to note that these concordances use the names of TigerSearch variables from the query to structure the concordance. No concordance will be produced if your query does not contain a #pivot variable. The pivot and block concordance requires at least one additional #blockXX variable.

Further documentation for these concordances may be found in the section “Exporting a concordance”.
Context (number of words)

Sets the size of the context preceding and following the pivot.
Restore punctuation

Adds punctuation from the BFM’s digitized edition to the exported concordance. It will also restore words excluded from the TIGERSearch corpus (e.g. lacunae, AOI in the Chanson de Roland).
Properties to show in concordance

Select which features of terminal and non-terminal nodes should be shown in the concordance. This function is only active for the ‘pivot and block concordance’.

When you have filled in the form:

Click the ‘OK’ button.

After a short delay, a new tab will open in your browser, containing the concordance in plain text tabular format (.csv).

Save this file to disk using the ‘File > Save As...’ menu in your browser.

Viewing the concordance

To view and manipulate the concordance, you will need to use a spreadsheet package.

Open the spreadsheet application.
Select ‘File > Open...’ from the toolbar.
Ensure that the file list is showing either ‘All files’ or ‘CSV text files’.
Select the saved .csv file.

You will need to correctly configure your spreadsheet software to read the file. We recommend using LibreOffice or OpenOffice Calc, which will prompt the user for settings whenever a .csv file is opened. The following settings are required for the import to function:

Character set: Unicode (UTF-8);
Separated by Tab (ONLY);
Merge delimiters OFF;
Text delimiter: NONE (empty box)

Troubleshooting likely problems:

If accented characters do not appear correctly > check the character set is UTF-8;
If some rows do not seem to have the correct number of columns > check that Text Delimiter is set to nothing (the default is usually double quote, which will cause an error where the text contains double quotes), merge delimiters is OFF, and TAB is the only separator selected.
If zeros appear rather than punctuation (unlikely) > use the ‘Fields’ section of the import window to set every column type to ‘Text’ rather than ‘Standard’.

Writing a simple query

The following section will enable you to write simple TIGERSearch queries for the SRCMF corpus. It is not comprehensive, and must be read in conjunction with:

chapter III of the TIGERSearch user’s guide

Nodes in the TS graph

A TigerSearch graph is made up of two types of nodes: terminal and non-terminal nodes. In the graph viewer, terminal nodes appear at the bottom of the graph, while non-terminal nodes are represented by labelled white ovals, as shown in the example je puis dire.

Each node has a number of features (see section “Tagset used”

SRCMF: ‘split’ nodes

In a true dependency graph, words form the only nodes.

In the TigerXML SRCMF corpus, each ‘word’ in the dependency structure is in fact split between a terminal node (which contains the lexical form and the PoS tag of the word itself) and a non-terminal node (which contains the syntactic features of the structure headed by the word). The non-terminal node and the terminal node are linked by an edge labelled ‘L’ (for lexical realization).

In the example tree, an ‘L’ edge links:

the terminal node puis to the non-terminal node ‘Snt’: these nodes represent the finite verb which heads the sentence;
the terminal node je to the non-terminal node ‘SjPer’: these nodes represent the subject of the sentence je;
the terminal node dire to the non-terminal node ‘AuxA’: these nodes represent the infinitive verb dire.

A ‘D’ edge links the ‘Snt’ node to the non-terminal nodes ‘SjPer’ and ‘AuxA’: this indicates that the subject je and the ‘auxiliated’ infinitive dire depend on the main verb puis.

SRCMF corpus node features

The SRCMF corpus has the following node features:

Terminal nodes:

word: the word form
pos: part-of-speech tag (Cattex)
form: whether the text is verse or prose, and position of the word in the line of verse.

Non-terminal nodes:

cat: function of the structure headed by the node
type: morpho-syntactic category of the node (VFin, VPar, VInf, NV)
headpos: part-of-speech tag of the head word
coord: set to ‘y’ if the structure forms part of a coordination
dom: underscore-separated list of all functions dominated by the node (e.g. for the ‘Snt’ node above ‘AuxA_SjPer’)

For simple queries, we will focus mainly on the word, pos and cat features.

Defining the feature specifications of a node

Node feature specifications are written between [square brackets] and take the following form:

[feature operator "value"]

where value is a string or

[feature operator /value/]

where value is a regular expression. Permitted operators are ‘=’ (equals) and ‘!=’ (does not equal). For example, the following expression identifies all nodes where cat is "SjPer" (personal subject):

[cat = "SjPer"]

If we wish to include impersonal subjects (i.e. "SjPer" and "SjImp") we can use a regular expression:

[cat = /Sj.*/]

We can identify all nodes which are not subjects:

[cat != /Sj.*/]

We may also the conjunction (&) operator within the square brackets to specify several properties. For example, we can search for subordinate clause subjects by requiring the subject to be headed by a finite verb (type is "VFin"):

[cat = /Sj.*/ & type = "VFin"]

Assigning a variable name to a node

A variable name may be assigned to the node definition. These are useful to refer to the same node several times in a complex query and are also used to indicate the pivot node to concordance scripts.

Variable definitions adopt the following syntax:

#name:[<definition>]

where definition is a feature specification as described above. Note that variable names must begin with hash (#) and are separated from their definition by a colon (:).

For example, we may to construct a concordance in which the subject forms the pivot. We define the #pivot variable as follows:

#pivot:[cat = /Sj.*/]

Node relations

All but the most simple queries will require more than one node to be defined, and will usually require the relationship between the nodes to be specified.

For example, suppose we wish to identify all subjects headed by the word Tristran. First, we define the subject:

#subject:[cat = /Sj.*/]

Second, we define the word Tristran as a terminal node:

#tristran:[word = "Tristran"]

Finally, we must indicate the relationship between the nodes. The relationship between a non-terminal node and the terminal node representing its lexical content in the TigerSearch graph is one of direct dominance, labelled ‘L’ (lexical).

Direct dominance

In TigerSearch, direct dominance is expressed by using the operator ‘>’ with the following syntax:

node >[label] node2

where node and node2 are feature specifications or node variables, and label (optional) is a string.

To identify subjects headed by the word Tristran, the relationship between nodes #subject and #tristran is expressed as follows:

#subject >L #tristran

Left corner dominance

The ‘>@l’ operator specifies the leftmost terminal node dominated at any depth by a non-terminal node. It has the following syntax:

node >@l tnode

where node and tnode are feature specifications or node variables, and tnode is a terminal node.

For example, instead of searching for all subjects which are headed by the word Tristran, we may wish to identify all subjects beginning with the word Tristran. This relation would be written as follows:

#subject >@l #tristran

Note that there is also a right corner dominance operator ‘>@r’.

Precedence

The precedence operator ‘.*’ permits the user to specify the word order of two terminal nodes with the following syntax:

tnode .* tnode2

where tnode and tnode2 are feature specifications or node variables representing terminal nodes.

For example, suppose we wish to identify all sentences in which the word Tristran heads the subject and precedes the main clause verb.

We need to add two additional conditions to the query in the previous section. First, we need to identify the terminal node containing the main verb of the sentence: i.e. the lexical realization of the non-terminal node ‘Snt’:

#snt:[cat = "Snt"] >L #verb

You may have noticed that #verb has no feature specification. This is perfectly valid in TigerSearch query syntax. In practice, we know that only one node can be linked to #snt by an ‘L’ relation in the corpus. #Verb is thus defined by its relation to #snt rather than by its features.

We then need to specify that the word Tristran precedes the verb:

#tristran .* #verb

Finally, we need to clarify that #subject is the the subject of #snt. Otherwise, we risk finding subjects of a subordinate clause which happen to precede the main clause verb:

#snt >D #subject

Putting it all together, the query is as follows:

#subject:[cat = /Sj.*/] >L #tristran:[word = "Tristran"] & #snt:[cat = "Snt"] >L #verb & #tristran .* #verb & #snt >D #subject

There is also a direct precedence operator, ‘.’, which specifies that the two terminal nodes must be directly adjacent.

Negation

It is important to learn one (extremely frustrating) golden rule of Tiger query syntax:

you can negate a feature specification (e.g. [cat != "SjPer"]);
you can negate a relation between nodes (e.g. #subject !>L #tristran)
but you can’t negate the existence of a node!

In practice, this means that when we write:

#snt:[cat = "Snt"] !>D #subject:[cat = /Sj.*/]

we have not found all null subject main clauses. Instead, we have asked for sentences (#snt) which contain a subject node (#subject) which is not the subject of a sentence. TigerSearch will return all sentences with subjects in a subordinate clause.

The SRCMF corpus provides a partial work-around for this problem by using the dom feature. The dom feature of a non-terminal node lists the cat features of all nodes linked to it by a ‘D’ edge in alphabetical order separated by an underscore. For example, the ‘Snt’ node in the example tree has two dependants: SjPer and AuxA. It therefore has a dom property ‘AuxA_SjPer’.

As a result, we can identify all main clauses without subjects by negating the dom feature:

#snt:[cat = "Snt" & dom != /.*Sj.*/]

This will return all ‘Snt’ nodes whose dom property does not contain the characters ‘Sj’: in other words, a main clause without an expressed subject.

Syntactic variation

TigerSearch syntax is quite flexible, and we may express queries in a number of ways. For example, the query identifying all subjects headed by the word Tristran may be expressed using three statements...

#subject:[cat = /Sj.*/] & #tristran:[word = "Tristran"] & #subject >L #tristran

... or two statements, e.g.:

#subject:[cat = /Sj.*/] & #subject >L #tristran:[word = "Tristran"]

... or one statement:

#subject:[cat = /Sj.*/] >L #tristran:[word = "Tristran"]

... or without variable names:

[cat = /Sj.*/] >L [word = "Tristran"]

Where multiple statements are used, the order of statements is irrelevant. Confusingly for programmers, you may reference variables before assigning a value, e.g.:

#subject >L #tristran & #tristran:[word = "Tristran"] & #subject:[cat = /Sj.*/]

Using concordances

The SRCMF project has developed a number of concordances to present the results of TigerSearch queries in tabular format. Three concordances are currently implemented:

basic concordance
single word pivot concordance
pivot and block concordance

These concordances produce a text CSV file.

Principles

The concordances use the names of variables from the TigerSearch query to identify the syntactic constituents which should form the focus of the table. All concordances require a #pivot variable to be present in the query.

For example, the following query is correct in TigerSearch, but will not produce a concordance:

[word = /Tristr?a[nm][sz]?/]

To produce a concordance, the query must identify a node as the #pivot, for example:

#pivot:[word = /Tristr?a[nm][sz]?/]

Basic concordance

The basic concordance has four columns:

sentence ID
left context
pivot
right context

The #pivot can be any node in the syntactic tree, either a single word or a larger structure. Currently, only lexical information (not annotation) can be shown in the basic concordance.

For example, we may wish to create a concordance of all the main clause subjects containing the word ‘Tristran’:

#snt:[cat = "Snt"] >D #pivot:[cat = "SjPer"] & #pivot >* [word = /Tristr?a[nm][sz]?/]

Note that the #pivot variable is attached to the subject node (cat = "SjPer").

Below is a selection of the results from the concordance:

ID	contexte gauche	pivot	contexte droite
beroul_pb:8_lb:234_1263227636.06	di por averté Ce saciés vos de verité Atant s' en est Iseut tornee	Tristran	l' a plorant salüee Sor le perron de marbre bis Tristran s' apuie ce
beroul_pb:13_lb:415_1264876249.02	# croiz Einz croiz parole fole et vaine Ma bone foi me fera saine Tristran [remest] a qui * mot poise	Tristran tes niés	vint soz cel pin Qui * est laienz en cel jardin Si me manda
beroul_pb:134_lb:4365_1268928771.68	moi le reçoive En sus l' atent s' espee tient Goudoïne autre voie tient	Tristran [remest] a qui * mot poise	Ist du * buison cela part toise Mais por noient quar cil s' esloigne

Note that the pivot may be one or more words.

What do the square brackets ([]), slashes (/), asterisks (*) and hashes (#) mean?

The third example in the above table contains [square brackets] in the pivot. These are used in all concordances to indicate words which occur between parts of a discontinuous syntactic constituent.

The annotated subject in this sentence is Tristran ... a qui mot poise. The main verb of the sentence, remest, is not part of the subject, but occurs between its two parts. The verb remest is included in the pivot column, but surrounded by square brackets.

This means that:

the pivot column contains all parts of discontinuous pivots;
reading the concordance from left to right will always give the original sentence.

Slashes (/) indicate division between sentences in the syntactic annotation. These will not correspond to the editor’s division into sentences as shown in the punctuation.

Asterisks (*) indicate that the preceding word has two syntactic functions (e.g. qui in a qui mot poise is both a relator and a subject). They may usually be ignored.

Hashes (#) are related to the representation of coordination, and may always be ignored.

Single word pivot concordance

The single word pivot concordance has a variable number of columns, based on the following structure:

ID
Left context outside the SRCMF sentence containing the pivot
Left context within the SRCMF sentence containing the pivot
Pivot
Structure headed by the pivot
Function of the structure headed by the pivot
Right context within the SRCMF sentence containing the pivot
Right context outside the SRCMF sentence containing the pivot

The single word pivot concordance is designed to give as much information as possible about a single word. For example, a concordance could be created around the word "Tristran":

#pivot:[word = /Tristr?a[nm][sz]?/]

Below is a selection of the results from the concordance (some columns are omitted):

Left context in sentence	Pivot	Pivot-headed structure	Right context in sentence
Sire	Tristran	Tristran	por Deu le roi Si grant pechié avez de moi Qui * me mandez a itel ore
	Tristran	Tristran tes niés	tes niés vint soz cel pin Qui * est laienz en cel jardin
# Que por Yseut que por	Tristranz	que por Tristranz	Mervellose joie menoient

The ‘pivot-headed structure’ gives the noun phrase of which the word Tristan is head. In the second example, for instance, the word Tristran heads the structure Tristan tes niés.

Note that words appearing in the ‘pivot-headed structure’ column are also found in the two context columns. The original sentence may be read across the columns left context — pivot — right context.

Pivot and block concordance

Introduction

The pivot and block concordance is designed to highlight the position of certain constituents, called ‘blocks’ (e.g. the subject) with respect to a pivot (e.g. the verb). The resulting CSV files are complex, with a large number of columns, and are intended as the basis for more detailed analysis in spreadsheet software.

The pivot and block concordances has the following basic structure:

ID
Left context outside the SRCMF sentence containing the pivot
Left context within the SRCMF sentence containing the pivot
Pre-pivot blocks
Pivot
Post-pivot blocks
Right context within the SRCMF sentence containing the pivot
Right context outside the SRCMF sentence containing the pivot

As with the other concordances, TigerSearch queries must define a #pivot variable. However, any number of variables whose name begins ‘#block’ may be defined. At least one ‘#blockXX’ variable is required.

For example, the following query will generate a pivot and block concordance to show the position of the subject (#block1) with respect to the finite verb (#pivot):

#snt:[cat = "Snt"] >D #block1:[cat = "SjPer"] & #snt >L #pivot

In essence, the central section of the resulting concordance will take the following form:

Left context	Block	Pivot	Block	Right context
	Li rois	pense		que par folie Sire Tristran vos aie amé
Si		voient	il	# Deu et son reigne

Where the subject is pre-verbal, it appears in the block column to the left of the pivot. Where it is post-verbal, it appears in the block column to the right of the pivot.

Why are there square brackets ([]) and curly brackets ({}) in the concordance?

As with other concordances, square brackets denote words occurring between two parts of a discontinuous unit. The difference in this concordance is that blocks may be discontinuous, as well as the pivot.

Curly brackets denote words which occur between the block and the pivot (or, in more complex examples, between two blocks).

Left context	Block	Pivot	Block	Right context
	Vos {n'}	entendez		pas la raison
Dex qel pitié		Faisoit	{a} {mainte} {gent} li chiens
	Ta parole [est] [tost] [entendue] Que li rois la roïne prent	est		tost entendue Que li rois la roïne prent
	Tuit [s'] [escrïent] la gent du * reigne {s'}	escrïent		la gent du * reigne

In the table above, note the use of curly brackets in the first example to mark the negative adverb n’, which occurs between the subject-block vos and the verb-pivot entendez. In the second example, the prepositional phrase a maintes gens is marked with curly brackets, as it separates the verb-pivot Faisoit from the post-verbal subject-block li chiens.

In the third example, a discontinuous subject Ta parole ... que li rois la roïne prent appears in a pre-verbal block. The pre- or post-verbal position of a block is determined by the position of its first word relative to the pivot. The words est tost entendue, which separate the two parts of the block, are marked with square brackets.

In the fourth example, the word s’ appears (i) in square brackets, between the two halves of a discontinuous subject-block and (ii) in curly brackets, between the first part of the discontinuous subject tost and the verb-pivot escrïent.

Why are there so many columns? I only asked for one block!

The pivot and block concordance shows only one result per pivot. Continuing to work with the same example, if a single verb-pivot has multiple subject-blocks (which is quite possible in cases of coordination), each subject occupies a separate column:

Block3	Block2	Block1	Pivot	Block
Ne tor	ne mur	ne fort chastel {Ne} {me}	tendra

However, due to the way the number of columns is calculated, it is possible that some will be empty. These may be deleted in the spreadsheet software, if you wish.

Note that the concordance will never represent the two halves of a single discontinuous block in separate columns. The following representation therefore indicates a coordination:

Left context	Block	Pivot	Block	Right context
	Tristran {en}	bese	{la} {roïne} {Et} ele	lui par la saisine

The SRCMF of the sentence in this table identifies two coordinated subjects of the verb bese. One is pre-verbal (Tristran), one is post-verbal (ele); both occupy separate blocks.

Adding annotation information

When a concordance is launched from the TXM-web interface, you may specify which properties of terminal and non-terminal nodes you wish to see in the concordance.

On the ‘Export Concordance’ form, use the drop-down lists of ‘Non-terminal features’ and ‘Terminal Features’.
Select the features of terminal and non-terminal nodes that you wish to show in the concordance from the two drop-down lists.
Click ‘OK’.

Each added property will be placed in a separate column next to the block or pivot. For example, if the ‘cat’ property is selected for non-terminal nodes, and the ‘pos’ property is selected for terminal nodes, the query above will produce the following concordance:

Left context	Block	Block Cat	Pivot	Pivot Pos	Block	Block Cat	Right context
	Li rois	SjPer	pense	VERcjg			que par folie Sire Tristran vos aie amé
Si			voient	VERcjg	il	SjPer	# Deu et son reigne

Tagset

Non-terminal nodes

Non-terminal nodes have the following properties and values:

cat

Gives the syntactic function of the element. For more details, please refer to the SRCMF website.

Apst: Vocative (fr. apostrophe)
AtObj: Object attribute
AtRfc: Attribute of reflexive pronoun
AtSj: Subject attribute
Aux: Auxiliated non-finite verb (neither passive nor active)
AuxA: Auxiliated non-finite verb (active)
AuxA: Auxiliated non-finite verb (passive)
Circ: Adjunct (fr. circonstant)
Cmpl: Complement
Coo: Coordination
GpCoo: Coordinated group (conjunct)
Insrt: Inserted clause
Intj: Interjection
ModA: Modifier (attached)
ModD: Dislocated (detached) modifier
Ng: Negation
NgPrt: Negative particle (e.g. pas, mie
nSnt: Non-sentence
Obj: Object
RelC: Coordinated relator
RelNC: Non-coordinating relator
Regim: Regime
Rfc: Reflexive pronoun
Rfx: Doubled reflexive pronoun (e.g. nous ... nous-mêmes)
SjImp: Impersonal subject
SjPer: Personal subject
Snt: Sentence

type

Gives the syntactic category of the head of the structure.

VFin: Finite verb form
VInf: Infinitive
VPar: Participle
nV: Non-verbal

dom

A ‘dom’ property is added to each non-terminal node in the tree listing the functions of all its dependants and relators in alphabetical order, separated by underscores. For example, if a finite verb has a subject, object and two adjuncts, the property [dom = "Circ_Circ_Obj_SjPer"] will be added.

This resolves to an extent the problem of ‘negative’ queries. Recall that it is impossible to query the non-existence of a node:

#clause:[type = "VFin"] !>D #suj:[cat = "SjPer"]

Contrary to appearances, this query DOES NOT mean ‘node #suj does not exist’: it means that the node #suj exists, but is not dependant on #clause.

However, it is possible to find all finite verbs without a subject by using the dom property of the finite verb:

#clause:[type = "VFin" & dom != /.*SjPer.*/]

The query specifies that we wish to find a node #clause which is a finite verb and does not have the string ‘SjPer’ in the list of dependant nodes given by the dom property.

coord

A ‘coord’ property is added to each non-terminal node in the tree. If the node represents a coordinated structure, [coord = "y"].

For example, in the sentence Sade et douz est quanqu’est de li (gcoin1: p. 3, l. 31), sade and douz are coordinated AtSj. The non-terminal nodes dominating the words sade and douz have the properties [cat = "AtSj" & coord="y"].

The ‘coord’ property exists primarily to allow non-coordinated structures to be identified. In the original format, this is not possible, as it would require a query specifying the non-existence of a node [cat = "Coo"]. However, with the coord property, it is possible to restrict a query to non-coordinated structures only:

#suj:[cat = "SjPer" & coord != "y"]

headpos

A ‘headpos’ property is added to each non-terminal node in the tree. If the text is correctly annotated at the deep level, each non-terminal node representing a structure should directly dominate at most one terminal node in the tree, the word representing the lexical content of the head of the structure. If this is the case, the ‘headpos’ property is equal to the ‘pos’ property of the dominated terminal node. Thus:

#node:[headpos = "NOMcom"]

is equivalent to:

#node >L #lexnode:[pos = "NOMcom"]

The headpos property does not improve the usability of the corpus in TigerSearch, but is useful in producing concordances, providing a more detailed morpho-syntactic tag for the head of a structure than the SRCMF ‘NV’ (non-verbal) type tag.

If the non-terminal node directly dominates more than one terminal node, the algorithm generating the headpos property makes an calculated guess as to which word is the head, and inserts the tag of this word as the ‘headpos’. For example, if a non-terminal node dominates a word with pos ‘NOMcom’ and a word with pos ‘DETdef’, the algorithm will guess that the noun is the head, and insert the headpos ‘NOMcom?’.

Note that headpos values which have been ‘guessed’ are always suffixed by a question mark (e.g. NOMcom?). There will be no guessed headpos values in texts with full NP annotation.

Terminal nodes

Terminal nodes have the following properties:

pos

Part-of-speech tag (Cattex). For more information, please refer to the Cattex documentation on the BFM website.

form

Each word has a property “form”. For texts in prose, the value of the “form” tags is always “prose”. For texts in verse, the form tag is:

“vers_first” for the first word in a line;
“vers_end” for the last word in a line;
“vers” for other words.

It is thus possible to formulate a TS query focusing on words at the beginning or end of a line of verse:

[word = "Tristran" & form = "vers_end"]

In Aucassin and Nicolete, the form tag correctly distinguishes the verse and prose sections of the text.

q

Each word has a property “q”. This is equal to ‘y’ when the word occurs as part of direct discourse, and ‘n’ when it does not. This annotation is automatically generated by the BFM team from the position of quote marks in the text.

Sample queries

The following sample queries may be tested by copying and pasting into the query panel.

Find all main clause verbs:
[cat = "Snt"]

Find all structures introduced by a preposition:
#n >R #relnc:[cat = "RelNC"] & #relnc >L [pos = /PRE.*/]

Find all post-verbal NP subjects:
#verb:[type = "VFin"] >D #suj:[cat = "SjPer" & type="nV"] & #suj >L [pos = /NOM.*/] & #suj >@l #sword & #verb >L #vword & #vword .* #sword

Find indefinite subjects introduced by qui:
[type = "VFin"] >D #suj:[cat = "SjPer"] & #suj >R #relnc:[cat = "RelNC"] & ( #relnc >L [word = /[QqKk]u?i/] | #relnc >~dupl [word = /[QqKk]u?i/] )

Find sentences with coordinated subjects:
#coo:[cat = "Coo"] >~coord #sj1:[cat = "SjPer"] & #coo >~coord #sj2:[cat = "SjPer"] & #sj1 $ #sj2

Find sentences with possible gapping of the finite verb (i.e. coordination of subject–predicate pairs):
#gpcoo1:[cat = "GpCoo"] >~ #suj1:[cat = "SjPer"] & #gpcoo1 $.* #gpcoo2:[cat = "GpCoo"] & #gpcoo2 >~ #suj2:[cat = "SjPer"] & #gpcoo1 >~ #pred1:[cat = /Cmpl|Obj|AtSj/] & #gpcoo2 >~ #pred2:[cat = /Cmpl|Obj|AtSj/]