Feature #2027
Mis à jour par Serge Heiden il y a plus de 8 ans
Add corpus parameters to manage how the concordance lines displays should display the CQP tokens and structural units
h3. Solution 1: tokens and structures 1
* use the corpus-language parameter/property to apply the right txt-renderer of that language when building the content of the contexts of the lines
** a txt-renderer is a component producing raw text (txt) by using language specific typographic rules to display the graphical forms of tokens)
** the Pager already uses a txt-renderer component to build edition pages content
* add a new concordance-display-opening-structural-unit-separator corpus token separator parameter
** possible values: '<', '(', '|'
* add a new concordance-display-opening-structural-units openning structural unit separator parameter corpus parameter
** possible values: div,sp,p
* add a new concordance-display-closing-structural-unit-separator corpus closing structural unit separator parameter
** possible values: '>', ')', '|'
* add a new concordance-display-closing-structural-units parameter corpus parameter
** possible values: div,sp,p
h3. Solution 2: tokens versus structures 2
For some studies on infra-word units (like syllables), a double corpus strategy is often used: ...
* corpus A builds CQP tokens as text words and the infra-word units are encoded through concatenation inside the word form (eg 'gi-ra-fe')
* corpus B builds CQP tokens as infra-word units and text words are encoded as <word> structures (eg <word form="girafe" seg-form="gi-ra-fe">gi ra fe</word>
To display correctly word forms in text editions and concordance lines:
* add a new word-as-structure corpus parameter
** possible values: 'word' or <empty>
** if the parameter value is <empty> use corpus tokens to render text words
** if the parameter value is the name of a structure use that structure to render text words accordingly
When building text edition and concordance line contexts, use that parameter value to render correctly word forms to ease reading.
Remark: there already exist some XSLT stylesheets to build correct text edition for those two different cases.
h3. Solution 1: tokens and structures 1
* use the corpus-language parameter/property to apply the right txt-renderer of that language when building the content of the contexts of the lines
** a txt-renderer is a component producing raw text (txt) by using language specific typographic rules to display the graphical forms of tokens)
** the Pager already uses a txt-renderer component to build edition pages content
* add a new concordance-display-opening-structural-unit-separator corpus token separator parameter
** possible values: '<', '(', '|'
* add a new concordance-display-opening-structural-units openning structural unit separator parameter corpus parameter
** possible values: div,sp,p
* add a new concordance-display-closing-structural-unit-separator corpus closing structural unit separator parameter
** possible values: '>', ')', '|'
* add a new concordance-display-closing-structural-units parameter corpus parameter
** possible values: div,sp,p
h3. Solution 2: tokens versus structures 2
For some studies on infra-word units (like syllables), a double corpus strategy is often used: ...
* corpus A builds CQP tokens as text words and the infra-word units are encoded through concatenation inside the word form (eg 'gi-ra-fe')
* corpus B builds CQP tokens as infra-word units and text words are encoded as <word> structures (eg <word form="girafe" seg-form="gi-ra-fe">gi ra fe</word>
To display correctly word forms in text editions and concordance lines:
* add a new word-as-structure corpus parameter
** possible values: 'word' or <empty>
** if the parameter value is <empty> use corpus tokens to render text words
** if the parameter value is the name of a structure use that structure to render text words accordingly
When building text edition and concordance line contexts, use that parameter value to render correctly word forms to ease reading.
Remark: there already exist some XSLT stylesheets to build correct text edition for those two different cases.