Feature #1571

Updated by Serge Heiden almost 4 years ago

It would be useful to break down specificity calculus by a secondary one additional word property.

For example one could break down the specificity of some lemma by their pos.

We could display only a synthetic view of the most specific lemma above a certain threshold (like banality):
<pre>
frlemma/frpos NOM.* ADJ.* ...
chirac solidarité, emploi, nouveau, fidèle, public, ...
avenir, projet, républicain, sûr ...
mondialisation ...
dg coopération, scientifique, économique, ...
développement, but, français, algérien, fécond ...
peuple, rapport ...
giscard vœu, bonheur, simple, français,intelligent, ...
unité, liberté, difficile, actif ...
ami ...
</pre>

h3. Solution

* Build a specificity table based on a partition index for each secondary property value or set of values and sort it by score:
** Specificity(VOEUX/text@loc, frpos="NOM.*"/frlemma)
** Specificity(VOEUX/text@loc, frpos="ADJ.*"/frlemma)
** ...
* Merge the specificity tables putting the most specific units in the table cells and the secondary property value or set of values as columns (and original partition values as rows) or rows (and original partition values as columns)
* add a new 'Filter specificity scores' parameter, with 'yes' as default value
* when FilterSpecificityScores is true, don't display units under the banality threshold in the results table
* a further 'Maximum number of units to display in results' parameter can be created to filter also by number of units. Parameter which should be discussed together with the Vmax parameter of Lexicon and Index.


See also:
* Spécificités d'auteurs dans Le Surréalisme au service de la Révolution, Marie-Renée Guyard, Mots Year 1981 Volume 2 Issue 1 pp. 95-122, http://www.persee.fr/doc/mots_0243-6450_1981_num_2_1_1023

Back