Bug #2280
Mis à jour par Benedicte Pincemin il y a plus de 6 ans
/** The TT enclitics. */
public static String FClitic_en = "'(s|re|ve|d|m|em|ll)|n['‘’]t";
public static String PClitic_fr = '[dcjlmnstDCJLNMST][\'‘’]|[Qq]u[\'‘’]|[Jj]usqu[\'‘’]|[Ll]orsqu[\'‘’]|[Pp]uisqu[\'‘’]|[Qq]uoiqu[\'‘’]';
public static String FClitic_fr = '-t-elles?|-t-ils?|-t-on|-ce|-elles?|-ils?|-je|-la|-les?|-leur|-lui|-mêmes?|-m[\'‘’]|-moi|-nous|-on|-toi|-tu|-t[\'‘’]|-vous|-en|-y|-ci|-là';
public static String PClitic_it = '[dD][ae]ll[\'‘’]|[nN]ell[\'‘’]|[Aa]ll[\'‘’]|[lLDd][\'‘’]|[Ss]ull[\'‘’]|[Qq]uest[\'‘’]|[Uu]n[\'‘’]|[Ss]enz[\'‘’]|[Tt]utt[\'‘’]';
public static String FClitic_gl = '-la|-las|-lo|-los|-nos';
BP 2019-04-08 - Contribution to diagnostic
For PClitic_fr, one should also manage the case of "y'" and "Y'" (especially for speech transcriptions). Cf. INDEX of .'.+ in LEMAN corpus (Fmin=2) :
y'a 127
y'en 30
Y'a 21
Y'en 8
y'avait 4
y'aura 3
y'ait 2
See also Montpellier team's experiments on Rivesaltes corpus (Matrice project, April 5th 2019 Copil)
public static String FClitic_en = "'(s|re|ve|d|m|em|ll)|n['‘’]t";
public static String PClitic_fr = '[dcjlmnstDCJLNMST][\'‘’]|[Qq]u[\'‘’]|[Jj]usqu[\'‘’]|[Ll]orsqu[\'‘’]|[Pp]uisqu[\'‘’]|[Qq]uoiqu[\'‘’]';
public static String FClitic_fr = '-t-elles?|-t-ils?|-t-on|-ce|-elles?|-ils?|-je|-la|-les?|-leur|-lui|-mêmes?|-m[\'‘’]|-moi|-nous|-on|-toi|-tu|-t[\'‘’]|-vous|-en|-y|-ci|-là';
public static String PClitic_it = '[dD][ae]ll[\'‘’]|[nN]ell[\'‘’]|[Aa]ll[\'‘’]|[lLDd][\'‘’]|[Ss]ull[\'‘’]|[Qq]uest[\'‘’]|[Uu]n[\'‘’]|[Ss]enz[\'‘’]|[Tt]utt[\'‘’]';
public static String FClitic_gl = '-la|-las|-lo|-los|-nos';
BP 2019-04-08 - Contribution to diagnostic
For PClitic_fr, one should also manage the case of "y'" and "Y'" (especially for speech transcriptions). Cf. INDEX of .'.+ in LEMAN corpus (Fmin=2) :
y'a 127
y'en 30
Y'a 21
Y'en 8
y'avait 4
y'aura 3
y'ait 2
See also Montpellier team's experiments on Rivesaltes corpus (Matrice project, April 5th 2019 Copil)