Feature #1093

GWT: x.x, add concordance context cut

Added by Matthieu Decorde almost 5 years ago. Updated almost 5 years ago.

Status:Closed Start date:10/22/2014
Priority:Normal Due date:
Assignee:- % Done:

100%

Category:- Spent time: -
Target version:Portal 0.6.1

Description

See #943

Currently "text" is the only structure used to cut concordance contexts.
  • We must be able to cut several structures (for example "text" or "plu" & "deplu" or "p")
  • The import parameters defines which structures to cut in the concordance
    • Step 1: add "context_limit" parameter to "import.xml" file.
      • default text value is "text"
      • default @type value is "list"
      • can contain multiple values (ex: "text,plu,deplu")
      • if @type="list", the context limit query will be <struct1>[]|<struct2>[]|...|<strucN>[]
      • if @type="query", text value must be a CQL query
    • document how to patch "import.xml" for this parameter
    • test with Apocalypse corpus
    • Step 2: replace patch by complete import parameter interface
      • build an import UI for this parameter
      • save the field in "import.xml"

Add in admin documentation the new import parameter element : uis/ui@command=concordance/context_limit, @type attribute and text value

Validation test - Portal

  • Call concordance on GRAAL corpus with query "<p> []"
  • add the new import configuration in import.xml file (see Admin manual)of the GRAAL corpus
    ...
    <uis>
      <ui command="concordance">
        <context_limits type="list">text,p</context_limits>
      </ui>
    </uis>
    ...
    
  • reload GRAAL corpus (no need to re-import and produce a new binary corpus with RCP ???)
  • reload user session
  • Call concordance on GRAAL corpus with query "<p> []", ensure left contexts are empty

Related issues

related to Feature #943: RCP: X.X, use structure limits in concordance contexts New 07/16/2014
related to Feature #1124: RCP: x.x, add concordance context cut Feedback 10/22/2014

History

#1 Updated by Matthieu Decorde almost 5 years ago

  • % Done changed from 0 to 30

#2 Updated by Serge Heiden almost 5 years ago

  • Description updated (diff)

#3 Updated by Matthieu Decorde almost 5 years ago

  • Description updated (diff)
  • % Done changed from 30 to 70

#4 Updated by Matthieu Decorde almost 5 years ago

  • % Done changed from 70 to 80

#5 Updated by Matthieu Decorde almost 5 years ago

  • Description updated (diff)

#6 Updated by Matthieu Decorde almost 5 years ago

  • Description updated (diff)

#7 Updated by Matthieu Decorde almost 5 years ago

  • Status changed from New to Feedback

#8 Updated by Alexey Lavrentev almost 5 years ago

  • % Done changed from 80 to 70

Test failed, because of underspecified protocol.

  • no clear distinction on what concerns RCP and the portal
    • where to call discourse corpus ?
    • "exit TXM" = stop the portal?
    • is it enough to modify import.xml and re-load the corpus on portal?
  • the new RCP import interface does not seem to be available
  • what sources and import module to use for Discours ? CNR+CSV?
  • the Discours corpus is not on the TEST portal

I tried to modify the import.xml on GRAAL and re-load to portal --> no effect

I got the Discours source CNR files from sourceforge and tried to import with CNR+CSV modules. There are many error messages (see below), and the character encoding is not well recognized by default. Adding the concordance limit parameter to import.xml and re-importing the corpus has no effect.

Execution du script : /home/alexey/TXM/scripts/import/discoursLoader.groovy
-- IMPORTER - Reading source files
Trying to read metadata from: /media/alexey/data/xml/discours/metadata.csv
Converting CNR to XML:Errors in file 01_DeGaulle.cnr. Wrong number of columns at lines: [1, 26, 62, 116, 153, 222, 237, 286, 421, 430, 451, 504, 544, 576, 614, 651, 695, 704, 752, 769, 777]
Errors in file 02_DeGaulle.cnr. Wrong number of columns at lines: [1, 111, 212, 386, 492, 603, 804, 974, 1055, 1103, 1213, 1324, 1444, 1500, 1607, 1620, 1629]
Errors in file 03_DeGaulle.cnr. Wrong number of columns at lines: [1, 68, 205, 352, 486, 639, 875, 996, 1090, 1099]
Errors in file 04_DeGaulle.cnr. Wrong number of columns at lines: [1, 26, 212, 624, 794, 862, 932, 1021, 1214, 1342, 1392, 1612, 1702, 1769, 1957, 2022, 2194, 2359, 2392, 2437, 2601, 2711, 2984, 3326, 3434, 3455, 3460, 3565, 3644, 3927, 4024, 4171, 4236, 4329, 4395, 4414, 4451, 4514, 4586, 4699, 4879, 4923, 4985, 5104, 5402, 5547, 5674, 5734, 5835, 6053, 6060, 6242, 6533, 6773, 6954, 7005]
Errors in file 05_DeGaulle.cnr. Wrong number of columns at lines: [1, 190, 256, 369, 436, 517, 619, 680, 844, 943, 1119, 1308, 1441, 1589, 1698, 1884, 1978, 2131, 2226, 2293, 2419, 2469]
Errors in file 06_DeGaulle.cnr. Wrong number of columns at lines: [1, 507, 856, 1064, 1192, 1414, 1530, 1671, 1929, 1983, 2165, 2201, 2248, 2328, 2448, 2467, 2506, 2630, 2668, 2740, 2818, 2937, 3066, 3124, 3300, 3363, 3443, 3500, 3623, 3653, 3730, 3762, 3845, 3912, 3993, 4061, 4177, 4210, 4256, 4340, 4557, 4588, 4636, 4667, 4761, 4863, 4917, 4939, 4961, 5029, 5064, 5251, 5317, 5358, 5392, 5452, 5489, 5531, 5612, 5675, 5761, 5834, 5930, 5999, 6036, 6087, 6133, 6259, 6286, 6338, 6428, 6519, 6582, 6634, 6717, 6832, 7233, 7399, 7493, 7554, 7618, 7625, 7642, 7650, 7679]
Errors in file 07_DeGaulle.cnr. Wrong number of columns at lines: [1, 47, 142, 185, 226, 290, 369, 485, 605, 669, 707, 716, 731, 831, 833, 957, 1084, 1265, 1449, 1531, 1610, 1668, 1780, 1870, 2013, 2098, 2238, 2384, 2405, 2509, 2623, 2627, 2655, 2688, 2693, 2730, 2746, 2808, 2917, 3040, 3086, 3250, 3359, 3403, 3465, 3479, 3562, 3806, 3841, 3889, 4063, 4118, 4174, 4292, 4358, 4398, 4443, 4644, 4877, 4996, 5126, 5264, 5311, 5390, 5425, 5544, 5617, 5728, 5773, 5930, 6076, 6271, 6362, 6468, 6509, 6642, 6646, 6668, 6749, 6891, 6956, 7102, 7188, 7309, 7420, 7461]
Errors in file 08_DeGaulle.cnr. Wrong number of columns at lines: [1, 43, 156, 211, 391, 461, 540, 615, 726, 889, 1062, 1093, 1153, 1241, 1320, 1527, 1617, 1769, 1809, 1883, 1888, 1893]
Errors in file 09_DeGaulle.cnr. Wrong number of columns at lines: [1, 117, 315, 431, 602, 825, 1043, 1045, 1098, 1272, 1464, 1587, 1747, 1854, 2055, 2191, 2302, 2307, 2312]
Errors in file 10_DeGaulle.cnr. Wrong number of columns at lines: [1, 6, 60, 211, 292, 368, 434, 491, 556, 782, 981, 1205, 1326, 1464, 1737, 1765, 1774, 1792, 1823, 1851, 1888, 1935, 1939, 2069, 2071, 2073, 2075, 2078, 2265, 2492, 2572, 2593, 2612, 2751, 2940, 2942, 3114, 3204, 3304, 3374, 3506, 3601, 3652, 3686, 3688, 3822, 3896, 3967, 4104, 4273, 4338, 4586, 4614, 5013, 5046, 5116, 5341, 5350, 5351, 5497, 5740, 5761, 5883, 6092, 6244, 6447, 6582, 6747]
Errors in file 11_DeGaulle.cnr. Wrong number of columns at lines: [1, 77, 158, 160, 235, 397, 475, 482, 486, 493, 494, 574, 796, 1076, 1230, 1318, 1565, 1769, 1943, 2013, 2018, 2023, 2029, 2059, 2065, 2074, 2086, 2149, 2198, 2203, 2208]
Errors in file 12_DeGaulle.cnr. Wrong number of columns at lines: [1, 6, 131, 179, 375, 600, 1037, 1240, 1284, 1289, 1366, 1614, 1746, 1872, 2002, 2072, 2179, 2345, 2417, 2480, 2532, 2562, 2645, 2704, 2824, 2829, 2895, 2981, 3108, 3210, 3348, 3426, 3601, 3643, 3712, 3778, 3929, 3931, 4023, 4361, 4638, 4908, 4938, 5143, 5348, 5351, 5525, 5547, 5673, 5716, 5747, 5755, 5807, 5942, 6010, 6046, 6256, 6274, 6360, 6563, 6595, 6638, 6818, 6866, 6911, 6915, 7078, 7240, 7398, 7459, 7519, 7537]
Errors in file 13_DeGaulle.cnr. Wrong number of columns at lines: [1, 13, 123, 176, 235, 398, 417, 457, 721, 852, 1103, 1167, 1175, 1179, 1344, 1418, 1420, 1562, 1755, 2098, 2345, 2739, 2742, 2744, 2949, 3064, 3186, 3335, 3598, 3841, 4122, 4279, 4539, 4769, 4952, 4987, 5013, 5115, 5316, 5318, 5332, 5582, 5788, 5858, 5965, 6269, 6459, 6589, 6633, 6734, 6831, 6839, 6865, 7012, 7251, 7451, 7733, 7976, 8262, 8467, 8702, 8731, 8776]
Errors in file 14_DeGaulle.cnr. Wrong number of columns at lines: [1, 123, 185, 271, 310, 339, 443, 565, 648, 779, 819, 880, 956, 958, 964, 966, 971, 973, 1092, 1126, 1294, 1299, 1304]
Errors in file 15_DeGaulle.cnr. Wrong number of columns at lines: [1, 89, 113, 141, 163, 204, 216, 248, 279, 544, 606, 608, 685, 710, 717, 968, 1123, 1325, 1522, 1700, 1708, 1710, 1898, 2123, 2221, 2284, 2296, 2466, 2694, 2760, 3048, 3275, 3389, 3499, 3728, 3962, 3990, 4017, 4022, 4135, 4265, 4378, 4523, 4740, 4776, 4778, 5031, 5183, 5295, 5524, 5722, 5783, 5785, 5932, 5935, 5937, 6005, 6238, 6531, 6841, 7110, 7155, 7208, 7222, 7269, 7324, 7418, 7426]
Errors in file 16_DeGaulle.cnr. Wrong number of columns at lines: [1, 27, 188, 280, 532, 646, 759, 964, 1013, 1018, 1023]
Errors in file 17_DeGaulle.cnr. Wrong number of columns at lines: [1, 6, 40, 48, 69, 74, 78, 82, 87, 134, 157, 316, 318, 387, 478, 739, 741, 743, 745, 812, 1074, 1085, 1263, 1516, 1629, 1752, 2015, 2047, 2185, 2423, 2604, 2764, 2787, 2796, 2815, 2824, 3094, 3344, 3619, 3696, 3839, 4086, 4124, 4126, 4157, 4159, 4162, 4164, 4167, 4171, 4242, 4247, 4249, 4254, 4354, 4419, 4427, 4549, 4675, 4681, 4868, 5121, 5365, 5438, 5465, 5626, 5660, 5732, 5736, 5780, 6005, 6099, 6208, 6340, 6552, 6605, 6644, 6684, 6766, 6892, 6901]
Errors in file 18_DeGaulle.cnr. Wrong number of columns at lines: [1, 6, 73, 294, 558, 850, 1026, 1198, 1272, 1297, 1302, 1307]
Errors in file 19_DeGaulle.cnr. Wrong number of columns at lines: [1, 6, 31, 175, 378, 677, 780, 1031, 1183, 1214, 1219, 1224]
Errors in file 20_DeGaulle.cnr. Wrong number of columns at lines: [1, 11, 12, 47, 86, 205, 234, 395, 448, 565, 681, 778, 865, 1060, 1235, 1482, 1761, 1828, 2041, 2223, 2363, 2444, 2583, 2684, 2756, 2831, 2936, 3014, 3170, 3172, 3445, 3461, 3639, 3930, 4061, 4065, 4142, 4250, 4252, 4387, 4596, 4700, 4757, 4761, 4808, 4813, 4844, 4897, 4902, 4906, 4919, 4922, 4937, 5076, 5082, 5094, 5193, 5234, 5236, 5399, 5556, 5588, 5716, 5723, 5734, 5788, 5810, 5952, 6017, 6149, 6189, 6379, 6636, 6811, 6905, 7329, 7407, 7694, 7732, 7979, 7990, 8003, 8025, 8030, 8033, 8075, 8105, 8114, 8236, 8237, 8244, 8315, 8398, 8472, 8577, 8646, 8717, 8815, 8851, 8941, 8985, 8994]
Errors in file 21_DeGaulle.cnr. Wrong number of columns at lines: [1, 66, 224, 356, 524, 559, 637, 653, 675, 677, 697, 699, 762, 767, 772]
Errors in file 22_DeGaulle.cnr. Wrong number of columns at lines: [1, 59, 354, 514, 700, 806, 903, 1081, 1232, 1492, 1556, 1558, 1603, 1674, 1843, 1846, 1930, 2113, 2138, 2140, 2280, 2418, 2533, 2783, 2807, 2811, 2979, 3081, 3144, 3299, 3512, 3617, 3702, 3830, 3939, 4027, 4071, 4083, 4218, 4301, 4302, 4457, 4666, 4826, 4880, 4890, 4903, 4907, 5013, 5092, 5185, 5360, 5558, 5886, 5990, 5992, 6054, 6225, 6273, 6433, 6436, 6490, 6786, 6795]
Errors in file 23_DeGaulle.cnr. Wrong number of columns at lines: [1, 111, 339, 463, 489, 696, 850, 1039, 1270, 1388, 1519, 1617, 1622, 1627]
Errors in file 24_DeGaulle.cnr. Wrong number of columns at lines: [1, 199, 503, 597, 609, 716, 950, 1067, 1075, 1232, 1450, 1574, 1730, 1735, 1739, 2022, 2156, 2192, 2492, 2567, 2573, 2622, 2626, 2743, 2986, 3032, 3221, 3270, 3278, 3471, 3482, 3497, 3664, 3762, 3915, 3957, 4186, 4318, 4616, 4647, 4650, 4656]
Errors in file 25_Pompidou.cnr. Wrong number of columns at lines: [35, 72, 88, 188, 256, 442, 463, 552, 588, 659, 716, 772, 831, 927, 1036, 1139, 1186, 1390, 1467, 1649, 1682, 1800, 1912, 2058]
Errors in file 26_Pompidou.cnr. Wrong number of columns at lines: [83, 207, 280, 385, 488]
Errors in file 27_Pompidou.cnr. Wrong number of columns at lines: [105, 167, 274, 398, 403]
Errors in file 28_Pompidou.cnr. Wrong number of columns at lines: [107, 245, 395]
Errors in file 29_Pompidou.cnr. Wrong number of columns at lines: [1, 24, 81, 184, 234, 254, 329, 349, 390, 496, 600, 685, 765, 912, 1032, 1076, 1204, 1254, 1347, 1361, 1484, 1570, 1649, 1655, 1664, 1710, 1713, 1726, 1818, 1917, 2020, 2139, 2158, 2169, 2227, 2349, 2414, 2475, 2525, 2526, 2596, 2600, 2730, 2805, 2934, 3020, 3087, 3243, 3320, 3329]
Can't find CNR file : 30_Pompidou.cnr
Can't find CNR file : 31_Pompidou.cnr
Can't find CNR file : 32_Pompidou.cnr
Can't find CNR file : 33_Pompidou.cnr
Can't find CNR file : 34_Pompidou.cnr
Can't find CNR file : 35_Pompidou.cnr
Can't find CNR file : 36_Pompidou.cnr
Can't find CNR file : 37_Giscard.cnr
Can't find CNR file : 38_Giscard.cnr
Can't find CNR file : 39_Giscard.cnr
Can't find CNR file : 40_Giscard.cnr
Can't find CNR file : 41_Giscard.cnr
Can't find CNR file : 42_Giscard.cnr
Can't find CNR file : 43_Giscard.cnr
Can't find CNR file : 44_Giscard.cnr
Can't find CNR file : 45_Giscard.cnr
Can't find CNR file : 46_Giscard.cnr
Can't find CNR file : 47_Giscard.cnr
Can't find CNR file : 48_Giscard.cnr
Can't find CNR file : 49_Giscard.cnr
Can't find CNR file : 50_Giscard.cnr
Can't find CNR file : 51_Giscard.cnr

-- ANNOTATE - Running NLP tools
No annotation to do
-- COMPILING - Building Search Engine indexes
01_DeGaulle, 02_DeGaulle, 03_DeGaulle, 04_DeGaulle, 05_DeGaulle, 
06_DeGaulle, 07_DeGaulle, 08_DeGaulle, 09_DeGaulle, 10_DeGaulle, 
11_DeGaulle, 12_DeGaulle, 13_DeGaulle, 14_DeGaulle, 15_DeGaulle, 
16_DeGaulle, 17_DeGaulle, 18_DeGaulle, 19_DeGaulle, 20_DeGaulle, 
21_DeGaulle, 22_DeGaulle, 23_DeGaulle, 24_DeGaulle, 25_Pompidou, 
26_Pompidou, 27_Pompidou, 28_Pompidou, 29_PompidoupAttrs : [id, sid, pid, pos, func, lemma, sent, para, ref]
sAttrs : [text:+loc+type+date+id+base+project, s:+id, p:+id, txmcorpus:+lang]
-- EDITION - Building edition
Paginating texts: 
.............................
Importation terminée : 18 sec (18968 ms)

#9 Updated by Matthieu Decorde almost 5 years ago

  • Description updated (diff)

#10 Updated by Alexey Lavrentev almost 5 years ago

  • Description updated (diff)

Test failed on the TEST portal with Brown corpus (re-imported the corpus with the new parameter, then loaded to the portal, the left context is still displayed).
Maybe I need to update my TXM RCP?
Tried to set update level to DEV but got:

Cannot complete the install because of a conflicting dependency. Software being installed: TXM 0.7.6.201410231004 (org.txm.rcpapplication.product 0.7.6.201410231004) Software currently installed: XML Editor 1.0.0.201407021602 (XMLEditor.feature.feature.group 1.0.0.201407021602) Only one of the following can be installed at once: Core Runtime 3.9.0.v20130326-1255 (org.eclipse.core.runtime 3.9.0.v20130326-1255) Core Runtime 3.10.0.v20140318-2214 (org.eclipse.core.runtime 3.10.0.v20140318-2214) Cannot satisfy dependency: From: XML Editor 1.0.0.201407021602 (XMLEditor.feature.feature.group 1.0.0.201407021602) To: org.eclipse.core.runtime [3.9.0.v20130326-1255] Cannot satisfy dependency: From: Eclipse e4 Rich Client Platform 1.3.100.v20140909-1633 (org.eclipse.e4.rcp.feature.group 1.3.100.v20140909-1633) To: org.eclipse.core.runtime [3.10.0.v20140318-2214] Cannot satisfy dependency: From: TXM 0.7.6.201410231004 (org.txm.rcpapplication.product 0.7.6.201410231004) To: org.eclipse.e4.rcp.feature.group [1.3.100.v20140909-1633]

#11 Updated by Matthieu Decorde almost 5 years ago

  • Description updated (diff)

#12 Updated by Matthieu Decorde almost 5 years ago

Need a new version of TXM and Portal for further tests

#13 Updated by Matthieu Decorde almost 5 years ago

  • % Done changed from 70 to 80

#14 Updated by Alexey Lavrentev almost 5 years ago

  • Description updated (diff)

#15 Updated by Alexey Lavrentev almost 5 years ago

  • % Done changed from 80 to 90

OK (only tested on the Test portal with Graal corpus)

#16 Updated by Alexey Lavrentev almost 5 years ago

  • Description updated (diff)

#17 Updated by Alexey Lavrentev almost 5 years ago

Not tested on the BFM portal, as this parameter should not be activated there

#18 Updated by Alexey Lavrentev almost 5 years ago

  • Status changed from Feedback to Closed
  • % Done changed from 90 to 100

Also available in: Atom PDF