Support #1058

AP: Correction de propriétés de mots: how to verify?

Ajouté par Serge Heiden il y a environ 4 ans.

Statut:New Début:16/10/2014
Priorité:Normal Echéance:
Assigné à:- % réalisé:

0%

Catégorie:Macros Temps passé: -
Version cible:Support

Description

I know where the problem was... there has been a file in the corpus that had problems with special characters and that I substituted in the meanwhile, that is, after exporting the tsv table, as such TXM could recognize those words in the reimport. 

Nevertheless, it would be good indeed to be able to read in the log the id of the words that have not been re-injected. If Matthieu could do something about it could be great.

**

I understood the principles of the macro, but I still have the following related doubts:

1. I have difficulties understanding exactly what the figures coming out of the verification queries that you sent me mean, in relation to the figures I obtain in the tsv file.

1.1 Maybe the easiest way for me to verify if I got it right is to tell you the logs of the queries and how I interpret them.

Tsv file:

Total number of lines to be exported: 15757

total number of lines where ptpos differs from ptpos2: 2863

Total number of lines without alterations: 12894

"Index of <[ptpos2!="__UNDEF__" & ptpos2!=ptpos]> with property: [word] in the corpus: CORPUSCORRIGIDOADJECTIVOS

Done: 638 items for 2860 occurrences" 

Does this mean that 638 words (as unique forms) have seen word properties changed, and that, counting with repeated words, a total of 2860 lines with changes have been reinjected? Meaning 3 lines missing when compared to my tsv results..

"Index of <[ptpos2!="__UNDEF__" & ptpos2=ptpos]> with property: [word] in the corpus: CORPUSCORRIGIDOADJECTIVOS

Done: 2104 items for 12882 occurrences" 

Does this mean that 2104 words (as unique forms) that have not seen word properties changed, and that, counting with repeated words, a total of 12882 lines without changes have been re-injected? Meaning 12 lines missing when compared to my tsv results.

"Index of <[ptpos2!="__UNDEF__"]> with property: [word] in the corpus: CORPUSCORRIGIDOADJECTIVOS

Done: 2668 items for 15742 occurrences" 

What is the meaning of 2668 (shouldn't it be the sum of the previous 2 results for items (638+2104)?

2. What is the logic of the numbering of the lines in the TXM Tei files? Do they start at each new file or do they carry from file to file?
Which operations change line number? For example, do you confirm that introducing sections changes line number? 

3. Introducing sections after running the Macro "InjectWordProp table".

After the correction of word properties I still need to alter my files to introduce sections.

What is the best way to do it? Can I convert the exported corrected corpus back into a transcriber file? Can I introduce sections in the exported corpus after corrections, that is, in the TXM-TEI format?

4. I have also noticed that once I import the corrected corpus, I loose any indication of the suppressed interviewer turns, I just get a continuous text, without any marks of time or turns. Is there a way around this?

Formats disponibles : Atom PDF