Bug #1201

Mis à jour par Serge Heiden il y a plus de 10 ans

h3. Context and origin of the bug description (mail, TXM version, OS, etc.)

* TXM 0.7.6.201409191440
* Ubuntu 12.04 64-bit
* TextTranscription2TRSMacro.groovy
** last modification: lun. 08 déc. 2014 18:09:42 CET
** size: 2 403 octets

h3. Diagnostic

Currently, the TextTranscription2TRS macro builds a TRS file that the Transcriber software is not able to open and edit.

This prevents a user to edit the resulting TRS file: to add time bullets for example, and breaks TXM manual that tels that this macro produces a TRS format conformant file.

h3. Hypothesis

This may come from a special add-on in the macro to manage some additional sections properties that are not available in a standard TRS file.

Here is the description of the transcription conventions to encode those section properties: (FR:)

<pre>
Sections

La transcription peut être divisée en sections, caractérisées par des
propriétés. Une section commence par une ligne au format suivant :
[propriété1="une valeur" propriété2="une autre valeur"]
«propriété1» et «propriété2» sont des noms de propriétés de la section qui
prennent pour valeur « une valeur » et « une autre valeur » jusqu'à la
prochaine section.
Les lignes de début de section doivent respecter les règles suivantes :
– Le nom d'une propriété ne doit pas contenir d'accent ni d'espace ou
de ponctuation. Astuce : on pourra remplacer les espaces par des
soulignés (_)
– La valeur de la propriété doit être entre guillemets anglais "..."
– Les propriétés sont séparées par un espace
– Une nouvelle section ferme la section qui la précède.
</pre>

The result is that the following transcription source:
<pre>
[theme="Thème 1: Préparation du travail: recherche et mise à disposition des documents manquant" organisation="class"]
</pre>

Is transformed by the TextTranscription2TRS to:
<pre>
<Section type="report" topic="" startTime="112.003" endTime="294.0" theme="Thème 1: Préparation du travail: recherche et mise à disposition des documents manquant" organisation="class">
</pre>

h3. Transcriber import tests

# the 'P1S8 30 avril 2014.odt' transcription is transformed to 'P1S8 30 avril 2014.trs' by the TextTranscription2TRS macro (version of 8th April 2014)
# the 'P1S8 30 avril 2014.trs' file is opended by Transcriber 1.5.2 on Linux

h4. report of all the errors encountered

h5. "Sync in Section" error

Transcriber error window:

!trs-import-section-sync-error.png!

P1S8 transcription lines involved:

!p1s8-line-15-sync-error.png!

h5. "Comment in Section" error

Transcriber error window:

!trs-import-section-comment-error.png!

P1S8 transcription lines involved:

!p1s8_line-19-comment-error.png!

h5. "Comment in Section" error 2

Transcriber error window:

!trs-import-section-comment-error2.png!

P1S8 transcription lines involved:

!p1s8-line-24-comment-error2.png!

h5. "Comment in Section" error 3

Transcriber error window:

!trs-import-section-comment-error3.png!

P1S8 transcription lines involved:

!p1s8-line-52-comment-error3.png!

h5. "@theme (and @organisation) in Section" error

Transcriber error window:

!trs-import-section-theme-error.png!

P1S8 transcription lines involved:

!p1s8-line-64-section-theme-error.png!

h4. Analysis of all the errors encountered

Problems are not only related to the <Section> element overloading.

The Trans DTD is broken:
<pre>
Trans (audio_filename?, scribe?, xml:lang?, version?, version_date?, elapsed_time="0")
|
|- Speakers
| |
| += Speaker* (id, name, check?, type?, dialect?, accent?, scope?)
|
|- Topics
| |
| += Topic* (id, desc)
|
+- Episode (program?, air_date?)
|
+- Section* (type, topic?, startTime, endTime)
|
+- Turn* (speaker?, startTime, endTime, mode?, fidelity?, channel?)
|
|= Sync (time)
|
|= Background (time, type, level?)
|
|= Comment (desc)
|
|= Who (nb)
|
|= Vocal (desc)
|
+= Event (type="noise", extent="instantaneous", desc)
</pre>

<Sync> and <Comment> can only be sons of <Turn>.

h3. Solution

A) don't put <Sync> elements as sons of <Section> but as sons of next <Turn>

B) don't put <Comment> elements as sons of <Section> but as sons of next <Turn>

C) Add a 'keepOutputFormatTranscriberConformant' parameter to the macro:
* a 'yes' value (default value) means that any encoding in the ODT source related to some special section properties should not be transfered to any XML encoding (as attributes of <Section> elements) that violates the standard TRS dtd. That encoding must be transfered as <Comment>s sons of <Turn>s.
* a 'no' value means that a XML-TRS file can be generated that can be imported by TXM but not opened by Transcriber for post editing.

Retour