Bug #1201

Macro TextTranscription2TRS: make output compatible with Transcriber

Added by Serge Heiden over 4 years ago. Updated 4 months ago.

Status:New Start date:12/14/2014
Priority:Normal Due date:
Assignee:- % Done:

0%

Category:Macros Spent time: -
Target version:TXM X.X

Description

Context and origin of the bug description (mail, TXM version, OS, etc.)

  • TXM 0.7.6.201409191440
  • Ubuntu 12.04 64-bit
  • TextTranscription2TRSMacro.groovy
    • last modification: lun. 08 déc. 2014 18:09:42 CET
    • size: 2 403 octets

Diagnostic

Currently, the TextTranscription2TRS macro builds a TRS file that the Transcriber software is not able to open and edit.

This prevents a user to edit the resulting TRS file: to add time bullets for example, and breaks TXM manual that tels that this macro produces a TRS format conformant file.

Hypothesis

This may come from a special add-on in the macro to manage some additional sections properties that are not available in a standard TRS file.

Here is the description of the transcription conventions to encode those section properties: (FR:)

Sections

La transcription peut être divisée en sections, caractérisées par des
propriétés. Une section commence par une ligne au format suivant :
[propriété1="une valeur" propriété2="une autre valeur"]
«propriété1» et «propriété2» sont des noms de propriétés de la section qui
prennent pour valeur « une valeur » et « une autre valeur » jusqu'à la
prochaine section.
Les lignes de début de section doivent respecter les règles suivantes :
– Le nom d'une propriété ne doit pas contenir d'accent ni d'espace ou
de ponctuation. Astuce : on pourra remplacer les espaces par des
soulignés (_)
– La valeur de la propriété doit être entre guillemets anglais "..." 
– Les propriétés sont séparées par un espace
– Une nouvelle section ferme la section qui la précède.

The result is that the following transcription source:

[theme="Thème 1: Préparation du travail: recherche et mise à disposition des documents manquant" organisation="class"]

Is transformed by the TextTranscription2TRS to:

    <Section type="report" topic="" startTime="112.003" endTime="294.0" theme="Thème 1: Préparation du travail: recherche et mise à disposition des documents manquant" organisation="class">

Transcriber import tests

  1. the 'P1S8 30 avril 2014.odt' transcription is transformed to 'P1S8 30 avril 2014.trs' by the TextTranscription2TRS macro (version of 8th April 2014)
  2. the 'P1S8 30 avril 2014.trs' file is opended by Transcriber 1.5.2 on Linux

report of all the errors encountered

"Sync in Section" error

Transcriber error window:

P1S8 transcription lines involved:

"Comment in Section" error

Transcriber error window:

P1S8 transcription lines involved:

"Comment in Section" error 2

Transcriber error window:

P1S8 transcription lines involved:

"Comment in Section" error 3

Transcriber error window:

P1S8 transcription lines involved:

"@theme (and @organisation) in Section" error

Transcriber error window:

P1S8 transcription lines involved:

Analysis of all the errors encountered

Problems are not only related to the <Section> element overloading.

The Trans DTD is broken:

Trans (audio_filename?, scribe?, xml:lang?, version?, version_date?, elapsed_time="0")
|
|- Speakers 
|  |
|  += Speaker* (id, name, check?, type?, dialect?, accent?, scope?)
|     
|- Topics 
|  |
|  += Topic* (id, desc)
|     
+- Episode (program?, air_date?)
   |
   +- Section* (type, topic?, startTime, endTime)
      |
      +- Turn* (speaker?, startTime, endTime, mode?, fidelity?, channel?)
         |
         |= Sync (time)
         |  
         |= Background (time, type, level?)
         |  
         |= Comment (desc)
         |  
         |= Who (nb)
         |  
         |= Vocal (desc)
         |  
         += Event (type="noise", extent="instantaneous", desc)

<Sync> and <Comment> can only be sons of <Turn>.

Solution

A) don't put <Sync> elements as sons of <Section> but as sons of next <Turn>

B) don't put <Comment> elements as sons of <Section> but as sons of next <Turn>

C) Add a 'keepOutputFormatTranscriberConformant' parameter to the macro:
  • a 'yes' value (default value) means that any encoding in the ODT source related to some special section properties should not be transfered to any XML encoding (as attributes of <Section> elements) that violates the standard TRS dtd. That encoding must be transfered as <Comment>s sons of <Turn>s.
  • a 'no' value means that a XML-TRS file can be generated that can be imported by TXM but not opened by Transcriber for post editing.

trs-import-section-sync-error.png (4 kB) Serge Heiden, 01/19/2015 06:48 pm

p1s8-line-15-sync-error.png (22.5 kB) Serge Heiden, 01/19/2015 06:48 pm

trs-import-section-comment-error.png (4.5 kB) Serge Heiden, 01/19/2015 06:56 pm

p1s8-line-15-sync-error.png (22.5 kB) Serge Heiden, 01/19/2015 06:56 pm

p1s8_line-19-comment-error.png (40.1 kB) Serge Heiden, 01/19/2015 06:56 pm

p1s8-line-24-comment-error2.png (50.5 kB) Serge Heiden, 01/19/2015 06:56 pm

p1s8-line-52-comment-error3.png (23.7 kB) Serge Heiden, 01/19/2015 06:56 pm

p1s8-line-64-section-theme-error.png (25.9 kB) Serge Heiden, 01/19/2015 06:56 pm

trs-import-section-theme-error.png (4.6 kB) Serge Heiden, 01/19/2015 06:56 pm

trs-import-section-comment-error3.png (4.5 kB) Serge Heiden, 01/19/2015 06:56 pm

trs-import-section-comment-error2.png (4.1 kB) Serge Heiden, 01/19/2015 06:56 pm

trs-import-section-comment-error.png (4.5 kB) Serge Heiden, 01/19/2015 06:56 pm


Related issues

related to Task #179: Import Transcription: résultat ouvrable par Transcriber New 06/26/2013

History

#1 Updated by Serge Heiden over 4 years ago

  • Description updated (diff)

#2 Updated by Serge Heiden over 4 years ago

#4 Updated by Serge Heiden over 4 years ago

  • Description updated (diff)

#5 Updated by Serge Heiden over 4 years ago

  • Tracker changed from Feature to Bug
  • Subject changed from Macro TextTranscription2TRS: add a keepOutputFormatTranscriberConformant parameter to Macro TextTranscription2TRS: make output compatible with Transcriber
  • Description updated (diff)

#6 Updated by Matthieu Decorde almost 4 years ago

  • Target version changed from TXM 0.7.8 to TXM 0.8.0a (split/restructuration)

#7 Updated by Sebastien Jacquot about 1 year ago

  • Target version changed from TXM 0.8.0a (split/restructuration) to TXM 0.8.0

#8 Updated by Matthieu Decorde 4 months ago

  • Target version changed from TXM 0.8.0 to TXM X.X

Also available in: Atom PDF