Bug #1201
Macro TextTranscription2TRS: make output compatible with Transcriber
Status: | New | Start date: | 12/14/2014 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | - | % Done: | 0% |
|
Category: | Macros | Spent time: | - | |
Target version: | TXM X.X |
Description
Context and origin of the bug description (mail, TXM version, OS, etc.)¶
- TXM 0.7.6.201409191440
- Ubuntu 12.04 64-bit
- TextTranscription2TRSMacro.groovy
- last modification: lun. 08 déc. 2014 18:09:42 CET
- size: 2 403 octets
Diagnostic¶
Currently, the TextTranscription2TRS macro builds a TRS file that the Transcriber software is not able to open and edit.
This prevents a user to edit the resulting TRS file: to add time bullets for example, and breaks TXM manual that tels that this macro produces a TRS format conformant file.
Hypothesis¶
This may come from a special add-on in the macro to manage some additional sections properties that are not available in a standard TRS file.
Here is the description of the transcription conventions to encode those section properties: (FR:)
Sections La transcription peut être divisée en sections, caractérisées par des propriétés. Une section commence par une ligne au format suivant : [propriété1="une valeur" propriété2="une autre valeur"] «propriété1» et «propriété2» sont des noms de propriétés de la section qui prennent pour valeur « une valeur » et « une autre valeur » jusqu'à la prochaine section. Les lignes de début de section doivent respecter les règles suivantes : – Le nom d'une propriété ne doit pas contenir d'accent ni d'espace ou de ponctuation. Astuce : on pourra remplacer les espaces par des soulignés (_) – La valeur de la propriété doit être entre guillemets anglais "..." – Les propriétés sont séparées par un espace – Une nouvelle section ferme la section qui la précède.
The result is that the following transcription source:
[theme="Thème 1: Préparation du travail: recherche et mise à disposition des documents manquant" organisation="class"]
Is transformed by the TextTranscription2TRS to:
<Section type="report" topic="" startTime="112.003" endTime="294.0" theme="Thème 1: Préparation du travail: recherche et mise à disposition des documents manquant" organisation="class">
Transcriber import tests¶
- the 'P1S8 30 avril 2014.odt' transcription is transformed to 'P1S8 30 avril 2014.trs' by the TextTranscription2TRS macro (version of 8th April 2014)
- the 'P1S8 30 avril 2014.trs' file is opended by Transcriber 1.5.2 on Linux
report of all the errors encountered¶
"Sync in Section" error
Transcriber error window:
P1S8 transcription lines involved:
"Comment in Section" error
Transcriber error window:
P1S8 transcription lines involved:
"Comment in Section" error 2
Transcriber error window:
P1S8 transcription lines involved:
"Comment in Section" error 3
Transcriber error window:
P1S8 transcription lines involved:
"@theme (and @organisation) in Section" error
Transcriber error window:
P1S8 transcription lines involved:
Analysis of all the errors encountered¶
Problems are not only related to the <Section> element overloading.
The Trans DTD is broken:
Trans (audio_filename?, scribe?, xml:lang?, version?, version_date?, elapsed_time="0") | |- Speakers | | | += Speaker* (id, name, check?, type?, dialect?, accent?, scope?) | |- Topics | | | += Topic* (id, desc) | +- Episode (program?, air_date?) | +- Section* (type, topic?, startTime, endTime) | +- Turn* (speaker?, startTime, endTime, mode?, fidelity?, channel?) | |= Sync (time) | |= Background (time, type, level?) | |= Comment (desc) | |= Who (nb) | |= Vocal (desc) | += Event (type="noise", extent="instantaneous", desc)
<Sync> and <Comment> can only be sons of <Turn>.
Solution¶
A) don't put <Sync> elements as sons of <Section> but as sons of next <Turn>
B) don't put <Comment> elements as sons of <Section> but as sons of next <Turn>
C) Add a 'keepOutputFormatTranscriberConformant' parameter to the macro:- a 'yes' value (default value) means that any encoding in the ODT source related to some special section properties should not be transfered to any XML encoding (as attributes of <Section> elements) that violates the standard TRS dtd. That encoding must be transfered as <Comment>s sons of <Turn>s.
- a 'no' value means that a XML-TRS file can be generated that can be imported by TXM but not opened by Transcriber for post editing.
Related issues
History
#1 Updated by Serge Heiden over 8 years ago
- Description updated (diff)
#2 Updated by Serge Heiden over 8 years ago
- File trs-import-section-sync-error.png added
- File p1s8-line-15-sync-error.png added
- Description updated (diff)
#3 Updated by Serge Heiden over 8 years ago
- File trs-import-section-comment-error.png added
- File p1s8-line-15-sync-error.png added
- File p1s8_line-19-comment-error.png added
- File p1s8-line-24-comment-error2.png added
- File p1s8-line-52-comment-error3.png added
- File p1s8-line-64-section-theme-error.png added
- File trs-import-section-theme-error.png added
- File trs-import-section-comment-error3.png added
- File trs-import-section-comment-error2.png added
- File trs-import-section-comment-error.png added
- Description updated (diff)
#4 Updated by Serge Heiden over 8 years ago
- Description updated (diff)
#5 Updated by Serge Heiden over 8 years ago
- Tracker changed from Feature to Bug
- Subject changed from Macro TextTranscription2TRS: add a keepOutputFormatTranscriberConformant parameter to Macro TextTranscription2TRS: make output compatible with Transcriber
- Description updated (diff)
#6 Updated by Matthieu Decorde over 7 years ago
- Target version changed from TXM 0.7.8 to TXM 0.8.0a (split/restructuration)
#7 Updated by Sebastien Jacquot almost 5 years ago
- Target version changed from TXM 0.8.0a (split/restructuration) to TXM 0.8.0
#8 Updated by Matthieu Decorde about 4 years ago
- Target version changed from TXM 0.8.0 to TXM X.X