Feature #2358

RCP X.X: XTZ-CSV import module, set "teiHeader" as default value for out-of-text parameter in import parameters input form

Added by Serge Heiden almost 2 years ago. Updated 9 months ago.

Status:New Start date:03/16/2018
Priority:High Due date:
Assignee:- % Done:

0%

Category:Import Spent time: -
Target version:TXM X.X

Description

XTZ+CSV is geared toward TEI encoded texts.

TEI encoded text have teiHeaders, and teiHeaders content always need to be ignored as text content.

More and more people use the XTZ+CSV import module, but never think to set teiHeader in out-out-text, so they regularly get teiHeader content in their text body (like user name, etc.).

Solution

set "teiHeader" as default value for out-of-text parameter in XTZ-CSV import parameters input form.

The power user can reset the out-of-text parameter value at any time if he needs to, and the novice user will have her life simplified.

History

#1 Updated by Alexey Lavrentev over 1 year ago

It may be better to set teiHeader as "Out of text to edit", as currently "Out of text" elements are completely deleted from XML-TXM corpora, which may result in the loss of useful metadata for future use of the texts processed by TXM.

#2 Updated by Serge Heiden over 1 year ago

Currently:
  • text metadata are already managed directly by TXM (typically some teiHeader contents are transfered at one moment to text element attributes or metadata.csv files)
  • metadata elements are also already rendered at the beginning of text editions by most default pagers

If you want to be able to manage the teiHeader content yourself, we must first define the relations between default TXM import behaviors and the various mecanisms to help customize TXM import.

For example, in the XTZ+CSV context, can we simultaneously keep the default Pager metadata renderer and keep the teiHeader content to edit by default?

#3 Updated by Alexey Lavrentev over 1 year ago

The problem here is not to manage metadata from teiHeader with TXM but to lose it for future use with other applications (cf. User Manual 3.7. section http://txm.sourceforge.net/doc/manual/0.7.9/fr/manual26.xhtml#toc108 that invites users to reuse source files processed by TXM).

Currently teiHeader is not displayed in the edition even though it is set to "out of text to edit" (because TXM adds its own teiHeader if the source document does not have one).

#4 Updated by Serge Heiden over 1 year ago

OK so your comment is not related to an import but to an export issue.

But the "Out of text to edit" parameter is still related to the text edition production, the source teiHeader content is susceptible to participate to metadata production, and text edition production is interested in rendering metadata.

a) can you still suggest an answer to the previous question? "in the XTZ+CSV context, can we simultaneously keep the default Pager metadata renderer and keep the teiHeader content to edit by default?" (especially when some metadata rendered in the edition come from the teiHeader in the sources)

b) can you suggest how to render by default in an edition a teiHeader "... to edit"?

Note: a solution to the export issue could be a new textual plan - ignore but keep?

#5 Updated by Alexey Lavrentev over 1 year ago

I think the problem comes from the current "out-of-text" behavior: it should not delete the elements in XML-TEI-TXM but make them invisible to TXM.
Restructuring source files (such as deleting some XML elements) should be done by XSLT rather than by import module parameters.
Placing teiHeader into "out of text to edit" is a workaround for current state of TXM.
That being said, I can try answer the questions:
a) yes, it can take various forms to be defined, e.g.
Title page :
Section 1: Selected information from teiHeader
Section 2: TXM metadata
b) yes, we can use TEI stylesheets as a model, I can make a first draft proposal based on them

#6 Updated by Serge Heiden over 1 year ago

1) I think the problem comes from the current "out-of-text" behavior: it should not delete the elements in XML-TEI-TXM but make them invisible to TXM.

Currently the implementation of the "out-of-text" behavior is to suppress the targets at one moment in the import process.

Do you have a suggestion to implement how the targets could be made "invisible to TXM" at some moment in the import process?

2) Restructuring source files (such as deleting some XML elements) should be done by XSLT rather than by import module parameters.

  • user perspective: currently the TXM design priority is to help end users to import XML sources with "simple/imperfect/sufficient in most cases" concepts parameters and UI. Forcing end users to use XSLT to import XML sources would be a dramatic change to TXM design priorities.
  • developer/scripter perspective: XSLT is a nice technology to manipulate XML, but it is not the only one. Currently the TXM import process is developped with all the available XML oriented Java/Groovy APIs available (DOM, StAX...), XSLT and XQuery (later) for various purposes. Each technology has its pros and cons, and should be used for its pros depending on the context, be documented and be maintained.

3) a) yes, it can take various forms to be defined, e.g.
Title page :
Section 1: Selected information from teiHeader
Section 2: TXM metadata

We should discuss and decide the semantics involved in all this:
  • I would place TXM metadata before, if I had to mix the informations
  • 'Section 1' and 'Section 2' isn't very transparent to me, to distinguish between editorial rendering of some information coming from the teiHeader (and not already present in the edition of the TEI text, like the title page for example) and editorial rendering of some information coming from the metadata, that can come from the teiHeader.

4) b) yes, we can use TEI stylesheets as a model, I can make a first draft proposal based on them

OK. Maybe we could host a link to it in a spec page in the txm-info wiki?

#7 Updated by Sebastien Jacquot over 1 year ago

  • Target version changed from TXM 0.8.0a (split/restructuration) to TXM 0.8.0

#8 Updated by Matthieu Decorde 9 months ago

  • Target version changed from TXM 0.8.0 to TXM X.X

Also available in: Atom PDF