Bug #2160

RCP: X.X, words not highlighted in editions

Added by Matthieu Decorde over 2 years ago. Updated 5 months ago.

Status:New Start date:04/19/2017
Priority:High Due date:
Assignee:- % Done:

60%

Category:Import Spent time: -
Target version:TXM 0.8.1

Description

For some texts, words are not highlighted in editions.

The IDS of those words not highlighted contain characters that broke the CSS ID syntax rules (e.g " ", "(" and more)

Discussion

Word IDs are built with <text identifier + number> or come from the sources.

If we forge the word ids in import modules, we must normalize/reduce text names to a text identifier, at the level of the corpus.

Three strategies:
  • a) normalize/reduce characters or morphemes
  • b) escape characters
  • c) manage <text name>:<automatic text identifier> hash

b) suppose to escape with respect to the syntax reading the identifier: for example CSS syntax. So different escape algorithms may need to be used depending on context. See the XXX Java library to escape for a lot of different syntaxes.

c) suppose to use the hash in various contexts: eg concordance references, etc.

Solution

Define the most simple common compatible syntax compatible with CSS ID syntax and CQL syntax.

Do a) fix the XMLw to XML-TXM step of import modules, in the XML2Ana class:
  • normalize/reduce the text ID to the bottom syntax

Solution 2

  • add a new import option "force word id generation" for corpora having already word IDs.
  • add a new load option "force word id generation" for corpora having already word IDs.

Related issues

related to Bug #2353: DOC: X.X, Windows words not highlighted in editions New 04/19/2017
related to Bug #2354: RCP: X.X, page break: words not highlighted in editions New 04/19/2017

History

#1 Updated by Matthieu Decorde almost 2 years ago

  • Description updated (diff)

#2 Updated by Matthieu Decorde almost 2 years ago

  • Description updated (diff)
  • % Done changed from 80 to 60

#3 Updated by Serge Heiden over 1 year ago

  • Description updated (diff)
  • Priority changed from Normal to High
  • Target version changed from TXM 0.7.8 to TXM 0.8.0a (split/restructuration)

#4 Updated by Sebastien Jacquot about 1 year ago

  • Target version changed from TXM 0.8.0a (split/restructuration) to TXM 0.8.0

#5 Updated by Matthieu Decorde 5 months ago

  • Target version changed from TXM 0.8.0 to TXM 0.8.1

Also available in: Atom PDF