Task #1666

Task #1630: TBX: improve performances of import process

TBX: improve performances of tokenizing process

Added by Sebastien Jacquot almost 4 years ago. Updated over 3 years ago.

Status:New Start date:02/10/2016
Priority:Normal Due date:
Assignee:- % Done:

80%

Category:Import Spent time: -
Target version:TXM 0.7.8

Description

Some improvements may be done in the Groovy code of tokenizing sections.

Streams

Wrap FileOutputStreams with BufferedOutputStreams.

Compiling REGEX patterns

  • compile all REGEX patterns that are used in Groovy scripts (use "= ~")
  • e.g. replace from: reg3pts = /\A(.*)(\.\.\.)(.*)\Z/ to: reg3pts = ~/\A(.*)(\.\.\.)(.*)\Z/
  • also compile the patterns used in replaceAll(), split(), etc. then call these methods from the Matcher class itself rather than the String class
  • see Pattern.compile() for Java code sections

Use static Groovy compilation to avoid the runtime reflection

see:

  • import groovy.transform.CompileStatic
  • import static groovy.transform.TypeCheckingMode.SKIP
  • @CompileStatic
  • @CompileStatic(SKIP) => may be used when we can't statically compiling the script because of Groovy syntax usage. But a better solution is to remove the Groovy syntax to statically compile
  • add @CompileStatic before "public class SimpleTokenizerXml" in SimpleTokenizer.groovy
  • edit protected String standardChecks(String s) so it can be statically compiled
  • compile all reg3pts, regPunct, etc. member patterns using "= ~" instead of "="
  • extract the split pattern of split(), replaceAll() and matches() methods and store them compiled as member, eg.:
regSplitWhiteSpaces = Pattern.compile(TokenizerClasses.whitespaces);
regLN = Pattern.compile("/\n/");
regCTRL = Pattern.compile("/\\p{C}/");

History

#1 Updated by Sebastien Jacquot almost 4 years ago

  • Parent task set to #1630

#2 Updated by Matthieu Decorde over 3 years ago

  • Description updated (diff)

#3 Updated by Matthieu Decorde over 3 years ago

  • Description updated (diff)

#4 Updated by Matthieu Decorde over 3 years ago

  • % Done changed from 0 to 50

#5 Updated by Matthieu Decorde over 3 years ago

  • % Done changed from 50 to 60

#6 Updated by Matthieu Decorde over 3 years ago

  • % Done changed from 60 to 80

vérification du bon déroulement des modules d'import du menu import

Also available in: Atom PDF