Task #1666

Updated by Matthieu Decorde almost 5 years ago

Some improvements may be done in the Groovy code of tokenizing sections.

h3. Streams

Wrap FileOutputStreams with BufferedOutputStreams.

Compiling REGEX patterns

* compile all REGEX patterns that are used in Groovy scripts (use "= ~")
* e.g. replace from: reg3pts = /\A(.*)(\.\.\.)(.*)\Z/ to: reg3pts = ~/\A(.*)(\.\.\.)(.*)\Z/
* also compile the patterns used in replaceAll(), split(), etc. then call these methods from the Matcher class itself rather than the String class
* see Pattern.compile() for Java code sections

h3. Use static Groovy compilation to avoid the runtime reflection


* import groovy.transform.CompileStatic
* import static groovy.transform.TypeCheckingMode.SKIP
* @CompileStatic
* @CompileStatic(SKIP) => may be used when we can't statically compiling the script because of Groovy syntax usage. But a better solution is to remove the Groovy syntax to statically compile

* add @CompileStatic before "public class SimpleTokenizerXml" in SimpleTokenizer.groovy
* edit protected String standardChecks(String s) so it can be statically compiled
* compile all reg3pts, regPunct, etc. member patterns using "= ~" instead of "="
* extract the split pattern of split(), replaceAll() and matches() methods and store them compiled as member, eg.:

regSplitWhiteSpaces = Pattern.compile(TokenizerClasses.whitespaces);
regLN = Pattern.compile("/\n/");
regCTRL = Pattern.compile("/\\p{C}/");