Saxon processor and regex issues with {} vs {{}}
@wendell I'm curious about this group of errors I ran into, that has to do with whether line 466 of the ucp-text-macros.xsl
sheet uses single or double curly brackets in the regex:
Some versions of Saxon work only with the single brackets version of this sheet, but not with the double brackets, while others work only with the double brackets but not with single brackets.
This has been resolved for the moment but I'd like to understand it.
There are 3 different version of Saxon I've used:
-
SaxonHE9-8-0-1J
: what we'd bundled with XSweet core, but I recently replaced it with -
SaxonHE9-9-1-1J
: currently included with XSweet core -
Saxon/C 1.1.2
: built from Saxon 9.8.0.15. Currently used in a PHP implementation of XSweet that's supporting the .docx uploads for Editoria instances being used in production.
The change that started this was this:
<xsl:variable name="livechar">[^\s\p{Ps}\p{Pe}"']</xsl:variable>
being changed to have more curly brackets:
<xsl:variable name="livechar">[^\s\p{{Ps}}\p{{Pe}}"']</xsl:variable>
With this change, running XSweet with SaxonHE9-8-0-1J
didn't work. ucp-text-macros.xsl
failed to produce an output, throwing the following error:
[] : "([^\s\p{{Ps}}\p{{Pe}}"'])
Error at char 8 in xsl:sequence/@select on line 289 column 75 of ucp-text-macros.xsl:
FORX0002: Syntax error at char 9 in regular expression: Unknown character category: {Ps
at xsl:apply-templates (file:/Users/atheg/Desktop/runner_test/XSweet_runner_scripts/XSweet-master-3baf18b7dcc799a1a4a7a21c533404b8b100de0b/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros.xsl#269)
processing xsw:sequence/xsw:match[5]
at xsl:apply-templates (file:/Users/atheg/Desktop/runner_test/XSweet_runner_scripts/XSweet-master-3baf18b7dcc799a1a4a7a21c533404b8b100de0b/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros.xsl#509)
processing xsw:sequence
at xsl:apply-templates (file:/Users/atheg/Desktop/runner_test/XSweet_runner_scripts/XSweet-master-3baf18b7dcc799a1a4a7a21c533404b8b100de0b/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros.xsl#269)
processing sequence/munge-quotes[1]
at xsl:apply-templates (file:/Users/atheg/Desktop/runner_test/XSweet_runner_scripts/XSweet-master-3baf18b7dcc799a1a4a7a21c533404b8b100de0b/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros.xsl#186)
processing sequence
in built-in template rule for /html/body[1]/div[1]/p[1] in the unnamed mode
in built-in template rule for /html in the unnamed mode
Syntax error at char 9 in regular expression: Unknown character category: {Ps
Error on line 1 column 1 of jures_img-12UCPTEXTED.xhtml:
SXXP0003: Error reported by XML parser: Premature end of file.
org.xml.sax.SAXParseException; systemId: file:/Users/atheg/Desktop/runner_test/XSweet_runner_scripts/XSweet-master-3baf18b7dcc799a1a4a7a21c533404b8b100de0b/scripts/../outputs/jures_img/jures_img-12UCPTEXTED.xhtml; lineNumber: 1; columnNumber: 1; Premature end of file.
So the runner_scripts broke. I reverted this back to single curly brackets, so it worked with SaxonHE9-8-0-1J
again.
But, then I tried to deploy it to the production servers using Saxon/C 1.1.2
, and got the following error message:
[] : "([^\s\p\p"'])
Error evaluating (($original, ...)) in xsl:sequence/@select on line 289 column 75 of ucp-text-macros.xsl:
FORX0002: Syntax error at char 8 in regular expression: Expected '{' after \112. Failed
while atomizing the result of template match="xsw:sequence". Failed while atomizing the
result of template match="xsw:sequence"
In template rule with match="element(Q{http://coko.foundation/xsweet}sequence)" on line 258 of ucp-text-macros.xsl
invoked by xsl:apply-templates at file:/Users/atheg/Desktop/runner_test/XSweet_runner_scripts/XSweet-master-3baf18b7dcc799a1a4a7a21c533404b8b100de0b/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros.xsl#186
In template rule with match="element(Q{http://www.w3.org/1999/xhtml}p)" on line 173 of ucp-text-macros.xsl
invoked by built-in template rule (shallow-copy)
In template rule with match="text()[fn:not(...)]" on line 254 of ucp-text-macros.xsl
Syntax error at char 8 in regular expression: Expected '{' after \112. Failed while atomizing the result of template match="xsw:sequence". Failed while atomizing the result of template match="xsw:sequence"
Adding the extra "{}"s back fixed the problem.
SaxonHE9-9-1-1J
behaved like Saxon/C 1.1.2
: it worked with the double brackets and threw the same error about the single ones.
Since those servers pull from the master
branches of the XSweet repos when they run an update task, I replaced SaxonHE9-8-0-1J
with SaxonHE-9-9-1-1J
.
The runner scripts work again, we're using the double brackets, and all is well in the world again. But I'd be really interested to hear what your change was about :)
Thanks Wendell!