XSweet issueshttps://gitlab.coko.foundation/groups/XSweet/-/issues2018-07-27T04:13:07Zhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/143Remove <spacing> element2018-07-27T04:13:07ZAlex ThegRemove <spacing> elementIn Prado, Ch 1, some paragraphs are composed of very small snippets of text enclosed by `<spacing>` tags.
The `<spacing>` tags should be removed, joining the text inside and outside of them into one string.
```html
<p style="margin-lef...In Prado, Ch 1, some paragraphs are composed of very small snippets of text enclosed by `<spacing>` tags.
The `<spacing>` tags should be removed, joining the text inside and outside of them into one string.
```html
<p style="margin-left: 5pt; margin-right: 2.15pt; margin-top: 0.5pt; text-indent: 36pt">
<spacing>Man</spacing>u
<spacing>e</spacing>l
<spacing> B</spacing>o
<spacing>telh</spacing>o
<spacing> </spacing>de
<spacing> Lacer</spacing>da
<spacing> </spacing>w
<spacing>a</spacing>s
...
```https://gitlab.coko.foundation/XSweet/XSweet/-/issues/145Collapse adjacent and repeated inline formatting tags2018-07-27T04:46:12ZBruno Herfsthello@brunoherfst.comCollapse adjacent and repeated inline formatting tagsExtracted Source:
<span style="font-style: italic"><span class="My Italic Style">italicised</span></span>
Result:
<em><em>italicised</em></em>Extracted Source:
<span style="font-style: italic"><span class="My Italic Style">italicised</span></span>
Result:
<em><em>italicised</em></em>https://gitlab.coko.foundation/XSweet/XSweet/-/issues/150Hyperlinks in footnotes broken2018-07-27T16:58:45ZBruno Herfsthello@brunoherfst.comHyperlinks in footnotes brokenHyperlinks in footnotes become internal DOC reference:
<a href="../customXml/item1.xml">
Expected it to be:
<a href="http://www.example.com">
[footnote-hyperlink.docx](/uploads/c03e49e543009d239dd03b2fb3606dca/footnote-hyperl...Hyperlinks in footnotes become internal DOC reference:
<a href="../customXml/item1.xml">
Expected it to be:
<a href="http://www.example.com">
[footnote-hyperlink.docx](/uploads/c03e49e543009d239dd03b2fb3606dca/footnote-hyperlink.docx)https://gitlab.coko.foundation/XSweet/XSweet/-/issues/147How do we test?2018-07-30T08:12:02ZBruno Herfsthello@brunoherfst.comHow do we test?I want to write some tests to validate the behavior of XSweets. Is there a preference for how that is done? Add a test folder with some test source documents and their expected output? Or use an existing testing framework like [xspec](ht...I want to write some tests to validate the behavior of XSweets. Is there a preference for how that is done? Add a test folder with some test source documents and their expected output? Or use an existing testing framework like [xspec](https://github.com/expath/xspec)?https://gitlab.coko.foundation/XSweet/XSweet/-/issues/146Extract Break Types2018-07-31T06:50:38ZBruno Herfsthello@brunoherfst.comExtract Break TypesBreak types are lost in conversion to HTML.
1 Page Break
2 Column Break
3 Next Page
4 Section Break
5 Even Page Break
5 Odd Page Break
6 Section Break
Could they be converted to classes: `<br class='page-bre...Break types are lost in conversion to HTML.
1 Page Break
2 Column Break
3 Next Page
4 Section Break
5 Even Page Break
5 Odd Page Break
6 Section Break
Could they be converted to classes: `<br class='page-break'>`https://gitlab.coko.foundation/XSweet/XSweet/-/issues/153Update XSweet to work with outline and list HTML attributes2018-08-07T12:51:33ZAlex ThegUpdate XSweet to work with outline and list HTML attributesAfter #152 is finished, and these are added as HTML attributes:
* `-xsweet-outline-level`
* `-xsweet-list-level`
We will need to:
1. Update how list handling works to use the new attribute
2. Update where heading promotion looks for thi...After #152 is finished, and these are added as HTML attributes:
* `-xsweet-outline-level`
* `-xsweet-list-level`
We will need to:
1. Update how list handling works to use the new attribute
2. Update where heading promotion looks for this data
3. Remove the above information from the CSS `style`https://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/15Ornament detector2018-08-07T13:01:25ZAlex ThegOrnament detectorAuthors often include ornaments to create divisions within chapters in their Word files. These are things like:
> chapter content chapter content chapter content.
> `* * *`
> Back to more chapter content
Authors can and do implement ...Authors often include ornaments to create divisions within chapters in their Word files. These are things like:
> chapter content chapter content chapter content.
> `* * *`
> Back to more chapter content
Authors can and do implement these in a few different ways:
1. Any number of text dividers: `***`, `*****`, `- - -`, etc.
2. Using a horizontal rule in Word
It would be good as an enhancement step (not extraction) to be able to port these into Wax, so I propose we implement the following:
1. Add an optional enhancement step to HTMLevator that can recognize a range of ornaments and convert them into `<hr>`s
2. Then, add a step into Editoria Typescript to convert `<hr>`s into ornaments for Wax. There's a ticket in for implementing this in Wax (https://gitlab.coko.foundation/wax/wax/issues/178), so we'll need to wait for this to be implemented to have a target format for ornaments. But it should be a straightforward mapping.
But we can start on the first part: ornament recognition.
I think there are 2 parts to this that would get us most of the way there:
## 1. Recognizing text ornaments
I think the rule for this is pretty simple: any paragraph that contains ONLY any combination of
* asterisks
* spaces
* hyphens
* en dashes
* em dashes
* tabs
is an ornament. The paragraph and its content should be clobbered and replaced with an `<hr>`
## 2. Convert horizontal rules to `<hr>`s
In Word, on a new line, typing 3 or more hyphens in a row then hitting enter creates a horizontal rule. Under the hood, it's achieved by applying a bottom border to the previous paragraph, like so:
How it looks in Word:
>
Content
***
content
The OOXML:
```xml
<w:p w14:paraId="4F67C0DD" w14:textId="77777777" w:rsidR="00B82E58" w:rsidRDefault="00C8440C">
<w:pPr>
<w:pBdr><w:bottom w:val="single" w:sz="6" w:space="1" w:color="auto"/></w:pBdr>
</w:pPr>
<w:r>
<w:t>Content</w:t>
</w:r>
</w:p>
<w:p w14:paraId="5A5EC17D" w14:textId="77777777" w:rsidR="00C8440C" w:rsidRDefault="00C8440C">
<w:r>
<w:t>content</w:t>
</w:r><w:bookmarkStart w:id="0" w:name="_GoBack"/><w:bookmarkEnd w:id="0"/>
</w:p>
```
So, HTMLevator would need to recognize this bottom border, and add an `<hr>` after the end of that paragraph.
What do you think?https://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/9Don't promote ornamental breaks to headers2018-08-07T13:01:26ZAlex ThegDon't promote ornamental breaks to headersThe most common ornamental divider authors use is a series of asterisks, which may or may not be separated by spaces or tabs.
* "***"
* "*****"
* "* * *"
* `*<span class="tab"></span>*`
Paragraphs containing only such a a pattern (with...The most common ornamental divider authors use is a series of asterisks, which may or may not be separated by spaces or tabs.
* "***"
* "*****"
* "* * *"
* `*<span class="tab"></span>*`
Paragraphs containing only such a a pattern (with any number of spaces or tabs between asterisks) should be excluded from consideration for promotion to headings.https://gitlab.coko.foundation/XSweet/XSweet/-/issues/142CSS for hanging paragraphs2018-08-07T14:24:43ZAlex ThegCSS for hanging paragraphsXSweet extracts regular paragraph indentation from Word into CSS correctly, but it needs a tweak to how it handles hanging paragraphs.
Indentation without hanging works great:
One indent no hanging: `<w:ind w:left="720"/>` -> `<p style...XSweet extracts regular paragraph indentation from Word into CSS correctly, but it needs a tweak to how it handles hanging paragraphs.
Indentation without hanging works great:
One indent no hanging: `<w:ind w:left="720"/>` -> `<p style="margin-left: 36pt">`
Two indent no hanging: `<w:ind w:left="1440"/>` -> `<p style="margin-left: 72pt">`
But the indentation with hanging needs another CSS property to be correct:
One indent hanging:
`<w:ind w:left="1440" w:hanging="720"/>` -> `<p style="padding-left: 36pt; text-indent: -36pt">`
It needs a `margin-left: 36pt;` added in addition to what's already there to be correct.
Two indent hanging:
`<w:ind w:left="2160" w:hanging="720"/>` -> `<p style="padding-left: 36pt; text-indent: -36pt">`
It needs a `margin-left: 72pt;` added in addition to what's already there and then it's correct.
Here's an test docx: [hanging.docx](/uploads/459ecfb10d4e6c42caf16f4983c52142/hanging.docx)1.0.0https://gitlab.coko.foundation/XSweet/XSweet/-/issues/151Update binary references to use extracted copies, rather than originals2018-08-08T09:07:37ZAlex ThegUpdate binary references to use extracted copies, rather than originalsThings like embedded images, media, and math are all stored in the .docx directory. For the HTML extraction, these files should be copied over to the same directory as the HTML files. That way, they're easily accessible, and the HTML doe...Things like embedded images, media, and math are all stored in the .docx directory. For the HTML extraction, these files should be copied over to the same directory as the HTML files. That way, they're easily accessible, and the HTML doesn't require the input .docx file to stay where it originally was. However, XSLT doesn't allow for file system manipulation by itself. That task will fall to INK, which is slated to be rebuilt in JavaScript (rather than RoR). Once that is complete, XSweet should be updated to reference copies of the binaries in the output directory, rather than directly referencing the binaries of the original .docx file.https://gitlab.coko.foundation/XSweet/XSweet/-/issues/152Semantic data mixed with style data?2018-08-30T07:25:11ZBruno Herfsthello@brunoherfst.comSemantic data mixed with style data?I noticed that XSweet saves semantic data as style info:
<p style="font-family: Tahoma; font-size: 18pt; -xsweet-outline-level: 1">
Should `-xsweet-outline-level` become a `data-*` attribute?
<p style="font-family: Tahoma; fon...I noticed that XSweet saves semantic data as style info:
<p style="font-family: Tahoma; font-size: 18pt; -xsweet-outline-level: 1">
Should `-xsweet-outline-level` become a `data-*` attribute?
<p style="font-family: Tahoma; font-size: 18pt;" data-xsweet-outline-level="1">https://gitlab.coko.foundation/XSweet/XSweet/-/issues/158Some more Word detritus2018-10-10T04:56:20ZWendell PiezSome more Word detritusTo catch WordML elements so far unaccounted for -- we should consider the following matching and whether there isn't info to be captured e.g. from `caps` or `highlight`. The rest should be cleaned up in a "scrub" phase:
```
<xsl:templat...To catch WordML elements so far unaccounted for -- we should consider the following matching and whether there isn't info to be captured e.g. from `caps` or `highlight`. The rest should be cleaned up in a "scrub" phase:
```
<xsl:template match="noProof | iCs">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="caps | spacing | highlight | webHidden">
<span class="{local-name()}">
<xsl:apply-templates/>
</span>
</xsl:template>
```https://gitlab.coko.foundation/XSweet/editoria_typescript/-/issues/40Preserve inline formatting on note callouts2018-10-15T19:51:48ZAlex ThegPreserve inline formatting on note calloutsThe editors at UCP have found a strange bug related to how notes appear in Wax when the note callouts (the inline note numbers in the main content) have inline formatting applied to them (underlining, italics, bold, etc.).
The root caus...The editors at UCP have found a strange bug related to how notes appear in Wax when the note callouts (the inline note numbers in the main content) have inline formatting applied to them (underlining, italics, bold, etc.).
The root cause of the issue is a bug with the Substance Javascript library that Wax is built with, but we can avoid the bug altogether with a tweak to XSweet:
If a note callout has inline formatting tags applied to it, those tags are preserved in the final HTML extraction from XSweet core. But, when Typescript transforms the note and callout formats from their HTML to the Wax-specific note and callout format, any inline formatting tags applied on the note callout are dropped.
The fix for the bug is to update Typescript to preserve inline formatting tags on the note callouts all the way through Typescript.
Here's an example of an instance that causes the bug in Substance:
The original Word .docx here has an italicized note callout. The whole paragraph is italicized, with italics before, on, and after the note callout:
![Screen_Shot_2018-09-24_at_10.32.27_AM](/uploads/e988514d9dd04aa5d4e39766e61cf187/Screen_Shot_2018-09-24_at_10.32.27_AM.png)
Here's the result of XSweet Core - the inline italics on the note callout is still there (the `<i>s` inside the `< span class=EndnoteReference">`) :
```html
<p style="font-size: 14pt">
<span style="font-size: 14pt">“
<i>When you go out to battle... that makes war with you, until it is subdued.</i>
<span class="EndnoteReference">
<i><a class="endnoteReference" href="#en6">6</a></i>
</span>
<i>“</i>
</span>
</p>
```
And, you can see that in this final result that's ported into Editoria, the `<i>` tags have all been converted to `<em>` tags, and the tags within the note callout have been dropped:
```html
<p>"
<em>When you go out to battle that makes war with you, until it is subdued.
</em>
<note data-id="en6" />
<em>"</em>
</p>
```
The request for this ticket is: if there are `<u>`, `<b>`, or `<em>` tags enclosing a note callout, preserve those through Typescript.https://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/17Add another numeral span rule2018-11-05T23:07:14ZAlex ThegAdd another numeral span ruleUCP encountered the following situation, which requires a new rule to catch:
* Author's original: `digit space en-dash space digit`
* This rule was applied: `space en-dash space` becomes `space em-dash space`
* The final result was this...UCP encountered the following situation, which requires a new rule to catch:
* Author's original: `digit space en-dash space digit`
* This rule was applied: `space en-dash space` becomes `space em-dash space`
* The final result was this: `digit space em-dash space digit`
The current numeral span rule is that `hyphen/minus digit` gets converted to `en-dash digit`.
To it, we should add this rule: `digit space em-dash space digit` becomes `digit en-dash digit`https://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/18Error from 2-in-1 detect list sheet / Saxon version issue2018-12-19T20:11:42ZAlex ThegError from 2-in-1 detect list sheet / Saxon version issue@wendell I've got a question I'm stuck on for you:
I modified the `itemize-lists` sheet to add these detected itemized lists and wrap them in `ol`s. I also chained the two other sheets into one `DETECT-ITEMIZE-lists.xsl` sheet. That wor...@wendell I've got a question I'm stuck on for you:
I modified the `itemize-lists` sheet to add these detected itemized lists and wrap them in `ol`s. I also chained the two other sheets into one `DETECT-ITEMIZE-lists.xsl` sheet. That works in the IDE but not with my scripts. It seems to be a Saxon version issue: the IDE uses Saxon-HE 9-8-0-12, but the scripts use SaxonHE 9-8-0-1.
Using the original SaxonHE 9-8-0-1 processor with the scripts, I get this error:
```
Error at char 12 in xsl:variable/@select on line 42 column 48 of DETECT-ITEMIZE-lists.xsl:
FOXT0002: The transform option xslt-version is higher than the XSLT version supported by
this processor
at xsl:apply-templates (file:/Users/atheg/Desktop/lists_develop/recent_commit/test/XSweet_runner_scripts/XSweet-master-36ec4971e6213e2891146e67b2be5efe570a4484/scripts/../applications/htmlevator/applications/list-detect/DETECT-ITEMIZE-lists.xsl#28)
processing /xsw:transform
The transform option xslt-version is higher than the XSLT version supported by this processor
```
Next, by swapping in 9-8-0-12 or higher in the scripts, the above complaint goes away, but then I get a new one about the UCP cleanups, whic doesn't process:
```
Error at char 8 in xsl:sequence/@select on line 283 column 79 of ucp-text-macros-new.xsl:
FORX0002: Syntax error at char 8 in regular expression: Expected '{' after \112
at xsl:apply-templates (file:/Users/atheg/Desktop/lists_develop/recent_commit/test/XSweet_runner_scripts/XSweet-master-36ec4971e6213e2891146e67b2be5efe570a4484/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros-new.xsl#269)
processing xsw:sequence/xsw:match[5]
at xsl:apply-templates (file:/Users/atheg/Desktop/lists_develop/recent_commit/test/XSweet_runner_scripts/XSweet-master-36ec4971e6213e2891146e67b2be5efe570a4484/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros-new.xsl#503)
processing xsw:sequence
at xsl:apply-templates (file:/Users/atheg/Desktop/lists_develop/recent_commit/test/XSweet_runner_scripts/XSweet-master-36ec4971e6213e2891146e67b2be5efe570a4484/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros-new.xsl#269)
processing sequence/munge-quotes[1]
at xsl:apply-templates (file:/Users/atheg/Desktop/lists_develop/recent_commit/test/XSweet_runner_scripts/XSweet-master-36ec4971e6213e2891146e67b2be5efe570a4484/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros-new.xsl#186)
processing sequence
in built-in template rule for /html/body[1]/div[1]/p[1] in the unnamed mode
in built-in template rule for /html in the unnamed mode
Syntax error at char 8 in regular expression: Expected '{' after \112
Error on line 1 column 1 of List_test_2-11UCPTEXTED.xhtml:
SXXP0003: Error reported by XML parser: Premature end of file.
org.xml.sax.SAXParseException; systemId: file:/Users/atheg/Desktop/lists_develop/recent_commit/test/XSweet_runner_scripts/XSweet-master-36ec4971e6213e2891146e67b2be5efe570a4484/scripts/../outputs/lists/List_test_2-11UCPTEXTED.xhtml; lineNumber: 1; columnNumber: 1; Premature end of file.
```
What I've done for the moment, to get it working (and it seems to), is remove the offending line (40) from the `DETECT-ITEMIZE-lists.xml` sheet, and stick with 9-8-0-1:
```xml
'xslt-version' : xs:decimal($xslt-spec/@version),
```
2 questions:
1. What have I broken by doing this? :)
2. Any thoughts on the best way to fix the errors so I can add this back in? Probably by updating Saxon and the UCP macro sheet?
Thanks Wendell.https://gitlab.coko.foundation/XSweet/XSweet/-/issues/159Word and Zotero citation formats2019-02-12T23:25:31ZAlex ThegWord and Zotero citation formatsThe two citation formats that would be good to capture are:
1. References added with Word's own References tool
2. Zotero references in Word
This ticket describes the format of both kinds of references, and we should decide on a target ...The two citation formats that would be good to capture are:
1. References added with Word's own References tool
2. Zotero references in Word
This ticket describes the format of both kinds of references, and we should decide on a target output format.
# MSWord References
References added from within recent versions of Word use the following format:
## Inline reference callouts from `word/document.xml`
The visible text in a Word document looks like this. The parenthetical is an inserted citation.
![citation_screen](/uploads/f40ed6d0a034ceea430611958027cca0/citation_screen.png)
Users choose one reference style for the document (APA, Chicago, etc.), which determines the format both of the bibliography and the inline reference callouts.
This is the OOXML for a citation is as follows:
```xml
<w:sdt>
<w:sdtPr><w:id w:val="836273690"/><w:citation/></w:sdtPr>
<w:sdtContent>
<w:r><w:fldChar w:fldCharType="begin"/></w:r>
<w:r>
<w:instrText xml:space="preserve">CITATION Sam02 \l 1033
</w:instrText>
</w:r>
<w:r><w:fldChar w:fldCharType="separate"/></w:r>
<w:r>
<w:rPr><w:noProof/></w:rPr>
<w:t>(Paul, 2002)</w:t>
</w:r>
<w:r><w:fldChar w:fldCharType="end"/></w:r>
</w:sdtContent>
</w:sdt>
```
I believe the `Sam02` tag in this block - `<w:instrText xml:space="preserve">CITATION Sam02 \l 1033</w:instrText>` - is used to connect the callout to the corresponding full citation & data.
The above text is from the `word/document.xml` file. The full citation with all its data generally lives in a file called `customXML/item1.xml`. There's a bit more to it than that, but we'll consider it true for the purpose of this ticket.
## Full citation data
Citation data, stored in a `customXML` file, takes the general format of the following example:
```xml
<b:Sources SelectedStyle="/APASixthEditionOfficeOnline.xsl" StyleName="APA" xmlns:b="http://schemas.openxmlformats.org/officeDocument/2006/bibliography" xmlns="http://schemas.openxmlformats.org/officeDocument/2006/bibliography">
<b:Source>
<b:Tag>Sam02</b:Tag>
<b:SourceType>Book</b:SourceType>
<b:Guid>{A7E1436C-B7FE-2441-A039-CD8DA09E5981}</b:Guid>
<b:Author>
<b:Author>
<b:NameList>
<b:Person>
<b:Last>Paul</b:Last>
<b:First>Sam</b:First>
<b:Middle>Jim</b:Middle>
</b:Person>
</b:NameList>
</b:Author>
<b:Editor>
<b:NameList>
<b:Person>
<b:Last>Zimms</b:Last>
<b:First>Steven</b:First>
<b:Middle>Carl</b:Middle>
</b:Person>
</b:NameList>
</b:Editor>
<b:Translator>
<b:NameList>
<b:Person>
<b:Last>Slims</b:Last>
<b:First>Handy</b:First>
</b:Person>
</b:NameList>
</b:Translator>
</b:Author>
<b:Title>Cell</b:Title>
<b:City>Davis</b:City>
<b:StateProvince>CA</b:StateProvince>
<b:CountryRegion>USA</b:CountryRegion>
<b:Publisher>Avid</b:Publisher>
<b:Year>2002</b:Year>
<b:Volume>1</b:Volume>
<b:NumberVolumes>1</b:NumberVolumes>
<b:Pages>107</b:Pages>
<b:ShortTitle>C</b:ShortTitle>
<b:StandardNumber>2</b:StandardNumber>
<b:Edition>Fourth</b:Edition>
<b:Comments>Bob Loblaw's Law Blog</b:Comments>
<b:RefOrder>1</b:RefOrder>
</b:Source>
...
</b:Sources>
```
All of the references data lives inside a `<b:Sources>` tag. The `<Sources>` tag specifies namespaces and the Word document's specified bibliographic style. The chosen references style doesn't affect the format or content of the underlying source data.
```xml
<b:Sources
SelectedStyle="/APASixthEditionOfficeOnline.xsl" StyleName="APA"
xmlns:b="http://schemas.openxmlformats.org/officeDocument/2006/bibliography"
xmlns="http://schemas.openxmlformats.org/officeDocument/2006/bibliography"
>
```
or, as another example:
```
<b:Sources
SelectedStyle="/CHICAGO.XSL" StyleName="Chicago"
...continues as above
```
There are many different citation types available in Word, which share some of the same data tags but also have some of their own.
## MS Word reference bibliographies
Authors can choose to insert an automatically-generated bibliography into the Word document, based off of the references cited. As XSweet stands currently, this bibliography does not get extracted and is removed from the HTML. However, it should be relatively simple to ensure that the bibliography is extracted and passed throughas text into the HTML.
# Zotero citations
Zotero citations, inserted into Word documents via the Zotero plugin, can be set to appear as either endnotes or footnotes. They come out in the HTML exactly like any other endnotes and footnotes, with no special class or other identifiers. If there are existing, non-Zotero citation endnotes or footnotes in the .docx, the Zotero citations will mix in with them.
Zotero manages the text and format of the inserted references, and will also generate a bibliography from them. If a bibliography is inserted, it is extracted into the HTML as a simple series of `<p class="Bibliography" ...>` paragraphs, one for each source.
So unlike MS Word references, Zotero references embedded in Word are not tagged semantically.https://gitlab.coko.foundation/XSweet/XSweet/-/issues/160Warnings and errors with Saxon 9.9.1.12019-03-05T18:28:54ZAlex ThegWarnings and errors with Saxon 9.9.1.1XSweet currently ships with Saxon 9.8.0.1. However, upgrading to use 9.9.1.1 causes some warnings and errors.
First, the newest version of Saxon shows the following 3 warnings, twice from `handle-notes.xsl` and once in the `make-header-...XSweet currently ships with Saxon 9.8.0.1. However, upgrading to use 9.9.1.1 causes some warnings and errors.
First, the newest version of Saxon shows the following 3 warnings, twice from `handle-notes.xsl` and once in the `make-header-escalator-xslt.xsl` sheet:
```
Warning in xsl:variable/@select on line 38 column 69 of handle-notes.xsl:
SXWN9000: in {div[}:
The keyword 'div' in this context means 'child::div'. If this was intended, use
'child::div' or './div' to avoid this warning.
Warning in xsl:variable/@select on line 51 column 70 of handle-notes.xsl:
SXWN9000: in {div[}:
The keyword 'div' in this context means 'child::div'. If this was intended, use
'child::div' or './div' to avoid this warning.
Warning in xsl:apply-templates/@select in make-header-escalator-xslt.xsl:
SXWN9000: in {div[}:
The keyword 'div' in this context means 'child::div'. If this was intended, use
'child::div' or './div' to avoid this warning.
```
This is just a noisy warning and doesn't cause any problems, but might be worth addressing.
The `ucp-text-macros.xsl` sheet, though, causes a breaking error:
```
Error evaluating (($original, ...)) in xsl:sequence/@select on line 283 column 79 of ucp-text-macros.xsl:
FORX0002: Syntax error at char 8 in regular expression: Expected '{' after \112. Failed
while atomizing the result of template match="xsw:sequence". Failed while atomizing the
result of template match="xsw:sequence"
In template rule with match="element(Q{http://coko.foundation/xsweet}sequence)" on line 258 of ucp-text-macros.xsl
invoked by xsl:apply-templates at file:/Users/atheg/Desktop/x_demo/koosum/XSweet_runner_scripts/XSweet-master-2edfadb9b06cf13d4f70701da62af809c9d55136/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros.xsl#186
In template rule with match="element(Q{http://www.w3.org/1999/xhtml}p)" on line 173 of ucp-text-macros.xsl
invoked by built-in template rule (shallow-copy)
In template rule with match="text()[fn:not(...)]" on line 254 of ucp-text-macros.xsl
Syntax error at char 8 in regular expression: Expected '{' after \112. Failed while atomizing the result of template match="xsw:sequence". Failed while atomizing the result of template match="xsw:sequence"
Error on line 1 column 1 of test-11UCPTEXTED.xhtml:
SXXP0003: Error reported by XML parser: Premature end of file.
org.xml.sax.SAXParseException; systemId: file:/Users/atheg/Desktop/x_demo/koosum/XSweet_runner_scripts/XSweet-master-2edfadb9b06cf13d4f70701da62af809c9d55136/scripts/../outputs/test/test-11UCPTEXTED.xhtml; lineNumber: 1; columnNumber: 1; Premature end of file.
```
I'll examine further for a fix.https://gitlab.coko.foundation/XSweet/XSweet/-/issues/161Add warning for images not supported by browsers2019-05-03T18:45:49ZAlex ThegAdd warning for images not supported by browsersWord allows some embedded image formats that are not supported by web browsers (e.g. `.emf`).
When XSweet encounters exotic image formats like this, it may be worthwhile to add some kind of warning, be it:
* a warning to the STDOUT for ...Word allows some embedded image formats that are not supported by web browsers (e.g. `.emf`).
When XSweet encounters exotic image formats like this, it may be worthwhile to add some kind of warning, be it:
* a warning to the STDOUT for the processor
* an inline comment in the HTML
* perhaps even a literal DOM element warning
See https://gitlab.coko.foundation/xpub/xpub-epmc/issues/63 for an example.https://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/19Saxon processor and regex issues with {} vs {{}}2019-05-08T00:53:59ZAlex ThegSaxon processor and regex issues with {} vs {{}}@wendell I'm curious about this group of errors I ran into, that has to do with whether line 466 of the `ucp-text-macros.xsl` sheet uses single or double curly brackets in the regex:
https://gitlab.coko.foundation/XSweet/HTMLevator/blob...@wendell I'm curious about this group of errors I ran into, that has to do with whether line 466 of the `ucp-text-macros.xsl` sheet uses single or double curly brackets in the regex:
https://gitlab.coko.foundation/XSweet/HTMLevator/blob/08358bcc0cc60b69fbb508762438583d7d8c485e/applications/ucp-cleanup/ucp-text-macros.xsl#L466
Some versions of Saxon work only with the single brackets version of this sheet, but not with the double brackets, while others work only with the double brackets but not with single brackets.
This has been resolved for the moment but I'd like to understand it.
There are 3 different version of Saxon I've used:
* `SaxonHE9-8-0-1J`: what we'd bundled with XSweet core, but I recently replaced it with
* `SaxonHE9-9-1-1J`: currently included with XSweet core
* `Saxon/C 1.1.2`: built from Saxon 9.8.0.15. Currently used in a PHP implementation of XSweet that's supporting the .docx uploads for Editoria instances being used in production.
The change that started this was this:
```
<xsl:variable name="livechar">[^\s\p{Ps}\p{Pe}"']</xsl:variable>
```
being changed to have more curly brackets:
```
<xsl:variable name="livechar">[^\s\p{{Ps}}\p{{Pe}}"']</xsl:variable>
```
With this change, running XSweet with `SaxonHE9-8-0-1J` didn't work. `ucp-text-macros.xsl` failed to produce an output, throwing the following error:
```
[] : "([^\s\p{{Ps}}\p{{Pe}}"'])
Error at char 8 in xsl:sequence/@select on line 289 column 75 of ucp-text-macros.xsl:
FORX0002: Syntax error at char 9 in regular expression: Unknown character category: {Ps
at xsl:apply-templates (file:/Users/atheg/Desktop/runner_test/XSweet_runner_scripts/XSweet-master-3baf18b7dcc799a1a4a7a21c533404b8b100de0b/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros.xsl#269)
processing xsw:sequence/xsw:match[5]
at xsl:apply-templates (file:/Users/atheg/Desktop/runner_test/XSweet_runner_scripts/XSweet-master-3baf18b7dcc799a1a4a7a21c533404b8b100de0b/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros.xsl#509)
processing xsw:sequence
at xsl:apply-templates (file:/Users/atheg/Desktop/runner_test/XSweet_runner_scripts/XSweet-master-3baf18b7dcc799a1a4a7a21c533404b8b100de0b/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros.xsl#269)
processing sequence/munge-quotes[1]
at xsl:apply-templates (file:/Users/atheg/Desktop/runner_test/XSweet_runner_scripts/XSweet-master-3baf18b7dcc799a1a4a7a21c533404b8b100de0b/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros.xsl#186)
processing sequence
in built-in template rule for /html/body[1]/div[1]/p[1] in the unnamed mode
in built-in template rule for /html in the unnamed mode
Syntax error at char 9 in regular expression: Unknown character category: {Ps
Error on line 1 column 1 of jures_img-12UCPTEXTED.xhtml:
SXXP0003: Error reported by XML parser: Premature end of file.
org.xml.sax.SAXParseException; systemId: file:/Users/atheg/Desktop/runner_test/XSweet_runner_scripts/XSweet-master-3baf18b7dcc799a1a4a7a21c533404b8b100de0b/scripts/../outputs/jures_img/jures_img-12UCPTEXTED.xhtml; lineNumber: 1; columnNumber: 1; Premature end of file.
```
So the runner_scripts broke. I reverted this back to single curly brackets, so it worked with `SaxonHE9-8-0-1J` again.
But, then I tried to deploy it to the production servers using `Saxon/C 1.1.2`, and got the following error message:
```
[] : "([^\s\p\p"'])
Error evaluating (($original, ...)) in xsl:sequence/@select on line 289 column 75 of ucp-text-macros.xsl:
FORX0002: Syntax error at char 8 in regular expression: Expected '{' after \112. Failed
while atomizing the result of template match="xsw:sequence". Failed while atomizing the
result of template match="xsw:sequence"
In template rule with match="element(Q{http://coko.foundation/xsweet}sequence)" on line 258 of ucp-text-macros.xsl
invoked by xsl:apply-templates at file:/Users/atheg/Desktop/runner_test/XSweet_runner_scripts/XSweet-master-3baf18b7dcc799a1a4a7a21c533404b8b100de0b/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros.xsl#186
In template rule with match="element(Q{http://www.w3.org/1999/xhtml}p)" on line 173 of ucp-text-macros.xsl
invoked by built-in template rule (shallow-copy)
In template rule with match="text()[fn:not(...)]" on line 254 of ucp-text-macros.xsl
Syntax error at char 8 in regular expression: Expected '{' after \112. Failed while atomizing the result of template match="xsw:sequence". Failed while atomizing the result of template match="xsw:sequence"
```
Adding the extra "{}"s back fixed the problem.
`SaxonHE9-9-1-1J` behaved like `Saxon/C 1.1.2`: it worked with the double brackets and threw the same error about the single ones.
Since those servers pull from the `master` branches of the XSweet repos when they run an update task, I replaced `SaxonHE9-8-0-1J` with `SaxonHE-9-9-1-1J`.
The runner scripts work again, we're using the double brackets, and all is well in the world again. But I'd be really interested to hear what your change was about :)
Thanks Wendell!https://gitlab.coko.foundation/XSweet/XSweet/-/issues/62Handling whitespace-only formatting2019-07-07T21:34:20ZAlex ThegHandling whitespace-only formattingBakker ch1, see #56 for files
There are 5 headers of the same level, but one of theme doesn't get promoted like the others. Seems to be caused by a `<tab>` at the end of the heading "The heroic migrant and the end of migration".
T...Bakker ch1, see #56 for files
There are 5 headers of the same level, but one of theme doesn't get promoted like the others. Seems to be caused by a `<tab>` at the end of the heading "The heroic migrant and the end of migration".
These all get promoted to h1:
* Keeping the monies flowing the times of crises
* The limits of migrant inclusion
* Migration, state-led transnationalism, and development
* The Washington Consensus and beyond: the continuing significance of market fundamentalism in development policy and practice
This one doesn't:
* The heroic migrant and the end of migration
Here's one that gets promoted, just after join-elements and before the header promotion steps:
````html
<p class="Default" style="font-size: 12pt; font-style: italic; margin-bottom: 6pt"><i>The limits of migrant inclusion</i></p>
````
This is the offending tab (at least, I think it's the tab keeping this from being recognized as a header):
````html
<p class="Default" style="font-size: 12pt; margin-bottom: 6pt"><i>The heroic migrant and the end of migration</i>
<tab/>
</p>
````
Perhaps a cleaning step that strips out trailing tabs before promotion? I can't think where a trailing tab would ever be meaningful.1.0.0