XSweet issueshttps://gitlab.coko.foundation/groups/XSweet/-/issues2022-05-09T12:02:31Zhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/169Preserve tab spaces2022-05-09T12:02:31ZBharathydasanPreserve tab spacesThe tab characters are actually converted to `<span class="tab"> <!-- tab --></span>` element in initial conversion, but it is removed in the conversion pipeline, this happens at the `15EDITORIABASIC.xhtml` where the span elements are...The tab characters are actually converted to `<span class="tab"> <!-- tab --></span>` element in initial conversion, but it is removed in the conversion pipeline, this happens at the `15EDITORIABASIC.xhtml` where the span elements are removed. well, the track-changes for tab character are preserved properly with `<ins>` and `<del>` which was added earlier.
Only the elements `<pre>` and `<code>` in HTML can render the tab spaces. Here my question is how these tab spaces are handled in `Editoria` or `Wax`.https://gitlab.coko.foundation/XSweet/XSweet/-/issues/168test2020-05-09T16:10:56ZGhost Usertesttesttesthttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/167Support options for different handling of track changes2020-09-20T20:46:25ZAlex ThegSupport options for different handling of track changesNow that the basic functionality to extract character insertion/deletion track changes into HTML is part of the initial extraction sheet, it's easy to comment that bit out if that's not desired.
We should consider how to give options to...Now that the basic functionality to extract character insertion/deletion track changes into HTML is part of the initial extraction sheet, it's easy to comment that bit out if that's not desired.
We should consider how to give options to XSweet users who may want to handle track changes differently: accept or reject them all away, drop them at a later stage, etc.
* This could possibly be managed with runtime flags
* We could also have a standalone sheet in the pipeline that a user could tweak to specify how to handle track changes: accept them all, reject them all, etc. This could work by passing in a runtime option, commenting/uncommenting the desired sections, etc.https://gitlab.coko.foundation/XSweet/XSweet/-/issues/166Track changes for tables?2020-10-29T08:27:20ZAlex ThegTrack changes for tables?Tracking changes to tables came up in this week's discussion with @bharathydasan and @wendell. There may not ultimately be a need for this, but if there is, it can come after the initial TC implementation.
This could also be potentially...Tracking changes to tables came up in this week's discussion with @bharathydasan and @wendell. There may not ultimately be a need for this, but if there is, it can come after the initial TC implementation.
This could also be potentially very complex. @wendell suggested that as a way to limit the scope/complexity of a solution, a simple whole-table before/after comparison could be surfaced, rather than a granular representation of the specific changes.https://gitlab.coko.foundation/XSweet/XSweet/-/issues/165Paragraph-level formatting track changes2020-06-17T02:26:16ZAlex ThegParagraph-level formatting track changesThis is a placeholder ticket for handling paragraph-level track changes. See also #162 and #164 for additional context.
Paragraph-level formatting changes will be ignored for a first implementation of track changes. However, they may me...This is a placeholder ticket for handling paragraph-level track changes. See also #162 and #164 for additional context.
Paragraph-level formatting changes will be ignored for a first implementation of track changes. However, they may merit revisiting at some point. From @bharathydasan:
> there are few things stored in paragraph properties such as center, left, and right alignments that need attentionhttps://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/20Make list extraction and list detection consistent with each other2019-07-28T21:49:37ZAlex ThegMake list extraction and list detection consistent with each other* `XSweet/applications/list-promote/PROMOTE-lists.xsl` rebuilds true lists from Word as HTML lists.
* `HTMLevator/applications/list-detect/DETECT-ITEMIZE-LISTS.xsl` detects plain-text versions of numbered lists, and creates true HTML `ol...* `XSweet/applications/list-promote/PROMOTE-lists.xsl` rebuilds true lists from Word as HTML lists.
* `HTMLevator/applications/list-detect/DETECT-ITEMIZE-LISTS.xsl` detects plain-text versions of numbered lists, and creates true HTML `ol`s out of them.
Their functionality should be standardized:
* List extraction puts the list item text inside a `p`. List detection does not wrap the list item text in a `p`.
* List extraction also extracts some additional information from the .docx, including list level. While it's not present on detected lists, the detection sheet should at least not remove these attributes, but it does.
[List_test_2-7PROMOTELISTS.xhtml](/uploads/8d4c60289eaa849a47406437f31f0787/List_test_2-7PROMOTELISTS.xhtml)
[List_test_2-8DETECTLISTS.xhtml](/uploads/10cc8cdca140e52abed28d68602c3ed8/List_test_2-8DETECTLISTS.xhtml)https://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/19Saxon processor and regex issues with {} vs {{}}2019-05-08T00:53:59ZAlex ThegSaxon processor and regex issues with {} vs {{}}@wendell I'm curious about this group of errors I ran into, that has to do with whether line 466 of the `ucp-text-macros.xsl` sheet uses single or double curly brackets in the regex:
https://gitlab.coko.foundation/XSweet/HTMLevator/blob...@wendell I'm curious about this group of errors I ran into, that has to do with whether line 466 of the `ucp-text-macros.xsl` sheet uses single or double curly brackets in the regex:
https://gitlab.coko.foundation/XSweet/HTMLevator/blob/08358bcc0cc60b69fbb508762438583d7d8c485e/applications/ucp-cleanup/ucp-text-macros.xsl#L466
Some versions of Saxon work only with the single brackets version of this sheet, but not with the double brackets, while others work only with the double brackets but not with single brackets.
This has been resolved for the moment but I'd like to understand it.
There are 3 different version of Saxon I've used:
* `SaxonHE9-8-0-1J`: what we'd bundled with XSweet core, but I recently replaced it with
* `SaxonHE9-9-1-1J`: currently included with XSweet core
* `Saxon/C 1.1.2`: built from Saxon 9.8.0.15. Currently used in a PHP implementation of XSweet that's supporting the .docx uploads for Editoria instances being used in production.
The change that started this was this:
```
<xsl:variable name="livechar">[^\s\p{Ps}\p{Pe}"']</xsl:variable>
```
being changed to have more curly brackets:
```
<xsl:variable name="livechar">[^\s\p{{Ps}}\p{{Pe}}"']</xsl:variable>
```
With this change, running XSweet with `SaxonHE9-8-0-1J` didn't work. `ucp-text-macros.xsl` failed to produce an output, throwing the following error:
```
[] : "([^\s\p{{Ps}}\p{{Pe}}"'])
Error at char 8 in xsl:sequence/@select on line 289 column 75 of ucp-text-macros.xsl:
FORX0002: Syntax error at char 9 in regular expression: Unknown character category: {Ps
at xsl:apply-templates (file:/Users/atheg/Desktop/runner_test/XSweet_runner_scripts/XSweet-master-3baf18b7dcc799a1a4a7a21c533404b8b100de0b/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros.xsl#269)
processing xsw:sequence/xsw:match[5]
at xsl:apply-templates (file:/Users/atheg/Desktop/runner_test/XSweet_runner_scripts/XSweet-master-3baf18b7dcc799a1a4a7a21c533404b8b100de0b/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros.xsl#509)
processing xsw:sequence
at xsl:apply-templates (file:/Users/atheg/Desktop/runner_test/XSweet_runner_scripts/XSweet-master-3baf18b7dcc799a1a4a7a21c533404b8b100de0b/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros.xsl#269)
processing sequence/munge-quotes[1]
at xsl:apply-templates (file:/Users/atheg/Desktop/runner_test/XSweet_runner_scripts/XSweet-master-3baf18b7dcc799a1a4a7a21c533404b8b100de0b/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros.xsl#186)
processing sequence
in built-in template rule for /html/body[1]/div[1]/p[1] in the unnamed mode
in built-in template rule for /html in the unnamed mode
Syntax error at char 9 in regular expression: Unknown character category: {Ps
Error on line 1 column 1 of jures_img-12UCPTEXTED.xhtml:
SXXP0003: Error reported by XML parser: Premature end of file.
org.xml.sax.SAXParseException; systemId: file:/Users/atheg/Desktop/runner_test/XSweet_runner_scripts/XSweet-master-3baf18b7dcc799a1a4a7a21c533404b8b100de0b/scripts/../outputs/jures_img/jures_img-12UCPTEXTED.xhtml; lineNumber: 1; columnNumber: 1; Premature end of file.
```
So the runner_scripts broke. I reverted this back to single curly brackets, so it worked with `SaxonHE9-8-0-1J` again.
But, then I tried to deploy it to the production servers using `Saxon/C 1.1.2`, and got the following error message:
```
[] : "([^\s\p\p"'])
Error evaluating (($original, ...)) in xsl:sequence/@select on line 289 column 75 of ucp-text-macros.xsl:
FORX0002: Syntax error at char 8 in regular expression: Expected '{' after \112. Failed
while atomizing the result of template match="xsw:sequence". Failed while atomizing the
result of template match="xsw:sequence"
In template rule with match="element(Q{http://coko.foundation/xsweet}sequence)" on line 258 of ucp-text-macros.xsl
invoked by xsl:apply-templates at file:/Users/atheg/Desktop/runner_test/XSweet_runner_scripts/XSweet-master-3baf18b7dcc799a1a4a7a21c533404b8b100de0b/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros.xsl#186
In template rule with match="element(Q{http://www.w3.org/1999/xhtml}p)" on line 173 of ucp-text-macros.xsl
invoked by built-in template rule (shallow-copy)
In template rule with match="text()[fn:not(...)]" on line 254 of ucp-text-macros.xsl
Syntax error at char 8 in regular expression: Expected '{' after \112. Failed while atomizing the result of template match="xsw:sequence". Failed while atomizing the result of template match="xsw:sequence"
```
Adding the extra "{}"s back fixed the problem.
`SaxonHE9-9-1-1J` behaved like `Saxon/C 1.1.2`: it worked with the double brackets and threw the same error about the single ones.
Since those servers pull from the `master` branches of the XSweet repos when they run an update task, I replaced `SaxonHE9-8-0-1J` with `SaxonHE-9-9-1-1J`.
The runner scripts work again, we're using the double brackets, and all is well in the world again. But I'd be really interested to hear what your change was about :)
Thanks Wendell!https://gitlab.coko.foundation/XSweet/XSweet/-/issues/161Add warning for images not supported by browsers2019-05-03T18:45:49ZAlex ThegAdd warning for images not supported by browsersWord allows some embedded image formats that are not supported by web browsers (e.g. `.emf`).
When XSweet encounters exotic image formats like this, it may be worthwhile to add some kind of warning, be it:
* a warning to the STDOUT for ...Word allows some embedded image formats that are not supported by web browsers (e.g. `.emf`).
When XSweet encounters exotic image formats like this, it may be worthwhile to add some kind of warning, be it:
* a warning to the STDOUT for the processor
* an inline comment in the HTML
* perhaps even a literal DOM element warning
See https://gitlab.coko.foundation/xpub/xpub-epmc/issues/63 for an example.https://gitlab.coko.foundation/XSweet/XSweet/-/issues/160Warnings and errors with Saxon 9.9.1.12019-03-05T18:28:54ZAlex ThegWarnings and errors with Saxon 9.9.1.1XSweet currently ships with Saxon 9.8.0.1. However, upgrading to use 9.9.1.1 causes some warnings and errors.
First, the newest version of Saxon shows the following 3 warnings, twice from `handle-notes.xsl` and once in the `make-header-...XSweet currently ships with Saxon 9.8.0.1. However, upgrading to use 9.9.1.1 causes some warnings and errors.
First, the newest version of Saxon shows the following 3 warnings, twice from `handle-notes.xsl` and once in the `make-header-escalator-xslt.xsl` sheet:
```
Warning in xsl:variable/@select on line 38 column 69 of handle-notes.xsl:
SXWN9000: in {div[}:
The keyword 'div' in this context means 'child::div'. If this was intended, use
'child::div' or './div' to avoid this warning.
Warning in xsl:variable/@select on line 51 column 70 of handle-notes.xsl:
SXWN9000: in {div[}:
The keyword 'div' in this context means 'child::div'. If this was intended, use
'child::div' or './div' to avoid this warning.
Warning in xsl:apply-templates/@select in make-header-escalator-xslt.xsl:
SXWN9000: in {div[}:
The keyword 'div' in this context means 'child::div'. If this was intended, use
'child::div' or './div' to avoid this warning.
```
This is just a noisy warning and doesn't cause any problems, but might be worth addressing.
The `ucp-text-macros.xsl` sheet, though, causes a breaking error:
```
Error evaluating (($original, ...)) in xsl:sequence/@select on line 283 column 79 of ucp-text-macros.xsl:
FORX0002: Syntax error at char 8 in regular expression: Expected '{' after \112. Failed
while atomizing the result of template match="xsw:sequence". Failed while atomizing the
result of template match="xsw:sequence"
In template rule with match="element(Q{http://coko.foundation/xsweet}sequence)" on line 258 of ucp-text-macros.xsl
invoked by xsl:apply-templates at file:/Users/atheg/Desktop/x_demo/koosum/XSweet_runner_scripts/XSweet-master-2edfadb9b06cf13d4f70701da62af809c9d55136/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros.xsl#186
In template rule with match="element(Q{http://www.w3.org/1999/xhtml}p)" on line 173 of ucp-text-macros.xsl
invoked by built-in template rule (shallow-copy)
In template rule with match="text()[fn:not(...)]" on line 254 of ucp-text-macros.xsl
Syntax error at char 8 in regular expression: Expected '{' after \112. Failed while atomizing the result of template match="xsw:sequence". Failed while atomizing the result of template match="xsw:sequence"
Error on line 1 column 1 of test-11UCPTEXTED.xhtml:
SXXP0003: Error reported by XML parser: Premature end of file.
org.xml.sax.SAXParseException; systemId: file:/Users/atheg/Desktop/x_demo/koosum/XSweet_runner_scripts/XSweet-master-2edfadb9b06cf13d4f70701da62af809c9d55136/scripts/../outputs/test/test-11UCPTEXTED.xhtml; lineNumber: 1; columnNumber: 1; Premature end of file.
```
I'll examine further for a fix.https://gitlab.coko.foundation/XSweet/XSweet/-/issues/159Word and Zotero citation formats2019-02-12T23:25:31ZAlex ThegWord and Zotero citation formatsThe two citation formats that would be good to capture are:
1. References added with Word's own References tool
2. Zotero references in Word
This ticket describes the format of both kinds of references, and we should decide on a target ...The two citation formats that would be good to capture are:
1. References added with Word's own References tool
2. Zotero references in Word
This ticket describes the format of both kinds of references, and we should decide on a target output format.
# MSWord References
References added from within recent versions of Word use the following format:
## Inline reference callouts from `word/document.xml`
The visible text in a Word document looks like this. The parenthetical is an inserted citation.
![citation_screen](/uploads/f40ed6d0a034ceea430611958027cca0/citation_screen.png)
Users choose one reference style for the document (APA, Chicago, etc.), which determines the format both of the bibliography and the inline reference callouts.
This is the OOXML for a citation is as follows:
```xml
<w:sdt>
<w:sdtPr><w:id w:val="836273690"/><w:citation/></w:sdtPr>
<w:sdtContent>
<w:r><w:fldChar w:fldCharType="begin"/></w:r>
<w:r>
<w:instrText xml:space="preserve">CITATION Sam02 \l 1033
</w:instrText>
</w:r>
<w:r><w:fldChar w:fldCharType="separate"/></w:r>
<w:r>
<w:rPr><w:noProof/></w:rPr>
<w:t>(Paul, 2002)</w:t>
</w:r>
<w:r><w:fldChar w:fldCharType="end"/></w:r>
</w:sdtContent>
</w:sdt>
```
I believe the `Sam02` tag in this block - `<w:instrText xml:space="preserve">CITATION Sam02 \l 1033</w:instrText>` - is used to connect the callout to the corresponding full citation & data.
The above text is from the `word/document.xml` file. The full citation with all its data generally lives in a file called `customXML/item1.xml`. There's a bit more to it than that, but we'll consider it true for the purpose of this ticket.
## Full citation data
Citation data, stored in a `customXML` file, takes the general format of the following example:
```xml
<b:Sources SelectedStyle="/APASixthEditionOfficeOnline.xsl" StyleName="APA" xmlns:b="http://schemas.openxmlformats.org/officeDocument/2006/bibliography" xmlns="http://schemas.openxmlformats.org/officeDocument/2006/bibliography">
<b:Source>
<b:Tag>Sam02</b:Tag>
<b:SourceType>Book</b:SourceType>
<b:Guid>{A7E1436C-B7FE-2441-A039-CD8DA09E5981}</b:Guid>
<b:Author>
<b:Author>
<b:NameList>
<b:Person>
<b:Last>Paul</b:Last>
<b:First>Sam</b:First>
<b:Middle>Jim</b:Middle>
</b:Person>
</b:NameList>
</b:Author>
<b:Editor>
<b:NameList>
<b:Person>
<b:Last>Zimms</b:Last>
<b:First>Steven</b:First>
<b:Middle>Carl</b:Middle>
</b:Person>
</b:NameList>
</b:Editor>
<b:Translator>
<b:NameList>
<b:Person>
<b:Last>Slims</b:Last>
<b:First>Handy</b:First>
</b:Person>
</b:NameList>
</b:Translator>
</b:Author>
<b:Title>Cell</b:Title>
<b:City>Davis</b:City>
<b:StateProvince>CA</b:StateProvince>
<b:CountryRegion>USA</b:CountryRegion>
<b:Publisher>Avid</b:Publisher>
<b:Year>2002</b:Year>
<b:Volume>1</b:Volume>
<b:NumberVolumes>1</b:NumberVolumes>
<b:Pages>107</b:Pages>
<b:ShortTitle>C</b:ShortTitle>
<b:StandardNumber>2</b:StandardNumber>
<b:Edition>Fourth</b:Edition>
<b:Comments>Bob Loblaw's Law Blog</b:Comments>
<b:RefOrder>1</b:RefOrder>
</b:Source>
...
</b:Sources>
```
All of the references data lives inside a `<b:Sources>` tag. The `<Sources>` tag specifies namespaces and the Word document's specified bibliographic style. The chosen references style doesn't affect the format or content of the underlying source data.
```xml
<b:Sources
SelectedStyle="/APASixthEditionOfficeOnline.xsl" StyleName="APA"
xmlns:b="http://schemas.openxmlformats.org/officeDocument/2006/bibliography"
xmlns="http://schemas.openxmlformats.org/officeDocument/2006/bibliography"
>
```
or, as another example:
```
<b:Sources
SelectedStyle="/CHICAGO.XSL" StyleName="Chicago"
...continues as above
```
There are many different citation types available in Word, which share some of the same data tags but also have some of their own.
## MS Word reference bibliographies
Authors can choose to insert an automatically-generated bibliography into the Word document, based off of the references cited. As XSweet stands currently, this bibliography does not get extracted and is removed from the HTML. However, it should be relatively simple to ensure that the bibliography is extracted and passed throughas text into the HTML.
# Zotero citations
Zotero citations, inserted into Word documents via the Zotero plugin, can be set to appear as either endnotes or footnotes. They come out in the HTML exactly like any other endnotes and footnotes, with no special class or other identifiers. If there are existing, non-Zotero citation endnotes or footnotes in the .docx, the Zotero citations will mix in with them.
Zotero manages the text and format of the inserted references, and will also generate a bibliography from them. If a bibliography is inserted, it is extracted into the HTML as a simple series of `<p class="Bibliography" ...>` paragraphs, one for each source.
So unlike MS Word references, Zotero references embedded in Word are not tagged semantically.https://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/18Error from 2-in-1 detect list sheet / Saxon version issue2018-12-19T20:11:42ZAlex ThegError from 2-in-1 detect list sheet / Saxon version issue@wendell I've got a question I'm stuck on for you:
I modified the `itemize-lists` sheet to add these detected itemized lists and wrap them in `ol`s. I also chained the two other sheets into one `DETECT-ITEMIZE-lists.xsl` sheet. That wor...@wendell I've got a question I'm stuck on for you:
I modified the `itemize-lists` sheet to add these detected itemized lists and wrap them in `ol`s. I also chained the two other sheets into one `DETECT-ITEMIZE-lists.xsl` sheet. That works in the IDE but not with my scripts. It seems to be a Saxon version issue: the IDE uses Saxon-HE 9-8-0-12, but the scripts use SaxonHE 9-8-0-1.
Using the original SaxonHE 9-8-0-1 processor with the scripts, I get this error:
```
Error at char 12 in xsl:variable/@select on line 42 column 48 of DETECT-ITEMIZE-lists.xsl:
FOXT0002: The transform option xslt-version is higher than the XSLT version supported by
this processor
at xsl:apply-templates (file:/Users/atheg/Desktop/lists_develop/recent_commit/test/XSweet_runner_scripts/XSweet-master-36ec4971e6213e2891146e67b2be5efe570a4484/scripts/../applications/htmlevator/applications/list-detect/DETECT-ITEMIZE-lists.xsl#28)
processing /xsw:transform
The transform option xslt-version is higher than the XSLT version supported by this processor
```
Next, by swapping in 9-8-0-12 or higher in the scripts, the above complaint goes away, but then I get a new one about the UCP cleanups, whic doesn't process:
```
Error at char 8 in xsl:sequence/@select on line 283 column 79 of ucp-text-macros-new.xsl:
FORX0002: Syntax error at char 8 in regular expression: Expected '{' after \112
at xsl:apply-templates (file:/Users/atheg/Desktop/lists_develop/recent_commit/test/XSweet_runner_scripts/XSweet-master-36ec4971e6213e2891146e67b2be5efe570a4484/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros-new.xsl#269)
processing xsw:sequence/xsw:match[5]
at xsl:apply-templates (file:/Users/atheg/Desktop/lists_develop/recent_commit/test/XSweet_runner_scripts/XSweet-master-36ec4971e6213e2891146e67b2be5efe570a4484/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros-new.xsl#503)
processing xsw:sequence
at xsl:apply-templates (file:/Users/atheg/Desktop/lists_develop/recent_commit/test/XSweet_runner_scripts/XSweet-master-36ec4971e6213e2891146e67b2be5efe570a4484/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros-new.xsl#269)
processing sequence/munge-quotes[1]
at xsl:apply-templates (file:/Users/atheg/Desktop/lists_develop/recent_commit/test/XSweet_runner_scripts/XSweet-master-36ec4971e6213e2891146e67b2be5efe570a4484/scripts/../applications/htmlevator/applications/ucp-cleanup/ucp-text-macros-new.xsl#186)
processing sequence
in built-in template rule for /html/body[1]/div[1]/p[1] in the unnamed mode
in built-in template rule for /html in the unnamed mode
Syntax error at char 8 in regular expression: Expected '{' after \112
Error on line 1 column 1 of List_test_2-11UCPTEXTED.xhtml:
SXXP0003: Error reported by XML parser: Premature end of file.
org.xml.sax.SAXParseException; systemId: file:/Users/atheg/Desktop/lists_develop/recent_commit/test/XSweet_runner_scripts/XSweet-master-36ec4971e6213e2891146e67b2be5efe570a4484/scripts/../outputs/lists/List_test_2-11UCPTEXTED.xhtml; lineNumber: 1; columnNumber: 1; Premature end of file.
```
What I've done for the moment, to get it working (and it seems to), is remove the offending line (40) from the `DETECT-ITEMIZE-lists.xml` sheet, and stick with 9-8-0-1:
```xml
'xslt-version' : xs:decimal($xslt-spec/@version),
```
2 questions:
1. What have I broken by doing this? :)
2. Any thoughts on the best way to fix the errors so I can add this back in? Probably by updating Saxon and the UCP macro sheet?
Thanks Wendell.https://gitlab.coko.foundation/XSweet/editoria_typescript/-/issues/40Preserve inline formatting on note callouts2018-10-15T19:51:48ZAlex ThegPreserve inline formatting on note calloutsThe editors at UCP have found a strange bug related to how notes appear in Wax when the note callouts (the inline note numbers in the main content) have inline formatting applied to them (underlining, italics, bold, etc.).
The root caus...The editors at UCP have found a strange bug related to how notes appear in Wax when the note callouts (the inline note numbers in the main content) have inline formatting applied to them (underlining, italics, bold, etc.).
The root cause of the issue is a bug with the Substance Javascript library that Wax is built with, but we can avoid the bug altogether with a tweak to XSweet:
If a note callout has inline formatting tags applied to it, those tags are preserved in the final HTML extraction from XSweet core. But, when Typescript transforms the note and callout formats from their HTML to the Wax-specific note and callout format, any inline formatting tags applied on the note callout are dropped.
The fix for the bug is to update Typescript to preserve inline formatting tags on the note callouts all the way through Typescript.
Here's an example of an instance that causes the bug in Substance:
The original Word .docx here has an italicized note callout. The whole paragraph is italicized, with italics before, on, and after the note callout:
![Screen_Shot_2018-09-24_at_10.32.27_AM](/uploads/e988514d9dd04aa5d4e39766e61cf187/Screen_Shot_2018-09-24_at_10.32.27_AM.png)
Here's the result of XSweet Core - the inline italics on the note callout is still there (the `<i>s` inside the `< span class=EndnoteReference">`) :
```html
<p style="font-size: 14pt">
<span style="font-size: 14pt">“
<i>When you go out to battle... that makes war with you, until it is subdued.</i>
<span class="EndnoteReference">
<i><a class="endnoteReference" href="#en6">6</a></i>
</span>
<i>“</i>
</span>
</p>
```
And, you can see that in this final result that's ported into Editoria, the `<i>` tags have all been converted to `<em>` tags, and the tags within the note callout have been dropped:
```html
<p>"
<em>When you go out to battle that makes war with you, until it is subdued.
</em>
<note data-id="en6" />
<em>"</em>
</p>
```
The request for this ticket is: if there are `<u>`, `<b>`, or `<em>` tags enclosing a note callout, preserve those through Typescript.https://gitlab.coko.foundation/XSweet/XSweet/-/issues/158Some more Word detritus2018-10-10T04:56:20ZWendell PiezSome more Word detritusTo catch WordML elements so far unaccounted for -- we should consider the following matching and whether there isn't info to be captured e.g. from `caps` or `highlight`. The rest should be cleaned up in a "scrub" phase:
```
<xsl:templat...To catch WordML elements so far unaccounted for -- we should consider the following matching and whether there isn't info to be captured e.g. from `caps` or `highlight`. The rest should be cleaned up in a "scrub" phase:
```
<xsl:template match="noProof | iCs">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="caps | spacing | highlight | webHidden">
<span class="{local-name()}">
<xsl:apply-templates/>
</span>
</xsl:template>
```https://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/17Add another numeral span rule2018-11-05T23:07:14ZAlex ThegAdd another numeral span ruleUCP encountered the following situation, which requires a new rule to catch:
* Author's original: `digit space en-dash space digit`
* This rule was applied: `space en-dash space` becomes `space em-dash space`
* The final result was this...UCP encountered the following situation, which requires a new rule to catch:
* Author's original: `digit space en-dash space digit`
* This rule was applied: `space en-dash space` becomes `space em-dash space`
* The final result was this: `digit space em-dash space digit`
The current numeral span rule is that `hyphen/minus digit` gets converted to `en-dash digit`.
To it, we should add this rule: `digit space em-dash space digit` becomes `digit en-dash digit`https://gitlab.coko.foundation/XSweet/XSweet/-/issues/154Extract math from Word2021-01-21T11:56:02ZAlex ThegExtract math from WordIt looks like there are 2 main ways of embedding math into .docx files (other than plain text):
1. Using the built-in equation editor. This uses a tag XML structure - no binaries, it's all inline:
```xml
<m:oMathPara>
<m:oMath>
```
2. ...It looks like there are 2 main ways of embedding math into .docx files (other than plain text):
1. Using the built-in equation editor. This uses a tag XML structure - no binaries, it's all inline:
```xml
<m:oMathPara>
<m:oMath>
```
2. MathType, the most common math add-on for Word, which uses math binaries that need to be extracted.
For both of these, we should be representing these in MathML (as the standard for HTML5). It looks like we will have to define the mapping for the first option, which could be pretty time consuming. For MathType, we'll need to convert the binaries. @jure's made a ruby gem that converts from MathType to MathML. It may be that we'll need to do a rewrite of this to use it, but it could be a helpful resource.Alex ThegAlex Theghttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/153Update XSweet to work with outline and list HTML attributes2018-08-07T12:51:33ZAlex ThegUpdate XSweet to work with outline and list HTML attributesAfter #152 is finished, and these are added as HTML attributes:
* `-xsweet-outline-level`
* `-xsweet-list-level`
We will need to:
1. Update how list handling works to use the new attribute
2. Update where heading promotion looks for thi...After #152 is finished, and these are added as HTML attributes:
* `-xsweet-outline-level`
* `-xsweet-list-level`
We will need to:
1. Update how list handling works to use the new attribute
2. Update where heading promotion looks for this data
3. Remove the above information from the CSS `style`https://gitlab.coko.foundation/XSweet/XSweet/-/issues/152Semantic data mixed with style data?2018-08-30T07:25:11ZBruno Herfsthello@brunoherfst.comSemantic data mixed with style data?I noticed that XSweet saves semantic data as style info:
<p style="font-family: Tahoma; font-size: 18pt; -xsweet-outline-level: 1">
Should `-xsweet-outline-level` become a `data-*` attribute?
<p style="font-family: Tahoma; fon...I noticed that XSweet saves semantic data as style info:
<p style="font-family: Tahoma; font-size: 18pt; -xsweet-outline-level: 1">
Should `-xsweet-outline-level` become a `data-*` attribute?
<p style="font-family: Tahoma; font-size: 18pt;" data-xsweet-outline-level="1">https://gitlab.coko.foundation/XSweet/XSweet/-/issues/151Update binary references to use extracted copies, rather than originals2018-08-08T09:07:37ZAlex ThegUpdate binary references to use extracted copies, rather than originalsThings like embedded images, media, and math are all stored in the .docx directory. For the HTML extraction, these files should be copied over to the same directory as the HTML files. That way, they're easily accessible, and the HTML doe...Things like embedded images, media, and math are all stored in the .docx directory. For the HTML extraction, these files should be copied over to the same directory as the HTML files. That way, they're easily accessible, and the HTML doesn't require the input .docx file to stay where it originally was. However, XSLT doesn't allow for file system manipulation by itself. That task will fall to INK, which is slated to be rebuilt in JavaScript (rather than RoR). Once that is complete, XSweet should be updated to reference copies of the binaries in the output directory, rather than directly referencing the binaries of the original .docx file.https://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/16Test heading promotion method chooser2018-07-26T22:00:41ZAlex ThegTest heading promotion method chooserThe criteria used for determining the heading promotion method is as follows:
* If the extracted HTML contains 2 or more `xsweet-outline-level` properties, then use the outline-level heading promotion
* Else, use the property-based clas...The criteria used for determining the heading promotion method is as follows:
* If the extracted HTML contains 2 or more `xsweet-outline-level` properties, then use the outline-level heading promotion
* Else, use the property-based classic method
These are OK for now, but could benefit from further refinement.
This is carried forward from the previous issue: https://gitlab.coko.foundation/XSweet/XSweet/issues/123https://gitlab.coko.foundation/XSweet/XSweet/-/issues/150Hyperlinks in footnotes broken2018-07-27T16:58:45ZBruno Herfsthello@brunoherfst.comHyperlinks in footnotes brokenHyperlinks in footnotes become internal DOC reference:
<a href="../customXml/item1.xml">
Expected it to be:
<a href="http://www.example.com">
[footnote-hyperlink.docx](/uploads/c03e49e543009d239dd03b2fb3606dca/footnote-hyperl...Hyperlinks in footnotes become internal DOC reference:
<a href="../customXml/item1.xml">
Expected it to be:
<a href="http://www.example.com">
[footnote-hyperlink.docx](/uploads/c03e49e543009d239dd03b2fb3606dca/footnote-hyperlink.docx)