XSweet issueshttps://gitlab.coko.foundation/groups/XSweet/-/issues2022-05-12T11:46:58Zhttps://gitlab.coko.foundation/XSweet/editoria_typescript/-/issues/48Split docx files at Heading 12022-05-12T11:46:58ZRyan Dix-PeekSplit docx files at Heading 1**Description;** the purpose of this task is to support the splitting of a single docx file imported in Editoria. Editoria supports importing a single docx file per chapter. This solution will allow a user to select an option to split a ...**Description;** the purpose of this task is to support the splitting of a single docx file imported in Editoria. Editoria supports importing a single docx file per chapter. This solution will allow a user to select an option to split a single docx file into multiple chapters.
![Screenshot_2021-11-17_at_14.31.31](/uploads/72f5f3a3d887b57b380155d1bde5aaaa/Screenshot_2021-11-17_at_14.31.31.png)
The proposed solution; to introduce an XSweet pipeline that ~~would parse out files at an `h1` level~~ [see below comment] using XSLTs. A user interface to handle the selection of a 1) multiple or 2) single docx import. If a single docx is selected, the uploaded file will be split into chapters using ~~a `Heading 1` as a reference tag~~ (see comment below) to action the chapter split.
Reference; Editoria docx splitter issue; https://gitlab.coko.foundation/editoria/editoria/-/issues/481
**Acceptance criteria;**
- a new Xsweet pipeline will convert a single file into multiple chapters ~~split by `h1`~~ (see below comment).
- the pipeline that handles the conversation on multiple `h1` into `h2` and `h4` should be disabled for a single docx import; see #46Suki VenkatSuki Venkathttps://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/21Reference lists imported with extra redundant numbering / Literal numbers not...2021-12-23T15:38:23ZBharathydasanReference lists imported with extra redundant numbering / Literal numbers not removed after list detection.
While detecting and converting the normal paragraphs to a list, literal list numbers are not removed from the text. This should be fixed in the scrub function (`scrub-literal-numbering-lists.xsl`), This issue is reported in [kotahi-711]...
While detecting and converting the normal paragraphs to a list, literal list numbers are not removed from the text. This should be fixed in the scrub function (`scrub-literal-numbering-lists.xsl`), This issue is reported in [kotahi-711](https://gitlab.coko.foundation/kotahi/kotahi/-/issues/711)
![image](/uploads/73eb808fae75aa8f84bca6bfab498a81/image.png)
@BenWh @ryandixpeekBharathydasanBharathydasanhttps://gitlab.coko.foundation/XSweet/editoria_typescript/-/issues/49Retain image file names on import2022-04-05T05:50:36ZRyan Dix-PeekRetain image file names on import**Description;** the purpose of this task is to retain image file names for images imported into Kotahi via MS Word file. MS Word files store the image names, these names need to be retained for all converted images in the object store.**Description;** the purpose of this task is to retain image file names for images imported into Kotahi via MS Word file. MS Word files store the image names, these names need to be retained for all converted images in the object store.BharathydasanBharathydasanhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/173Copyediting cleanups are not suitable for Spanish language2022-05-16T09:15:37ZSofia OlguinCopyediting cleanups are not suitable for Spanish language## Context
The [HTMLevator copyediting cleanups](https://xsweet.org/documentation/htmlevator/) does the following:
>Any number of spaces before or after em dashes are removed
This is suitable for English language texts, but in Spani...## Context
The [HTMLevator copyediting cleanups](https://xsweet.org/documentation/htmlevator/) does the following:
>Any number of spaces before or after em dashes are removed
This is suitable for English language texts, but in Spanish this generates errors in all the dialogs. In Spanish, the dialogs are written this:
—Hola —dijo el joven.
To reproduce:
- upload the attached word file in Editoria
- Check the chapter in Editoria and see how the space character disappear.
[dashSpace.docx](/uploads/7af2653f4e107dcf0ba33789e6b1ceb7/dashSpace.docx)
## Suggested solution
The HTMLevator copyediting cleanups should be configureable to support:
* difference uses cases between languages
* diffrerent use cases between Editorial house style guidesDione Mentisdione@coko.foundationDione Mentisdione@coko.foundationhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/174Deploy an example service2022-04-28T05:54:42ZAdam Hydeadam@coko.foundationDeploy an example serviceWe have long needed a web based service to run from (linked from) xsweet.org where folks can test uploading a docx file and see the results.We have long needed a web based service to run from (linked from) xsweet.org where folks can test uploading a docx file and see the results.Ryan Dix-PeekRyan Dix-Peekhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/175XSweet is broken2022-05-10T04:55:01ZAlexandros GeorgantasXSweet is brokenWhen I try to build `job-xsweet` or try to use `xsweet-service` I am getting the following error.
Branch `wax2`
```
Error: Command failed: sh /home/node/xsweet/scripts/execute_chain.sh /home/node/xsweet/_conversion-221zp50jANrtTTM
se...When I try to build `job-xsweet` or try to use `xsweet-service` I am getting the following error.
Branch `wax2`
```
Error: Command failed: sh /home/node/xsweet/scripts/execute_chain.sh /home/node/xsweet/_conversion-221zp50jANrtTTM
server_1 | Error near {...es(../../pic:nvPicPr/pic:cN...} at char 25 in xsl:when/@test on line 814 column 137 of docx-html-extract.xsl:
server_1 | XPST0081 Namespace prefix 'pic' has not been declared
server_1 | Error in {../../pic:nvPicPr/pic:cNvPr} at char 17 in xsl:variable/@select on line 815 column 91 of docx-html-extract.xsl:
server_1 | XPST0081 Namespace prefix 'pic' has not been declared
server_1 | Error in {../../pic:nvPicPr/pic:cNvPr} at char 17 in xsl:when/@test on line 818 column 62 of docx-html-extract.xsl:
server_1 | XPST0081 Namespace prefix 'pic' has not been declared
server_1 | Error in {../../pic:nvPicPr/pic:cNvPr} at char 17 in xsl:value-of/@select on line 819 column 72 of docx-html-extract.xsl:
server_1 | XPST0081 Namespace prefix 'pic' has not been declared
server_1 | Error near {...es(../../pic:nvPicPr/pic:cN...} at char 25 in expression in xsl:when/@test on line 814 column 137 of docx-html-extract.xsl:
server_1 | XPST0081 Namespace prefix 'pic' has not been declared
server_1 | In template rule with match="xsw:transform" on line 56 of PIPELINE.xsl
server_1 | invoked by xsl:apply-templates at file:/home/node/xsweet/scripts/../XSweet/applications/PIPELINE.xsl#48
server_1 | invoked by xsl:iterate at file:/home/node/xsweet/scripts/../XSweet/applications/PIPELINE.xsl#43
server_1 | In template rule with match="/" on line 41 of PIPELINE.xsl
server_1 | Namespace prefix 'pic' has not been declared
server_1 | There was an error converting the document.
```BharathydasanBharathydasanhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/176Import PDF & convert to Kotahi's HTML profile2022-06-08T17:04:19ZRyan Dix-PeekImport PDF & convert to Kotahi's HTML profile**Description;** the purpose of this task is to support the import of PDFs into Kotahi through integration with Sciencebeam. Sciencebeam supports the conversion of PDFs to XML. We require conversion of PDF to HTML (Kotahi HTML profile sp...**Description;** the purpose of this task is to support the import of PDFs into Kotahi through integration with Sciencebeam. Sciencebeam supports the conversion of PDFs to XML. We require conversion of PDF to HTML (Kotahi HTML profile specifically).
Suggested solution; XSweet accepts docx, but the remaining pipelines support HTML. Convert PDF to HTML and then feed the output through XSweet for the doc clean-up; PDF -> TEI-XML -> Docx -> XSweet -> Wax
**Acceptance criteria;**
- Ensure HTML is accessible in Wax.
- Extract manuscript metadata and populate the submission form i.e. title, abstract and/or author name data.Suki VenkatSuki Venkathttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/177Ingest Docx files containing binary math2023-02-21T07:09:18ZRyan Dix-PeekIngest Docx files containing binary math**Issue description;** the purpose of this task is to support the import of Docx files that contain binary math (formatted supported by Mathtype) and **view** the math in Wax.
Potential solutions; use Xsweet pipeline to extract the gif...**Issue description;** the purpose of this task is to support the import of Docx files that contain binary math (formatted supported by Mathtype) and **view** the math in Wax.
Potential solutions; use Xsweet pipeline to extract the gif files on import and display the formulas as images in Wax.
[BinaryMath.docx](/uploads/73878e990b94ef590ae6e0e7c868a4ab/BinaryMath.docx)
Error message on import into Kotahi;
![Screenshot_2022-05-31_at_09.24.38](/uploads/b77b5e296378ad792cace63249b462ce/Screenshot_2022-05-31_at_09.24.38.png)
Wax content view;
![Screenshot_2022-05-31_at_11.54.33](/uploads/d14c37fc4857ac9ae99bd6f8ad81197b/Screenshot_2022-05-31_at_11.54.33.png)BharathydasanBharathydasanhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/178Doesn't handle images correctly when converting to html2022-07-05T04:09:56ZAnna KhapsasovaDoesn't handle images correctly when converting to html**Problem:**
When we convert docx to html the xsl builds the wrong path to media folder with images. And as a result html doesn't contain images.
**Version:**
We are using pubsweet/job-xsweet:1.5.4 image which contains this issue.
**Fi...**Problem:**
When we convert docx to html the xsl builds the wrong path to media folder with images. And as a result html doesn't contain images.
**Version:**
We are using pubsweet/job-xsweet:1.5.4 image which contains this issue.
**Fix:**
To fix it you need to change docx-html-extract.xsl
Please replace the raw
<xsl:variable name="docx-base" select="resolve-uri('.', document-uri(/))"/>
to
<xsl:variable name="docx-base" select="substring-before(resolve-uri('.', document-uri(/)), '/word')"/>
**Request**
This solution already tested and builds correct path for images. Could you kindly fix the file and update image with this fix?
Unfortunately I don't have persmissions to create a branch and pull request inside the project
Kind regards,
AnnaSuki VenkatSuki Venkathttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/179XSweet incorrectly adds 'http://' to links inserted within parenthesis2022-10-18T06:48:52ZRyan Dix-PeekXSweet incorrectly adds 'http://' to links inserted within parenthesisWhen importing a docx file into Kotahi, and viewing the manuscript content in Wax; XSweet mistakenly places an 'http://' in front of links that include a parenthesis '(https' (a parenthesis followed by https).
Further points to note;
-...When importing a docx file into Kotahi, and viewing the manuscript content in Wax; XSweet mistakenly places an 'http://' in front of links that include a parenthesis '(https' (a parenthesis followed by https).
Further points to note;
- The content of the docx is plain text.
- If the link text started with 'www' then it makes sense to insert the 'http://' in front on 'wwww', but this is incorrectly applied to the '(https' example.
URLs inserted within parenthesis should be display as links.
![Screenshot_2022-06-30_at_12.24.50](/uploads/ad802eed6314c8588e2dce3a42d10c1d/Screenshot_2022-06-30_at_12.24.50.png)BharathydasanBharathydasanhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/180all paragraphs come in as H2s2023-02-21T05:30:35ZDan Viselall paragraphs come in as H2sThe attached .DOCX is a Kotahi test file; it comes in with all of its paragraphs as H2s (see screenshot of source inside of Kotahi). The same thing happens if I run it through http://pdf2html.cloud68.co/, which makes me think that this i...The attached .DOCX is a Kotahi test file; it comes in with all of its paragraphs as H2s (see screenshot of source inside of Kotahi). The same thing happens if I run it through http://pdf2html.cloud68.co/, which makes me think that this is XSweet – there's something about the file (the source of which I don't know) that's encoded incorrectly.
I don't have MS Word on my computer, but opening it up in Mac TextEdit shows some weirdness – all paragraphs are right-aligned, which is clearly incorrect. If I open it in Apple Pages, it looks more or less how I would expect it to.
This particular file isn't very important, but because Kotahi is processing a lot of Word docs coming from strange sources, we sometimes run into bugs that feel similar. (Most recently: display math is incorrectly coming in as H4s.) I don't know what they did to the DOCX to make it behave this way, though it would be nice if we could handle it?
[BodyMass.docx](/uploads/bcb5a0e6d8b6d92cf7c54875ac904f05/BodyMass.docx)
![Screen_Shot_2022-09-26_at_12.27.32_PM](/uploads/6983dda40058a13cc67808fc64f180d0/Screen_Shot_2022-09-26_at_12.27.32_PM.png)https://gitlab.coko.foundation/XSweet/XSweet_runner_scripts/-/issues/4"ruby xsweet_downloader.rb" does not work2022-11-27T13:23:40ZAndreas Jung"ruby xsweet_downloader.rb" does not workTrying to run the downloader results in ZIP errors
```
ajung@dev2.zopyx.com ➜ XSweet_runner_scripts git:(master) ruby xsweet_downloader.rb
/home/ajung/src/XSweet_runner_scripts/xsweet.zip
End-of-central-directory signature not fou...Trying to run the downloader results in ZIP errors
```
ajung@dev2.zopyx.com ➜ XSweet_runner_scripts git:(master) ruby xsweet_downloader.rb
/home/ajung/src/XSweet_runner_scripts/xsweet.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /home/ajung/src/XSweet_runner_scripts/xsweet.zip or
/home/ajung/src/XSweet_runner_scripts/xsweet.zip.zip, and cannot find /home/ajung/src/XSweet_runner_scripts/xsweet.zip.ZIP, period.
Error: no "view" rule for type "text/plain" passed its test case
(for more information, add "--debug=1" on the command line)
Warning: program returned non-zero exit code #768
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /tmp/lynxXXXXM4NF22/L91147-8317TMP.zip or
/tmp/lynxXXXXM4NF22/L91147-8317TMP.zip.zip, and cannot find /tmp/lynxXXXXM4NF22/L91147-8317TMP.zip.ZIP, period.
/usr/bin/xdg-open: 882: links2: not found
/usr/bin/xdg-open: 882: elinks: not found
/usr/bin/xdg-open: 882: links: not found
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /tmp/lynxXXXX0Fol16/L91160-2819TMP.zip or
/tmp/lynxXXXX0Fol16/L91160-2819TMP.zip.zip, and cannot find /tmp/lynxXXXX0Fol16/L91160-2819TMP.zip.ZIP, period.
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /home/ajung/.w3m/w3mtmp91167-0.zip or
/home/ajung/.w3m/w3mtmp91167-0.zip.zip, and cannot find /home/ajung/.w3m/w3mtmp91167-0.zip.ZIP, period.
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /home/ajung/src/XSweet_runner_scripts/typescript.zip or
/home/ajung/src/XSweet_runner_scripts/typescript.zip.zip, and cannot find /home/ajung/src/XSweet_runner_scripts/typescript.zip.ZIP, period.
Error: no "view" rule for type "text/plain" passed its test case
(for more information, add "--debug=1" on the command line)
Warning: program returned non-zero exit code #768
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /tmp/lynxXXXXPusFVT/L91202-9790TMP.zip or
/tmp/lynxXXXXPusFVT/L91202-9790TMP.zip.zip, and cannot find /tmp/lynxXXXXPusFVT/L91202-9790TMP.zip.ZIP, period.
/usr/bin/xdg-open: 882: links2: not found
/usr/bin/xdg-open: 882: elinks: not found
/usr/bin/xdg-open: 882: links: not found
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /tmp/lynxXXXXbruXm5/L91215-1479TMP.zip or
/tmp/lynxXXXXbruXm5/L91215-1479TMP.zip.zip, and cannot find /tmp/lynxXXXXbruXm5/L91215-1479TMP.zip.ZIP, period.
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /home/ajung/.w3m/w3mtmp91222-0.zip or
/home/ajung/.w3m/w3mtmp91222-0.zip.zip, and cannot find /home/ajung/.w3m/w3mtmp91222-0.zip.ZIP, period.
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /home/ajung/src/XSweet_runner_scripts/htmlevator.zip or
/home/ajung/src/XSweet_runner_scripts/htmlevator.zip.zip, and cannot find /home/ajung/src/XSweet_runner_scripts/htmlevator.zip.ZIP, period.
Error: no "view" rule for type "text/plain" passed its test case
(for more information, add "--debug=1" on the command line)
Warning: program returned non-zero exit code #768
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /tmp/lynxXXXXYNyi79/L91257-5254TMP.zip or
/tmp/lynxXXXXYNyi79/L91257-5254TMP.zip.zip, and cannot find /tmp/lynxXXXXYNyi79/L91257-5254TMP.zip.ZIP, period.
/usr/bin/xdg-open: 882: links2: not found
/usr/bin/xdg-open: 882: elinks: not found
/usr/bin/xdg-open: 882: links: not found
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /tmp/lynxXXXXn1Vd7K/L91270-5542TMP.zip or
/tmp/lynxXXXXn1Vd7K/L91270-5542TMP.zip.zip, and cannot find /tmp/lynxXXXXn1Vd7K/L91270-5542TMP.zip.ZIP, period.
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /home/ajung/.w3m/w3mtmp91277-0.zip or
/home/ajung/.w3m/w3mtmp91277-0.zip.zip, and cannot find /home/ajung/.w3m/w3mtmp91277-0.zip.ZIP, period.
typescript_name
xsweet_name
htmlevator_name
```https://gitlab.coko.foundation/XSweet/XSweet/-/issues/181MathML equations are not converted correctly2023-09-13T12:30:42ZRyan Dix-PeekMathML equations are not converted correctlyHere's what I see in the original doc; instead of misrepresenting a `<p>` as an `<h4>` (following an empty `<h4>`), what's actually happening is that we have a `<p>` turning into nested `<h4><h4>`
Useful links;
- Issue description, fixe...Here's what I see in the original doc; instead of misrepresenting a `<p>` as an `<h4>` (following an empty `<h4>`), what's actually happening is that we have a `<p>` turning into nested `<h4><h4>`
Useful links;
- Issue description, fixes and examples; https://gitlab.coko.foundation/kotahi/kotahi/-/issues/1023
- Testing feedback; https://gitlab.coko.foundation/kotahi/kotahi/-/issues/1023#note_107694https://gitlab.coko.foundation/XSweet/XSweet/-/issues/182paragraphs don't come in with tags2023-03-01T05:14:29ZDan Viselparagraphs don't come in with tagsWe have a Word doc we're ingesting in Kotahi that's not behaving as it should be – there's a list of maybe 100 citations at the end of the doc (all are regular paragraphs), and the first 15 come in correctly, then the last 85 all as a si...We have a Word doc we're ingesting in Kotahi that's not behaving as it should be – there's a list of maybe 100 citations at the end of the doc (all are regular paragraphs), and the first 15 come in correctly, then the last 85 all as a single paragraph. I ran the document through http://pdf2html.cloud68.co to see what XSweet was doing and noticed this:
![Screenshot_2023-02-22_at_5.03.24_PM](/uploads/e63a1ff25a0f28fafed8829989db8e2f/Screenshot_2023-02-22_at_5.03.24_PM.png)
The doc on the left looks correct, but if you look at the source on the right, the citations that begin "Buxton" and "Chen" don't have any tag around them at all. The browser's displaying them as paragraphs because they're between other block-level elements. When it comes into Kotahi/Wax, we don't expect that there won't be a block-level tag and the results are inconsistent.
One thing I notice while looking at the list of citations: almost every citation contains a hyperlink. Six don't; four of those don't get paragraph tags. Aside from the hyperlinks, the citations appear to have no formatting.
Here's how Word's XML is marking up three paragraphs: the first and the last come in normally, the middle one doesn't get a paragraph tag:
```xml
<w:p w:rsidR="00A07C5C" w:rsidRPr="00361E5B" w:rsidRDefault="00A07C5C" w:rsidP="0077453B">
<w:pPr>
<w:pStyle w:val="BodyText"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
<w:spacing w:after="14" w:line="360" w:lineRule="auto"/>
<w:ind w:left="454" w:hanging="454"/>
<w:jc w:val="both"/>
<w:rPr>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
</w:pPr>
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>Bush, K., Zhou, S., Cisler, J., Bian, J., Hazaroglu, O., Gillispie, K., Yoshigoe, K., Kilts, C., 2015. A deconvolution-based approach to identifying large-scale effective connectivity. Magnetic Resonance Imaging 33, 1290</w:t>
</w:r>
<w:r w:rsidR="00EE6F93" w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>–</w:t>
</w:r>
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>1298. doi:</w:t>
</w:r>
<w:hyperlink r:id="rId33" w:history="1">
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="Hyperlink"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:sz w:val="18"/>
<w:szCs w:val="18"/>
<w:u w:val="none"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>10.1016/j.mri.2015.07.015</w:t>
</w:r>
</w:hyperlink>
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>
.
</w:t>
</w:r>
</w:p>
<w:p w:rsidR="00A07C5C" w:rsidRPr="00361E5B" w:rsidRDefault="00A07C5C" w:rsidP="0077453B">
<w:pPr>
<w:pStyle w:val="BodyText"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
<w:spacing w:after="14" w:line="360" w:lineRule="auto"/>
<w:ind w:left="454" w:hanging="454"/>
<w:jc w:val="both"/>
<w:rPr>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
</w:pPr>
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>Buxton, R.B., Wong, E.C., Frank, L.R., 1998. Dynamics of blood flow and oxygenation changes during brain activation: the balloon model. Magnetic resonance in medicine 39, 855</w:t>
</w:r>
<w:r w:rsidR="00EE6F93" w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>–</w:t>
</w:r>
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>864.</w:t>
</w:r>
</w:p>
<w:p w:rsidR="00A07C5C" w:rsidRPr="00361E5B" w:rsidRDefault="00A07C5C" w:rsidP="0077453B">
<w:pPr>
<w:pStyle w:val="BodyText"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
<w:spacing w:after="14" w:line="360" w:lineRule="auto"/>
<w:ind w:left="454" w:hanging="454"/><w:jc w:val="both"/>
<w:rPr>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
</w:pPr>
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/><w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>Caballero-Gaudes, C., Moia, S., Panwar, P., Bandettini, P.A., Gonzalez-Castillo, J., 2019. A deconvolution algorithm for multi-echo functional MRI: Multi-echo sparse paradigm free mapping. NeuroImage 202, 116081. doi:</w:t>
</w:r>
<w:hyperlink r:id="rId34" w:history="1">
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="Hyperlink"/><w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:sz w:val="18"/>
<w:szCs w:val="18"/>
<w:u w:val="none"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>10.1016/j.neuroimage.2019.116081</w:t>
</w:r>
</w:hyperlink>
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>.</w:t>
</w:r>
</w:p>
```https://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/22Some minus signs get dropped partway through pipeline2023-07-17T16:59:53ZA GaltmanSome minus signs get dropped partway through pipelineWhen converting a Word document to HTML that has many equations, I noticed that some of the minus signs were missing in HTML. Most of the missing characters are present through step 11 of the pipeline. (One negative sign was missing in a...When converting a Word document to HTML that has many equations, I noticed that some of the minus signs were missing in HTML. Most of the missing characters are present through step 11 of the pipeline. (One negative sign was missing in all .xhtml files, so it might represent a different issue entirely.)
I'm attaching a .docx file and two .xhtml files. (The .docx file is excerpted from a document authored by someone other than me, and I haven't looked carefully at the styles or character usage.) The step 11 .xhtml file represents the Word document well, except one missing negative sign. The step 12 .xhtml file is missing several minus signs.
[missing-minus-signs.docx](/uploads/4d7fffc4823bbfa93ca1c8db1ffb0964/missing-minus-signs.docx)
[missing-minus-signs-11RINSED.xhtml](/uploads/edea8a246c3174eae9dcdb175dbee75a/missing-minus-signs-11RINSED.xhtml)
[missing-minus-signs-12UCPTEXTED.xhtml](/uploads/bc0e800d3d2d34c35a268947a08c3c23/missing-minus-signs-12UCPTEXTED.xhtml)https://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/23Subscript brackets in Word come out with asymmetric formatting in HTML2023-07-17T18:00:00ZA GaltmanSubscript brackets in Word come out with asymmetric formatting in HTMLIn Word, a math expression has a subscript "[_n_]" where the _n_ is italic and the square brackets are not italic. In HTML starting from step 12 of the pipeline, I see the following:
- If a variable name to the immediate left of the sub...In Word, a math expression has a subscript "[_n_]" where the _n_ is italic and the square brackets are not italic. In HTML starting from step 12 of the pipeline, I see the following:
- If a variable name to the immediate left of the subscript is not italic, then the left bracket is (correctly) not italic but the right bracket is italic. Both brackets are subscripted, which is correct.
- If a variable name to the immediate left of the subscript is italic, then the left bracket is italic and not subscripted. The right bracket is italic and (correctly) subscripted.
I'm attaching a .docx file that reproduces the problem, as well as my HTML outputs from steps 11 and 12 of the pipeline. The step 11 HTML looks as expected, while the step 12 HTML shows the incorrect formatting.
[subscript-brackets.docx](/uploads/9244c13f42cc7946c6403017ee00eb07/subscript-brackets.docx)
[subscript-brackets-11RINSED.xhtml](/uploads/975a86db02bafea44c25c2763a5af8bd/subscript-brackets-11RINSED.xhtml)
[subscript-brackets-12UCPTEXTED.xhtml](/uploads/da914fef8f13814647083b7fe2168e85/subscript-brackets-12UCPTEXTED.xhtml)https://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/24(Minor) Warning about ambiguous rule match in scrub.xsl2023-07-17T18:33:26ZA Galtman(Minor) Warning about ambiguous rule match in scrub.xsl`scrub.xsl` produces a warning about an ambiguous rule match. The two templates with overlapping match patterns have the same content, so there is no doubt that the output is correct. However, it would be nice to avoid the warning.
```
...`scrub.xsl` produces a warning about an ambiguous rule match. The two templates with overlapping match patterns have the same content, so there is no doubt that the output is correct. However, it would be nice to avoid the warning.
```
Warning
XTDE0540: Ambiguous rule match for /html/body[1]/div[1]/p[130]/noProof[1]/div[1]/p[2]
Matches both
"element(Q{http://www.w3.org/1999/xhtml}p)//element()[(empty((docOrder(docOrder(descendant::element()))) except (((((docOrder(docOrder(descendant::element(Q{http://www.w3.org/1999/xhtml}tab)))) | (docOrder(docOrder(descendant::element(Q{http://www.w3.org/1999/xhtml}span))))) | (docOrder(docOrder(descendant::element(Q{http://www.w3.org/1999/xhtml}b))))) | (docOrder(docOrder(descendant::element(Q{http://www.w3.org/1999/xhtml}i))))) | (docOrder(docOrder(descendant::element(Q{http://www.w3.org/1999/xhtml}u))))))) and (not(string(.)))]" on line 56 of [path...]/XSweet-core/scripts/../applications/docx-extract/scrub.xsl
and "element(Q{http://www.w3.org/1999/xhtml}p)//element()[not(matches(convertUntyped(data(.)), "\S", ""))]" on line 46 of [path...]/XSweet-core/scripts/../a
```
The templates in question have these `match` patterns:
- Line 46: `p//*[not(matches(.,'\S'))]` (maybe change it to `p//*[string(.) and not(matches(.,'\S'))]`?)
- Line 56: `p//*[empty(.//* except (.//tab|.//span|.//b|.//i|.//u)) and not(string(.))]`https://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/25Pipeline step 12 inserts extra period at end of sentence (related to text-bas...2023-07-17T18:51:54ZA GaltmanPipeline step 12 inserts extra period at end of sentence (related to text-based equation, maybe)In Word, start with the sentence, "The answer is _F_(_x_)." where the letters are italic and the parentheses are not italic.
In step 12 of the pipeline, the parentheses become italic and there is an extra period at the end of the senten...In Word, start with the sentence, "The answer is _F_(_x_)." where the letters are italic and the parentheses are not italic.
In step 12 of the pipeline, the parentheses become italic and there is an extra period at the end of the sentence: "The answer is _F(x)._."
I'm attaching a Word document with this source content, the correct-looking step 11 HTML, and the step 12 HTML that exhibits the issue.
[extra-period.docx](/uploads/018dfb09d3af71ad53ce7793cb4b2c85/extra-period.docx)
[extra-period-11RINSED.xhtml](/uploads/72d2b00787bd47bf494efec321551810/extra-period-11RINSED.xhtml)
[extra-period-12UCPTEXTED.xhtml](/uploads/4b9d386a2fbbb5c4dd96d86961a4e8fe/extra-period-12UCPTEXTED.xhtml)https://gitlab.coko.foundation/XSweet/XSweet/-/issues/184Backslashes in math equations are not being converted correctly2023-10-02T12:15:31ZRyan Dix-PeekBackslashes in math equations are not being converted correctlyWhat seems to be happening is that backslashes are getting escaped. For example; `\sin x` turns into `\\sin x`. There were some fixes recently integrated into Kotahi via [this MR](https://gitlab.coko.foundation/kotahi/kotahi/-/merge_requ...What seems to be happening is that backslashes are getting escaped. For example; `\sin x` turns into `\\sin x`. There were some fixes recently integrated into Kotahi via [this MR](https://gitlab.coko.foundation/kotahi/kotahi/-/merge_requests/987) that were addressing this, but this should be handled during conversion.
Example; https://kotahi.kotahidev.cloud68.co/kotahi/versions/7fb2d295-046a-40fc-8faf-13de3d3a5b10/decision
![Screenshot_2023-10-02_at_08.56.22](/uploads/9c99571b3f070476513dfc24040af287/Screenshot_2023-10-02_at_08.56.22.png)https://gitlab.coko.foundation/XSweet/XSweet/-/issues/185page break in certain cases causes lost content2023-11-18T21:31:01ZDan Viselpage break in certain cases causes lost contentWe have a document from a client in Kotahi that imports incorrectly in XSweet: the first page of content is missing. It's a confidential file, so I'm not sharing it publicly.
The first page consists of headers, then a table which contai...We have a document from a client in Kotahi that imports incorrectly in XSweet: the first page of content is missing. It's a confidential file, so I'm not sharing it publicly.
The first page consists of headers, then a table which contains text (an abstract for the paper). After the table there is a page break (in its own paragraph); the regular text starts on the next page. When we import to XSweet, the first page content is entirely missing.
If I go in and delete the page break, the title page content imports correctly. If the paragraph with the page break in it (which appears empty) is given text content, the content above it also imports correctly. It's only when the paragraph containing the page break is otherwise empty that the problematic case happens.
Here's what's in the `document.xml` at the point at which the problem happens:
```plaintext
<w:p w14:paraId="6907B6C5" w14:textId="75D1BA40" w:rsidR="00E82AA4" w:rsidRPr="00440252" w:rsidRDefault="00E82AA4" w:rsidP="00585668">
<w:pPr>
<w:spacing w:before="120" w:after="0"/>
<w:jc w:val="both"/>
<w:rPr>
<w:rFonts w:cstheme="majorBidi"/>
<w:noProof/>
<w:sz w:val="20"/>
<w:szCs w:val="20"/>
</w:rPr>
<w:sectPr w:rsidR="00E82AA4" w:rsidRPr="00440252" w:rsidSect="002D7B7B">
<w:headerReference w:type="default" r:id="rId13"/>
<w:footerReference w:type="default" r:id="rId14"/>
<w:headerReference w:type="first" r:id="rId15"/>
<w:footerReference w:type="first" r:id="rId16"/>
<w:pgSz w:w="12240" w:h="15840"/>
<w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="0" w:gutter="0"/>
<w:pgNumType w:start="125"/>
<w:cols w:space="720"/>
<w:titlePg/>
<w:docGrid w:linePitch="360"/>
</w:sectPr>
</w:pPr>
<w:bookmarkStart w:id="1" w:name="_Hlk138064635"/>
</w:p>
```
That `<w:sectPr>` seems to be the representation of the page break. (In Word, this document does have a title page template, and it does have its own distinct header and footer, which are being called here.) There's no `<w:t>` with text content in that particular paragraph; maybe because of that it's getting deleted? Or possibly this is causing problems because it's immediately after a table? Page breaks in and of themselves don't seem to cause problems.
There's a little bit of background on the `<w:sectPr>` element [here](http://officeopenxml.com/WPsection.php):
> A section's properties are stored in a **sectPr** element. For all sections except the last section, the **sectPr** element is stored as a child element of the last paragraph in the section.
The problem here might be that there's not really a last paragraph in the section if the section ends with a table? Or it's possible that the page break is coming from the page template, rather than a page break that has been inserted manually. The XML doesn't show a standard Word XML page break, which looks like this:
```plaintext
<w:pPr>
<w:pageBreakBefore/>
</w:pPr>
```
(these appear later in the texxt) or like this: `<w:br w:type="page" />`. The page break seems to be coming from the page template.