XSweet issueshttps://gitlab.coko.foundation/groups/XSweet/-/issues2024-03-18T04:47:28Zhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/187document with watermark isn't parsed correctly2024-03-18T04:47:28ZDan Viseldocument with watermark isn't parsed correctlyA Kotahi client submitted a document with a watermark on the first page (not attached because it's a client document). This doesn't go through XSweet (testing was done on XSweet without Kotahi); if the watermark is removed, the document ...A Kotahi client submitted a document with a watermark on the first page (not attached because it's a client document). This doesn't go through XSweet (testing was done on XSweet without Kotahi); if the watermark is removed, the document does go through.
Not sure if this is a particular type of watermark that's causing problems – if I make a test document in LibreOffice with a watermark and send it through XSweet, it works.https://gitlab.coko.foundation/XSweet/XSweet/-/issues/186Word simple styles not extracted correctly2024-01-09T16:56:27ZSidorela UkuWord simple styles not extracted correctlyIn both ketida and kotahi I tried uploading a simple docx file, created with Google docs, which includes basic styles like:
- a sentence styled in italic
- a sentence styled in bold
- a sentence styled in bold and italic
- a sentence st...In both ketida and kotahi I tried uploading a simple docx file, created with Google docs, which includes basic styles like:
- a sentence styled in italic
- a sentence styled in bold
- a sentence styled in bold and italic
- a sentence styled in underlined
After the conversion all the styles for the sentences where set to italic.
I suppose those styles are expected to be extracted correctly during the conversions.
The docx that I uploaded is:[docx-with-simple-style.docx](/uploads/b4d8f63a3319ec3bd585c7cc1a9d5923/docx-with-simple-style.docx)https://gitlab.coko.foundation/XSweet/XSweet/-/issues/185page break in certain cases causes lost content2023-11-18T21:31:01ZDan Viselpage break in certain cases causes lost contentWe have a document from a client in Kotahi that imports incorrectly in XSweet: the first page of content is missing. It's a confidential file, so I'm not sharing it publicly.
The first page consists of headers, then a table which contai...We have a document from a client in Kotahi that imports incorrectly in XSweet: the first page of content is missing. It's a confidential file, so I'm not sharing it publicly.
The first page consists of headers, then a table which contains text (an abstract for the paper). After the table there is a page break (in its own paragraph); the regular text starts on the next page. When we import to XSweet, the first page content is entirely missing.
If I go in and delete the page break, the title page content imports correctly. If the paragraph with the page break in it (which appears empty) is given text content, the content above it also imports correctly. It's only when the paragraph containing the page break is otherwise empty that the problematic case happens.
Here's what's in the `document.xml` at the point at which the problem happens:
```plaintext
<w:p w14:paraId="6907B6C5" w14:textId="75D1BA40" w:rsidR="00E82AA4" w:rsidRPr="00440252" w:rsidRDefault="00E82AA4" w:rsidP="00585668">
<w:pPr>
<w:spacing w:before="120" w:after="0"/>
<w:jc w:val="both"/>
<w:rPr>
<w:rFonts w:cstheme="majorBidi"/>
<w:noProof/>
<w:sz w:val="20"/>
<w:szCs w:val="20"/>
</w:rPr>
<w:sectPr w:rsidR="00E82AA4" w:rsidRPr="00440252" w:rsidSect="002D7B7B">
<w:headerReference w:type="default" r:id="rId13"/>
<w:footerReference w:type="default" r:id="rId14"/>
<w:headerReference w:type="first" r:id="rId15"/>
<w:footerReference w:type="first" r:id="rId16"/>
<w:pgSz w:w="12240" w:h="15840"/>
<w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="0" w:gutter="0"/>
<w:pgNumType w:start="125"/>
<w:cols w:space="720"/>
<w:titlePg/>
<w:docGrid w:linePitch="360"/>
</w:sectPr>
</w:pPr>
<w:bookmarkStart w:id="1" w:name="_Hlk138064635"/>
</w:p>
```
That `<w:sectPr>` seems to be the representation of the page break. (In Word, this document does have a title page template, and it does have its own distinct header and footer, which are being called here.) There's no `<w:t>` with text content in that particular paragraph; maybe because of that it's getting deleted? Or possibly this is causing problems because it's immediately after a table? Page breaks in and of themselves don't seem to cause problems.
There's a little bit of background on the `<w:sectPr>` element [here](http://officeopenxml.com/WPsection.php):
> A section's properties are stored in a **sectPr** element. For all sections except the last section, the **sectPr** element is stored as a child element of the last paragraph in the section.
The problem here might be that there's not really a last paragraph in the section if the section ends with a table? Or it's possible that the page break is coming from the page template, rather than a page break that has been inserted manually. The XML doesn't show a standard Word XML page break, which looks like this:
```plaintext
<w:pPr>
<w:pageBreakBefore/>
</w:pPr>
```
(these appear later in the texxt) or like this: `<w:br w:type="page" />`. The page break seems to be coming from the page template.https://gitlab.coko.foundation/XSweet/XSweet/-/issues/184Backslashes in math equations are not being converted correctly2023-10-02T12:15:31ZRyan Dix-PeekBackslashes in math equations are not being converted correctlyWhat seems to be happening is that backslashes are getting escaped. For example; `\sin x` turns into `\\sin x`. There were some fixes recently integrated into Kotahi via [this MR](https://gitlab.coko.foundation/kotahi/kotahi/-/merge_requ...What seems to be happening is that backslashes are getting escaped. For example; `\sin x` turns into `\\sin x`. There were some fixes recently integrated into Kotahi via [this MR](https://gitlab.coko.foundation/kotahi/kotahi/-/merge_requests/987) that were addressing this, but this should be handled during conversion.
Example; https://kotahi.kotahidev.cloud68.co/kotahi/versions/7fb2d295-046a-40fc-8faf-13de3d3a5b10/decision
![Screenshot_2023-10-02_at_08.56.22](/uploads/9c99571b3f070476513dfc24040af287/Screenshot_2023-10-02_at_08.56.22.png)https://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/25Pipeline step 12 inserts extra period at end of sentence (related to text-bas...2023-07-17T18:51:54ZA GaltmanPipeline step 12 inserts extra period at end of sentence (related to text-based equation, maybe)In Word, start with the sentence, "The answer is _F_(_x_)." where the letters are italic and the parentheses are not italic.
In step 12 of the pipeline, the parentheses become italic and there is an extra period at the end of the senten...In Word, start with the sentence, "The answer is _F_(_x_)." where the letters are italic and the parentheses are not italic.
In step 12 of the pipeline, the parentheses become italic and there is an extra period at the end of the sentence: "The answer is _F(x)._."
I'm attaching a Word document with this source content, the correct-looking step 11 HTML, and the step 12 HTML that exhibits the issue.
[extra-period.docx](/uploads/018dfb09d3af71ad53ce7793cb4b2c85/extra-period.docx)
[extra-period-11RINSED.xhtml](/uploads/72d2b00787bd47bf494efec321551810/extra-period-11RINSED.xhtml)
[extra-period-12UCPTEXTED.xhtml](/uploads/4b9d386a2fbbb5c4dd96d86961a4e8fe/extra-period-12UCPTEXTED.xhtml)https://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/24(Minor) Warning about ambiguous rule match in scrub.xsl2023-07-17T18:33:26ZA Galtman(Minor) Warning about ambiguous rule match in scrub.xsl`scrub.xsl` produces a warning about an ambiguous rule match. The two templates with overlapping match patterns have the same content, so there is no doubt that the output is correct. However, it would be nice to avoid the warning.
```
...`scrub.xsl` produces a warning about an ambiguous rule match. The two templates with overlapping match patterns have the same content, so there is no doubt that the output is correct. However, it would be nice to avoid the warning.
```
Warning
XTDE0540: Ambiguous rule match for /html/body[1]/div[1]/p[130]/noProof[1]/div[1]/p[2]
Matches both
"element(Q{http://www.w3.org/1999/xhtml}p)//element()[(empty((docOrder(docOrder(descendant::element()))) except (((((docOrder(docOrder(descendant::element(Q{http://www.w3.org/1999/xhtml}tab)))) | (docOrder(docOrder(descendant::element(Q{http://www.w3.org/1999/xhtml}span))))) | (docOrder(docOrder(descendant::element(Q{http://www.w3.org/1999/xhtml}b))))) | (docOrder(docOrder(descendant::element(Q{http://www.w3.org/1999/xhtml}i))))) | (docOrder(docOrder(descendant::element(Q{http://www.w3.org/1999/xhtml}u))))))) and (not(string(.)))]" on line 56 of [path...]/XSweet-core/scripts/../applications/docx-extract/scrub.xsl
and "element(Q{http://www.w3.org/1999/xhtml}p)//element()[not(matches(convertUntyped(data(.)), "\S", ""))]" on line 46 of [path...]/XSweet-core/scripts/../a
```
The templates in question have these `match` patterns:
- Line 46: `p//*[not(matches(.,'\S'))]` (maybe change it to `p//*[string(.) and not(matches(.,'\S'))]`?)
- Line 56: `p//*[empty(.//* except (.//tab|.//span|.//b|.//i|.//u)) and not(string(.))]`https://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/23Subscript brackets in Word come out with asymmetric formatting in HTML2023-07-17T18:00:00ZA GaltmanSubscript brackets in Word come out with asymmetric formatting in HTMLIn Word, a math expression has a subscript "[_n_]" where the _n_ is italic and the square brackets are not italic. In HTML starting from step 12 of the pipeline, I see the following:
- If a variable name to the immediate left of the sub...In Word, a math expression has a subscript "[_n_]" where the _n_ is italic and the square brackets are not italic. In HTML starting from step 12 of the pipeline, I see the following:
- If a variable name to the immediate left of the subscript is not italic, then the left bracket is (correctly) not italic but the right bracket is italic. Both brackets are subscripted, which is correct.
- If a variable name to the immediate left of the subscript is italic, then the left bracket is italic and not subscripted. The right bracket is italic and (correctly) subscripted.
I'm attaching a .docx file that reproduces the problem, as well as my HTML outputs from steps 11 and 12 of the pipeline. The step 11 HTML looks as expected, while the step 12 HTML shows the incorrect formatting.
[subscript-brackets.docx](/uploads/9244c13f42cc7946c6403017ee00eb07/subscript-brackets.docx)
[subscript-brackets-11RINSED.xhtml](/uploads/975a86db02bafea44c25c2763a5af8bd/subscript-brackets-11RINSED.xhtml)
[subscript-brackets-12UCPTEXTED.xhtml](/uploads/da914fef8f13814647083b7fe2168e85/subscript-brackets-12UCPTEXTED.xhtml)https://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/22Some minus signs get dropped partway through pipeline2023-07-17T16:59:53ZA GaltmanSome minus signs get dropped partway through pipelineWhen converting a Word document to HTML that has many equations, I noticed that some of the minus signs were missing in HTML. Most of the missing characters are present through step 11 of the pipeline. (One negative sign was missing in a...When converting a Word document to HTML that has many equations, I noticed that some of the minus signs were missing in HTML. Most of the missing characters are present through step 11 of the pipeline. (One negative sign was missing in all .xhtml files, so it might represent a different issue entirely.)
I'm attaching a .docx file and two .xhtml files. (The .docx file is excerpted from a document authored by someone other than me, and I haven't looked carefully at the styles or character usage.) The step 11 .xhtml file represents the Word document well, except one missing negative sign. The step 12 .xhtml file is missing several minus signs.
[missing-minus-signs.docx](/uploads/4d7fffc4823bbfa93ca1c8db1ffb0964/missing-minus-signs.docx)
[missing-minus-signs-11RINSED.xhtml](/uploads/edea8a246c3174eae9dcdb175dbee75a/missing-minus-signs-11RINSED.xhtml)
[missing-minus-signs-12UCPTEXTED.xhtml](/uploads/bc0e800d3d2d34c35a268947a08c3c23/missing-minus-signs-12UCPTEXTED.xhtml)https://gitlab.coko.foundation/XSweet/XSweet/-/issues/183Target format for xPubedit Typescript2023-04-26T02:30:42ZAlex ThegTarget format for xPubedit TypescriptIn order for .docx content to port into the new WYSIWYG editor @christos is building (xpubedit), we'll need to build steps to make the necessary tweaks. Fortunately, the required format is closer to valid HTML than what Wax requires.
* ~...In order for .docx content to port into the new WYSIWYG editor @christos is building (xpubedit), we'll need to build steps to make the necessary tweaks. Fortunately, the required format is closer to valid HTML than what Wax requires.
* ~~@christos can you please double check me here?~~ confirmed
* @wendell there's nothing to do yet but have a quick look
In the meantime, I am going to try to get xpubedit running with the PubSweet development kit.
# Target format
## General document format
A self-closing `<style />` tag prevents content loading in the editor. This is fixed by HTML5. So we should be serializing to HTML5 as the last step.
## Paragraphs
Paragraphs are the same as the HTML:
`<p>I'm a paragraph!</p>`
## Headings
Headings are native HTML:
`<h1>` `<h2>` `<h3>` etc.
## Lists
Lists, numbered and unnumbered, don't need any changing:
### Numbered list:
```
<ol>
<li>
<p>first li</p>
</li>
<li>
<p>second li</p>
<p> second paragraph of second li </p>
</li>
<li>
<p>third one </p>
</li>
<li><p>fourth</p>
<ol>
<li><p> Nested 1 </p>
</li>
<li><p> Nested 2</p>
</li>
</ol>
</li>
</ol>
```
### Bulleted list:
```
<ul>
<li>
<p>first li</p>
</li>
<li>
<p>second li</p>
<p> second paragraph of second li </p>
</li>
<li>
<p>third one </p>
</li>
<li><p>fourth</p>
<ul>
<li><p> Nested 1 </p>
</li>
<li><p> Nested 2</p>
</li>
</ul>
</li>
</ul>
```
## Tables
I believe the currently XSweet table extractor should work fine:
```
<table>
<tr>
<th colspan=3 data-colwidth="100,0,0">Wide header</td>
</tr>
<tr>
<td>One</td>
<td>Two</td>
<td>Three</td>
</tr>
<tr>
<td>Four</td>
<td>Five</td>
<td>Six</td>
</tr>
</table>
```
xPubedit also allows for `tbody` elements (and I'm also assuming `thead` and `tfoot` too), although we don't have any use for them at the moment:
```
<table>
<tbody>
<tr>
<th colspan=3 data-colwidth="100,0,0">Wide header</td>
</tr>
<tr>
<td>One</td>
<td>Two</td>
<td>Three</td>
</tr>
<tr>
<td>Four</td>
<td>Five</td>
<td>Six</td>
</tr>
</tbody>
</table>
```
## Notes
Notes have yet to be implemented in the editor, but once they are, I will update this with the format.https://gitlab.coko.foundation/XSweet/XSweet/-/issues/182paragraphs don't come in with tags2023-03-01T05:14:29ZDan Viselparagraphs don't come in with tagsWe have a Word doc we're ingesting in Kotahi that's not behaving as it should be – there's a list of maybe 100 citations at the end of the doc (all are regular paragraphs), and the first 15 come in correctly, then the last 85 all as a si...We have a Word doc we're ingesting in Kotahi that's not behaving as it should be – there's a list of maybe 100 citations at the end of the doc (all are regular paragraphs), and the first 15 come in correctly, then the last 85 all as a single paragraph. I ran the document through http://pdf2html.cloud68.co to see what XSweet was doing and noticed this:
![Screenshot_2023-02-22_at_5.03.24_PM](/uploads/e63a1ff25a0f28fafed8829989db8e2f/Screenshot_2023-02-22_at_5.03.24_PM.png)
The doc on the left looks correct, but if you look at the source on the right, the citations that begin "Buxton" and "Chen" don't have any tag around them at all. The browser's displaying them as paragraphs because they're between other block-level elements. When it comes into Kotahi/Wax, we don't expect that there won't be a block-level tag and the results are inconsistent.
One thing I notice while looking at the list of citations: almost every citation contains a hyperlink. Six don't; four of those don't get paragraph tags. Aside from the hyperlinks, the citations appear to have no formatting.
Here's how Word's XML is marking up three paragraphs: the first and the last come in normally, the middle one doesn't get a paragraph tag:
```xml
<w:p w:rsidR="00A07C5C" w:rsidRPr="00361E5B" w:rsidRDefault="00A07C5C" w:rsidP="0077453B">
<w:pPr>
<w:pStyle w:val="BodyText"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
<w:spacing w:after="14" w:line="360" w:lineRule="auto"/>
<w:ind w:left="454" w:hanging="454"/>
<w:jc w:val="both"/>
<w:rPr>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
</w:pPr>
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>Bush, K., Zhou, S., Cisler, J., Bian, J., Hazaroglu, O., Gillispie, K., Yoshigoe, K., Kilts, C., 2015. A deconvolution-based approach to identifying large-scale effective connectivity. Magnetic Resonance Imaging 33, 1290</w:t>
</w:r>
<w:r w:rsidR="00EE6F93" w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>–</w:t>
</w:r>
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>1298. doi:</w:t>
</w:r>
<w:hyperlink r:id="rId33" w:history="1">
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="Hyperlink"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:sz w:val="18"/>
<w:szCs w:val="18"/>
<w:u w:val="none"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>10.1016/j.mri.2015.07.015</w:t>
</w:r>
</w:hyperlink>
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>
.
</w:t>
</w:r>
</w:p>
<w:p w:rsidR="00A07C5C" w:rsidRPr="00361E5B" w:rsidRDefault="00A07C5C" w:rsidP="0077453B">
<w:pPr>
<w:pStyle w:val="BodyText"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
<w:spacing w:after="14" w:line="360" w:lineRule="auto"/>
<w:ind w:left="454" w:hanging="454"/>
<w:jc w:val="both"/>
<w:rPr>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
</w:pPr>
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>Buxton, R.B., Wong, E.C., Frank, L.R., 1998. Dynamics of blood flow and oxygenation changes during brain activation: the balloon model. Magnetic resonance in medicine 39, 855</w:t>
</w:r>
<w:r w:rsidR="00EE6F93" w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>–</w:t>
</w:r>
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>864.</w:t>
</w:r>
</w:p>
<w:p w:rsidR="00A07C5C" w:rsidRPr="00361E5B" w:rsidRDefault="00A07C5C" w:rsidP="0077453B">
<w:pPr>
<w:pStyle w:val="BodyText"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
<w:spacing w:after="14" w:line="360" w:lineRule="auto"/>
<w:ind w:left="454" w:hanging="454"/><w:jc w:val="both"/>
<w:rPr>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
</w:pPr>
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/><w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>Caballero-Gaudes, C., Moia, S., Panwar, P., Bandettini, P.A., Gonzalez-Castillo, J., 2019. A deconvolution algorithm for multi-echo functional MRI: Multi-echo sparse paradigm free mapping. NeuroImage 202, 116081. doi:</w:t>
</w:r>
<w:hyperlink r:id="rId34" w:history="1">
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="Hyperlink"/><w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:sz w:val="18"/>
<w:szCs w:val="18"/>
<w:u w:val="none"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>10.1016/j.neuroimage.2019.116081</w:t>
</w:r>
</w:hyperlink>
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>.</w:t>
</w:r>
</w:p>
```https://gitlab.coko.foundation/XSweet/XSweet/-/issues/181MathML equations are not converted correctly2023-09-13T12:30:42ZRyan Dix-PeekMathML equations are not converted correctlyHere's what I see in the original doc; instead of misrepresenting a `<p>` as an `<h4>` (following an empty `<h4>`), what's actually happening is that we have a `<p>` turning into nested `<h4><h4>`
Useful links;
- Issue description, fixe...Here's what I see in the original doc; instead of misrepresenting a `<p>` as an `<h4>` (following an empty `<h4>`), what's actually happening is that we have a `<p>` turning into nested `<h4><h4>`
Useful links;
- Issue description, fixes and examples; https://gitlab.coko.foundation/kotahi/kotahi/-/issues/1023
- Testing feedback; https://gitlab.coko.foundation/kotahi/kotahi/-/issues/1023#note_107694https://gitlab.coko.foundation/XSweet/XSweet_runner_scripts/-/issues/4"ruby xsweet_downloader.rb" does not work2022-11-27T13:23:40ZAndreas Jung"ruby xsweet_downloader.rb" does not workTrying to run the downloader results in ZIP errors
```
ajung@dev2.zopyx.com ➜ XSweet_runner_scripts git:(master) ruby xsweet_downloader.rb
/home/ajung/src/XSweet_runner_scripts/xsweet.zip
End-of-central-directory signature not fou...Trying to run the downloader results in ZIP errors
```
ajung@dev2.zopyx.com ➜ XSweet_runner_scripts git:(master) ruby xsweet_downloader.rb
/home/ajung/src/XSweet_runner_scripts/xsweet.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /home/ajung/src/XSweet_runner_scripts/xsweet.zip or
/home/ajung/src/XSweet_runner_scripts/xsweet.zip.zip, and cannot find /home/ajung/src/XSweet_runner_scripts/xsweet.zip.ZIP, period.
Error: no "view" rule for type "text/plain" passed its test case
(for more information, add "--debug=1" on the command line)
Warning: program returned non-zero exit code #768
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /tmp/lynxXXXXM4NF22/L91147-8317TMP.zip or
/tmp/lynxXXXXM4NF22/L91147-8317TMP.zip.zip, and cannot find /tmp/lynxXXXXM4NF22/L91147-8317TMP.zip.ZIP, period.
/usr/bin/xdg-open: 882: links2: not found
/usr/bin/xdg-open: 882: elinks: not found
/usr/bin/xdg-open: 882: links: not found
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /tmp/lynxXXXX0Fol16/L91160-2819TMP.zip or
/tmp/lynxXXXX0Fol16/L91160-2819TMP.zip.zip, and cannot find /tmp/lynxXXXX0Fol16/L91160-2819TMP.zip.ZIP, period.
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /home/ajung/.w3m/w3mtmp91167-0.zip or
/home/ajung/.w3m/w3mtmp91167-0.zip.zip, and cannot find /home/ajung/.w3m/w3mtmp91167-0.zip.ZIP, period.
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /home/ajung/src/XSweet_runner_scripts/typescript.zip or
/home/ajung/src/XSweet_runner_scripts/typescript.zip.zip, and cannot find /home/ajung/src/XSweet_runner_scripts/typescript.zip.ZIP, period.
Error: no "view" rule for type "text/plain" passed its test case
(for more information, add "--debug=1" on the command line)
Warning: program returned non-zero exit code #768
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /tmp/lynxXXXXPusFVT/L91202-9790TMP.zip or
/tmp/lynxXXXXPusFVT/L91202-9790TMP.zip.zip, and cannot find /tmp/lynxXXXXPusFVT/L91202-9790TMP.zip.ZIP, period.
/usr/bin/xdg-open: 882: links2: not found
/usr/bin/xdg-open: 882: elinks: not found
/usr/bin/xdg-open: 882: links: not found
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /tmp/lynxXXXXbruXm5/L91215-1479TMP.zip or
/tmp/lynxXXXXbruXm5/L91215-1479TMP.zip.zip, and cannot find /tmp/lynxXXXXbruXm5/L91215-1479TMP.zip.ZIP, period.
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /home/ajung/.w3m/w3mtmp91222-0.zip or
/home/ajung/.w3m/w3mtmp91222-0.zip.zip, and cannot find /home/ajung/.w3m/w3mtmp91222-0.zip.ZIP, period.
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /home/ajung/src/XSweet_runner_scripts/htmlevator.zip or
/home/ajung/src/XSweet_runner_scripts/htmlevator.zip.zip, and cannot find /home/ajung/src/XSweet_runner_scripts/htmlevator.zip.ZIP, period.
Error: no "view" rule for type "text/plain" passed its test case
(for more information, add "--debug=1" on the command line)
Warning: program returned non-zero exit code #768
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /tmp/lynxXXXXYNyi79/L91257-5254TMP.zip or
/tmp/lynxXXXXYNyi79/L91257-5254TMP.zip.zip, and cannot find /tmp/lynxXXXXYNyi79/L91257-5254TMP.zip.ZIP, period.
/usr/bin/xdg-open: 882: links2: not found
/usr/bin/xdg-open: 882: elinks: not found
/usr/bin/xdg-open: 882: links: not found
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /tmp/lynxXXXXn1Vd7K/L91270-5542TMP.zip or
/tmp/lynxXXXXn1Vd7K/L91270-5542TMP.zip.zip, and cannot find /tmp/lynxXXXXn1Vd7K/L91270-5542TMP.zip.ZIP, period.
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /home/ajung/.w3m/w3mtmp91277-0.zip or
/home/ajung/.w3m/w3mtmp91277-0.zip.zip, and cannot find /home/ajung/.w3m/w3mtmp91277-0.zip.ZIP, period.
typescript_name
xsweet_name
htmlevator_name
```https://gitlab.coko.foundation/XSweet/XSweet/-/issues/180all paragraphs come in as H2s2023-02-21T05:30:35ZDan Viselall paragraphs come in as H2sThe attached .DOCX is a Kotahi test file; it comes in with all of its paragraphs as H2s (see screenshot of source inside of Kotahi). The same thing happens if I run it through http://pdf2html.cloud68.co/, which makes me think that this i...The attached .DOCX is a Kotahi test file; it comes in with all of its paragraphs as H2s (see screenshot of source inside of Kotahi). The same thing happens if I run it through http://pdf2html.cloud68.co/, which makes me think that this is XSweet – there's something about the file (the source of which I don't know) that's encoded incorrectly.
I don't have MS Word on my computer, but opening it up in Mac TextEdit shows some weirdness – all paragraphs are right-aligned, which is clearly incorrect. If I open it in Apple Pages, it looks more or less how I would expect it to.
This particular file isn't very important, but because Kotahi is processing a lot of Word docs coming from strange sources, we sometimes run into bugs that feel similar. (Most recently: display math is incorrectly coming in as H4s.) I don't know what they did to the DOCX to make it behave this way, though it would be nice if we could handle it?
[BodyMass.docx](/uploads/bcb5a0e6d8b6d92cf7c54875ac904f05/BodyMass.docx)
![Screen_Shot_2022-09-26_at_12.27.32_PM](/uploads/6983dda40058a13cc67808fc64f180d0/Screen_Shot_2022-09-26_at_12.27.32_PM.png)https://gitlab.coko.foundation/XSweet/XSweet/-/issues/179XSweet incorrectly adds 'http://' to links inserted within parenthesis2022-10-18T06:48:52ZRyan Dix-PeekXSweet incorrectly adds 'http://' to links inserted within parenthesisWhen importing a docx file into Kotahi, and viewing the manuscript content in Wax; XSweet mistakenly places an 'http://' in front of links that include a parenthesis '(https' (a parenthesis followed by https).
Further points to note;
-...When importing a docx file into Kotahi, and viewing the manuscript content in Wax; XSweet mistakenly places an 'http://' in front of links that include a parenthesis '(https' (a parenthesis followed by https).
Further points to note;
- The content of the docx is plain text.
- If the link text started with 'www' then it makes sense to insert the 'http://' in front on 'wwww', but this is incorrectly applied to the '(https' example.
URLs inserted within parenthesis should be display as links.
![Screenshot_2022-06-30_at_12.24.50](/uploads/ad802eed6314c8588e2dce3a42d10c1d/Screenshot_2022-06-30_at_12.24.50.png)BharathydasanBharathydasanhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/178Doesn't handle images correctly when converting to html2022-07-05T04:09:56ZAnna KhapsasovaDoesn't handle images correctly when converting to html**Problem:**
When we convert docx to html the xsl builds the wrong path to media folder with images. And as a result html doesn't contain images.
**Version:**
We are using pubsweet/job-xsweet:1.5.4 image which contains this issue.
**Fi...**Problem:**
When we convert docx to html the xsl builds the wrong path to media folder with images. And as a result html doesn't contain images.
**Version:**
We are using pubsweet/job-xsweet:1.5.4 image which contains this issue.
**Fix:**
To fix it you need to change docx-html-extract.xsl
Please replace the raw
<xsl:variable name="docx-base" select="resolve-uri('.', document-uri(/))"/>
to
<xsl:variable name="docx-base" select="substring-before(resolve-uri('.', document-uri(/)), '/word')"/>
**Request**
This solution already tested and builds correct path for images. Could you kindly fix the file and update image with this fix?
Unfortunately I don't have persmissions to create a branch and pull request inside the project
Kind regards,
AnnaSuki VenkatSuki Venkathttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/177Ingest Docx files containing binary math2023-02-21T07:09:18ZRyan Dix-PeekIngest Docx files containing binary math**Issue description;** the purpose of this task is to support the import of Docx files that contain binary math (formatted supported by Mathtype) and **view** the math in Wax.
Potential solutions; use Xsweet pipeline to extract the gif...**Issue description;** the purpose of this task is to support the import of Docx files that contain binary math (formatted supported by Mathtype) and **view** the math in Wax.
Potential solutions; use Xsweet pipeline to extract the gif files on import and display the formulas as images in Wax.
[BinaryMath.docx](/uploads/73878e990b94ef590ae6e0e7c868a4ab/BinaryMath.docx)
Error message on import into Kotahi;
![Screenshot_2022-05-31_at_09.24.38](/uploads/b77b5e296378ad792cace63249b462ce/Screenshot_2022-05-31_at_09.24.38.png)
Wax content view;
![Screenshot_2022-05-31_at_11.54.33](/uploads/d14c37fc4857ac9ae99bd6f8ad81197b/Screenshot_2022-05-31_at_11.54.33.png)BharathydasanBharathydasanhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/176Import PDF & convert to Kotahi's HTML profile2022-06-08T17:04:19ZRyan Dix-PeekImport PDF & convert to Kotahi's HTML profile**Description;** the purpose of this task is to support the import of PDFs into Kotahi through integration with Sciencebeam. Sciencebeam supports the conversion of PDFs to XML. We require conversion of PDF to HTML (Kotahi HTML profile sp...**Description;** the purpose of this task is to support the import of PDFs into Kotahi through integration with Sciencebeam. Sciencebeam supports the conversion of PDFs to XML. We require conversion of PDF to HTML (Kotahi HTML profile specifically).
Suggested solution; XSweet accepts docx, but the remaining pipelines support HTML. Convert PDF to HTML and then feed the output through XSweet for the doc clean-up; PDF -> TEI-XML -> Docx -> XSweet -> Wax
**Acceptance criteria;**
- Ensure HTML is accessible in Wax.
- Extract manuscript metadata and populate the submission form i.e. title, abstract and/or author name data.Suki VenkatSuki Venkathttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/175XSweet is broken2022-05-10T04:55:01ZAlexandros GeorgantasXSweet is brokenWhen I try to build `job-xsweet` or try to use `xsweet-service` I am getting the following error.
Branch `wax2`
```
Error: Command failed: sh /home/node/xsweet/scripts/execute_chain.sh /home/node/xsweet/_conversion-221zp50jANrtTTM
se...When I try to build `job-xsweet` or try to use `xsweet-service` I am getting the following error.
Branch `wax2`
```
Error: Command failed: sh /home/node/xsweet/scripts/execute_chain.sh /home/node/xsweet/_conversion-221zp50jANrtTTM
server_1 | Error near {...es(../../pic:nvPicPr/pic:cN...} at char 25 in xsl:when/@test on line 814 column 137 of docx-html-extract.xsl:
server_1 | XPST0081 Namespace prefix 'pic' has not been declared
server_1 | Error in {../../pic:nvPicPr/pic:cNvPr} at char 17 in xsl:variable/@select on line 815 column 91 of docx-html-extract.xsl:
server_1 | XPST0081 Namespace prefix 'pic' has not been declared
server_1 | Error in {../../pic:nvPicPr/pic:cNvPr} at char 17 in xsl:when/@test on line 818 column 62 of docx-html-extract.xsl:
server_1 | XPST0081 Namespace prefix 'pic' has not been declared
server_1 | Error in {../../pic:nvPicPr/pic:cNvPr} at char 17 in xsl:value-of/@select on line 819 column 72 of docx-html-extract.xsl:
server_1 | XPST0081 Namespace prefix 'pic' has not been declared
server_1 | Error near {...es(../../pic:nvPicPr/pic:cN...} at char 25 in expression in xsl:when/@test on line 814 column 137 of docx-html-extract.xsl:
server_1 | XPST0081 Namespace prefix 'pic' has not been declared
server_1 | In template rule with match="xsw:transform" on line 56 of PIPELINE.xsl
server_1 | invoked by xsl:apply-templates at file:/home/node/xsweet/scripts/../XSweet/applications/PIPELINE.xsl#48
server_1 | invoked by xsl:iterate at file:/home/node/xsweet/scripts/../XSweet/applications/PIPELINE.xsl#43
server_1 | In template rule with match="/" on line 41 of PIPELINE.xsl
server_1 | Namespace prefix 'pic' has not been declared
server_1 | There was an error converting the document.
```BharathydasanBharathydasanhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/174Deploy an example service2022-04-28T05:54:42ZAdam Hydeadam@coko.foundationDeploy an example serviceWe have long needed a web based service to run from (linked from) xsweet.org where folks can test uploading a docx file and see the results.We have long needed a web based service to run from (linked from) xsweet.org where folks can test uploading a docx file and see the results.Ryan Dix-PeekRyan Dix-Peekhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/173Copyediting cleanups are not suitable for Spanish language2022-05-16T09:15:37ZSofia OlguinCopyediting cleanups are not suitable for Spanish language## Context
The [HTMLevator copyediting cleanups](https://xsweet.org/documentation/htmlevator/) does the following:
>Any number of spaces before or after em dashes are removed
This is suitable for English language texts, but in Spani...## Context
The [HTMLevator copyediting cleanups](https://xsweet.org/documentation/htmlevator/) does the following:
>Any number of spaces before or after em dashes are removed
This is suitable for English language texts, but in Spanish this generates errors in all the dialogs. In Spanish, the dialogs are written this:
—Hola —dijo el joven.
To reproduce:
- upload the attached word file in Editoria
- Check the chapter in Editoria and see how the space character disappear.
[dashSpace.docx](/uploads/7af2653f4e107dcf0ba33789e6b1ceb7/dashSpace.docx)
## Suggested solution
The HTMLevator copyediting cleanups should be configureable to support:
* difference uses cases between languages
* diffrerent use cases between Editorial house style guidesDione Mentisdione@coko.foundationDione Mentisdione@coko.foundation