XSweet issueshttps://gitlab.coko.foundation/XSweet/XSweet/-/issues2023-02-21T07:09:18Zhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/177Ingest Docx files containing binary math2023-02-21T07:09:18ZRyan Dix-PeekIngest Docx files containing binary math**Issue description;** the purpose of this task is to support the import of Docx files that contain binary math (formatted supported by Mathtype) and **view** the math in Wax.
Potential solutions; use Xsweet pipeline to extract the gif...**Issue description;** the purpose of this task is to support the import of Docx files that contain binary math (formatted supported by Mathtype) and **view** the math in Wax.
Potential solutions; use Xsweet pipeline to extract the gif files on import and display the formulas as images in Wax.
[BinaryMath.docx](/uploads/73878e990b94ef590ae6e0e7c868a4ab/BinaryMath.docx)
Error message on import into Kotahi;
![Screenshot_2022-05-31_at_09.24.38](/uploads/b77b5e296378ad792cace63249b462ce/Screenshot_2022-05-31_at_09.24.38.png)
Wax content view;
![Screenshot_2022-05-31_at_11.54.33](/uploads/d14c37fc4857ac9ae99bd6f8ad81197b/Screenshot_2022-05-31_at_11.54.33.png)BharathydasanBharathydasanhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/180all paragraphs come in as H2s2023-02-21T05:30:35ZDan Viselall paragraphs come in as H2sThe attached .DOCX is a Kotahi test file; it comes in with all of its paragraphs as H2s (see screenshot of source inside of Kotahi). The same thing happens if I run it through http://pdf2html.cloud68.co/, which makes me think that this i...The attached .DOCX is a Kotahi test file; it comes in with all of its paragraphs as H2s (see screenshot of source inside of Kotahi). The same thing happens if I run it through http://pdf2html.cloud68.co/, which makes me think that this is XSweet – there's something about the file (the source of which I don't know) that's encoded incorrectly.
I don't have MS Word on my computer, but opening it up in Mac TextEdit shows some weirdness – all paragraphs are right-aligned, which is clearly incorrect. If I open it in Apple Pages, it looks more or less how I would expect it to.
This particular file isn't very important, but because Kotahi is processing a lot of Word docs coming from strange sources, we sometimes run into bugs that feel similar. (Most recently: display math is incorrectly coming in as H4s.) I don't know what they did to the DOCX to make it behave this way, though it would be nice if we could handle it?
[BodyMass.docx](/uploads/bcb5a0e6d8b6d92cf7c54875ac904f05/BodyMass.docx)
![Screen_Shot_2022-09-26_at_12.27.32_PM](/uploads/6983dda40058a13cc67808fc64f180d0/Screen_Shot_2022-09-26_at_12.27.32_PM.png)https://gitlab.coko.foundation/XSweet/XSweet/-/issues/181MathML equations are not converted correctly2023-09-13T12:30:42ZRyan Dix-PeekMathML equations are not converted correctlyHere's what I see in the original doc; instead of misrepresenting a `<p>` as an `<h4>` (following an empty `<h4>`), what's actually happening is that we have a `<p>` turning into nested `<h4><h4>`
Useful links;
- Issue description, fixe...Here's what I see in the original doc; instead of misrepresenting a `<p>` as an `<h4>` (following an empty `<h4>`), what's actually happening is that we have a `<p>` turning into nested `<h4><h4>`
Useful links;
- Issue description, fixes and examples; https://gitlab.coko.foundation/kotahi/kotahi/-/issues/1023
- Testing feedback; https://gitlab.coko.foundation/kotahi/kotahi/-/issues/1023#note_107694https://gitlab.coko.foundation/XSweet/XSweet/-/issues/182paragraphs don't come in with tags2023-03-01T05:14:29ZDan Viselparagraphs don't come in with tagsWe have a Word doc we're ingesting in Kotahi that's not behaving as it should be – there's a list of maybe 100 citations at the end of the doc (all are regular paragraphs), and the first 15 come in correctly, then the last 85 all as a si...We have a Word doc we're ingesting in Kotahi that's not behaving as it should be – there's a list of maybe 100 citations at the end of the doc (all are regular paragraphs), and the first 15 come in correctly, then the last 85 all as a single paragraph. I ran the document through http://pdf2html.cloud68.co to see what XSweet was doing and noticed this:
![Screenshot_2023-02-22_at_5.03.24_PM](/uploads/e63a1ff25a0f28fafed8829989db8e2f/Screenshot_2023-02-22_at_5.03.24_PM.png)
The doc on the left looks correct, but if you look at the source on the right, the citations that begin "Buxton" and "Chen" don't have any tag around them at all. The browser's displaying them as paragraphs because they're between other block-level elements. When it comes into Kotahi/Wax, we don't expect that there won't be a block-level tag and the results are inconsistent.
One thing I notice while looking at the list of citations: almost every citation contains a hyperlink. Six don't; four of those don't get paragraph tags. Aside from the hyperlinks, the citations appear to have no formatting.
Here's how Word's XML is marking up three paragraphs: the first and the last come in normally, the middle one doesn't get a paragraph tag:
```xml
<w:p w:rsidR="00A07C5C" w:rsidRPr="00361E5B" w:rsidRDefault="00A07C5C" w:rsidP="0077453B">
<w:pPr>
<w:pStyle w:val="BodyText"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
<w:spacing w:after="14" w:line="360" w:lineRule="auto"/>
<w:ind w:left="454" w:hanging="454"/>
<w:jc w:val="both"/>
<w:rPr>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
</w:pPr>
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>Bush, K., Zhou, S., Cisler, J., Bian, J., Hazaroglu, O., Gillispie, K., Yoshigoe, K., Kilts, C., 2015. A deconvolution-based approach to identifying large-scale effective connectivity. Magnetic Resonance Imaging 33, 1290</w:t>
</w:r>
<w:r w:rsidR="00EE6F93" w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>–</w:t>
</w:r>
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>1298. doi:</w:t>
</w:r>
<w:hyperlink r:id="rId33" w:history="1">
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="Hyperlink"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:sz w:val="18"/>
<w:szCs w:val="18"/>
<w:u w:val="none"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>10.1016/j.mri.2015.07.015</w:t>
</w:r>
</w:hyperlink>
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>
.
</w:t>
</w:r>
</w:p>
<w:p w:rsidR="00A07C5C" w:rsidRPr="00361E5B" w:rsidRDefault="00A07C5C" w:rsidP="0077453B">
<w:pPr>
<w:pStyle w:val="BodyText"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
<w:spacing w:after="14" w:line="360" w:lineRule="auto"/>
<w:ind w:left="454" w:hanging="454"/>
<w:jc w:val="both"/>
<w:rPr>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
</w:pPr>
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>Buxton, R.B., Wong, E.C., Frank, L.R., 1998. Dynamics of blood flow and oxygenation changes during brain activation: the balloon model. Magnetic resonance in medicine 39, 855</w:t>
</w:r>
<w:r w:rsidR="00EE6F93" w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>–</w:t>
</w:r>
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>864.</w:t>
</w:r>
</w:p>
<w:p w:rsidR="00A07C5C" w:rsidRPr="00361E5B" w:rsidRDefault="00A07C5C" w:rsidP="0077453B">
<w:pPr>
<w:pStyle w:val="BodyText"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
<w:spacing w:after="14" w:line="360" w:lineRule="auto"/>
<w:ind w:left="454" w:hanging="454"/><w:jc w:val="both"/>
<w:rPr>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
</w:pPr>
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/><w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>Caballero-Gaudes, C., Moia, S., Panwar, P., Bandettini, P.A., Gonzalez-Castillo, J., 2019. A deconvolution algorithm for multi-echo functional MRI: Multi-echo sparse paradigm free mapping. NeuroImage 202, 116081. doi:</w:t>
</w:r>
<w:hyperlink r:id="rId34" w:history="1">
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="Hyperlink"/><w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:sz w:val="18"/>
<w:szCs w:val="18"/>
<w:u w:val="none"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>10.1016/j.neuroimage.2019.116081</w:t>
</w:r>
</w:hyperlink>
<w:r w:rsidRPr="00361E5B">
<w:rPr>
<w:rStyle w:val="BodyTextChar1"/>
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/>
<w:color w:val="000000" w:themeColor="text1"/>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF" w:themeFill="background1"/>
</w:rPr>
<w:t>.</w:t>
</w:r>
</w:p>
```https://gitlab.coko.foundation/XSweet/XSweet/-/issues/184Backslashes in math equations are not being converted correctly2023-10-02T12:15:31ZRyan Dix-PeekBackslashes in math equations are not being converted correctlyWhat seems to be happening is that backslashes are getting escaped. For example; `\sin x` turns into `\\sin x`. There were some fixes recently integrated into Kotahi via [this MR](https://gitlab.coko.foundation/kotahi/kotahi/-/merge_requ...What seems to be happening is that backslashes are getting escaped. For example; `\sin x` turns into `\\sin x`. There were some fixes recently integrated into Kotahi via [this MR](https://gitlab.coko.foundation/kotahi/kotahi/-/merge_requests/987) that were addressing this, but this should be handled during conversion.
Example; https://kotahi.kotahidev.cloud68.co/kotahi/versions/7fb2d295-046a-40fc-8faf-13de3d3a5b10/decision
![Screenshot_2023-10-02_at_08.56.22](/uploads/9c99571b3f070476513dfc24040af287/Screenshot_2023-10-02_at_08.56.22.png)