Skip to content

fix(*): deals with double backslashes in math in html in upload

Dan Visel requested to merge xsweet-math-fixes into main

As noted in #1433 , sometimes math was coming out of the new XSweet container with double-backslashes in it – when the HTML is treated as a JavaScript string, backslashes may be inserted before LaTeX commands. This cleans out double-backslashed LaTeX. This works with both the problematic Amnet document and the Russian document we had a few weeks back.

I strongly suspect that this is coming out of XSweet itself – we see the same results if we run files through http://pdf2html.cloud68.co with "Skip mml2tex" turned off. Part of the problem seems to be that both on the server side (in the XSweet container) and on the client side, we are sometimes running the HTML through Cheerio, which escapes the math. But LaTeX inside of HTML is always likely to cause problems because of this kind of unescape issue.

If we could get MathML out of XSweet instead of TeX, we could theoretically convert the MathML to LaTeX for MathJax when it needs it, which would get around this problem. Of course, MathML going through Cheerio might have its own issues, being a different flavor of XML?

(thanks to Ben for the great regexes here!)

Merge request reports