XSweet issueshttps://gitlab.coko.foundation/XSweet/XSweet/-/issues2018-10-10T04:56:20Zhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/158Some more Word detritus2018-10-10T04:56:20ZWendell PiezSome more Word detritusTo catch WordML elements so far unaccounted for -- we should consider the following matching and whether there isn't info to be captured e.g. from `caps` or `highlight`. The rest should be cleaned up in a "scrub" phase:
```
<xsl:templat...To catch WordML elements so far unaccounted for -- we should consider the following matching and whether there isn't info to be captured e.g. from `caps` or `highlight`. The rest should be cleaned up in a "scrub" phase:
```
<xsl:template match="noProof | iCs">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="caps | spacing | highlight | webHidden">
<span class="{local-name()}">
<xsl:apply-templates/>
</span>
</xsl:template>
```https://gitlab.coko.foundation/XSweet/XSweet/-/issues/143Remove <spacing> element2018-07-27T04:13:07ZAlex ThegRemove <spacing> elementIn Prado, Ch 1, some paragraphs are composed of very small snippets of text enclosed by `<spacing>` tags.
The `<spacing>` tags should be removed, joining the text inside and outside of them into one string.
```html
<p style="margin-lef...In Prado, Ch 1, some paragraphs are composed of very small snippets of text enclosed by `<spacing>` tags.
The `<spacing>` tags should be removed, joining the text inside and outside of them into one string.
```html
<p style="margin-left: 5pt; margin-right: 2.15pt; margin-top: 0.5pt; text-indent: 36pt">
<spacing>Man</spacing>u
<spacing>e</spacing>l
<spacing> B</spacing>o
<spacing>telh</spacing>o
<spacing> </spacing>de
<spacing> Lacer</spacing>da
<spacing> </spacing>w
<spacing>a</spacing>s
...
```https://gitlab.coko.foundation/XSweet/XSweet/-/issues/142CSS for hanging paragraphs2018-08-07T14:24:43ZAlex ThegCSS for hanging paragraphsXSweet extracts regular paragraph indentation from Word into CSS correctly, but it needs a tweak to how it handles hanging paragraphs.
Indentation without hanging works great:
One indent no hanging: `<w:ind w:left="720"/>` -> `<p style...XSweet extracts regular paragraph indentation from Word into CSS correctly, but it needs a tweak to how it handles hanging paragraphs.
Indentation without hanging works great:
One indent no hanging: `<w:ind w:left="720"/>` -> `<p style="margin-left: 36pt">`
Two indent no hanging: `<w:ind w:left="1440"/>` -> `<p style="margin-left: 72pt">`
But the indentation with hanging needs another CSS property to be correct:
One indent hanging:
`<w:ind w:left="1440" w:hanging="720"/>` -> `<p style="padding-left: 36pt; text-indent: -36pt">`
It needs a `margin-left: 36pt;` added in addition to what's already there to be correct.
Two indent hanging:
`<w:ind w:left="2160" w:hanging="720"/>` -> `<p style="padding-left: 36pt; text-indent: -36pt">`
It needs a `margin-left: 72pt;` added in addition to what's already there and then it's correct.
Here's an test docx: [hanging.docx](/uploads/459ecfb10d4e6c42caf16f4983c52142/hanging.docx)1.0.0https://gitlab.coko.foundation/XSweet/XSweet/-/issues/140Formatting issues with nested spans and Word styles2018-05-01T13:52:44ZAlex ThegFormatting issues with nested spans and Word stylesThis is somewhat related to #131
[small_caps_example.docx](/uploads/c5e9961e5f3248e8d6ace6c892d97048/small_caps_example.docx)
In the attached example, "Acknowledgements" comes through in bold and small caps but it should not - it use...This is somewhat related to #131
[small_caps_example.docx](/uploads/c5e9961e5f3248e8d6ace6c892d97048/small_caps_example.docx)
In the attached example, "Acknowledgements" comes through in bold and small caps but it should not - it uses the Word style "BookTitle + Not Bold, Not Small caps". It looks like this is a question of nested spans, the priority in which the formatting is resolved, and how Word style modifiers are extracted into the html.
Here's the html after the join step:
```html
<p style="margin-bottom: 0pt">
<span style="font-variant: normal; font-weight: bold">
<span class="BookTitle">
<span style="font-weight: normal">Acknowledgements</span>
</span>
</span>
<a class="bookmarkStart" id="docx-bookmark_0">
<!-- bookmark ='_GoBack'-->
</a>
<a href="#docx-bookmark_0">
<!-- bookmark end -->
</a>
</p>
```
Here it is after the the collapse step. At this point, I believe the innermost span's `font-weight: normal` should have been passed to the outer `class="BookTitle` span, but it is not:
```html
<p style="margin-bottom: 0pt">
<span style="font-variant: normal; font-weight: bold">
<span class="BookTitle">Acknowledgements</span>
</span>
<a class="bookmarkStart" id="docx-bookmark_0">
<!-- bookmark ='_GoBack'-->
</a>
<a href="#docx-bookmark_0">
<!-- bookmark end -->
</a>
</p>
```
And here is the final rinsed html:
```html
<h2 style="margin-bottom: 0pt">
<span style="font-variant: normal; font-weight: bold">
<span class="BookTitle">Acknowledgements</span>
</span>
<a class="bookmarkStart" id="docx-bookmark_0"><!-- bookmark ='_GoBack'--></a>
<a href="#docx-bookmark_0"><!-- bookmark end --></a>
</h2>
```
And, the `font-variant: normal` needs to be passed down to the innermost span, or else it's clobbered by the `BookTitle` styling on the innermost span.1.0.0https://gitlab.coko.foundation/XSweet/XSweet/-/issues/132Invisible bib entry in Horton visible in HTML2019-07-07T23:05:58ZAlex ThegInvisible bib entry in Horton visible in HTMLHow come? For Alex to investigate.
"U.S. Dept. of Labor. 2006. Census of Fatal Occupational Injuries."
XML:
```xml
<w:p w14:paraId="59927B05" w14:textId="77777777" w:rsidR="00DA5911" w:rsidRPr="00DA5911" w:rsidRDefault="00DA5911" w:rsi...How come? For Alex to investigate.
"U.S. Dept. of Labor. 2006. Census of Fatal Occupational Injuries."
XML:
```xml
<w:p w14:paraId="59927B05" w14:textId="77777777" w:rsidR="00DA5911" w:rsidRPr="00DA5911" w:rsidRDefault="00DA5911" w:rsidP="00DA5911">
<w:pPr>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
<w:spacing w:line="0" w:lineRule="auto"/>
<w:rPr>
<w:rFonts w:ascii="ff6" w:eastAsia="Times New Roman" w:hAnsi="ff6" w:cs="Times New Roman"/>
<w:color w:val="231F20"/>
<w:sz w:val="102"/>
<w:szCs w:val="102"/>
</w:rPr>
</w:pPr>
<w:proofErr w:type="gramStart"/>
<w:r w:rsidRPr="00DA5911">
<w:rPr>
<w:rFonts w:ascii="ff6" w:eastAsia="Times New Roman" w:hAnsi="ff6" w:cs="Times New Roman"/>
<w:color w:val="231F20"/>
<w:sz w:val="102"/>
<w:szCs w:val="102"/>
</w:rPr>
<w:t>U.S. Dept. of Labor.</w:t>
</w:r>
<w:proofErr w:type="gramEnd"/>
<w:r w:rsidRPr="00DA5911">
<w:rPr>
<w:rFonts w:ascii="ff6" w:eastAsia="Times New Roman" w:hAnsi="ff6" w:cs="Times New Roman"/>
<w:color w:val="231F20"/>
<w:sz w:val="102"/>
<w:szCs w:val="102"/>
</w:rPr>
<w:t xml:space="preserve">2006. Census of Fatal Occupational Injuries.</w:t>
</w:r>
</w:p>
<w:p w14:paraId="63865577" w14:textId="77777777" w:rsidR="00DA5911" w:rsidRPr="00DA5911" w:rsidRDefault="00DA5911" w:rsidP="00DA5911">
<w:pPr><w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/><w:spacing w:line="0" w:lineRule="auto"/>
<w:rPr><w:rFonts w:ascii="ff6" w:eastAsia="Times New Roman" w:hAnsi="ff6" w:cs="Times New Roman"/><w:color w:val="231F20"/><w:sz w:val="102"/><w:szCs w:val="102"/></w:rPr>
</w:pPr>
<w:r w:rsidRPr="00DA5911">
<w:rPr><w:rFonts w:ascii="ff6" w:eastAsia="Times New Roman" w:hAnsi="ff6" w:cs="Times New Roman"/><w:color w:val="231F20"/><w:sz w:val="102"/><w:szCs w:val="102"/></w:rPr>
<w:t xml:space="preserve">Washington, D.C., Bureau of Labor Statistics.
</w:t>
</w:r>
</w:p>
```
HTML:
```html
<p>
<span style="font-family: ff6; color: #231F20; font-size: 51pt">U.S. Dept. of Labor.</span>
<span style="font-family: ff6; color: #231F20; font-size: 51pt"> 2006. Census of Fatal Occupational Injuries.</span>
</p>
<p>
<span style="font-family: ff6; color: #231F20; font-size: 51pt">Washington, D.C., Bureau of Labor Statistics. </span>
</p>
```1.0.0https://gitlab.coko.foundation/XSweet/XSweet/-/issues/105Incorrect fonts in html - coming from w:rFonts attributes?2018-04-24T05:21:50ZAlex ThegIncorrect fonts in html - coming from w:rFonts attributes?Brinton Ch 8 has some incorrect fonts coming through into the html. The following text is all Times in Word:
>Even though ‘ulama’ like Qaradawi assume that images...
However, it comes through in the rinsed html in 3 different fonts:...Brinton Ch 8 has some incorrect fonts coming through into the html. The following text is all Times in Word:
>Even though ‘ulama’ like Qaradawi assume that images...
However, it comes through in the rinsed html in 3 different fonts: Times, Menlo Regular, and Helvetica. It looks like it has to do with the `w:rFonts` attributes: `w:cs`, `w:eastAsia`, `w:ascii` and `w:hAnsi`. These specify the font to use for certain character types.
The word "Qaradawi" is extracted as Helvetica:
```xml
<w:r w:rsidRPr="009337E2">
<w:rPr>
<w:rFonts w:eastAsia="Helvetica"/>
</w:rPr>
<w:t>Qaradawi</w:t>
</w:r>
```
And " assume that " is extracted as Menlo Regular:
```xml
<w:r w:rsidRPr="009337E2">
<w:rPr>
<w:rFonts w:eastAsia="Helvetica" w:cs="Menlo Regular"/>
</w:rPr>
<w:t xml:space="preserve"> assume that</w:t>
</w:r>
```
The html doesn't specify different fonts for different character types in the same way. How does XSweet handle these `w:rFonts` attributes? Since this displays in the original Word as all Times, I am guessing that Word doesn't consider any of the characters in these runs to be of the type `w:eastAsia` or `w:cs`, but I'm not sure how it decides what kind of character it's looking at. Do you have a better idea what's going on here?
Here's the full XML:
```xml
<w:p w14:paraId="3E8B35BD" w14:textId="77777777" w:rsidR="00DE7EE7" w:rsidRPr="009337E2" w:rsidRDefault="00DE7EE7" w:rsidP="00DE7EE7">
<w:pPr><w:widowControl w:val="0"/>
<w:tabs><w:tab w:val="left" w:pos="560"/><w:tab w:val="left" w:pos="1120"/><w:tab w:val="left" w:pos="1680"/><w:tab w:val="left" w:pos="2240"/><w:tab w:val="left" w:pos="2800"/><w:tab w:val="left" w:pos="3360"/><w:tab w:val="left" w:pos="3920"/><w:tab w:val="left" w:pos="4480"/><w:tab w:val="left" w:pos="5040"/><w:tab w:val="left" w:pos="5600"/><w:tab w:val="left" w:pos="6160"/><w:tab w:val="left" w:pos="6720"/></w:tabs><w:autoSpaceDE w:val="0"/><w:autoSpaceDN w:val="0"/><w:adjustRightInd w:val="0"/><w:spacing w:line="480" w:lineRule="auto"/>
<w:rPr><w:rFonts w:cs="Times"/></w:rPr>
</w:pPr>
<w:r w:rsidRPr="009337E2">
<w:rPr>
<w:rFonts w:eastAsia="Helvetica" w:cs="Times New Roman"/>
<w:color w:val="000000"/>
<w:szCs w:val="20"/>
</w:rPr>
<w:tab/>
</w:r>
<w:r w:rsidRPr="009337E2">
<w:rPr>
<w:rFonts w:cs="Times"/>
</w:rPr>
<w:t xml:space="preserve">Even though</w:t>
</w:r>
<w:r w:rsidR="00BA3E1D">
<w:rPr>
<w:rFonts w:cs="Times"/>
</w:rPr>
<w:t>‘ulama’</w:t>
</w:r>
<w:r w:rsidRPr="009337E2">
<w:rPr>
<w:rFonts w:cs="Times"/>
</w:rPr>
<w:t xml:space="preserve"> like </w:t>
</w:r>
<w:r w:rsidRPr="009337E2">
<w:rPr>
<w:rFonts w:eastAsia="Helvetica"/>
</w:rPr>
<w:t>Qaradawi</w:t>
</w:r>
<w:r w:rsidRPr="009337E2">
<w:rPr>
<w:rFonts w:eastAsia="Helvetica" w:cs="Menlo Regular"/>
</w:rPr>
<w:t xml:space="preserve"> assume that</w:t>
</w:r>
<w:r w:rsidRPr="009337E2">
<w:rPr>
<w:rFonts w:cs="Times"/>
</w:rPr>
<w:t xml:space="preserve"> images of certain objects </w:t>
</w:r>
```
Here's how it's extracted:
```html
<p>
<span style="font-family: Times New Roman"><tab/></span>
<span style="font-family: Times">Even though </span>
<span style="font-family: Times">‘ulama’</span>
<span style="font-family: Times"> like </span>
<span style="font-family: Helvetica">Qaradawi</span>
<span style="font-family: Menlo Regular"> assume that</span>
<span style="font-family: Times"> images of certain objects </span>
```
And here's the final html
```html
<p><span class="tab"><!-- tab --></span>
<span style="font-family: Times">Even though ‘ulama’ like </span>
<span style="font-family: Helvetica">Qaradawi</span>
<span style="font-family: Menlo Regular"> assume that</span>
<span style="font-family: Times"> images of certain objects
```1.0.0https://gitlab.coko.foundation/XSweet/XSweet/-/issues/77Chapter headings come through in blue2017-08-16T19:56:39ZAlex ThegChapter headings come through in blueIn Buchbinder's book, all of the chapters come through into the HTML with blue coloring. It seems to be caused by a `style="color: auto"` attribute. The headings display as black in Word, but light blue in browsers (Chrome and Firefox)...In Buchbinder's book, all of the chapters come through into the HTML with blue coloring. It seems to be caused by a `style="color: auto"` attribute. The headings display as black in Word, but light blue in browsers (Chrome and Firefox)
Here are examples of what's causing it:
```html
<h1 class="Subtitle" style="color: auto; font-family: Garamond; font-size: 14pt; font-weight: bold; margin-bottom: 0pt">Introduction</h1>
```
```html
<h1 class="Subtitle" style="color: auto; font-family: Garamond; font-size: 14pt; font-weight: bold; margin-bottom: 0pt">Chapter One </h1>
<h1 class="Subtitle" style="color: auto; font-family: Garamond; font-size: 14pt; font-weight: bold; margin-bottom: 0pt">The Bottom of the Funnel </h1>
```
```html
<h1 class="Subtitle" style="color: auto; font-family: Garamond; font-size: 14pt; font-weight: bold; margin-bottom: 0pt">Chapter Three </h1>
<h1 class="Subtitle" style="color: auto; font-family: Garamond; font-size: 14pt; font-weight: bold; margin-bottom: 0pt">Sticky Brains </h1>
```
```html
<h1 class="Subtitle" style="color: auto; font-family: Garamond; font-size: 14pt; font-weight: bold; margin-bottom: 0pt">Chapter Four </h1>
<h1 class="Subtitle" style="color: auto; font-family: Garamond; font-size: 14pt; font-weight: bold; margin-bottom: 0pt">Treating the Family</h1>
```
```html
<h1 class="Subtitle" style="color: auto; font-family: Garamond; font-size: 14pt; font-weight: bold; margin-bottom: 0pt">Chapter Five </h1>
<h1 class="Subtitle" style="color: auto; font-family: Garamond; font-size: 14pt; font-weight: bold; margin-bottom: 0pt">Locating Pain in Societal Stress</h1>
```
I'll mark this as low priority, since this is an html-only improvement. The color gets scrubbed out by typescript.Wendell PiezWendell Piezhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/62Handling whitespace-only formatting2019-07-07T21:34:20ZAlex ThegHandling whitespace-only formattingBakker ch1, see #56 for files
There are 5 headers of the same level, but one of theme doesn't get promoted like the others. Seems to be caused by a `<tab>` at the end of the heading "The heroic migrant and the end of migration".
T...Bakker ch1, see #56 for files
There are 5 headers of the same level, but one of theme doesn't get promoted like the others. Seems to be caused by a `<tab>` at the end of the heading "The heroic migrant and the end of migration".
These all get promoted to h1:
* Keeping the monies flowing the times of crises
* The limits of migrant inclusion
* Migration, state-led transnationalism, and development
* The Washington Consensus and beyond: the continuing significance of market fundamentalism in development policy and practice
This one doesn't:
* The heroic migrant and the end of migration
Here's one that gets promoted, just after join-elements and before the header promotion steps:
````html
<p class="Default" style="font-size: 12pt; font-style: italic; margin-bottom: 6pt"><i>The limits of migrant inclusion</i></p>
````
This is the offending tab (at least, I think it's the tab keeping this from being recognized as a header):
````html
<p class="Default" style="font-size: 12pt; margin-bottom: 6pt"><i>The heroic migrant and the end of migration</i>
<tab/>
</p>
````
Perhaps a cleaning step that strips out trailing tabs before promotion? I can't think where a trailing tab would ever be meaningful.1.0.0https://gitlab.coko.foundation/XSweet/XSweet/-/issues/42Handle highlighting2020-06-03T15:08:18ZAlex ThegHandle highlightingOpening this issue because I'm looking at an example, but I'm going to put it on hold for now as it's a low priority.
From Green, Ch 1, "Fig. 6 about here" is highlighted green in Word, and comes through as a highlight tag in the HTML,...Opening this issue because I'm looking at an example, but I'm going to put it on hold for now as it's a low priority.
From Green, Ch 1, "Fig. 6 about here" is highlighted green in Word, and comes through as a highlight tag in the HTML, but does not actually appear in the HTML as highlighed:
```html
<p style="font-weight: bold; font-size: 18pt">
<highlight>[Fig. 6 about here.]</highlight>
</p>
```
1. Should we try to catch highlighting?
2. If so, do we care about preserving the original color?