XSweet issueshttps://gitlab.coko.foundation/groups/XSweet/-/issues2018-05-29T01:38:51Zhttps://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/7Hyperlink inferrer tags some file names as links that shouldn't be2018-05-29T01:38:51ZAlex ThegHyperlink inferrer tags some file names as links that shouldn't beIn an author docx that lists captions for images to be included in the book, these strings get linked as hyperlinks:
* 04_IntroTodosSantos.jpg
* 13_ClinicScale_20R06.jpg
They shouldn't be, since they don't point to anything. Could we a...In an author docx that lists captions for images to be included in the book, these strings get linked as hyperlinks:
* 04_IntroTodosSantos.jpg
* 13_ClinicScale_20R06.jpg
They shouldn't be, since they don't point to anything. Could we add a small adjustment to be sure things like this don't get linked? Perhaps a validation like "must have at least one slash if it ends in a file extension" or something similar?https://gitlab.coko.foundation/XSweet/XSweet/-/issues/132Invisible bib entry in Horton visible in HTML2019-07-07T23:05:58ZAlex ThegInvisible bib entry in Horton visible in HTMLHow come? For Alex to investigate.
"U.S. Dept. of Labor. 2006. Census of Fatal Occupational Injuries."
XML:
```xml
<w:p w14:paraId="59927B05" w14:textId="77777777" w:rsidR="00DA5911" w:rsidRPr="00DA5911" w:rsidRDefault="00DA5911" w:rsi...How come? For Alex to investigate.
"U.S. Dept. of Labor. 2006. Census of Fatal Occupational Injuries."
XML:
```xml
<w:p w14:paraId="59927B05" w14:textId="77777777" w:rsidR="00DA5911" w:rsidRPr="00DA5911" w:rsidRDefault="00DA5911" w:rsidP="00DA5911">
<w:pPr>
<w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
<w:spacing w:line="0" w:lineRule="auto"/>
<w:rPr>
<w:rFonts w:ascii="ff6" w:eastAsia="Times New Roman" w:hAnsi="ff6" w:cs="Times New Roman"/>
<w:color w:val="231F20"/>
<w:sz w:val="102"/>
<w:szCs w:val="102"/>
</w:rPr>
</w:pPr>
<w:proofErr w:type="gramStart"/>
<w:r w:rsidRPr="00DA5911">
<w:rPr>
<w:rFonts w:ascii="ff6" w:eastAsia="Times New Roman" w:hAnsi="ff6" w:cs="Times New Roman"/>
<w:color w:val="231F20"/>
<w:sz w:val="102"/>
<w:szCs w:val="102"/>
</w:rPr>
<w:t>U.S. Dept. of Labor.</w:t>
</w:r>
<w:proofErr w:type="gramEnd"/>
<w:r w:rsidRPr="00DA5911">
<w:rPr>
<w:rFonts w:ascii="ff6" w:eastAsia="Times New Roman" w:hAnsi="ff6" w:cs="Times New Roman"/>
<w:color w:val="231F20"/>
<w:sz w:val="102"/>
<w:szCs w:val="102"/>
</w:rPr>
<w:t xml:space="preserve">2006. Census of Fatal Occupational Injuries.</w:t>
</w:r>
</w:p>
<w:p w14:paraId="63865577" w14:textId="77777777" w:rsidR="00DA5911" w:rsidRPr="00DA5911" w:rsidRDefault="00DA5911" w:rsidP="00DA5911">
<w:pPr><w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/><w:spacing w:line="0" w:lineRule="auto"/>
<w:rPr><w:rFonts w:ascii="ff6" w:eastAsia="Times New Roman" w:hAnsi="ff6" w:cs="Times New Roman"/><w:color w:val="231F20"/><w:sz w:val="102"/><w:szCs w:val="102"/></w:rPr>
</w:pPr>
<w:r w:rsidRPr="00DA5911">
<w:rPr><w:rFonts w:ascii="ff6" w:eastAsia="Times New Roman" w:hAnsi="ff6" w:cs="Times New Roman"/><w:color w:val="231F20"/><w:sz w:val="102"/><w:szCs w:val="102"/></w:rPr>
<w:t xml:space="preserve">Washington, D.C., Bureau of Labor Statistics.
</w:t>
</w:r>
</w:p>
```
HTML:
```html
<p>
<span style="font-family: ff6; color: #231F20; font-size: 51pt">U.S. Dept. of Labor.</span>
<span style="font-family: ff6; color: #231F20; font-size: 51pt"> 2006. Census of Fatal Occupational Injuries.</span>
</p>
<p>
<span style="font-family: ff6; color: #231F20; font-size: 51pt">Washington, D.C., Bureau of Labor Statistics. </span>
</p>
```1.0.0https://gitlab.coko.foundation/XSweet/XSweet/-/issues/114Ensure endnotes appear with their numbers at the end2017-08-16T23:29:59ZAlex ThegEnsure endnotes appear with their numbers at the endSee b_04_ch_3_Bakker, and #2, where this issue was first reported:
The inline callout is fine, but the third endnote doesn’t show its number next to the text of the note at the end of the document. This is an error in the Word doc, whic...See b_04_ch_3_Bakker, and #2, where this issue was first reported:
The inline callout is fine, but the third endnote doesn’t show its number next to the text of the note at the end of the document. This is an error in the Word doc, which comes through into the html. Clicking the inline note callout ("3") in the html takes you to the correct corresponding note, but it's not labeled "3". This is because the note is missing its `<w:endnoteRef/>` in the OOXML:
```xml
<w:r>
<w:rPr><w:sz w:val="24"/><w:szCs w:val="24"/><w:vertAlign w:val="superscript"/></w:rPr><w:endnoteRef/></w:r>
<w:r>
```
All the other note references have this `<w:endnoteRef/>` tag. This element is extracted into the html as `<span class="endnoteRef">`, and that’s where the corresponding note number gets inserted.
To fix this, we could insert this `<span class="endnoteRef">` into the html in the proper place (inside `<div class="docx-endnote”> for every note`) even if it doesn’t exist. Since XSweet renumbers the notes, it should have a list of the notes anyway.
I'll put this on hold as a validation step that can come after 1.0.0https://gitlab.coko.foundation/XSweet/XSweet/-/issues/105Incorrect fonts in html - coming from w:rFonts attributes?2018-04-24T05:21:50ZAlex ThegIncorrect fonts in html - coming from w:rFonts attributes?Brinton Ch 8 has some incorrect fonts coming through into the html. The following text is all Times in Word:
>Even though ‘ulama’ like Qaradawi assume that images...
However, it comes through in the rinsed html in 3 different fonts:...Brinton Ch 8 has some incorrect fonts coming through into the html. The following text is all Times in Word:
>Even though ‘ulama’ like Qaradawi assume that images...
However, it comes through in the rinsed html in 3 different fonts: Times, Menlo Regular, and Helvetica. It looks like it has to do with the `w:rFonts` attributes: `w:cs`, `w:eastAsia`, `w:ascii` and `w:hAnsi`. These specify the font to use for certain character types.
The word "Qaradawi" is extracted as Helvetica:
```xml
<w:r w:rsidRPr="009337E2">
<w:rPr>
<w:rFonts w:eastAsia="Helvetica"/>
</w:rPr>
<w:t>Qaradawi</w:t>
</w:r>
```
And " assume that " is extracted as Menlo Regular:
```xml
<w:r w:rsidRPr="009337E2">
<w:rPr>
<w:rFonts w:eastAsia="Helvetica" w:cs="Menlo Regular"/>
</w:rPr>
<w:t xml:space="preserve"> assume that</w:t>
</w:r>
```
The html doesn't specify different fonts for different character types in the same way. How does XSweet handle these `w:rFonts` attributes? Since this displays in the original Word as all Times, I am guessing that Word doesn't consider any of the characters in these runs to be of the type `w:eastAsia` or `w:cs`, but I'm not sure how it decides what kind of character it's looking at. Do you have a better idea what's going on here?
Here's the full XML:
```xml
<w:p w14:paraId="3E8B35BD" w14:textId="77777777" w:rsidR="00DE7EE7" w:rsidRPr="009337E2" w:rsidRDefault="00DE7EE7" w:rsidP="00DE7EE7">
<w:pPr><w:widowControl w:val="0"/>
<w:tabs><w:tab w:val="left" w:pos="560"/><w:tab w:val="left" w:pos="1120"/><w:tab w:val="left" w:pos="1680"/><w:tab w:val="left" w:pos="2240"/><w:tab w:val="left" w:pos="2800"/><w:tab w:val="left" w:pos="3360"/><w:tab w:val="left" w:pos="3920"/><w:tab w:val="left" w:pos="4480"/><w:tab w:val="left" w:pos="5040"/><w:tab w:val="left" w:pos="5600"/><w:tab w:val="left" w:pos="6160"/><w:tab w:val="left" w:pos="6720"/></w:tabs><w:autoSpaceDE w:val="0"/><w:autoSpaceDN w:val="0"/><w:adjustRightInd w:val="0"/><w:spacing w:line="480" w:lineRule="auto"/>
<w:rPr><w:rFonts w:cs="Times"/></w:rPr>
</w:pPr>
<w:r w:rsidRPr="009337E2">
<w:rPr>
<w:rFonts w:eastAsia="Helvetica" w:cs="Times New Roman"/>
<w:color w:val="000000"/>
<w:szCs w:val="20"/>
</w:rPr>
<w:tab/>
</w:r>
<w:r w:rsidRPr="009337E2">
<w:rPr>
<w:rFonts w:cs="Times"/>
</w:rPr>
<w:t xml:space="preserve">Even though</w:t>
</w:r>
<w:r w:rsidR="00BA3E1D">
<w:rPr>
<w:rFonts w:cs="Times"/>
</w:rPr>
<w:t>‘ulama’</w:t>
</w:r>
<w:r w:rsidRPr="009337E2">
<w:rPr>
<w:rFonts w:cs="Times"/>
</w:rPr>
<w:t xml:space="preserve"> like </w:t>
</w:r>
<w:r w:rsidRPr="009337E2">
<w:rPr>
<w:rFonts w:eastAsia="Helvetica"/>
</w:rPr>
<w:t>Qaradawi</w:t>
</w:r>
<w:r w:rsidRPr="009337E2">
<w:rPr>
<w:rFonts w:eastAsia="Helvetica" w:cs="Menlo Regular"/>
</w:rPr>
<w:t xml:space="preserve"> assume that</w:t>
</w:r>
<w:r w:rsidRPr="009337E2">
<w:rPr>
<w:rFonts w:cs="Times"/>
</w:rPr>
<w:t xml:space="preserve"> images of certain objects </w:t>
</w:r>
```
Here's how it's extracted:
```html
<p>
<span style="font-family: Times New Roman"><tab/></span>
<span style="font-family: Times">Even though </span>
<span style="font-family: Times">‘ulama’</span>
<span style="font-family: Times"> like </span>
<span style="font-family: Helvetica">Qaradawi</span>
<span style="font-family: Menlo Regular"> assume that</span>
<span style="font-family: Times"> images of certain objects </span>
```
And here's the final html
```html
<p><span class="tab"><!-- tab --></span>
<span style="font-family: Times">Even though ‘ulama’ like </span>
<span style="font-family: Helvetica">Qaradawi</span>
<span style="font-family: Menlo Regular"> assume that</span>
<span style="font-family: Times"> images of certain objects
```1.0.0https://gitlab.coko.foundation/XSweet/XSweet/-/issues/98"Chapter One - " text dropped from extraction2018-03-15T20:05:00ZAlex Theg"Chapter One - " text dropped from extractionBoyles Ch 1 [e._Boyles_Chapter_1.docx](/uploads/86a58da52d6b60a5568a284f3225f592/e._Boyles_Chapter_1.docx)
[e.BoylesChapter1-1EXTRACTED.html](/uploads/dee5c6517f5f46d846e33d0922fa20cb/e.BoylesChapter1-1EXTRACTED.html)
[e.BoylesChapter1...Boyles Ch 1 [e._Boyles_Chapter_1.docx](/uploads/86a58da52d6b60a5568a284f3225f592/e._Boyles_Chapter_1.docx)
[e.BoylesChapter1-1EXTRACTED.html](/uploads/dee5c6517f5f46d846e33d0922fa20cb/e.BoylesChapter1-1EXTRACTED.html)
[e.BoylesChapter1-9RINSED.html](/uploads/da01d16780e6f8cc2dd7e8b02877db1c/e.BoylesChapter1-9RINSED.html)
The top level header reads "Chapter One - Introduction" in Word. But the text "Chapter One - " is not coming through from Word into html. It seems to have been implemented as some kind of list in Word, but I can't find anything evidence of it in the initial extraction.
![Screen_Shot_2017-04-21_at_1.42.36_AM](/uploads/55402036b54e43e7429bb576ce93278c/Screen_Shot_2017-04-21_at_1.42.36_AM.png)
Is there an easy fix?Wendell PiezWendell Piezhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/92Header promotion example for Alex to check2019-07-07T22:40:39ZAlex ThegHeader promotion example for Alex to checkIn Gilbert, fwd: [a03_fwd.docx](/uploads/69b43e22a61426ec09e83037cf875c35/a03_fwd.docx)
Why is "Holly Near" not labeled a header, but "12/21/2014" is?
For Alex to check after next iteration of header promotionIn Gilbert, fwd: [a03_fwd.docx](/uploads/69b43e22a61426ec09e83037cf875c35/a03_fwd.docx)
Why is "Holly Near" not labeled a header, but "12/21/2014" is?
For Alex to check after next iteration of header promotion1.0.0Alex ThegAlex Theghttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/87Don't distinguish between otherwise equivalent style signatures on "text-alig...2019-07-07T22:39:46ZAlex ThegDon't distinguish between otherwise equivalent style signatures on "text-align: left" aloneGarcia bibliography: [e_BibliographyGGv6.docx](/uploads/c994d731e262919ab6c4a84d8df7ec66/e_BibliographyGGv6.docx)
Alex, check this for entries and lines that are promoted to headers. Hopefully considering the display text differentl...Garcia bibliography: [e_BibliographyGGv6.docx](/uploads/c994d731e262919ab6c4a84d8df7ec66/e_BibliographyGGv6.docx)
Alex, check this for entries and lines that are promoted to headers. Hopefully considering the display text differently (#81) will stop some of the spurious promotions.1.0.0Wendell PiezWendell Piezhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/83Improve header level guesses by considering order of appearance2018-03-14T20:27:18ZAlex ThegImprove header level guesses by considering order of appearanceAs mentioned on the 4/15/17 comment on ticket #81:
How can header promotion consider the order of appearance of the headers to improve the heading level it guesses? How is it currently deciding which level to promote things to? Does...As mentioned on the 4/15/17 comment on ticket #81:
How can header promotion consider the order of appearance of the headers to improve the heading level it guesses? How is it currently deciding which level to promote things to? Does it consider the resulting structure of the document?
We'll probably want to implement some checks for this. My first thought is that once header promotion has identified all the different heading levels (say there are 3 different formats, so it knows for sure there should be 3 different heading levels), it could then order them according to whatever looks like it produces the most credible heading structure.
As an example, let's say it finds 3 levels of headers and initially promotes them as follows:
h2
h1
h1
h3
h1
h3
h3
h1
A final step would realize that changing the order around to get this structure makes better sense:
h1
h2
h2
h3
h2
h3
h3
h2
and make the change.
We'd need to give it a few rules it can use to score the structures and choose the best one. A good start could be to say that *generally* (not rigidly) 1) lower level headers should be nested under higher level ones, and 2) sequential heading levels should either stay the same or increment or decrement by 1 level.
For improving the accuracy of heading levels, I don't think the formatting itself will ever tell us much about what level a group of headings should be. That's because authors use formatting in so many different and entirely inconsistent ways. Considered in a vacuum, there's no reasonable way to say something like "bolding denotes a higher level heading than underlining, which denotes a higher level than italics". I think the best way to improve header level inferring will be by looking at their order of appearance and what that might say about the structure.
What do you think?1.0.0Wendell PiezWendell Piezhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/78Buchbinder Ch 2 headings don't get promoted2018-03-14T20:41:17ZAlex ThegBuchbinder Ch 2 headings don't get promotedCompare Chapters 1 and 2 of Buchbinder
[c_Buchbinder_Chap_1.docx](/uploads/bca82798d963a32c026fecfcd868aa76/c_Buchbinder_Chap_1.docx)
[output_Buchbinder_Chap_1.zip](/uploads/c4d1c813323a451c901734e12f6649af/output_Buchbinder_Chap_1.zip)
...Compare Chapters 1 and 2 of Buchbinder
[c_Buchbinder_Chap_1.docx](/uploads/bca82798d963a32c026fecfcd868aa76/c_Buchbinder_Chap_1.docx)
[output_Buchbinder_Chap_1.zip](/uploads/c4d1c813323a451c901734e12f6649af/output_Buchbinder_Chap_1.zip)
[c_Buchbinder_Chap_2.docx](/uploads/623a4ca2c779cd0571cac9bd4b0507fa/c_Buchbinder_Chap_2.docx)
[output_Buchbinder_Chap_2.zip](/uploads/e0e37f742260d239b6e89a7129733a90/output_Buchbinder_Chap_2.zip)
Header promotion works well in Ch 1, but misses all the headings beyond the very top level in Ch 2. In Ch 1, nothing that gets promoted isn't a header.
In Ch 2, the Chapter title and subtitle get caught as h1s, but none of the other headings get promoted. There are 25 h2 promotions, but they are all either entirely empty, filled only with tabs, or a visual divider made of asterisks.
Any ideas as to the differences between chapters 1 and 2 that cause 1 to work well but 2 to be less accurate? It doesn't appear to be Word styles.1.0.0Wendell PiezWendell Piezhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/77Chapter headings come through in blue2017-08-16T19:56:39ZAlex ThegChapter headings come through in blueIn Buchbinder's book, all of the chapters come through into the HTML with blue coloring. It seems to be caused by a `style="color: auto"` attribute. The headings display as black in Word, but light blue in browsers (Chrome and Firefox)...In Buchbinder's book, all of the chapters come through into the HTML with blue coloring. It seems to be caused by a `style="color: auto"` attribute. The headings display as black in Word, but light blue in browsers (Chrome and Firefox)
Here are examples of what's causing it:
```html
<h1 class="Subtitle" style="color: auto; font-family: Garamond; font-size: 14pt; font-weight: bold; margin-bottom: 0pt">Introduction</h1>
```
```html
<h1 class="Subtitle" style="color: auto; font-family: Garamond; font-size: 14pt; font-weight: bold; margin-bottom: 0pt">Chapter One </h1>
<h1 class="Subtitle" style="color: auto; font-family: Garamond; font-size: 14pt; font-weight: bold; margin-bottom: 0pt">The Bottom of the Funnel </h1>
```
```html
<h1 class="Subtitle" style="color: auto; font-family: Garamond; font-size: 14pt; font-weight: bold; margin-bottom: 0pt">Chapter Three </h1>
<h1 class="Subtitle" style="color: auto; font-family: Garamond; font-size: 14pt; font-weight: bold; margin-bottom: 0pt">Sticky Brains </h1>
```
```html
<h1 class="Subtitle" style="color: auto; font-family: Garamond; font-size: 14pt; font-weight: bold; margin-bottom: 0pt">Chapter Four </h1>
<h1 class="Subtitle" style="color: auto; font-family: Garamond; font-size: 14pt; font-weight: bold; margin-bottom: 0pt">Treating the Family</h1>
```
```html
<h1 class="Subtitle" style="color: auto; font-family: Garamond; font-size: 14pt; font-weight: bold; margin-bottom: 0pt">Chapter Five </h1>
<h1 class="Subtitle" style="color: auto; font-family: Garamond; font-size: 14pt; font-weight: bold; margin-bottom: 0pt">Locating Pain in Societal Stress</h1>
```
I'll mark this as low priority, since this is an html-only improvement. The color gets scrubbed out by typescript.Wendell PiezWendell Piezhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/76Brinton Ch 2: an incorrect h4 promotion2019-07-07T22:22:05ZAlex ThegBrinton Ch 2: an incorrect h4 promotionBrinton Ch 2: [b02_Brinton_Chapter_2.docx](/uploads/6fbc8fffc4a354e2f90e8c6cbdfae434/b02_Brinton_Chapter_2.docx)
[output_brinton_ch_2.zip](/uploads/fe7a849192953b6e043e6dc1681709b8/output_brinton_ch_2.zip)
This bit gets promoted to an ...Brinton Ch 2: [b02_Brinton_Chapter_2.docx](/uploads/6fbc8fffc4a354e2f90e8c6cbdfae434/b02_Brinton_Chapter_2.docx)
[output_brinton_ch_2.zip](/uploads/fe7a849192953b6e043e6dc1681709b8/output_brinton_ch_2.zip)
This bit gets promoted to an h4 but it's not:
`<h4 class="FootnoteText">According to an interview done with Sha‘rawi:</h4>`
Can you tell why? It's listed as FootnoteText, not sure if that contributes to it or not.
There is also another empty tag that gets promoted to an h4, but since it's empty it gets removed in typescript (there's a request in to move the empty header removal into header promotion itself - #55).1.0.0Wendell PiezWendell Piezhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/72Catch "Subtitle" styles2018-03-15T17:27:31ZAlex ThegCatch "Subtitle" stylesSee #71 for Boyles files
The chapter's subtitle ("A Startling Turn of Events") doesn't get promoted to a heading. There are 2 issues here:
1. It is labeled as a Word style "Subtitle." This should be one of the styles that XSweet pay...See #71 for Boyles files
The chapter's subtitle ("A Startling Turn of Events") doesn't get promoted to a heading. There are 2 issues here:
1. It is labeled as a Word style "Subtitle." This should be one of the styles that XSweet pays attention to for consideration as a header. In this entire book, the author's used styles h1-4 to mark the different heading levels correctly. I think the conversion pipeline is noting this, as all the properly labeled headers are coming through accurately, so it would be good to stir class="Subtitle" into the mix too.
2. (more of a note to self) Even if the author hadn't styled this, it's short and centered, so should be caught by the "if it's centered and short it's a header" rule, once that is implemented.Wendell PiezWendell Piezhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/68Note on end of header stops promotion2019-07-07T22:32:26ZAlex ThegNote on end of header stops promotionSee #67 for files, Bowen Ch 1
The heading "The War on Terroir" isn't getting promoted to an h1, like all the other headings in the chapter (which, apart from the very 1st chapter title, are all the same level). This is because there'...See #67 for files, Bowen Ch 1
The heading "The War on Terroir" isn't getting promoted to an h1, like all the other headings in the chapter (which, apart from the very 1st chapter title, are all the same level). This is because there's an note at the end of the heading:
![Screen_Shot_2017-04-09_at_4.58.12_AM](/uploads/a421fc609d8d6dc2361af6eabb229839/Screen_Shot_2017-04-09_at_4.58.12_AM.png)
Removing the note and re-running the conversion fixes the issue. Is there an easy fix for this, so a note at the end of the header doesn't stop promotion?1.0.0Wendell PiezWendell Piezhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/67Check that headers promoted to the same level are internally consistent2018-03-15T17:44:14ZAlex ThegCheck that headers promoted to the same level are internally consistentb Bowen Chapter 1: [b_Bowen_Chapter_1.docx](/uploads/1d451dda0fc0a1b5a2c5afc8f93eeb60/b_Bowen_Chapter_1.docx)
Output: [output_b_Bowen_Chapter_1.zip](/uploads/fd1d441a66ac8f328a7cf3922661a6a0/output_b_Bowen_Chapter_1.zip)
Overall this ...b Bowen Chapter 1: [b_Bowen_Chapter_1.docx](/uploads/1d451dda0fc0a1b5a2c5afc8f93eeb60/b_Bowen_Chapter_1.docx)
Output: [output_b_Bowen_Chapter_1.zip](/uploads/fd1d441a66ac8f328a7cf3922661a6a0/output_b_Bowen_Chapter_1.zip)
Overall this chapter comes through quite nicely, and it offers an interesting case for header promotion.
The only headers this has are 1) a chapter title, and 2) 11 same-level headings in the body of the text.
1. The chapter title is bold and centered
2. The other headings are bold but not centered
Beyond that, the text is the same as the rest of the content. Word styles don't seem to factor into this one. So, here we have two levels of headers that are different from each other only in that one is centered and one is not. I think this is an important clue that these are different heading levels.
I've just proposed a step to check whether different heading levels are formatted the same, and combine them if they are (#64). The other side of that coin would be a step that ensure that all the headings promoted to the same level are *internally* consistent on some key criteria, and I think text alignment is one of those points.
What do you think?1.0.0Wendell PiezWendell Piezhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/64Promotion step to combine similarly formatted headings to same level2018-03-15T17:46:13ZAlex ThegPromotion step to combine similarly formatted headings to same levelBakker Ch 1:
[b_02_ch_1_Bakker.docx](/uploads/eb8fb4e69594a4462a16f781d74d2379/b_02_ch_1_Bakker.docx)
Bakker Ch 1 conversion:[output_b02_Ch_1_Bakker.zip](/uploads/c96603d3d4db9be2f485c75d80715269/output_b02_Ch_1_Bakker.zip)
See ...Bakker Ch 1:
[b_02_ch_1_Bakker.docx](/uploads/eb8fb4e69594a4462a16f781d74d2379/b_02_ch_1_Bakker.docx)
Bakker Ch 1 conversion:[output_b02_Ch_1_Bakker.zip](/uploads/c96603d3d4db9be2f485c75d80715269/output_b02_Ch_1_Bakker.zip)
See issue #56 for a full description. In this chapter, one of the heading levels, consistently formatted as italic but using different Word styles, comes through marked as 2 different header levels. This suggests there could be a step at the end of header promotion to compare the formatting of each heading level it's identified to the formatting on other heading levels and combine similarly formatted headers to the same level.1.0.0Wendell PiezWendell Piezhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/62Handling whitespace-only formatting2019-07-07T21:34:20ZAlex ThegHandling whitespace-only formattingBakker ch1, see #56 for files
There are 5 headers of the same level, but one of theme doesn't get promoted like the others. Seems to be caused by a `<tab>` at the end of the heading "The heroic migrant and the end of migration".
T...Bakker ch1, see #56 for files
There are 5 headers of the same level, but one of theme doesn't get promoted like the others. Seems to be caused by a `<tab>` at the end of the heading "The heroic migrant and the end of migration".
These all get promoted to h1:
* Keeping the monies flowing the times of crises
* The limits of migrant inclusion
* Migration, state-led transnationalism, and development
* The Washington Consensus and beyond: the continuing significance of market fundamentalism in development policy and practice
This one doesn't:
* The heroic migrant and the end of migration
Here's one that gets promoted, just after join-elements and before the header promotion steps:
````html
<p class="Default" style="font-size: 12pt; font-style: italic; margin-bottom: 6pt"><i>The limits of migrant inclusion</i></p>
````
This is the offending tab (at least, I think it's the tab keeping this from being recognized as a header):
````html
<p class="Default" style="font-size: 12pt; margin-bottom: 6pt"><i>The heroic migrant and the end of migration</i>
<tab/>
</p>
````
Perhaps a cleaning step that strips out trailing tabs before promotion? I can't think where a trailing tab would ever be meaningful.1.0.0https://gitlab.coko.foundation/XSweet/XSweet/-/issues/60Invisible-to-the-eye changes in Word style cause paragraphs to be promoted to...2019-07-07T22:10:58ZAlex ThegInvisible-to-the-eye changes in Word style cause paragraphs to be promoted to headersSee #59 for the word files and outputs for Berry
In Berry, invisible (not-format changing) changes in Word style are causing paragraphs to be promoted to headers.
Most of the main content is styled in Word as: "Normal + (Latin) G...See #59 for the word files and outputs for Berry
In Berry, invisible (not-format changing) changes in Word style are causing paragraphs to be promoted to headers.
Most of the main content is styled in Word as: "Normal + (Latin) Garamond". These examples from Berry chapter 2 show the style changes that cause the erroneous promotion:
* Comment Text: The 3 paragraphs following "It is certainly true"
* Body Text: "Many new threads"
* Body Text 2: "In 1906, amid" is
* Block Text: "The Mountaineers is an association"
And in ch 3, 2 regular paragraphs are styled as headings behind the scenes and thus get promoted.
Although a length filter would solve this (#58), this also relates to the policy about what Word styles do and don't matter, and how to handle them (soon to be documented on the wiki). These sound like styles we'd want to ignore for header promotion. Are these specific Word styles important to header promotion, or is it rather simply that the style has changes from the rest of the content?1.0.0Wendell PiezWendell Piezhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/59Some epigraphs, not others, being promoted to headers2019-07-07T22:35:47ZAlex ThegSome epigraphs, not others, being promoted to headers[berry.zip](/uploads/33c200a740753230a1f3c0f2f8c3b08c/berry.zip)
[output_berry.zip](/uploads/e2f33cf0126f7681c096caef3b2ae834/output_berry.zip)
In Berry, each chapter begins with an epigraph. Sometimes the epigraph is promoted to ...[berry.zip](/uploads/33c200a740753230a1f3c0f2f8c3b08c/berry.zip)
[output_berry.zip](/uploads/e2f33cf0126f7681c096caef3b2ae834/output_berry.zip)
In Berry, each chapter begins with an epigraph. Sometimes the epigraph is promoted to a heading, but not always. The epigraph is promoted in the intro, ch1, ch2, and the conclusion. It is not promoted in ch3 or ch4.
Can you tell what's causing the difference, and how to keep these from being labeled headings? I can't see that it's a Word styles issue. A filter to not tag a header if it's too long (#58) could stop most of these, but fixing whatever else is going on could stop future errors too.1.0.0Wendell PiezWendell Piezhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/52Making @class values HTML-safe2017-08-16T19:19:22ZWendell PiezMaking @class values HTML-safeXSweet produces HTML 'class' attributes with labels reflecting Word Styles ("Paragraph Styles" and "Character Styles") assigned in the source document.
We are already normalizing these names into HTML-safe versions by removing spaces an...XSweet produces HTML 'class' attributes with labels reflecting Word Styles ("Paragraph Styles" and "Character Styles") assigned in the source document.
We are already normalizing these names into HTML-safe versions by removing spaces and other unwanted characters, but at risk of confusion when there are name collisions (in the rare case where a document has both "Header 1" and "Header1" styles, and they are different). We are also not making provision for some peculiarities of HTML @class, such as the fact that they are not case-sensitive.
A separate filter to rewrite style names to safe values would account for all of this properly. It should:
* Remove spaces and unwanted characters
* Cast to lower case
* Relabel and resulting collisions with distinguishing suffixes.
This needs to happen both on @class, and inside enclosed CSS (wherein classes are referenced), for comprehensiveness.Wendell PiezWendell Piezhttps://gitlab.coko.foundation/XSweet/editoria_typescript/-/issues/4Mapping `<code>` and code blocks2018-03-16T17:50:57ZWendell PiezMapping `<code>` and code blocksEditoria accepts both `<code>` (inline) and code blocks in the form:
```<pre><code> { content } </code></pre>```
The problem will be detecting these in HTMLTypescript, where we are more likely to have things like `<span class="someCode...Editoria accepts both `<code>` (inline) and code blocks in the form:
```<pre><code> { content } </code></pre>```
The problem will be detecting these in HTMLTypescript, where we are more likely to have things like `<span class="someCode">` (where the `someCode` style may or may not have a monospace font), or `<span style="font-family: Courier New">` or (for a code block) `<p style="font-family: Courier New; font-size: 14pt">`.
Seeing actual examples will help. Assembling a list of known monospace fonts (on platforms used by authors) could also help - a filter could detect these and act accordingly.Wendell PiezWendell Piez