XSweet issueshttps://gitlab.coko.foundation/groups/XSweet/-/issues2018-03-28T17:00:13Zhttps://gitlab.coko.foundation/XSweet/editoria_typescript/-/issues/25Removing italicized tab at start of paragraph drops one <em> tag2018-03-28T17:00:13ZAlex ThegRemoving italicized tab at start of paragraph drops one <em> tagFor Bakker Chapter 1:
There is an italicized tab that starts one of the paragraphs in this chapter. The tab gets removed in the very last Editoria reduce step, but the opening `<em>` tag also gets dropped, leaving only one self-closing ...For Bakker Chapter 1:
There is an italicized tab that starts one of the paragraphs in this chapter. The tab gets removed in the very last Editoria reduce step, but the opening `<em>` tag also gets dropped, leaving only one self-closing `<em/>`. This tag causes an error in Wax and thus the chapter cannot be opened to reading and editing.
Here's what the HTML looks like after the Editoria notes step:
```html
<p class="Default" style="font-family: Helvetica; font-size: 12pt">
<i>
<span class="tab"><!-- tab --></span>
</i>
This historical documentary material...
```
Then, during the Editoria basic step, the `<i>` tags get converted into `<em>`s:
```html
<p class="Default" style="font-family: Helvetica; font-size: 12pt">
<em>
<span class="tab"><!-- tab --></span>
</em>
This historical documentary material...
```
Finally, in the Editoria reduce step, the tab removed, leaving only a `<em/>` in its place:
```html
<p><em/>This historical documentary material...
```
So, when the Editoria reduce step removes tabs at the beginning of paragraphs, it needs to also remove both formatting tags if there are any.1.0.0https://gitlab.coko.foundation/XSweet/editoria_typescript/-/issues/26Double quotation marks followed by punctuation face the wrong way2018-03-28T22:10:22ZAlex ThegDouble quotation marks followed by punctuation face the wrong wayIn some instances where punctuation is placed outside of the closing double quotation mark, the UCP cleanup step inserts the wrong directional double quotation mark.
Here are a few examples:
* `“Send $300 to Mexico for $15“,`
* `“Send $...In some instances where punctuation is placed outside of the closing double quotation mark, the UCP cleanup step inserts the wrong directional double quotation mark.
Here are a few examples:
* `“Send $300 to Mexico for $15“,`
* `“Send $300 to Mexico for $15“.`
* `“fraud“?`
It is a bit hard to see, but both quotation marks are left double quotation marks, when in fact the last one should be a right double quotation mark. This occurs regardless of whether the original double quotation marks are directional and/or facing the right way. Punctuation outside the quotes is not technically correct but would still be good to catch.
This does _not_ happen with single quotes, which come through correctly.
I don't know what the best search pattern for this is, but it might be something like:
`double quotation mark + punctuation mark (-,.:;?!)` -> `right double quotation mark + punctuation mark`
The only punctuation that I can think of that would ever start a quotation is an ellipsis, which wouldn't be covered by the above rule.1.0.0https://gitlab.coko.foundation/XSweet/editoria_typescript/-/issues/21Add UCP cleanup ingest/output macros to Typescript2021-03-18T16:16:43ZAlex ThegAdd UCP cleanup ingest/output macros to TypescriptUCP runs a series of cleanup macros on book chapters, both when they are ingested before editing, and also again at output to prevent any of the same cleanups from being accidentally reintroduced. Since these cleanups are to be used at t...UCP runs a series of cleanup macros on book chapters, both when they are ingested before editing, and also again at output to prevent any of the same cleanups from being accidentally reintroduced. Since these cleanups are to be used at the beginning and the end of the editing process, and since different presses will have slightly different cleanups, these changes should all be housed in a single XSLT sheet.
Here are several cleanups to implement - there may be more to follow. I am checking this with Erich at UCP, so we will probably add to this list going forward:
- [x] Hyphens between numerals should be converted to en dashes: "2-3" -> "2–3"
- [x] Double spaces should be converted to single spaces, anywhere they're found: <pre>"...touches. However, the..." -> "...touches. However, the..."</pre>
- [x] Spaces around em dashes should be removed (any number of consecutive spaces spaces before or after an em dash)
* "that sentence —as I’ve done" -> "that sentence—as I’ve done"
* "that sentence — as I’ve done" -> "that sentence—as I’ve done"
- [x] Series of periods converted to ellipses
* "..." -> "…"
# Update
I have gotten through a lot of the macro but it is long. Here are some of the remaining cleanups. I will post the rest tomorrow morning. These are in no special order and @wendell you can draw from these and start checking them off as you can - let me know if any of these need clarification. There are plenty more rules left. The most complicated is probably smart quotes. The macro actually does a deletes and replaces all the "s and 's and lets Word's auto-formatting determine their direction when it inserts them in again, which I don't think is an option for us. In any event, more to come!
- [x] Two adjacent hyphens become an em dash: "--" -> "—"
- [x] An en dash surround on both sides by spaces should be converted to an em dash: " – " -> " — "
- [x] Equal signs should be surrounded on either side by one and only one space: " = "
- [x] Replace runs of multiple consecutive spaces with just one space
- [ ] ~~Replace runs of multiple consecutive tabs with just one tab~~ Update: we scrub these out anyway in preparation for Wax, so this is not necessary
- [x] Spaces touching tabs should be removed
- [x] Remove spaces at the very beginning and ends of `p`s
- [x] Remove tabs that end a paragraph (not ones that start)
- [x] Delete empty paragraphs (I believe we are already doing this)
## Quotation marks
All straight, non-directional single and double quotes should be converted into "smart" directional quotes, depending on context. Since the original macro uses Word's auto-formatting, we'll have to make the rules for determining which direction they should point.
Straight quotation marks:
* u0022: quotation mark
* u0027: apostrophe
Should all be replaced by one of the following:
* u2018: left single quotation mark
* u2019: right single quotation mark
* u201c: left double quotation mark
* u201d: right double quotation mark
### Replacement rules from macro:
- [x] ' -> right or left single quotation mark (u2018 or u2019)
- [x] '' -> right or left double quotation mark (u201c or u201d)
- [x] ` -> right or left single quotation mark (u2018 or u2019)
- [x] `` -> right or left double quotation mark (u201c or u201d)
- [x] em dash+right double quote (u2014+u201d) -> em dash+left double quote (u2014+u201c)
- [x] left double quote+em dash (u201c+u2014)-> right double quote+em dash (u201d+u2014)
The following 3 search pattern should look for a straight single quote or a left single quote and replace with a right single quote
- [x] " 'em" or " ‘em" (space+u0027+"em" or space+u2019+"em") -> " ’em" (space+u2019+"em")
- [x] "'n'" or "'n'" (u0027+"n"+u0027 or u2018+"n"+u2018) -> "’n’" (u2019+"n"+u2019)"
- [x] " 'tis" (space+u0027+"tis" or space+u2018+"tis") -> " ’tis" (space+u2019+"tis")
Then:
- [ ] ~~Insert hair space (u200a) btwn pairs of single/double quotes. Note that order of operations matters; this assumes that straight quotes and apostrophes have been replaced with their directional counterparts.~~ update: tracking in #28
* left single quote+left double quote (u2018+u201c)
* left double quote+left single quote (u201c+u2018)
* right single quote+right double quote (u2019+u201d)
* right double quote+right single quote (u201d+u2019)
### Directional rules
Here are my proposed rules for direction. They would have to be executed before all of the above rules from the macro:
- [x] First, replace all 4 directional quotation marks with their non-directional counterparts:
* u2018 and u2019 -> u0027
* u201c and u201d -> u0022
* also \` and `` to their respective u0027 and u0022
Then:
- [x] apostrophe+alphabetical character (u0027+letter) -> left single quotation mark (u2018+letter)
- [x] alphabetical character+apostrophe (letter+u0027( -> alphabetical character+right single quotation mark (letter+u2019)
- [x] quotation mark+alphabetical character (u0022+letter) -> left double quotation mark+alphabetical character (u201c+letter)
- [x] alphabetical character+quotation mark (letter+u0022) -> alphabetical character+right double quotation mark (letter+u201d)
In any case, these will probably need some refinement but double check me and let me know what you think!
### Formatting
- [x] Convert underlining to italics
- [ ] ~~Convert bold to italics~~ update: tracking in #29
We currently convert literal `<u>` tags into `<i>s` in the “Editoria basic” step. But, that can sometimes get scrubbed out in the “Editoria reduce” step. We should also catch underlining, italics, and bold when it’s specified in the css style, which we’re not currently doing. Wax looks for an `<em>` tag for italics. So, the following should all be converted into text wrapped in `<em>`:
* `<i>`
* `<b>`
* `<u>`
* `<p style=“font-weight: bold”>`
* `<p style=“font-style: italic”>`
* `<p style=“text-decoration: underline”>`
Once this is implemented, we should also update the “Push mappings” to reflect this.
- [ ] ~~Force punctuation to match formatting of preceding word~~ tracking in #27
Since we're porting into Wax, we don't need to worry about fonts/font size. The only thing I can think to catch is formatting (italics, bold, underline). And, since all of these should get flattened to `<em>`s, I think this could be as simple as ensuring that if the preceding word is `<em>`, the trailing punctuation is as well. These are the punctuation marks that this rule should apply to:
* ,
* .
* :
* ;
* ?
* !
### Rules already implemented
The following cleanups don't require any additional coding, since XSweet is handling these as it should already:
* Remove page breaks and section breaks
* Page breaks are extracted as `<br class=“br”>`, and the pipeline replaces these by breaking paragraphs on `br`s
* Section breaks are dropped, since we’re not explicitly catching them
* Remove any comments: already happens, since wed don’t handle them
* Delete headers and footers: we’re already dropping these
* Remove soft hyphens: these do not come through into the html.1.0.0https://gitlab.coko.foundation/XSweet/XSweet/-/issues/46Font sizing issue in scrub step2021-05-07T04:30:38ZAlex ThegFont sizing issue in scrub stepSee b_02_ch_1_Bakker.docx.
All the text in the Word doc seems to be 12pt Helvetica. Before the scrub step, this seems to all comes through properly. After the scrub, though, a big portion of the document, from "Migration, remittance...See b_02_ch_1_Bakker.docx.
All the text in the Word doc seems to be 12pt Helvetica. Before the scrub step, this seems to all comes through properly. After the scrub, though, a big portion of the document, from "Migration, remittances, and development: three vignettes" through to the "Introduction" heading is in 11pt font.
The font changes for the endnotes, but I suspect that's an entirely separate issue and because they're endnotes.https://gitlab.coko.foundation/XSweet/editoria_typescript/-/issues/22Insert directional single quote before year abbreviations2018-02-27T23:09:53ZAlex ThegInsert directional single quote before year abbreviationsThis issue relates to the UCP cleanup macro (#21).
A straight apostrophe (u0027) or a left-side single quote (u2018) followed by a numeral (0-9) should be replaced with a right single quote (u2019)
This will ensure that abbreviated yea...This issue relates to the UCP cleanup macro (#21).
A straight apostrophe (u0027) or a left-side single quote (u2018) followed by a numeral (0-9) should be replaced with a right single quote (u2019)
This will ensure that abbreviated years get the right quotation mark.
'97 -> ’97
‘97 -> ’97https://gitlab.coko.foundation/XSweet/XSweet/-/issues/41{ths} tags throughout Green2017-04-03T21:07:12ZAlex Theg{ths} tags throughout GreenAny ideas what these {ths} tags might be? They're sprinkled thoughout the book, so if they don't add anything let's get clean them out.
See chapter 1 as an example. The tags render in the HMTL like this: "Islam in Sistan. C.{ths}E. B...Any ideas what these {ths} tags might be? They're sprinkled thoughout the book, so if they don't add anything let's get clean them out.
See chapter 1 as an example. The tags render in the HMTL like this: "Islam in Sistan. C.{ths}E. Bosworth’s" with the "{ths}" in purple.
Here's the underlying HTML (this is all inside an enclosing p tag):
```html
...Islam in Sistan. C.<span style="color: #800080; font-family: GentiumAlt">{ths}</span>E. Bosworth’s
```
Original XML:
```xml
<w:r w:rsidR="005266F1">
<w:rPr>
<w:rFonts w:ascii="GentiumAlt" w:hAnsi="GentiumAlt"/>
<w:color w:val="800080"/>
</w:rPr>
<w:t>{ths}</w:t>
</w:r>
```https://gitlab.coko.foundation/XSweet/editoria_typescript/-/issues/23Add nonbreaking spaces between initials2018-02-27T23:10:10ZAlex ThegAdd nonbreaking spaces between initialsThis issue relates to the UCP cleanup macro (#21).
Spaces between initials should be replaced with non-breaking spaces.
The search should look for:
* uppercase letter (A-Z) + period + space + uppercase letter (A-Z) + period
and repla...This issue relates to the UCP cleanup macro (#21).
Spaces between initials should be replaced with non-breaking spaces.
The search should look for:
* uppercase letter (A-Z) + period + space + uppercase letter (A-Z) + period
and replace it with:
* uppercase letter (A-Z) + period + nbsp (00A0) + uppercase letter (A-Z) + period
As a bonus, it would be nice if we could detect initials without spaces between the letters. So, before the above replacement would take place, we should look for the following pattern:
* space +
* any number of repetitions of a capital letter (A-Z) + a period + capital letter + period, etc.
* + a space
and add nbsps between the letters.
THEN, catch the following common abbreviations that would have been erroneously sucked up by the above rule. All of these should _not_ have spaces between the letters:
* U.S.
* D.C.
* A.M.
* P.M.
* A.D.
* B.C.
* B.C.E.
* A.C.E.https://gitlab.coko.foundation/XSweet/XSweet/-/issues/40Dropped spaces due to long strings of repeated tags2017-04-03T21:07:12ZAlex ThegDropped spaces due to long strings of repeated tagsThe book by Green has 4 parts, each with an introductory section. Part 1 come through rinsed really nicely and very clean, but parts 2, 3, and 4 drop almost all the spaces.
For whatever reason, large portions of this book are being e...The book by Green has 4 parts, each with an introductory section. Part 1 come through rinsed really nicely and very clean, but parts 2, 3, and 4 drop almost all the spaces.
For whatever reason, large portions of this book are being extracted with each word wrapped in its own tag. A big part of Part 1 gets extracted as long strings of `<iCs>` tags, with empty iCs tags for spaces:
```html
<iCs>remained</iCs>
<iCs></iCs>
<iCs>the</iCs>
<iCs></iCs>
<iCs>majority</iCs>
<iCs></iCs>
<iCs>population</iCs>
<iCs></iCs>
<iCs>of</iCs>
<iCs></iCs>
<iCs>many</iCs>
```
These all gets collapsed into a p tag with spaces preserved and the HTML looks great.
Parts 2, 3, and 4, though, have strings of spans instead:
```html
<span style="font-size: 12pt">Safavids,</span>
<span style="font-size: 12pt"></span>
<span style="font-size: 12pt">and</span>
<span style="font-size: 12pt"></span>
<span style="font-size: 12pt">Uzbeks—seized</span>
<span style="font-size: 12pt"></span>
<span style="font-size: 12pt">control</span>
<span style="font-size: 12pt"></span>
````
When this happens, the content ends up in one long p tag with no spaces.
The introduction has a long string of `<lang>` tags, similar to the iCs tags above. They don’t cause dropped spaces.
This is related to #35 but may be caused by a different underlying issue.Alex ThegAlex Theghttps://gitlab.coko.foundation/XSweet/editoria_typescript/-/issues/24Eliminate spaces before punctuation2018-04-06T22:22:06ZAlex ThegEliminate spaces before punctuationThis issue relates to the UCP cleanup macro (#21).
Any number of spaces before any of the following punctuation marks should be removed:
* ,
* ;
* :
* !
* ?
* )
* ]
* }
E.g. "Here's my sentence ." -> "Here's my sentence."This issue relates to the UCP cleanup macro (#21).
Any number of spaces before any of the following punctuation marks should be removed:
* ,
* ;
* :
* !
* ?
* )
* ]
* }
E.g. "Here's my sentence ." -> "Here's my sentence."https://gitlab.coko.foundation/XSweet/XSweet/-/issues/70Contradictory styles make a quote display as bold in the browser2018-03-28T21:54:45ZAlex ThegContradictory styles make a quote display as bold in the browserBowen, ch 6: [b_Bowen_Chapter_6.docx](/uploads/cf42db0827d247ab201bae29d39cf463/b_Bowen_Chapter_6.docx)
[output_bowen_ch_6.zip](/uploads/b2dbbd8c07fb2a8b29f2e2f2d88b4bd4/output_bowen_ch_6.zip)
Low priority issue; let's come back to it ...Bowen, ch 6: [b_Bowen_Chapter_6.docx](/uploads/cf42db0827d247ab201bae29d39cf463/b_Bowen_Chapter_6.docx)
[output_bowen_ch_6.zip](/uploads/b2dbbd8c07fb2a8b29f2e2f2d88b4bd4/output_bowen_ch_6.zip)
Low priority issue; let's come back to it later:
There's one small inline quotation that displays as bold in the browser ("a blatant example..."). This is surely because it was copy/pasted into Word from somewhere else; while the surrounding text is style "Normal," this quote is style "Strong + Not Bold" (lol). As a result, the quote is bolded when viewed in the browser.
This is low priority, because
1. once this goes through Typescript, the aberrant styling gets removed as one that's not important, so it would only help improve the html
2. the solution seems complicated for very small payoff
So let's keep this on the backburner for now.1.0.0Wendell PiezWendell Piezhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/38Contain table info in the HTML2017-04-03T21:07:12ZAlex ThegContain table info in the HTMLIn Best, Ch. 1, there is a very simple table example named "MILESTONES IN STUDENT LOANS". In the browser the HTML table looks like this:
![Screen_Shot_2016-10-18_at_4.07.59_PM](/uploads/6cb2e4b742f59e318076b595aaeec97d/Screen_Shot_2016...In Best, Ch. 1, there is a very simple table example named "MILESTONES IN STUDENT LOANS". In the browser the HTML table looks like this:
![Screen_Shot_2016-10-18_at_4.07.59_PM](/uploads/6cb2e4b742f59e318076b595aaeec97d/Screen_Shot_2016-10-18_at_4.07.59_PM.png)
Here's the HTML
```html
<p style="font-family: Arial">center0
<p style="font-weight: bold">MILESTONES IN STUDENT LOANS</p>
<p>1971 -- $1,000,000,000 in new student loans</p>020000
<p style="font-weight: bold">MILESTONES IN STUDENT LOANS</p>
<p>1971 -- $1,000,000,000 innew student loans</p>
</p>
```
The "center0" and "020000" look like they're attributes of the table ("center0" comes from a noProof tag - see #6).
Tables are complex enough we need to discuss them separately, but it would be great if we could keep the table attributes from showing up in the HTML. If someone was to recreate the table in HTML manually, we'd want them to be able to drop it into place. Maybe we could try to replace all the table info with a placeholder tag as a first step?Alex ThegAlex Theghttps://gitlab.coko.foundation/XSweet/editoria_typescript/-/issues/27Force punctuation to match formatting of preceding word2018-04-02T18:29:55ZAlex ThegForce punctuation to match formatting of preceding wordAs part of the macro cleanups, we should force punctuation to match formatting of preceding word. Let's do that for the following:
* ,
* .
* :
* ;
* ?
* !
Current example:
```xml
<w:p w14:paraId="369F0F7E" w14:textId="2A1BB479" w:rsidR=...As part of the macro cleanups, we should force punctuation to match formatting of preceding word. Let's do that for the following:
* ,
* .
* :
* ;
* ?
* !
Current example:
```xml
<w:p w14:paraId="369F0F7E" w14:textId="2A1BB479" w:rsidR="00733D7F" w:rsidRDefault="00733D7F">
<w:pPr>
<w:rPr><w:rFonts w:ascii="Helvetica" w:eastAsia="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/></w:rPr>
</w:pPr>
<w:r>
<w:rPr><w:rFonts w:ascii="Helvetica" w:eastAsia="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/><w:b/></w:rPr>
<w:t>this is all bold except for the period</w:t>
</w:r>
<w:r>
<w:rPr><w:rFonts w:ascii="Helvetica" w:eastAsia="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/></w:rPr>
<w:t>.</w:t>
</w:r>
</w:p>
```
...results in...
```html
<p style="font-family: Helvetica"><b>this is all bold except for the period</b>.</p>
```1.0.0https://gitlab.coko.foundation/XSweet/XSweet/-/issues/34Superscripts2017-04-03T21:07:12ZAlex ThegSuperscriptsThe Word doc for chapter 1 of the Berry book - "b01_Chapter1" - shows the "th" part of "13th and 18th" as superscripts in the 1st paragraph below the heading.
After the initial extraction, the "th"s are wrapped inside `<vertalign>` t...The Word doc for chapter 1 of the Berry book - "b01_Chapter1" - shows the "th" part of "13th and 18th" as superscripts in the 1st paragraph below the heading.
After the initial extraction, the "th"s are wrapped inside `<vertalign>` tags. The scrub step changes that to a span, and the join elements step wraps that into the surrounding p tag. So, the vertaligns disappear and the superscripts do not come through into the HTML.
It looks like Word uses the vertalign for superscripts and probably subscripts too. Can we catch this and carry it over into the HTML? I wonder if there are other ways Word implements sub and supercripts.Wendell PiezWendell Piezhttps://gitlab.coko.foundation/XSweet/editoria_typescript/-/issues/28Insert hair space (u200a) btwn pairs of single/double quotes2018-04-02T18:30:14ZAlex ThegInsert hair space (u200a) btwn pairs of single/double quotesAs part of the macro cleanups, we should insert hair space (u200a) btwn pairs of single/double quotes. Note that order of operations matters; this assumes that straight quotes and apostrophes have been replaced with their directional cou...As part of the macro cleanups, we should insert hair space (u200a) btwn pairs of single/double quotes. Note that order of operations matters; this assumes that straight quotes and apostrophes have been replaced with their directional counterparts.
* left single quote+left double quote (u2018+u201c)
* left double quote+left single quote (u201c+u2018)
* right single quote+right double quote (u2019+u201d)
* right double quote+right single quote (u201d+u2019)
This currently partially works. See the following example inputs and outputs: the characters in Word on the left and the final Typescript output on the right.
* `"'quote'"` -> `<p style="font-family: Helvetica">“ ‘quote’ ”</p>`
* works properly; hs between both pairs of quotes
* `'"quote"'` -> `<p style="font-family: Helvetica">‘“quote” ’</p>`
* hs between the 2nd quotes but not the 1st
* `'”quote"‘` -> `<p style="font-family: Helvetica">‘“quote” ’</p>
* hs between the 2nd quotes but not the 1st
* `‘"quote"’` -> `<p style="font-family: Helvetica">‘“quote” ’</p>`
* hs between the 2nd quotes but not the 1st
* `""quote""` -> `<p style="font-family: Helvetica">“ “quote””</p>`
* hs between the 1st quotes but not the 2nd1.0.0https://gitlab.coko.foundation/XSweet/XSweet/-/issues/30Capture horizontal alignment2017-04-03T21:07:12ZAlex ThegCapture horizontal alignmentExtraction should capture the horizontal alignment of the text in the Word doc.
This will be one of the most important styling elements to translate into the HTML, since it's so closely tied into headings.Extraction should capture the horizontal alignment of the text in the Word doc.
This will be one of the most important styling elements to translate into the HTML, since it's so closely tied into headings.Wendell PiezWendell Piezhttps://gitlab.coko.foundation/XSweet/editoria_typescript/-/issues/29Typescript macro cleanup should convert bold to italic2018-04-02T16:26:24ZAlex ThegTypescript macro cleanup should convert bold to italicCurrently, `editoria-basic.xsl` converts `<b>` to `<strong>`, which is perfect, especially since people may want to port into Wax without using the UCP macro cleanup step.
Before that, though, `editoria-tune.xsl` should convert `<b>` to...Currently, `editoria-basic.xsl` converts `<b>` to `<strong>`, which is perfect, especially since people may want to port into Wax without using the UCP macro cleanup step.
Before that, though, `editoria-tune.xsl` should convert `<b>` to `<i>`, in exactly the same manner that it's converting `<u>` to `<i>`. It currently passes `<b>` tags through unaltered.1.0.0https://gitlab.coko.foundation/XSweet/XSweet/-/issues/29Incorrect underlining and italics2017-04-03T21:07:12ZAlex ThegIncorrect underlining and italicsMuch of the Bakker book comes through in HTML with underlining that's not present in the original docx file.
One example: "b_01_Part_I_Bakker.docx." All the text comes through as `<u>`s.
I'll add some more examples to this with t...Much of the Bakker book comes through in HTML with underlining that's not present in the original docx file.
One example: "b_01_Part_I_Bakker.docx." All the text comes through as `<u>`s.
I'll add some more examples to this with the next few chapters.Wendell PiezWendell Piezhttps://gitlab.coko.foundation/XSweet/editoria_typescript/-/issues/30Remove both open and close formatting tags when removing tabs2018-04-21T02:54:24ZAlex ThegRemove both open and close formatting tags when removing tabsIn the last typescript xslt (reduce), tabs are removed.
When there's formatting wrapping a single tab span, the opening formatting tag is removed with the tab, but not the closing one. This happens at least with `<em>` tags, but I would...In the last typescript xslt (reduce), tabs are removed.
When there's formatting wrapping a single tab span, the opening formatting tag is removed with the tab, but not the closing one. This happens at least with `<em>` tags, but I wouldn't be surprised if it happens with other tabs.
Here's an example:
Output of editoria basic step:
```html
<p style="font-family: Times New Roman; margin-bottom: 12pt">
<em><span class="tab"><!-- tab --></span></em>Hermeneutical Innovations in Advaita Vedānta Intellectual History.” Unpublished
</p>
```
Then after editoria reduce:
```html
<p>
<em/>Hermeneutical Innovations in Advaita Vedānta Intellectual History.” Unpublished
</p>
```
This mismatched closing tag doesn't stop the page from showing properly in the browser, but it does prevent it from loading at all in Wax.https://gitlab.coko.foundation/XSweet/XSweet/-/issues/18Consider adding another bit of "scrub" logic to scrub.xsl2017-04-03T21:07:12ZWendell PiezConsider adding another bit of "scrub" logic to scrub.xslOftentimes HTML results from .docx show how formatting was controlled at the inline level not paragraph level, so we get things like:
```
<p>
<span style="font-size: 18">A: Nobody can beat me! I am the best showman in the whole hi...Oftentimes HTML results from .docx show how formatting was controlled at the inline level not paragraph level, so we get things like:
```
<p>
<span style="font-size: 18">A: Nobody can beat me! I am the best showman in the whole history of man. </span>
</p>
```
We might consider removing the `span` and promoting its properties to the `p`.
Don't do this when there's a `@class` collision; also think through `@style`.
Wendell PiezWendell Piezhttps://gitlab.coko.foundation/XSweet/editoria_typescript/-/issues/31Alex: confirm superscripts working properly after updating `master` to `staging`2018-04-21T02:49:02ZAlex ThegAlex: confirm superscripts working properly after updating `master` to `staging`Testing for Alex:
* current INK recipe drops `<sup>` tags in typescript reduce step
* `staging` works correctly
Confirm resolved after updating live INK recipeTesting for Alex:
* current INK recipe drops `<sup>` tags in typescript reduce step
* `staging` works correctly
Confirm resolved after updating live INK recipe