HTMLevator issueshttps://gitlab.coko.foundation/XSweet/HTMLevator/-/issues2018-05-03T22:42:18Zhttps://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/6Some closing tags dropped and blocks of text repeated in the macro text clean...2018-05-03T22:42:18ZAlex ThegSome closing tags dropped and blocks of text repeated in the macro text cleanup stepPicking up an issue from an existing ticket (the seastar part of https://gitlab.coko.foundation/XSweet/HTMLevator/issues/3). Upon further review, it does not appear to have anything to do with the "force punctuation formatting to match p...Picking up an issue from an existing ticket (the seastar part of https://gitlab.coko.foundation/XSweet/HTMLevator/issues/3). Upon further review, it does not appear to have anything to do with the "force punctuation formatting to match preceding word formatting" rule.
This is a widespread issue, and I will post examples below as I encounter them.
In some instances (maybe all, will investigate), the close format tag is dropped, then a certain amount of text gets duplicated until another open format tag, so some text then appears twice. In one of the repeated chunks, sentences are separated by 2 spaces, while in the other chunk, the double spaces have been replaced by single spaces.
I can also confirm that this issue happens in exactly the same way for inline bold, underline, and italic tags.
# Example 1
Rinsed html:
```html
<p class="Default" style="font-family: Helvetica; font-size: 12pt; margin-bottom: 6pt">
One of the central promises of change that former Mexican president Vicente Fox made in the run-up to his victorious election in 2000 was that he would govern on behalf of 118 million Mexicans – a number that included both the 100 million people residing within the territorial confines of the Mexican nation-state as well as the 18 million
<i>mexicanos en el exterior</i>
, the imagined community of Mexican migrants and their descendents living abroad. In recognition of their economic contributions to Mexico, and their continued commitment to the nation, Fox often referred to those
<i>mexicanos en el exterior</i>
as heroes. In this, president Fox was part of an expanding chorus of leaders from major migrant-sending states, from Ireland to the Philippines, who have celebrated the heroic contributions of migrants to their homelands over recent decades. For Fox, this heroic imagery took perhaps its grandest form on December 3, 2000, just three days into the presidency. That day Fox held his first public event and opened the official presidential residence, Los Pinos, for a meeting with migrant leaders. In his official address, the newly inaugurated president waxed eloquently about the spirit and tenacity of the migrant, about the set of characteristics that migrants shared with a curious amalgam of historical figures:
</p>
```
Macro text cleanups applied:
```html
<p class="Default" style="font-family: Helvetica; font-size: 12pt; margin-bottom: 6pt">
One of the central promises of change that former Mexican president Vicente Fox made in the run-up to his victorious election in 2000 was that he would govern on behalf of 118 million Mexicans—a number that included both the 100 million people residing within the territorial confines of the Mexican nation-state as well as the 18 million
<i>mexicanos en el exterior,</i>
the imagined community of Mexican migrants and their descendents living abroad. In recognition of their economic contributions to Mexico, and their continued commitment to the nation, Fox often referred to those
<i>mexicanos en el exterior as heroes. In this, president Fox was part of an expanding chorus of leaders from major migrant-sending states, from Ireland to the Philippines, who have celebrated the heroic contributions of migrants to their homelands over recent decades. For Fox, this heroic imagery took perhaps its grandest form on December 3, 2000, just three days into the presidency. That day Fox held his first public event and opened the official presidential residence, Los Pinos, for a meeting with migrant leaders. In his official address, the newly inaugurated president waxed eloquently about the spirit and tenacity of the migrant, about the set of characteristics that migrants shared with a curious amalgam of historical figures:
</i> as heroes. In this, president Fox was part of an expanding chorus of leaders from major migrant-sending states, from Ireland to the Philippines, who have celebrated the heroic contributions of migrants to their homelands over recent decades. For Fox, this heroic imagery took perhaps its grandest form on December 3, 2000, just three days into the presidency. That day Fox held his first public event and opened the official presidential residence, Los Pinos, for a meeting with migrant leaders. In his official address, the newly inaugurated president waxed eloquently about the spirit and tenacity of the migrant, about the set of characteristics that migrants shared with a curious amalgam of historical figures:
</p>
```1.0.0https://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/2Handling paragraph-level formatting?2018-04-26T20:51:18ZAlex ThegHandling paragraph-level formatting?With #1 finished, the mappings of in-line formatting tags is working properly: (`<b>` and `<u>` become `<i>`).
However, when an entire paragraph is formatted with bold, underlining, or italics, that property is promoted to the paragraph...With #1 finished, the mappings of in-line formatting tags is working properly: (`<b>` and `<u>` become `<i>`).
However, when an entire paragraph is formatted with bold, underlining, or italics, that property is promoted to the paragraph level. Consequently, bolding and underlining aren't mapped to italics, and even if they were, paragraph-level bolding, italics, and underlining is all dropped by Typescript.
Here's an example
Initial extraction
```html
<p><span style="font-family: Helvetica"><b>Bold</b></span></p>
<p><span style="font-family: Helvetica"><i>italics</i></span></p>
<p><span style="font-family: Helvetica"><u>Underline</u></span></p>
```
After the `rinse` step, properties are on the paragraph:
```
<p style="font-family: Helvetica; font-weight: bold">Bold</p>
<p style="font-family: Helvetica; font-style: italic">Italics</p>
<p style="font-family: Helvetica; text-decoration: underline">Underlined</p>
```
Finally, in the last Editoria reduce step, the `style` properties are dropped and the above becomes the following:
```html
<p>Bold</p>
<p>Italics</p>
<p>Underlined</p>
```
One solution might be to add a step in Typescript before the UCP cleanups that looks for one of these:
* `font-weight: bold`
* `font-style: italic`
* `text-decoration: underline`
And handles them by adding an opening `<b>`, `<u>`, or `<i>` right after the opening `<p>`, and the related closing tag just before the `</p>`. What do you think?1.0.0https://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/3Force punctuation to match formatting of preceding word2018-04-21T02:43:19ZAlex ThegForce punctuation to match formatting of preceding wordAs part of the macro cleanups, we should force punctuation to match formatting of preceding word. Let's do that for the following:
* ,
* .
* :
* ;
* ?
* !
Current example:
```xml
<w:p w14:paraId="369F0F7E" w14:textId="2A1BB479" w:rsidR=...As part of the macro cleanups, we should force punctuation to match formatting of preceding word. Let's do that for the following:
* ,
* .
* :
* ;
* ?
* !
Current example:
```xml
<w:p w14:paraId="369F0F7E" w14:textId="2A1BB479" w:rsidR="00733D7F" w:rsidRDefault="00733D7F">
<w:pPr>
<w:rPr><w:rFonts w:ascii="Helvetica" w:eastAsia="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/></w:rPr>
</w:pPr>
<w:r>
<w:rPr><w:rFonts w:ascii="Helvetica" w:eastAsia="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/><w:b/></w:rPr>
<w:t>this is all bold except for the period</w:t>
</w:r>
<w:r>
<w:rPr><w:rFonts w:ascii="Helvetica" w:eastAsia="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/></w:rPr>
<w:t>.</w:t>
</w:r>
</w:p>
```
...results in...
```html
<p style="font-family: Helvetica"><b>this is all bold except for the period</b>.</p>
```1.0.0https://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/5After cleanup, some words duplicated2018-04-09T15:27:38ZAlex ThegAfter cleanup, some words duplicatedAs of a commit today (I think one of the HTMLevator ones), something weird is happening with this chunk of text:
This:
```html
<p style="font-family: Helvetica">
<b>Outstanding </b>
<u>Underline</u>
<b> issues:</b>
</p>
```
Becom...As of a commit today (I think one of the HTMLevator ones), something weird is happening with this chunk of text:
This:
```html
<p style="font-family: Helvetica">
<b>Outstanding </b>
<u>Underline</u>
<b> issues:</b>
</p>
```
Becomes
```html
<p style="font-family: Helvetica">
<b>Outstanding Underline</b>
<u>Underline</u>
<b> issues:</b>
</p>
```
Any ideas what's causing this?1.0.0https://gitlab.coko.foundation/XSweet/HTMLevator/-/issues/4Insert hair space (u200a) btwn pairs of single/double quotes2018-04-06T18:42:23ZAlex ThegInsert hair space (u200a) btwn pairs of single/double quotesAs part of the macro cleanups, we should insert hair space (u200a) btwn pairs of single/double quotes. Note that order of operations matters; this assumes that straight quotes and apostrophes have been replaced with their directional cou...As part of the macro cleanups, we should insert hair space (u200a) btwn pairs of single/double quotes. Note that order of operations matters; this assumes that straight quotes and apostrophes have been replaced with their directional counterparts.
* left single quote+left double quote (u2018+u201c)
* left double quote+left single quote (u201c+u2018)
* right single quote+right double quote (u2019+u201d)
* right double quote+right single quote (u201d+u2019)
This currently partially works. See the following example inputs and outputs: the characters in Word on the left and the final Typescript output on the right.
* `"'quote'"` -> `<p style="font-family: Helvetica">“ ‘quote’ ”</p>`
* works properly; hs between both pairs of quotes
* `'"quote"'` -> `<p style="font-family: Helvetica">‘“quote” ’</p>`
* hs between the 2nd quotes but not the 1st
* `'”quote"‘` -> `<p style="font-family: Helvetica">‘“quote” ’</p>
* hs between the 2nd quotes but not the 1st
* `‘"quote"’` -> `<p style="font-family: Helvetica">‘“quote” ’</p>`
* hs between the 2nd quotes but not the 1st
* `""quote""` -> `<p style="font-family: Helvetica">“ “quote””</p>`
* hs between the 1st quotes but not the 2nd1.0.0