XSweet issueshttps://gitlab.coko.foundation/groups/XSweet/-/issues2023-02-03T09:52:17Zhttps://gitlab.coko.foundation/XSweet/editoria_typescript/-/issues/43Translate HTML-like track changes into Wax 2 target format2023-02-03T09:52:17ZAlex ThegTranslate HTML-like track changes into Wax 2 target formathttps://gitlab.coko.foundation/XSweet/XSweet/issues/172 defines the target format for TCs in Wax.
TCs currently make it through the `editoria-basic.xsl` step, but are dropped by the `editoria-reduce.xsl` step. I feel we should do the co...https://gitlab.coko.foundation/XSweet/XSweet/issues/172 defines the target format for TCs in Wax.
TCs currently make it through the `editoria-basic.xsl` step, but are dropped by the `editoria-reduce.xsl` step. I feel we should do the conversion of TCs for Wax 2 format as part of the `editoria-basic.xsl` step, though, before the `editoria-reduce.xsl` step, then make sure it's passed through the `editoria-reduce.xsl` step.
Here are the tag translations:
* `<ins>` maps to `<span class="insertion">`
* `<del>` maps to `<span class="deletion">`
* `<track-change>` maps to `<span class="format-change">`
These two attributes' values should be passed through unchanged and set on all TCs in Wax:
* `id` to `data-id`. Example values: `"tc-1", "tc-2"`.
* `data-author` to `data-username`. E.g. `"Alex Theg"`
Additionally, these tags need to have their values not just passed through but transformed:
* `datetime` needs to be converted into the datetime format Wax uses, then set as `data-date` on every TC
* `data-addedtype` and `data-oldtype`, when present, need to be converted to Wax's target format and set as `data-after` and `data-before`, respectively. Not only does the format of the values need to be modified, but we probably need to decide what formatting is worth saving. That may be a Amnet-specific step. For example: you may want to save underline, bolding, italics, and ignore everything else.
@christos I have questions about these three attributes and how to handle them:
* `data-user`: could we also pass the `data-author` value from XSweet into this attribute and have it serve as a user unique identifier, in addition to setting it as into `data-username`?
* `data-group`: what are the possible values? This is either `"main"` or what? Can Wax 2 assign this or would it be XSweet?
* `datetime`: @christos what format does Wax use for date time? Does MS Word's timestamp need to be converted? E.g. `2020-02-05T19:02:00Z`BharathydasanBharathydasanhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/148Capture list type2022-03-09T11:23:00ZAlex ThegCapture list typeFrom #106 @GitBruno
XSweet should extract the list type from lists, in addition to what it does now, and hold onto it as a property, `xsweet-list-type`.
@atheg todo: investigate most commonly occurring list types in Word and specify ma...From #106 @GitBruno
XSweet should extract the list type from lists, in addition to what it does now, and hold onto it as a property, `xsweet-list-type`.
@atheg todo: investigate most commonly occurring list types in Word and specify mappings from the Word OOXML to `xsweet-list-type` property values.Dione Mentisdione@coko.foundationBharathydasanDione Mentisdione@coko.foundationhttps://gitlab.coko.foundation/XSweet/editoria_typescript/-/issues/41Import tables for Wax 22022-03-09T11:21:11ZAlex ThegImport tables for Wax 2Wax 2 supports tables. The differences between what the XSweet pipeline currently produces and the format Wax 2 uses is not big:
* XSweet's `<td>`s have a `style` attribute; Wax does not
* XSweet's `<p>`s do not have a `class` but Wax's ...Wax 2 supports tables. The differences between what the XSweet pipeline currently produces and the format Wax 2 uses is not big:
* XSweet's `<td>`s have a `style` attribute; Wax does not
* XSweet's `<p>`s do not have a `class` but Wax's do
* XSweet doesn't wrap the table in a `<tbody>` tag; Wax does
XSweet output:
```html
<table>
<tr>
<td style="border-bottom-style: solid; border-bottom-width: 0.5pt; border-left-style: solid; border-left-width: 0.5pt; border-right-style: solid; border-right-width: 0.5pt; border-top-style: solid; border-top-width: 0.5pt; vertical-align: top">
<p>One</p>
</td>
<td style="border-bottom-style: solid; border-bottom-width: 0.5pt; border-left-style: solid; border-left-width: 0.5pt; border-right-style: solid; border-right-width: 0.5pt; border-top-style: solid; border-top-width: 0.5pt; vertical-align: top">
<p>Two</p>
</td>
<td style="border-bottom-style: solid; border-bottom-width: 0.5pt; border-left-style: solid; border-left-width: 0.5pt; border-right-style: solid; border-right-width: 0.5pt; border-top-style: solid; border-top-width: 0.5pt; vertical-align: top">
<p>Three</p>
</td>
</tr>
<tr>
<td style="border-bottom-style: solid; border-bottom-width: 0.5pt; border-left-style: solid; border-left-width: 0.5pt; border-right-style: solid; border-right-width: 0.5pt; border-top-style: solid; border-top-width: 0.5pt; vertical-align: top">
<p>Four</p>
</td>
<td style="border-bottom-style: solid; border-bottom-width: 0.5pt; border-left-style: solid; border-left-width: 0.5pt; border-right-style: solid; border-right-width: 0.5pt; border-top-style: solid; border-top-width: 0.5pt; vertical-align: top">
<p>Five</p>
</td>
<td style="border-bottom-style: solid; border-bottom-width: 0.5pt; border-left-style: solid; border-left-width: 0.5pt; border-right-style: solid; border-right-width: 0.5pt; border-top-style: solid; border-top-width: 0.5pt; vertical-align: top">
<p>Six</p>
</td>
</tr>
<tr>
<td style="border-bottom-style: solid; border-bottom-width: 0.5pt; border-left-style: solid; border-left-width: 0.5pt; border-right-style: solid; border-right-width: 0.5pt; border-top-style: solid; border-top-width: 0.5pt; vertical-align: top">
<p>Seven</p>
</td>
<td style="border-bottom-style: solid; border-bottom-width: 0.5pt; border-left-style: solid; border-left-width: 0.5pt; border-right-style: solid; border-right-width: 0.5pt; border-top-style: solid; border-top-width: 0.5pt; vertical-align: top">
<p>Eight</p>
</td>
<td style="border-bottom-style: solid; border-bottom-width: 0.5pt; border-left-style: solid; border-left-width: 0.5pt; border-right-style: solid; border-right-width: 0.5pt; border-top-style: solid; border-top-width: 0.5pt; vertical-align: top">
<p>Nine</p>
</td>
</tr>
</table>
```
Wax 2 table:
```xml
<table>
<tbody>
<tr>
<td>
<p class="paragraph">one</p>
</td>
<td>
<p class="paragraph">two</p>
</td>
<td>
<p class="paragraph">three</p>
</td>
</tr>
<tr>
<td>
<p class="paragraph">four</p>
</td>
<td>
<p class="paragraph">five</p>
</td>
<td>
<p class="paragraph">six</p>
</td>
</tr>
<tr>
<td>
<p class="paragraph">seven</p>
</td>
<td>
<p class="paragraph">eight</p>
</td>
<td>
<p class="paragraph">ten</p>
</td>
</tr>
</tbody>
</table>
```
When an XSweet table like the first example is passed in, Wax appears to gloss over the differences seamlessly, adding `class` attributes and a `body` tag, and dropping the `style` information entirely. That is great news.
To complete table import into Wax 2, the only change need (that I can identify) is to pass tables through the final `editoria-reduce.xsl` step. Currently, that drops all table tags, leaving cell contents unchanged.BharathydasanBharathydasanhttps://gitlab.coko.foundation/XSweet/editoria_typescript/-/issues/44Notes format for Wax 22022-03-09T11:14:20ZAlex ThegNotes format for Wax 2# Target format for Wax 2
Unlike Wax 1, which has inline note callouts that reference the contents of the notes at the end of the file, Wax 2 keeps the contents of the notes inline, where they're referenced.
```html
<p class="paragraph">...# Target format for Wax 2
Unlike Wax 1, which has inline note callouts that reference the contents of the notes at the end of the file, Wax 2 keeps the contents of the notes inline, where they're referenced.
```html
<p class="paragraph">Quisque posuere fermentum.<footnote id="de87b2a3-6186-440e-9249-fb07ce9ff344" data-group="notes">note 1 with some content </footnote> Duis ut volutpat nunc.<footnote id="b8cc2013-ba52-48cb-b42c-dbe407de0c69" data-group="notes">note 2</footnote> Nunc elementum id nulla nec tempor. Sed fringilla lacinia diam non tempus.</p>
```
All notes - both endnotes and footnotes - should be turned into `<footnote>` tags. The `<footnote>`s should have two attributes:
1. A unique `id`
2. A `data-group`, which should have a value of "notes"
This transformation needs to happen in the Editoria Typescript steps.
# How it works now
The starting format for notes is the final step of the XSweet chain: the HTML-like format. This will not change. At the end of that, notes take this format:
```html
<div class="docx-body">
<p>Simple notes test. Here’s a footnote.<span class="FootnoteReference"><a class="footnoteReference" href="#fn1">a</a></span> And another footnote.<span class="FootnoteReference"><a class="footnoteReference" href="#fn2">b</a></span> And here’s an
endnote.<span class="EndnoteReference"><a class="endnoteReference" href="#en1">1</a></span> And another endnote.<span class="EndnoteReference"><a class="endnoteReference" href="#en2">2</a></span></p>
</div>
<div class="docx-endnotes">
<div class="docx-endnote" id="en1">
<p class="EndnoteText" style="font-size: 10pt"><span class="EndnoteReference"><span class="endnoteRef">1</span></span> First endnote!</p>
</div>
<div class="docx-endnote" id="en2">
<p class="EndnoteText" style="font-size: 10pt"><span class="EndnoteReference"><span class="endnoteRef">2</span></span> Second endnote!</p>
</div>
</div>
<div class="docx-footnotes">
<div class="docx-footnote" id="fn1">
<p class="FootnoteText" style="font-size: 10pt"><span class="FootnoteReference"><span class="footnoteRef">a</span></span> The first footnote!</p>
</div>
<div class="docx-footnote" id="fn2">
<p class="FootnoteText" style="font-size: 10pt"><span class="FootnoteReference"><span class="footnoteRef">b</span></span> Second footnote</p>
</div>
</div>
```
At the end of the Editoria Typescript process, notes come out looking like this:
```html
<container id="main">
<p>Simple notes test. Here’s a footnote.<note data-id="fn1"/> And another footnote.<note data-id="fn2"/> And here’s an endnote.<note data-id="en1"/> And another endnote.<note data-id="en2"/></p>
</container>
<div id="notes">
<note-container id="container-en1">
<p> First endnote!</p>
</note-container>
<note-container id="container-en2">
<p> Second endnote!</p>
</note-container>
<note-container id="container-fn1">
<p> The first footnote!</p>
</note-container>
<note-container id="container-fn2">
<p> Second footnote</p>
</note-container>
</div>
```
The steps that will need an update are `editoria-basic.xsl` and `editoria-reduce.xsl`.
Currently, the `editoria-basic.xsl` step transforms the inline note callouts:
```html
Before:
<span class="FootnoteReference"><a class="footnoteReference" href="#fn1">a</a></span>
After:
<note data-id="fn1"><!-- implicit --></note>
```
It also transforms the note content itself:
1. turning the `<div class="docx-endnote">` into a `<note-container>`
2. prepending "`container-`" to the `id`
3. dropping out the `<span class="EndnoteReference">` and its contents
```html
Before:
<div class="docx-endnote" id="en1">
<p class="EndnoteText" style="font-size: 10pt">
<span class="EndnoteReference">
<span class="endnoteRef">1</span>
</span>
First endnote!
</p>
</div>
After:
<note-container id="container-en1">
<p class="EndnoteText" style="font-size: 10pt">
First endnote!
</p>
</note-container>
```
The `editoria-reduce.xsl` step leave the note callout as-is, but strips extraneous attributes from the note's `<p>` (`class`, `style`, etc.):
```html
<note-container id="container-en1">
<p> First endnote!</p>
</note-container>
```
# Implementation notes
The `editoria-basic.xsl` step needs to be updated to:
* Find the corresponding note content for every callout, by matching the callout's `href` value to the `id` of the endnote/footnote
* Copy that note content inline, into a `<footnote>` tag, generating a UUID (see https://gitlab.coko.foundation/XSweet/editoria_typescript/issues/43#note_52278) and setting it as the `<footnote>`'s `id` attribute. Also, add a `data-group="notes"` attribute to it.
* Cleanup the old callouts and notes
* For the callouts, the new `<footnote>` tags should completely replace the `<span class="FootnoteReference">` and `<span class="EndnoteReference">` and their contents.
* The notes section at the end of the docx should be dropped completely. The `editoria-basic.xsl` step takes the `<div class="docx-endnotes">` and `<div class="docx-footnotes">` and collapses them both into one `<div id="notes">`. This should be updated to simply drop the `<div class="docx-endnotes">` and `<div class="docx-footnotes">` tags and their contents.
The `editoria-reduce.xsl` step will likely need some minor tweaks to update specific references to outdated notes tags.BharathydasanBharathydasanhttps://gitlab.coko.foundation/XSweet/editoria_typescript/-/issues/45Math2022-03-09T10:47:36ZChristosMathXSweet captures equations created with Word’s built-in equation tool (Office MathML format) and passes them through to the HTML as standard MathML. (docs [here](https://xsweet.org/documentation/xsweet-core/#math))
The Wax editor should ...XSweet captures equations created with Word’s built-in equation tool (Office MathML format) and passes them through to the HTML as standard MathML. (docs [here](https://xsweet.org/documentation/xsweet-core/#math))
The Wax editor should be passed LaTeX so that users can easily edit the math after the Word docs are imported (see library [mml2tex](https://github.com/transpect/mml2tex)). This will be rendered with MathJax (https://www.mathjax.org/)BharathydasanBharathydasanhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/172Track changes target format for Wax 22022-03-01T08:41:15ZAlex ThegTrack changes target format for Wax 2@christos is working on the next version of the Wax editor for Editoria, and it will include some changes to the way certain elements are coded. We had laid out the current format of TCs in #162; going forward, this issue will serve as t...@christos is working on the next version of the Wax editor for Editoria, and it will include some changes to the way certain elements are coded. We had laid out the current format of TCs in #162; going forward, this issue will serve as the reference for the expected target format.
Rather than using a `<track-change>` element, track changes will be a `<span>` tag with a class describing the specific type of change:
* `<span class="insertion">`
* `<span class="deletion">`
* `<span class="format-change">`
Track change elements will have the following attributes:
* `data-id`: A unique identifier string. There is no specific requirement for the format, just that something's passed in.
* `data-user`: A unique identifier string for the user. There is no specific format required, but something does need to be passed in. Down the line, this is going to change to `data-userid`.
* `data-username`: When changes are made within the editor, this will be the user's username. We won't have access to usernames since we're coming from Word, so pass in the author name as-is.
* `data-date`: the date of the change, in the format ____
* `data-group`: what part of the content the change happened in. Either "main" or else "____"
For formatting track changes, these two additional attributes will apply:
* `data-before`: The format before the change. As a starting point, there is no need to pass in anything here and it's fine to omit the tag altogether.
* `data-after`: The formatting change that has been made. More important than the `data-before` attribute.
If `data-id` and `data-group` are both omitted, then the right-hand side track change pane won't show the change (see attached screenshot), and the change will only appear inline. This would be fine as a starting point.
![side_callout](/uploads/8007c007d801d716bd99355f86cc70d9/side_callout.png)
Example:
```html
<p class="paragraph">
<strong>
<span class="format-change"
data-id="9b863ae5-a1ba-4714-93e9-3674b3ef3edf"
data-user="1234"
data-username="demo"
data-date="5316364"
data-before="[]"
data-after="["strong"]"
data-group="main">
this
</span>
</strong>
is a paragraph.
</p>
```
@christos, here are some questions I have:
1. Is this correct, that TCs will be a `<span>` rather than a `<p>` tag?
1. What format does the date need to be in?
1. What are the possible values for `data-group`? `main` and what? Do we need to include this in what we pass in?
1. Can I get a check about when the side callout won't be available? Are the two relevant fields `data-id` and `data-group`? Do both need to be missing before the side callout won't show up, or just one? If I'm wrong, can you correct what I wrote above by the screenshot?
1. We're going to have the `data-before` data. Is there any reason not to include it if we already have it?https://gitlab.coko.foundation/XSweet/XSweet/-/issues/170Track changes not supported in list format conversion2022-03-01T08:32:01ZBharathydasanTrack changes not supported in list format conversionHi @atheg and @wendell,
I tried to do some test in list style and understood that the conversion process completely removes the track changes and even the inline formattings in the list element, I believe this should be addressed first...Hi @atheg and @wendell,
I tried to do some test in list style and understood that the conversion process completely removes the track changes and even the inline formattings in the list element, I believe this should be addressed first before adding anything new to list styles.
1. List (numbering)formats are not captured:
-- I looked some of the existing issues and understood that this was already raised in #148 and #149 and open.
2. Inline format tags are removed in list:
-- Inline formats are completely lost or removed in the conversion pipeline at the '8DETECTLISTS' stage.
3. Track change tags are removed in list:
-- Similarly, track changes (ins/del) are also removed in the conversion pipeline at the '8DETECTLISTS' stage.Alex ThegAlex Theghttps://gitlab.coko.foundation/XSweet/editoria_typescript/-/issues/47Tables not being retained upon import2021-10-26T10:06:29ZRyan Dix-PeekTables not being retained upon import**Issue description;** table structure is not retained, only text pulls through when importing Docx files into Kotahi. (XSweet Core pipeline](https://xsweet.org/documentation/xsweet-core/) should handle table conversion, but this does no...**Issue description;** table structure is not retained, only text pulls through when importing Docx files into Kotahi. (XSweet Core pipeline](https://xsweet.org/documentation/xsweet-core/) should handle table conversion, but this does not appear to be happening. 'Import tables into Wax 2' was implemented here; #41 but the issue persists.
See Kotahi issue for [further context](https://gitlab.coko.foundation/kotahi/kotahi/-/issues/613)https://gitlab.coko.foundation/XSweet/XSweet/-/issues/46Font sizing issue in scrub step2021-05-07T04:30:38ZAlex ThegFont sizing issue in scrub stepSee b_02_ch_1_Bakker.docx.
All the text in the Word doc seems to be 12pt Helvetica. Before the scrub step, this seems to all comes through properly. After the scrub, though, a big portion of the document, from "Migration, remittance...See b_02_ch_1_Bakker.docx.
All the text in the Word doc seems to be 12pt Helvetica. Before the scrub step, this seems to all comes through properly. After the scrub, though, a big portion of the document, from "Migration, remittances, and development: three vignettes" through to the "Introduction" heading is in 11pt font.
The font changes for the endnotes, but I suspect that's an entirely separate issue and because they're endnotes.https://gitlab.coko.foundation/XSweet/editoria_typescript/-/issues/21Add UCP cleanup ingest/output macros to Typescript2021-03-18T16:16:43ZAlex ThegAdd UCP cleanup ingest/output macros to TypescriptUCP runs a series of cleanup macros on book chapters, both when they are ingested before editing, and also again at output to prevent any of the same cleanups from being accidentally reintroduced. Since these cleanups are to be used at t...UCP runs a series of cleanup macros on book chapters, both when they are ingested before editing, and also again at output to prevent any of the same cleanups from being accidentally reintroduced. Since these cleanups are to be used at the beginning and the end of the editing process, and since different presses will have slightly different cleanups, these changes should all be housed in a single XSLT sheet.
Here are several cleanups to implement - there may be more to follow. I am checking this with Erich at UCP, so we will probably add to this list going forward:
- [x] Hyphens between numerals should be converted to en dashes: "2-3" -> "2–3"
- [x] Double spaces should be converted to single spaces, anywhere they're found: <pre>"...touches. However, the..." -> "...touches. However, the..."</pre>
- [x] Spaces around em dashes should be removed (any number of consecutive spaces spaces before or after an em dash)
* "that sentence —as I’ve done" -> "that sentence—as I’ve done"
* "that sentence — as I’ve done" -> "that sentence—as I’ve done"
- [x] Series of periods converted to ellipses
* "..." -> "…"
# Update
I have gotten through a lot of the macro but it is long. Here are some of the remaining cleanups. I will post the rest tomorrow morning. These are in no special order and @wendell you can draw from these and start checking them off as you can - let me know if any of these need clarification. There are plenty more rules left. The most complicated is probably smart quotes. The macro actually does a deletes and replaces all the "s and 's and lets Word's auto-formatting determine their direction when it inserts them in again, which I don't think is an option for us. In any event, more to come!
- [x] Two adjacent hyphens become an em dash: "--" -> "—"
- [x] An en dash surround on both sides by spaces should be converted to an em dash: " – " -> " — "
- [x] Equal signs should be surrounded on either side by one and only one space: " = "
- [x] Replace runs of multiple consecutive spaces with just one space
- [ ] ~~Replace runs of multiple consecutive tabs with just one tab~~ Update: we scrub these out anyway in preparation for Wax, so this is not necessary
- [x] Spaces touching tabs should be removed
- [x] Remove spaces at the very beginning and ends of `p`s
- [x] Remove tabs that end a paragraph (not ones that start)
- [x] Delete empty paragraphs (I believe we are already doing this)
## Quotation marks
All straight, non-directional single and double quotes should be converted into "smart" directional quotes, depending on context. Since the original macro uses Word's auto-formatting, we'll have to make the rules for determining which direction they should point.
Straight quotation marks:
* u0022: quotation mark
* u0027: apostrophe
Should all be replaced by one of the following:
* u2018: left single quotation mark
* u2019: right single quotation mark
* u201c: left double quotation mark
* u201d: right double quotation mark
### Replacement rules from macro:
- [x] ' -> right or left single quotation mark (u2018 or u2019)
- [x] '' -> right or left double quotation mark (u201c or u201d)
- [x] ` -> right or left single quotation mark (u2018 or u2019)
- [x] `` -> right or left double quotation mark (u201c or u201d)
- [x] em dash+right double quote (u2014+u201d) -> em dash+left double quote (u2014+u201c)
- [x] left double quote+em dash (u201c+u2014)-> right double quote+em dash (u201d+u2014)
The following 3 search pattern should look for a straight single quote or a left single quote and replace with a right single quote
- [x] " 'em" or " ‘em" (space+u0027+"em" or space+u2019+"em") -> " ’em" (space+u2019+"em")
- [x] "'n'" or "'n'" (u0027+"n"+u0027 or u2018+"n"+u2018) -> "’n’" (u2019+"n"+u2019)"
- [x] " 'tis" (space+u0027+"tis" or space+u2018+"tis") -> " ’tis" (space+u2019+"tis")
Then:
- [ ] ~~Insert hair space (u200a) btwn pairs of single/double quotes. Note that order of operations matters; this assumes that straight quotes and apostrophes have been replaced with their directional counterparts.~~ update: tracking in #28
* left single quote+left double quote (u2018+u201c)
* left double quote+left single quote (u201c+u2018)
* right single quote+right double quote (u2019+u201d)
* right double quote+right single quote (u201d+u2019)
### Directional rules
Here are my proposed rules for direction. They would have to be executed before all of the above rules from the macro:
- [x] First, replace all 4 directional quotation marks with their non-directional counterparts:
* u2018 and u2019 -> u0027
* u201c and u201d -> u0022
* also \` and `` to their respective u0027 and u0022
Then:
- [x] apostrophe+alphabetical character (u0027+letter) -> left single quotation mark (u2018+letter)
- [x] alphabetical character+apostrophe (letter+u0027( -> alphabetical character+right single quotation mark (letter+u2019)
- [x] quotation mark+alphabetical character (u0022+letter) -> left double quotation mark+alphabetical character (u201c+letter)
- [x] alphabetical character+quotation mark (letter+u0022) -> alphabetical character+right double quotation mark (letter+u201d)
In any case, these will probably need some refinement but double check me and let me know what you think!
### Formatting
- [x] Convert underlining to italics
- [ ] ~~Convert bold to italics~~ update: tracking in #29
We currently convert literal `<u>` tags into `<i>s` in the “Editoria basic” step. But, that can sometimes get scrubbed out in the “Editoria reduce” step. We should also catch underlining, italics, and bold when it’s specified in the css style, which we’re not currently doing. Wax looks for an `<em>` tag for italics. So, the following should all be converted into text wrapped in `<em>`:
* `<i>`
* `<b>`
* `<u>`
* `<p style=“font-weight: bold”>`
* `<p style=“font-style: italic”>`
* `<p style=“text-decoration: underline”>`
Once this is implemented, we should also update the “Push mappings” to reflect this.
- [ ] ~~Force punctuation to match formatting of preceding word~~ tracking in #27
Since we're porting into Wax, we don't need to worry about fonts/font size. The only thing I can think to catch is formatting (italics, bold, underline). And, since all of these should get flattened to `<em>`s, I think this could be as simple as ensuring that if the preceding word is `<em>`, the trailing punctuation is as well. These are the punctuation marks that this rule should apply to:
* ,
* .
* :
* ;
* ?
* !
### Rules already implemented
The following cleanups don't require any additional coding, since XSweet is handling these as it should already:
* Remove page breaks and section breaks
* Page breaks are extracted as `<br class=“br”>`, and the pipeline replaces these by breaking paragraphs on `br`s
* Section breaks are dropped, since we’re not explicitly catching them
* Remove any comments: already happens, since wed don’t handle them
* Delete headers and footers: we’re already dropping these
* Remove soft hyphens: these do not come through into the html.1.0.0https://gitlab.coko.foundation/XSweet/XSweet/-/issues/171Express the most recently applied formatting inline on formatting track changes2020-10-29T14:57:25ZAlex ThegExpress the most recently applied formatting inline on formatting track changesThe formatting track changes @bharathydasan has been working on with #164 express the styling information in CSS style syntax, rather than using inline tags. At some point in the pipeline, the styling from the CSS should be expressed inl...The formatting track changes @bharathydasan has been working on with #164 express the styling information in CSS style syntax, rather than using inline tags. At some point in the pipeline, the styling from the CSS should be expressed inline, for two reasons.
First and most importantly, it's what Wax expects to see, per the target format provided by @christos:
```html
<track-change-format status="add-formating"
oldtype="[]"
addedtype="[{" username ":"demo ","type ":"strong "}]"
user-id="1"
username="demo">
<strong>test</strong>
</track-change-format>
```
Additionally, by duplicating the CSS styling from formatting track changes as inline tags, HTML in the browser would display the inline formatting even if it ignored the track change tags altogether.
That it's what Wax expects is the more important point. If we set aside the second point, this could also plausibly live in the Editoria Typescript portion of the pipeline, but I think this really belongs as one of the final steps targeting HTML (maybe before the `final-rinse.xsl` step?).
@bharathydasan and @wendell have started a discussion on #164 (specifically [here](https://gitlab.coko.foundation/XSweet/XSweet/issues/164#note_45058)), which we can carry forward on this issue.
I'd propose that we all agree on the target format explicitly before building anything. My suggestion would be to use exactly the target format for Wax, perhpas differing only on the exact tags used if there's a good reason (e.g. we might use `<b>` for the HTML and `<strong>` for Wax).https://gitlab.coko.foundation/XSweet/XSweet/-/issues/164Track changes to inline formatting2020-09-20T21:13:16ZAlex ThegTrack changes to inline formatting@bharathydasan, would you mind outlining your approach to capturing track changes to formatting in this ticket? From the `track-changes` branch `README.md`:
> Formatting Changes: Formatting changes are recorded in `w:rPrChange`, all the...@bharathydasan, would you mind outlining your approach to capturing track changes to formatting in this ticket? From the `track-changes` branch `README.md`:
> Formatting Changes: Formatting changes are recorded in `w:rPrChange`, all the inline and other formattings should be retained during the transformation. Development for fomatting changes will start once the development on Character changes is completed.
Specifically, it would be great to have a list of the formatting changes you're targeting to capture initially, so we can have a reference and then discuss what is or isn't included/ignored/transformed etc. in the first implementation. Thanks!BharathydasanBharathydasanhttps://gitlab.coko.foundation/XSweet/XSweet/-/issues/162WIP - Handle track changes2020-09-20T21:07:49ZAlex ThegWIP - Handle track changesXSweet should handle track changes.
# Current .docx track changes format
* __ANY OTHERS? TEST A LARGER GROUP OF FORMATTING CHANGES__
## Information included in OOXML track changes
The following data exists in the .docx for each tracked...XSweet should handle track changes.
# Current .docx track changes format
* __ANY OTHERS? TEST A LARGER GROUP OF FORMATTING CHANGES__
## Information included in OOXML track changes
The following data exists in the .docx for each tracked change, as attributes on the same tags that mark track changesL `<w:ins`, `<w:del`, and `<w:rPrChange`:
* `w:id`: integers assigned mostly sequentially to track changes as they are made (a few things other than track changes get assigned these, such as `<w:bookmark...>`s).
* `w:author`
* `w:date`: timestamp in ISO 8601 format
Example: `<w:ins w:id="1" w:author="Alex Theg" w:date="2020-02-03T07:44:00Z">`
## Insertions
Insertions are denoted by a `<w:ins` tag
Inline text: For the simplest insertions, inline within a text run, the preceding run ends (`</w:r>`), then there's an insertion start tag (`<w:ins...>`), then a normal `<w:r><w:t>` inside it, followed by the normal closing tags and a `</w:ins>`:
```xml
<w:p w14:paraId="5363EBC2" w14:textId="0075D436" w:rsidR="00105FD3" w:rsidRDefault="005D2EF4">
<w:r>
<w:t>Now let's do some inline additions starting after the colon but before the space:</w:t>
</w:r>
<w:ins w:id="12" w:author="Alex Theg" w:date="2020-02-05T19:07:00Z">
<w:r>
<w:t xml:space="preserve">
here's the inline insertion which ends on the w in now. now
</w:t>
</w:r>
</w:ins>
<w:r>
<w:t>! The exclamation mark is the first untracked character again.</w:t>
</w:r>
</w:p>
```
If an insertion covers multiple paragraphs, then each paragraph will have a `<w:pPr><w:rPr><w:ins.../>` with a timestamp that matches the timestamp of the last tracked change from that paragraph. I _think_ a single whole-paragraph insertion doesn't necessarily get this `<w:pPr><wPrPr><w:ins.../>`. All inserted text within the paragraph is still marked as a normal text insertion (`<w:ins...><w:r><w:t>`), so it appears the `<w:pPr>` can usually be ignored. The only exception is the insertion of an entirely blank paragraph, i.e. hit return twice - the paragraph created by the first keystroke. That empty para looks like this:
```xml
<w:p w14:paraId="2661557F" w14:textId="77777777" w:rsidR="00105FD3" w:rsidRDefault="00105FD3">
<w:pPr>
<w:rPr><w:ins w:id="7" w:author="Alex Theg" w:date="2020-02-05T19:02:00Z"/></w:rPr>
</w:pPr>
</w:p>
```
This is worth verifying as TC support is being built, but it may also be a moot point:
* IIRC, we may already disregard _everything_ in `<w:pPr>`s (low degree of confidence here)
* Pretty sure we also remove `<p>`s with no content somewhere in the pipeline (higher confidence)
## Deletions
Deletions are denoted by a `<w:del` tag.
Inline text: simple inline text deletions look the same as inline text insertions, except that they are surrounded by a `<w:delText>` tag instead of the regular `<w:r>` tag. This means they'd need slightly different handling than the contents of a text insertion TC:
```xml
<w:p w14:paraId="507F6657" w14:textId="77777777" w:rsidR="00803599" w:rsidRDefault="00803599" w:rsidP="00803599">
<w:r>
<w:t xml:space="preserve">
The quick
</w:t>
</w:r>
<w:del w:id="0" w:author="Alex Theg" w:date="2020-02-08T17:18:00Z">
<w:r w:rsidDel="00803599">
<w:delText xml:space="preserve">
brown fox
</w:delText>
</w:r>
</w:del>
<w:r>
<w:t>jumped over the lazy dog.</w:t>
</w:r>
</w:p>
```
Multi-paragraph deletions are handled similarly to insertions:
* Paragraphs in a multi-para deletion have a `<w:pPr><w:rPr><w:del.../>` set on the para level.
* The deletions are still tracked as regular deletions: wrapped in `<w:del...>` tags.
* For deletions, the text is inside `<w:delText>` tags, instead of `<w:t>` tags.
## Inline formatting changes
_see issue #164_
Inline formatting changes are denoted by a `<w:rPrChange` tag that includes the track change id, author, and date data.
Formatting changes happen on the `<w:r>` level. The position of the formatting tag changes depends on whether it is being applied or removed:
Added formatting follows the below pattern. Note that the applied format tag comes between `<w:rPr>` and `<w:rPrChange...>`, and the `<w:rPr/>` tag inside the `w:rPrChange...>` is self-closing:
```xml
<w:r...>
<w:rPr>
<w:[NAME OF FORMAT TAG APPLIED]/>
<w:rPrChange w:id="[ID]" w:author="[AUTHOR]" w:date="[ISO8601 TIMESTAMP]">
<w:rPr/>
</w:rPrChange>
</w:rPr>
<w:t>Hello there.</w:t>
</w:r>
```
Here is the generic pattern for formatting removal. Note that the format tag being removed is inside the `<w:rPr>` tags inside the `<w:rPrChange>` tags:
```xml
<w:r...>
<w:rPr>
<w:rPrChange w:id="[ID]" w:author="[AUTHOR]" w:date="[ISO8601 TIMESTAMP]">
<w:rPr>
<w:[NAME OF FORMAT TAG APPLIED]/>
</w:rPr>
</w:rPrChange>
</w:rPr>
<w:t>Hello there.</w:t>
</w:r>
```
Here are examples of inline formatting being applied and removed for:
* bold
* italics
* small caps
* subscript
* superscript
* highlighting
```xml
<!-- add bold -->
<w:r w:rsidR="00DD440B" w:rsidRPr="00DD440B">
<w:rPr>
<w:b/>
<w:rPrChange w:id="1" w:author="Alex Theg" w:date="2020-02-05T19:18:00Z">
<w:rPr/>
</w:rPrChange>
</w:rPr>
<w:t>bold</w:t>
</w:r>
<!-- remove bold -->
<w:r w:rsidRPr="008D6A23">
<w:rPr>
<w:rPrChange w:id="7" w:author="Alex Theg" w:date="2020-02-05T19:20:00Z">
<w:rPr>
<w:b/>
</w:rPr>
</w:rPrChange>
</w:rPr>
<w:t>bold</w:t>
</w:r>
<!-- add italics -->
<w:r w:rsidR="00DD440B" w:rsidRPr="00DD440B">
<w:rPr>
<w:i/>
<w:rPrChange w:id="2" w:author="Alex Theg" w:date="2020-02-05T19:18:00Z">
<w:rPr/>
</w:rPrChange>
</w:rPr>
<w:t>italics</w:t>
</w:r>
<!-- remove italics -->
<w:r w:rsidRPr="008D6A23">
<w:rPr>
<w:rPrChange w:id="8" w:author="Alex Theg" w:date="2020-02-05T19:20:00Z">
<w:rPr>
<w:i/>
</w:rPr>
</w:rPrChange>
</w:rPr>
<w:t>italics</w:t>
</w:r>
<!-- add small caps -->
<w:r w:rsidR="00DD440B" w:rsidRPr="00DD440B">
<w:rPr>
<w:smallCaps/>
<w:rPrChange w:id="3" w:author="Alex Theg" w:date="2020-02-05T19:18:00Z">
<w:rPr/>
</w:rPrChange>
</w:rPr>
<w:t>small-caps</w:t>
</w:r>
<!-- remove small caps -->
<w:r w:rsidRPr="008D6A23">
<w:rPr>
<w:rPrChange w:id="9" w:author="Alex Theg" w:date="2020-02-05T19:20:00Z">
<w:rPr>
<w:smallCaps/>
</w:rPr>
</w:rPrChange>
</w:rPr>
<w:t>small-caps</w:t>
</w:r>
<!-- add subscript -->
<w:r w:rsidR="00DD440B" w:rsidRPr="00DD440B">
<w:rPr>
<w:vertAlign w:val="subscript"/>
<w:rPrChange w:id="4" w:author="Alex Theg" w:date="2020-02-05T19:18:00Z">
<w:rPr/>
</w:rPrChange>
</w:rPr>
<w:t>subscript</w:t>
</w:r>
<!-- remove subscript -->
<w:r w:rsidRPr="008D6A23">
<w:rPr>
<w:rPrChange w:id="10" w:author="Alex Theg" w:date="2020-02-05T19:20:00Z">
<w:rPr>
<w:vertAlign w:val="subscript"/>
</w:rPr>
</w:rPrChange>
</w:rPr>
<w:t>subscript</w:t>
</w:r>
<!-- add superscript -->
<w:r w:rsidR="00DD440B" w:rsidRPr="00DD440B">
<w:rPr>
<w:vertAlign w:val="superscript"/>
<w:rPrChange w:id="5" w:author="Alex Theg" w:date="2020-02-05T19:19:00Z">
<w:rPr/>
</w:rPrChange>
</w:rPr>
<w:t>superscript</w:t>
</w:r>
<!-- remove superscript -->
<w:r w:rsidRPr="008D6A23">
<w:rPr>
<w:rPrChange w:id="11" w:author="Alex Theg" w:date="2020-02-05T19:20:00Z">
<w:rPr>
<w:vertAlign w:val="superscript"/>
</w:rPr>
</w:rPrChange>
</w:rPr>
<w:t>superscript</w:t>
</w:r>
<!-- add highlight -->
<w:r w:rsidR="00DD440B" w:rsidRPr="00DD440B">
<w:rPr>
<w:highlight w:val="yellow"/>
<w:rPrChange w:id="6" w:author="Alex Theg" w:date="2020-02-05T19:19:00Z">
<w:rPr/>
</w:rPrChange>
</w:rPr>
<w:t>highlight</w:t>
</w:r>
<!-- remove highlight -->
<w:r w:rsidRPr="008D6A23">
<w:rPr>
<w:rPrChange w:id="12" w:author="Alex Theg" w:date="2020-02-05T19:20:00Z">
<w:rPr>
<w:highlight w:val="yellow"/>
</w:rPr>
</w:rPrChange>
</w:rPr>
<w:t>highlight</w:t>
</w:r>
```
## Whole-paragraph formatting changes
_see issue #165_
Whole-para formatting changes are reflected on the paragraph level under `<w:pPr>` in a similar format as on the runs. Similarly to how insertions and deletions can probably be _generally_ ignored (with the possible exception of blank paras), my guess is that formatting changes marked at the para level can be ignored entirely, since the same formatting change appears to be marked on all the `<w:r>`s in the paragraph as well. Here's an example of a whole-para formatting change:
```xml
<w:p w14:paraId="2D7ADC77" w14:textId="31895971" w:rsidR="00C61225" w:rsidRPr="000432AB" w:rsidRDefault="00C61225">
<w:pPr>
<w:rPr>
<w:b/>
<w:rPrChange w:id="14" w:author="Alex Theg" w:date="2020-02-05T19:27:00Z">
<w:rPr/>
</w:rPrChange>
</w:rPr>
</w:pPr>
<w:r w:rsidRPr="000432AB">
<w:rPr>
<w:b/>
<w:rPrChange w:id="15" w:author="Alex Theg" w:date="2020-02-05T19:27:00Z">
<w:rPr/>
</w:rPrChange>
</w:rPr>
<w:t>Here's a whole-para formatting change. Does it show up where it would if it was just a word, or change the whole paragraph's properties?</w:t>
</w:r>
<w:r w:rsidR="000432AB" w:rsidRPr="000432AB">
<w:rPr>
<w:b/>
<w:rPrChange w:id="16" w:author="Alex Theg" w:date="2020-02-05T19:27:00Z">
<w:rPr/>
</w:rPrChange>
</w:rPr>
<w:t xml:space="preserve">
Bit at end selected</w:t>
</w:r>
</w:p>
```
## Adjacent track changes
Sometimes, adjacent TC insertions are marked as multiple TCs in the OOXML but treated as a single TC insertion in Word:
```xml
<w:p w14:paraId="2FB11FDC" w14:textId="7955A033" w:rsidR="00105FD3" w:rsidRDefault="00105FD3">
<w:pPr>
<w:rPr><w:ins w:id="2" w:author="Alex Theg" w:date="2020-02-05T19:02:00Z"/></w:rPr>
</w:pPr>
<w:ins w:id="3" w:author="Alex Theg" w:date="2020-02-05T19:02:00Z">
<w:r>
<w:t xml:space="preserve">1 of 2
</w:t>
</w:r>
</w:ins>
<w:ins w:id="4" w:author="Alex Theg" w:date="2020-02-05T19:01:00Z">
<w:r>
<w:t xml:space="preserve">I'm turning on TC again now. The return between this and the previous TC-insertion para above it is not tracked. Now I'll hit enter twice and start a new
</w:t>
</w:r>
</w:ins>
<w:ins w:id="5" w:author="Alex Theg" w:date="2020-02-05T19:02:00Z">
<w:r>
<w:t>para, making this one long multi-para insertion</w:t>
</w:r>
</w:ins>
<w:ins w:id="6" w:author="Alex Theg" w:date="2020-02-05T19:01:00Z">
<w:r>
<w:t>.</w:t>
</w:r>
</w:ins>
</w:p>
<w:p w14:paraId="2661557F" w14:textId="77777777" w:rsidR="00105FD3" w:rsidRDefault="00105FD3">
<w:pPr>
<w:rPr><w:ins w:id="7" w:author="Alex Theg" w:date="2020-02-05T19:02:00Z"/></w:rPr>
</w:pPr>
</w:p>
<w:p w14:paraId="7F19C05A" w14:textId="719C1229" w:rsidR="00105FD3" w:rsidRDefault="00105FD3">
<w:pPr>
<w:rPr><w:ins w:id="8" w:author="Alex Theg" w:date="2020-02-05T19:04:00Z"/></w:rPr>
</w:pPr>
<w:ins w:id="9" w:author="Alex Theg" w:date="2020-02-05T19:02:00Z">
<w:r>
<w:t>2 of 2 It's still the same insertion</w:t>
</w:r>
</w:ins>
<w:ins w:id="10" w:author="Alex Theg" w:date="2020-02-05T19:03:00Z">
<w:r>
<w:t>. TC left on for the next enter,</w:t>
</w:r>
</w:ins>
<w:ins w:id="11" w:author="Alex Theg" w:date="2020-02-05T19:04:00Z">
<w:r>
<w:t xml:space="preserve">
but not the one after that.</w:t>
</w:r>
</w:ins>
</w:p>
```
The same thing goes for deletions: adjacent deletions marked separately in the OOXML are treated as a single deletion in Word:
```xml
<w:p w14:paraId="47DD4588" w14:textId="20155193" w:rsidR="00803599" w:rsidRDefault="00803599" w:rsidP="004A18F2">
<w:pPr><w:outlineLvl w:val="0"/></w:pPr>
<w:r>
<w:t>1. Can you have adjacent deletions in the underlying .</w:t>
</w:r><w:proofErr w:type="spellStart"/>
<w:r>
<w:t>docx</w:t>
</w:r><w:proofErr w:type="spellEnd"/>
<w:r>
<w:t xml:space="preserve">
that are treated as one
</w:t>
</w:r><w:bookmarkStart w:id="0" w:name="_GoBack"/><w:bookmarkEnd w:id="0"/>
<w:del w:id="1" w:author="Alex Theg" w:date="2020-02-08T17:27:00Z">
<w:r w:rsidDel="00BF16DE">
<w:delText>TC</w:delText>
</w:r>
<w:r w:rsidDel="00573FCF">
<w:delText>, sim</w:delText>
</w:r>
</w:del>
<w:del w:id="2" w:author="Alex Theg" w:date="2020-02-08T17:25:00Z">
<w:r w:rsidDel="00AC7454">
<w:delText xml:space="preserve">ilar
</w:delText>
</w:r>
<w:r w:rsidDel="004A18F2">
<w:delText>to insertions</w:delText>
</w:r>
</w:del>
<w:r>
<w:t>?</w:t>
</w:r>
</w:p>
```
Adjacent but separately marked text insertions or text deletions are still treated by Word as a single one, even if they are non-sequential. E.g. if an author makes:
1. Tracked insertion A
2. Tracked insertion B somewhere else in the document
3. Tracked insertion C adjacent to tracked insertion A
then the touching insertions A and C would be accepted or rejected together in Word.
Inline formatting appears similar: equal adjacent TC'd formatting changes are treated in Word as one change to be accepted or rejected together.
# Current XSweet behavior (no handling)
The initial `docx-html-extract.xsl` doesn't currently recognize any of the track changes markup from the OOXML as important or worth saving. As such, it drops the surrounding tags and change data, but passes through the content tags within, since those _are_ important. The result of this is effectively to:
* Accept all text additions, leaving them in place
* Reject all text deletions, as the text just gets passed through.
* __formatting__
# Target formats
UPDATE:
* The HTML-like format for text insertions and deletions is reflected in the subsequent comments on this ticket
* #164 reflects the HTML-like format for formatting changes
* #172 reflects a newer target format for TCs for Wax 2
ORIGINAL:
From @christos, as a starting point:
## Addition
```html
<track-change status="add" user-id="1" username="demo" color='{"addition":"#4990e2","deletion":"#c00"}'> some addition</track-change>
```
## Deletion
```html
<track-change status="delete" user-id="1" username="demo" color='{"addition":"#4990e2","deletion":"#c00"}'>test</track-change>
```
## add Format change
```html
<track-change-format status="add-formating" oldtype="[]" addedtype="[{" username ":"demo ","type ":"strong "}]" user-id="1" username="demo"><strong>test</strong></track-change-format>
```
## remove and add format change
```html
<track-change-format status="delete-formating" oldtype="[{" username ":"demo ","type ":"strong "}]" addedtype="[]" user-id="1" username="demo">
<track-change-format status="add-formating" oldtype="[]" addedtype="[{" username ":"demo ","type ":"emphasis "}]" user-id="1" username="demo"><em>test</em></track-change-format>
</track-change-format>
```
# Architecture
UPDATE: #167 exists for considering different options/modes for handling TCs
ORIGINAL:
* Initial XSweet extraction XSL sheet (`docx-html-extract.xsl`) ~~will need to preserve the OOXML track changes markup.~~ should know _whether_ to preserve track changes at all. It's important that this first sheet can take a yes or no, because there's plenty of handling down-pipeline that cleans up the HTML by joining adjacent tags of the same type and attributes together, pushes formatting from inline to para level, etc., and I can see TC markup preventing a lot of that from happening. Thus, if someone doesn't need TC preserved, they should be able to opt out and get the best result from all those cleanups.
* Subsequent XSL sheets will need to know how to pass through the relevant OOXML track changes tags.
* The `final-rinse.xsl` sheet in the `html-polish` group would need to know _how_ to handle track changes, if they're passed to it. At a minimum, this step would need to know whether to:
1. keep the OOXML track change markup as-is, to be transformed into an target format in a subsequent step
2. Accept (or reject) all changes, to preserve XSweet's ability to output to valid HTML, if that is the target format[Better_tc_test.docx]
Temp testing docx:
[beautified_document.xml](/uploads/40657ddb31cf7ab503befa63791c549a/beautified_document.xml)
[Better_tc_test.docx](/uploads/d01599d31c432324f11fe9889e0c93de/Better_tc_test.docx)https://gitlab.coko.foundation/XSweet/XSweet/-/issues/163CONTRIBUTING.md2020-04-16T04:33:15ZAlex ThegCONTRIBUTING.md@adam @wendell I'm curious whether you think this seems like a good plan for XSweet contributions going forward (especially the "Getting your contributions merged" section). I'm certainly open to changes. This is very similar to the guid...@adam @wendell I'm curious whether you think this seems like a good plan for XSweet contributions going forward (especially the "Getting your contributions merged" section). I'm certainly open to changes. This is very similar to the guidance provided for contributing to Pubsweet and Editoria.
# CONTRIBUTING
XSweet is both an open source software project (https://gitlab.coko.foundation/editoria/editoria) and an open community. We welcome people of all kinds to join the community and contribute with knowledge, skills, and expertise. Everyone is welcome in our chat room (https://mattermost.coko.foundation/coko/channels/xsweet).
In order to contribute to XSweet, you're expected to follow a few sensible guidelines.
## Search first, ask questions later
If you want to add a new XSLT step or change an existing step, or if you've experienced a bug or want to discuss something in the issue trackers, please search in the relevant repos to find out whether an issue already exists before you start developing. Issue lists for the main XSweet repos include:
* https://gitlab.coko.foundation/XSweet/XSweet/issues
* https://gitlab.coko.foundation/XSweet/HTMLevator/issues
* https://gitlab.coko.foundation/XSweet/editoria_typescript/issues
When in doubt about which repo your issue should be posted to, post it to XSweet.
## Discuss your contribution before you build
Please let us know about the contribution you plan to make before you start it. Either comment on a relevant existing issue, or open a new issue if you can't find an existing one. This helps us avoid duplicating effort and increases the likelihood that your contributions will be accepted. You can also ask in the chat room (https://mattermost.coko.foundation/coko/channels/xsweet) if you are unsure.
For contributions made as discussions and suggestions, feel free to open an RFC (requests for comment) as an issue at any time, so XSweet community members and maintainers can join a discussion.
## Branches
We maintain master as the production branch of the three main XSweet repos.
If you wish to contribute to XSweet, you should make a branch and then issue a pull request following this procedure:
1. Create a user account on Coko GitLab : http://gitlab.coko.foundation
2. Clone the desired repo's master branch with `git clone <XSweet repo URL>`
3. Create a new branch and work off that. Please name the branch descriptively so it identifies the feature you are working on. You can push the branch to Gitlab at any time.
## Getting your contributions merged
All merge requests should fulfill these two simple rules:
1. Consensus should been established in its corresponding Gitlab issue
2. Merge requests shouldn't break existing functionality
Once you're ready to submit your contributions:
* Generate a Merge Request (analogous to a GitHub Pull Request) from the GitLab interface but do not assign this request to anyone.
You do this from the Gitlab UI on your branch. As you submit your merge request, ask for feedback from both @wendell and @atheg:
* @wendell will review the code and provide any comments, suggestions, etc. about implementation
* @atheg will QC the code and provide feedback on a functional level, e.g. does it work?
* We also encourage feedback and discussion from as many people as possible on merge requests!
* After discussion at this stage and/or altering your branch based on the discussion, you'll receive approval from both @wendell and @atheg. At this point, assign your merge request to @atheg to be merged. You do this from the Gitlab UI on your branch.
## Bug reports, feature requests, and support questions
Bugs should be reported as issues on the appropriate repo (when in doubt, add your issue to the XSweet/XSweet repo). Questions can be posted as issues, or posed in the XSweet Mattermost channel at https://mattermost.coko.foundation/coko/channels/xsweethttps://gitlab.coko.foundation/XSweet/XSweet_runner_scripts/-/issues/3Next steps for track changes with Amnet2020-03-20T01:28:50ZAlex ThegNext steps for track changes with AmnetNext steps for moving forward with preserving track changes:
1. @Bharathydasan to update runner scripts so they work in a Windows 10 environment, to get a comfortable dev environment set up. These 2 scripts in particular need updating:
...Next steps for moving forward with preserving track changes:
1. @Bharathydasan to update runner scripts so they work in a Windows 10 environment, to get a comfortable dev environment set up. These 2 scripts in particular need updating:
* https://gitlab.coko.foundation/XSweet/XSweet_runner_scripts/blob/master/xsweet_runner.rb
* https://gitlab.coko.foundation/XSweet/XSweet_runner_scripts/blob/master/execute_chain.sh
2. Once that's done, Amnet will consider a strategy for passing through track changes. Any thoughts and discussion can take place on this issue: (https://gitlab.coko.foundation/XSweet/XSweet/issues/162)
3. Before any track changes development happens, Amnet, @wendell @atheg and @adam will meet next week to discuss the approach and talk through the overall strategy and architecture for capturing track changes.https://gitlab.coko.foundation/XSweet/XSweet_runner_scripts/-/issues/2"xsweet_downloader.rb" doesn't work in Windows 10 OS2020-03-01T14:27:26ZBharathydasan"xsweet_downloader.rb" doesn't work in Windows 10 OSxsweet_downloader.rb throws an error as "No such file or directory", this occurs in Windows 10 os but it works well in "Mac OS".
***Error:***
Traceback (most recent call last):
1: from xsweet_downloader.rb:30:in ``<main>'`
xswee...xsweet_downloader.rb throws an error as "No such file or directory", this occurs in Windows 10 os but it works well in "Mac OS".
***Error:***
Traceback (most recent call last):
1: from xsweet_downloader.rb:30:in ``<main>'`
xsweet_downloader.rb:30:in ``': No such file or directory - xdg-open E:/XSweet/XSweet_runner_scripts/xsweet.zip (Errno::ENOENT)
![xsweet_runner_script](/uploads/689793bf48f97d006ee2b793c0569e3e/xsweet_runner_script.gif)https://gitlab.coko.foundation/XSweet/XSweet/-/issues/138Spaces between superscripts dropped in join step2019-07-09T19:02:41ZAlex ThegSpaces between superscripts dropped in join stepThis may or may not be related to https://gitlab.coko.foundation/XSweet/XSweet/issues/44
When superscripts are separated by regular spaces, in the join step, they are collapsed into one superscript and the separating spaces are dropped ...This may or may not be related to https://gitlab.coko.foundation/XSweet/XSweet/issues/44
When superscripts are separated by regular spaces, in the join step, they are collapsed into one superscript and the separating spaces are dropped entirely. Example:
Join input:
```html
<p>...impact-related injury.<sup>6</sup> <sup>7</sup> <sup>8</sup> Then, there...</p>
```
Join output:
```html
<p>...linear impact-related injury.<sup>678</sup> Then, there...</p>
```
Instead, the non-superscript spaces should stay where they are.1.0.0https://gitlab.coko.foundation/XSweet/XSweet/-/issues/139Enable upload of larger docx files2019-07-07T23:20:08ZAlex ThegEnable upload of larger docx filesThis is twofold:
1. It looks like there's a maximum docx size limit of 1MB. We should determine where this is coming from (INK? Editoria?) and increase it a bit.
2. If a docx is very long, even if it's not particularly large, can cause a...This is twofold:
1. It looks like there's a maximum docx size limit of 1MB. We should determine where this is coming from (INK? Editoria?) and increase it a bit.
2. If a docx is very long, even if it's not particularly large, can cause a timeout. Example: Gabbard Pt 1.https://gitlab.coko.foundation/XSweet/XSweet/-/issues/133Numbering from Horton bib2019-07-07T23:09:30ZAlex ThegNumbering from Horton bibSources nested under "Castañeda, Heide" in Horton bib are really an automatically numbered list starting at 2007.
Follow numbering references through to `numbering.xml`; generate plan for extracting.
Only the first bullet gets properly...Sources nested under "Castañeda, Heide" in Horton bib are really an automatically numbered list starting at 2007.
Follow numbering references through to `numbering.xml`; generate plan for extracting.
Only the first bullet gets properly extracted; revisit this once this bug has been fixed (ticket number #106)https://gitlab.coko.foundation/XSweet/XSweet/-/issues/115Heading 1 and 2 styles coming through as the same style2019-07-07T22:54:34ZChris JenningsHeading 1 and 2 styles coming through as the same styleI have a Word document with styles thus: Heading 1, Heading 2, Heading 3
All come into Editoria as Heading 1
![Screenshot_2017-08-16_12.59.20](/uploads/effb89e725cbd140b01c647a0f3bc71a/Screenshot_2017-08-16_12.59.20.png)
![Scree...I have a Word document with styles thus: Heading 1, Heading 2, Heading 3
All come into Editoria as Heading 1
![Screenshot_2017-08-16_12.59.20](/uploads/effb89e725cbd140b01c647a0f3bc71a/Screenshot_2017-08-16_12.59.20.png)
![Screen_Shot_2017-08-16_at_13.00.03](/uploads/769ebd996140ec2609478b8381eadfcd/Screen_Shot_2017-08-16_at_13.00.03.png)1.0.0