... | ... | @@ -10,22 +10,22 @@ Bash scripts are an expediency. Soon we should be able to run these from INK. |
|
|
|
|
|
### Steps
|
|
|
|
|
|
## [docx-html-extract.xsl](docx-html-extract.xsl)
|
|
|
## [docx-html-extract.xsl](./docx-html-extract.xsl)
|
|
|
|
|
|
Extracts data from the Word as literal-mindedly as we can make it, producing outputs that are nominally HTML5 (syntactically and idiomatically) while also capturing all *relevant* information from the Word document source (as data object representing a formatted artifact in print or UI).
|
|
|
|
|
|
## [handle-notes.xsl](handle-notes.xsl)
|
|
|
## [handle-notes.xsl](./handle-notes.xsl)
|
|
|
|
|
|
Resolves and re-renders `endnote` constructs from the Word into a normalized form.
|
|
|
|
|
|
## [scrub.xsl](scrub.xsl)
|
|
|
## [scrub.xsl](./scrub.xsl)
|
|
|
|
|
|
Performs certain cleanup operations, such as regularizing CSS on `@style` (one of the ways info is captured from the source) and removing other noise (e.g. spurious and redundant element types captured from the Word etc., paragraphs or formatting wrappers with no contents, etc.).
|
|
|
|
|
|
## [join-elements.xsl](join-elements.xsl)
|
|
|
## [join-elements.xsl](./join-elements.xsl)
|
|
|
|
|
|
Collapses runs of contiguous tagging to the same effect. I.e. `<u>Moby </u><u>Dick</u>` will be rewritten as `<u>Moby Dick</u>`. (Word files that have been worked over a lot have this problem especially.)
|
|
|
|
|
|
## [zorba-map.xsl](zorba-map.xsl)
|
|
|
## [zorba-map.xsl](./zorba-map.xsl)
|
|
|
|
|
|
Handles some mappings of element patterns specific to "Zorba" sample inputs, such as patterns of font/bold/italic into headers. |