Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • E Editoria Typescript
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 8
    • Issues 8
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 3
    • Merge requests 3
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • XSweet
  • Editoria Typescript
  • Wiki
  • docx editoria step manifest

Last edited by Wendell Piez Jan 20, 2017
Page history

docx editoria step manifest

docx -> Editoria step manifest

Production of HTML from a .docx file input is accomplished by executing a sequence of XSLT 2.0 transformations (tested under Saxon HE). We do not specify how these transformations are accomplished, maintained or managed internally, or whether/where their intermediate results are exposed or made accessible. (At least in some scenarios it is expected that intermediate results may be of use or interest to consumers.)

A particular XSLT stylesheet (aka 'transformation specification' or 'transform') will be found in one of two Gitlab repositories (so far): XSweet or Editoria Typescript (this repository). Within the pipeline, these stylesheets are grouped functionally, but such grouping is internal, not exposed.

For the most part, the output of each XSLT in the sequence is designated as primary input to the next transformation. There may be occasional complications, for example a pipeline that generates an XSLT dynamically and then applies it.

The primary input of the first transformation in the sequence is assumed to be a document.xml document extracted from an MS Word .docx file (as a zip). Its neighbor files including endnotes.xml, footnotes.xml and styles.xml must be present for their contents or settings to be included.

Formally, each intermediate file is expected to be (or can be regarded or expressed as) an XML file sans XML declaration, using HTML5 element names and semantics (as optimized for Editoria intake), but no DOCTYPE declaration. (Thus: the <html> tag is on the first line, with no prologue.) Such a file is "system agnostic" and can be read as either XML or HTML. (An exception to this rule is the XSLT that is generated dynamically for header promotion, which will of course not be HTML of any sort.)

XSLT Repositories

Find the repositories here:

  • https://gitlab.coko.foundation/wendell/XSweet/tree/ink-api-publish, or a copy
  • https://gitlab.coko.foundation/wendell/editoria_typescript/tree/ink-api-publish (this repository)

When calling the files in Gitlab, in the file paths listed:

  • Expand $XSweet to https://gitlab.coko.foundation/wendell/XSweet/raw/ink-api-publish/applications - note extra applications subdirectory
  • Expand $editoria-typescript is https://gitlab.coko.foundation/wendell/editoria_typescript/raw/ink-api-publish

Process sequence

  1. docx extraction
  2. $XSweet/docx-extract/docx-html-extract.xsl
  • $XSweet/docx-extract/handle-notes.xsl
  • $XSweet/docx-extract/scrub.xsl
  • $XSweet/docx-extract/join-elements.xsl
  • $XSweet/docx-extract/collapse-paragraphs.xsl
  1. Header promotion (note process branch)
  2. $XSweet/header-promote/digest-paragraphs.xsl
  • $XSweet/header-promote/make-header-escalator-xslt.xsl
  • Apply resulting XSLT back to collapse-paragraphs.xsl output (result) to produce HTML input to next step
  1. Finalize XSweet / HTML Typescript

  2. $XSweet/html-polish/final-rinse.xsl

  3. Prep for Editoria (Editoria Typescript)

  • $editoria-typescript/editoria-notes.xsl
  • $editoria-typescript/editoria-basic.xsl
  • $editoria-typescript/editoria-reduce.xsl

XSweet also includes other modules and functionalities not presently being used for the Editoria load. These include html-tweak, css-abstract, or post-processes that produce plain text outputs or analytic profiles of inputs.

Clone repository
  • docx editoria step manifest
  • Home
  • pull mapping
  • push mapping