|
|
# XSweet
|
|
|
|
|
|
A set of tools supporting data acquisition, editorial and document production workflows, on an XML stack with XML/HTML/CSS interfaces. We like the XML stack (XSLT in particular) for these purposes, because it is well suited to encapsulating discrete processes in document transformation, providing performant, scalable, reusable and robust solutions in a 'pluggable' way. XSweet should "just work". But it should also be adaptable.
|
|
|
|
|
|
## Aims:
|
|
|
- MS Word "Office Open" XML (aka WordML) into "HTML typescript"
|
|
|
- Arbitrary HTML / CSS mapping and munging (HTML tweak)
|
|
|
- Validation services against ad-hoc (project based) schemas and constraint sets
|
|
|
-Conversion from editorial system (Enhanced typescript) into structured targets (e.g. TEI, JATS/BITS)
|
|
|
|
|
|
## Design constraints:
|
|
|
- All open source (specifications and components)
|
|
|
- W3C XSLT 2.0 is okay
|
|
|
- INK will provide a pipelining infrastructure, but pipelines need to be operational outside INK as well.
|
|
|
|
|
|
## XSweet components (so far)
|
|
|
|
|
|
Each of these will be an INK recipe?
|
|
|
|
|
|
### [docx-extract](docx-extract-recipe)
|
|
|
Input: an unbundled .docx file, specifically its document.xml
|
|
|
Output: An HTML file (probably valid)
|
|
|
|
|
|
### [html-validate](html-validate-recipe)
|
|
|
This should include HTML5 validation plus any ad-hoc (html typescript) validation)
|
|
|
|
|
|
### [html-tweak](html-tweak-recipe)
|
|
|
make any adjustments to HTML @class/@style
|
|
|
via externally exposed driver
|
|
|
|
|
|
### [header-promote](header-promote-recipe)
|
|
|
Convert p elements into h1-h2 based on heuristics
|
|
|
|
|
|
### [ucp-adjust](ucp-adjust-recipe)
|
|
|
Adjustments warranted by/for UCP and its projects
|
|
|
e.g. @class='Hypertext' -> hypertext links
|
|
|
|
|
|
### [html-funnel](html-funnel-recipe)
|
|
|
populate structured html section skeleton with contents from flat HTML typescript
|
|
|
(i.e. converts from unstructured, to structured form, following a
|
|
|
structure externally specified)
|
|
|
|
|
|
### [html-polish](html-polish-recipe)
|
|
|
cleanup of arbitrary html - removes redundancies and normalizes anomalies
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
(old contents: move some to docx-extract page)
|
|
|
|
|
|
A suite of XSLT 2.0 stylesheets for `.docx` data extraction and refinement into HTML, for editorial workflows. (And perhaps eventually for other tasks. And also, possibly not always XSLT 2.0.)
|
|
|
|
|
|
## Scope and goals
|
... | ... | |