... | ... | @@ -10,17 +10,17 @@ We aim to develop and share an open source toolkit on a commodity platform (XSLT |
|
|
|
|
|
"Good enough" means that the tools are serviceable (or better) in actual document conversion workflows, while producing results at least as good (for these purposes) as available alternatives and pathways.
|
|
|
|
|
|
An important consideration for these purposes are that these stylesheets need to work on arbitrary Word inputs, not just Word documents written to templates or other constraint sets. Another is that the results do not have to be good enough to publish, just good enough to be worth editing further.
|
|
|
An important consideration for these purposes is that these stylesheets need to work (or may need to work) on arbitrary Word inputs, not just Word documents written to templates or (implicitly or explicitly) other constraint sets. Another is that the results do not have to be good enough to publish, just good enough to be worth editing further.
|
|
|
|
|
|
## A pipeline, not a stylesheet
|
|
|
|
|
|
For maximum flexibility and maintainability, we deploy an XSLT-based solution not as a single transformation but as a series of transformations to be arranged in a sequence (pipeline). Considered as a black box, such a sequence (in which each XSLT reads input from the result of the previous XSLT) is the same as a transformation. Since this is exactly analogous to INK's processing model (an INK 'recipe' is a pipeline) we can deploy this straightforwardly on INK, while also remaining platform independent with respect to pipelining technology. (I.e., the same XSLTs in the same sequence will work the same in another environment; this is commodity/standard XSLT 2.0.)
|
|
|
For maximum flexibility and maintainability, we deploy an XSLT-based solution not as a single transformation but as a series of transformations to be arranged in a sequence (pipeline). Since this is exactly analogous to INK's processing model (an INK 'recipe' is a pipeline), for ongoing projects we can deploy this straightforwardly on INK, while also remaining platform independent with respect to pipelining technology. (I.e., the same XSLTs in the same sequence will work the same in another environment; this is commodity/standard XSLT 2.0.)
|
|
|
|
|
|
Among other advantages this gives us is transparency. Since each XSLT does less, holes and bugs are easier to find and fill than in a single relatively opaque XSLT (which may run pipelines internally). A suite of smaller XSLTs should be easier to understand, maintain and modify than a single monolithic stylesheet.
|
|
|
One advantage this gives us is transparency. Since each XSLT does less, holes and bugs are easier to find and fill than in a single relatively opaque XSLT (which may run pipelines internally). A suite of smaller XSLTs should be easier to understand, maintain and modify than a single monolithic stylesheet.
|
|
|
|
|
|
Another advantage is flexibility. We may be able to deploy suites of modules to be used together and separately in "mix-and-match" combinations.
|
|
|
|
|
|
If we run into performance issues due to overhead in this architecture (e.g. for parsing/serialization of temporary results) we can consider alternatives or improvements.
|
|
|
If we run into performance issues due to overhead in this architecture (e.g. for parsing/serialization of temporary results) we can consider strategies for mitigation.
|
|
|
|
|
|
Experience has shown that the Word .docx -> structured markup is a hard problem. We believe one reason it has been difficult is because assumptions have been made regarding requirements, which do not actually apply in many or most cases -- and in particular, which do not apply in a situation in which a significant editing phase is planned for _after_ conversion.
|
|
|
|
... | ... | |