... | ... | @@ -8,11 +8,11 @@ We start with a couple of observations and assumptions -- |
|
|
|
|
|
Having a 'black box' solution to get data from MS Word into publishing workflows - even one that works well - is a necessary but not a sufficient condition for what will come next. It is difficult to make progress in developing these workflows without such a tool. Yet even when we have one that works tolerably well, it will not be enough: we can expect always to have to augment or tweak it at any rate on the edges.
|
|
|
|
|
|
Works authored in Word are too variable in form and purpose, while at the same time, for some works, the creative process does not always (or even usually) end when the book goes to the designer -- or even, today, the printer, or whatever the contemporary equivalents are. One Size Fits All (or even most) is a worthy goal, but if such a solution were possible, one imagines it would exist by now -- such is the demand for it -- much as HTML Tidy, CURL or other open source utilities exist for various common or ubiquitous tasks. Despite a couple of seeming near-misses and plenty of workaround-pathways, we don't have a tool that can reliably and simply deliver clean markup out of Word -- or at least, the clean markup we need, out of the Word data we have. Nor is this exactly because the work hasn't been done. But the very terms of their success also show the problem.
|
|
|
Works authored in Word are too variable in form and purpose, while at the same time, for some works, the creative process does not always (or even usually) end when the book goes to the designer -- or even, today, the printer, or whatever the contemporary equivalents are. One Size Fits All (or even most) is a worthy goal, but if such a solution were possible, one imagines it would exist by now -- such is the demand for it -- much as HTML Tidy, CURL or other open source utilities exist for various common or ubiquitous tasks. Despite a couple of seeming near-misses and plenty of workaround-pathways, we don't have a tool that can reliably and simply deliver clean markup out of Word -- or at least, the clean markup we need, out of the Word data we have. What is more puzzling, this is not because the work hasn't been done. But it turns out, when we look at the "solutions" -- the very terms of their success also show the problem.
|
|
|
|
|
|
The available solutions, both proprietary and open source, all constrain themselves to handle only a subset of Word documents, while at the same time more or less requiring some level of customization even within this subset, on further subsets or even on the individual document. The required customizations can take the form either of tool development or tuning (configuration, extension or modification of the tool), or handwork on the documents themselves, or both. This typically requires a level of expert assistance that often makes the work prohibitively difficult.
|
|
|
The available alternatives, both proprietary and open source, all constrain themselves to handle only a subset of Word documents (however defined), while at the same time more or less requiring some level of customization even within this subset, on further subsets or even on the individual document. The required customizations can take the form either of tool development or tuning (configuration, extension or modification of the tool), or handwork on the documents themselves, or both. This typically requires a level of expert engagement that often makes the work prohibitively difficult, at least within a number of economic settings including especially "small shops" - where expertise is either cheap, or non-existent. (The kind of expertise we are talking about tends to be the non-existent kind.)
|
|
|
|
|
|
Not being able to provide a 100% solution, however, does not necessary make an 80% solution -- if it really were that -- less useful or less valuable. On the contrary -- getting most of the way up might be a big advance, at any rate if it presented with us with options on how best to take a "data conversion" further. So there has to be editing, normalization and enhancement - that might be regarded as a feature and a situation to be encouraged, not a bug.
|
|
|
Not being able to provide a 100% solution, however, does not necessary make an 80% solution -- if it really were that -- less useful or less valuable. On the contrary -- getting most of the way up might be a big advance, at any rate if it presented with us with options on how best to take a "data conversion" further -- maybe even, ideally, cross into that territory where the only expertise we need, is the cheap kind -- or at least a kind we can offer at a reasonable rate. So there has to be editing, normalization and enhancement - that might be regarded as a feature and a situation to be encouraged, not a bug.
|
|
|
|
|
|
## Towards a solution
|
|
|
|
... | ... | |