... | @@ -28,7 +28,7 @@ However an important consideration is that none of these are in scope for this t |
... | @@ -28,7 +28,7 @@ However an important consideration is that none of these are in scope for this t |
|
|
|
|
|
* In general, any "semantic" interpretation of ad-hoc (local) names in the Word document. For example a segment marked as style "Italic" will be so marked in the HTML (as a `<span class="Italic">`), but not marked as HTML `i` or represented as italic in any other way.
|
|
* In general, any "semantic" interpretation of ad-hoc (local) names in the Word document. For example a segment marked as style "Italic" will be so marked in the HTML (as a `<span class="Italic">`), but not marked as HTML `i` or represented as italic in any other way.
|
|
|
|
|
|
None of these rules are absolute. However, as long as we do not lose data content coming across, the only thing we really care about our format is that it be (a) legible and intelligible to target users and applications, (b) as economical and tractable as possible, and (c) 'truthful' in its representations. However, requirements b and c are at odds, since economy means leaving things out. We want to represent only what is both true, and useful. Because this is not yet known (and indeed because it may vary from one case to the next), the particulars of the target format (as respects element types, attribute values etc.) are probably best defined "under load" (that is, in use). We like HTML because it is a vernacular and developers know what to expect from it -- so it gives us some (broad) boundaries going forward.
|
|
None of these rules are absolute. In particular because it will be difficult to be both comprehensive and succinct (economical), the particulars of the target format (as respects element types, attribute values etc.) are probably best defined "under load" (that is, in use). We like HTML because it is a vernacular and developers know what to expect from it -- so it gives us some (broad) boundaries going forward.
|
|
|
|
|
|
Development of a formal spec for such a format is an item tbd. For now, we intend to "produce pudding" that can be proven by eating it.
|
|
Development of a formal spec for such a format is an item tbd. For now, we intend to "produce pudding" that can be proven by eating it.
|
|
|
|
|
... | @@ -50,9 +50,9 @@ Since there is much to be done to get to that point, this means being vigilant f |
... | @@ -50,9 +50,9 @@ Since there is much to be done to get to that point, this means being vigilant f |
|
|
|
|
|
No provision is made for passing through, for example, page headers, into the HTML, in any form.
|
|
No provision is made for passing through, for example, page headers, into the HTML, in any form.
|
|
|
|
|
|
However, at deployment time, no provision is made for handling tables, for example, and we know we will have to handle them. So we already know we will be fixing up the XSLT to work for these cases. But what about cases we haven't seen yet?
|
|
However, as this is being written no provision is yet made for handling tables, for example, and we know we will have to handle them. So we already know we will be fixing up the XSLT to work for these cases. But what about cases we haven't seen yet?
|
|
|
|
|
|
We need to have robust mechanisms for detecting problems in data extraction (or any transformation_ _especially lost data_, for ameliorating such problems in the instance (sometimes they may not be fatal errors), and for maintaining and improving the XSLTs.
|
|
We need to have robust mechanisms for detecting problems in data extraction _especially lost data_, for ameliorating such problems in the instance (sometimes they may not be fatal errors), and for maintaining and improving the XSLTs so they don't happen.
|
|
|
|
|
|
Operationally, what will be the best way to specify corrections / improvements? (Could use Issues on this here gitlab.)
|
|
Operationally, what will be the best way to specify corrections / improvements? (Could use Issues on this here gitlab.)
|
|
|
|
|
... | | ... | |