... | ... | @@ -16,13 +16,18 @@ We'll do this using basic-brain-dead HTML/CSS: basically a few structural divs f |
|
|
|
|
|
### Variability
|
|
|
|
|
|
HTML Typescript isn't one thing, because it is a transitional format. (So the same document may be in one form of HTML Typescript early in an editing process, another one later -- before being "lifted" out of HTML Typescript altogether. There are a couple of consistent differences between HTML Typescript and other species of HTML, which make it recognizable:
|
|
|
HTML Typescript isn't one thing, because it is a transitional format. (So the same document may be in one form of HTML Typescript early in an editing process, another one later -- before being "lifted" out of HTML Typescript altogether.) There are a couple of consistent differences between HTML Typescript and other species of HTML, which make it recognizable:
|
|
|
|
|
|
* At least early in editing, it will be mostly flat.
|
|
|
* At least early in editing, it will be mostly flat. (No application-oriented scaffolding to speak of certainly not lots of deeply nested divs.)
|
|
|
|
|
|
* Not much richness of tagging. No HTML5 'semantic' elements such as 'header' or 'aside'. Mostly just `p` elements with inline elements, `span` and the like.
|
|
|
|
|
|
* It will be *presentational*. For many purposes in document processing of course this is a complete no-no! But in HTML Typescript it is a virtue as long as working with the data includes a forensic process -- that is, as long as we are still interested in "what the author wrote (in the Word document)".
|
|
|
In other words, this looks much like what you might see for a Word processor -- except the volume is turned way way down so you can actually hear the signal.
|
|
|
|
|
|
* The tagging will be *presentational*. For many purposes in document processing of course this is a complete no-no! But in HTML Typescript, we like to see presentational tagging as long as working with the data still includes a forensic process -- that is, as long as we are still interested in "what the author wrote (in the Word document)". In other words, in converting data this is information we want to hang onto at least until we know for sure, we don't want it.
|
|
|
|
|
|
(Of course cleaning up the cruft is exactly how we get "nicer" HTML Typescript out of "noisier" HTML Typescript.)
|
|
|
|
|
|
By 'presentational' of course we mean that tagging is devoted to describing presentational or "formatting" (and generally 'visual') properties of the text, without abstract labeling of "semantic" categories.
|
|
|
|
|
|
* Structure will be "hidden" in presentational features. For example, margin shifts may indicate things like block quotes or excerpts.
|
... | ... | @@ -33,6 +38,47 @@ By 'presentational' of course we mean that tagging is devoted to describing pres |
|
|
|
|
|
* HTML Typescript, in other words, doesn't have the data the way you eventually want it. What it does have, is all the *distinctions* in the data, that you need to map it into the controlled form of your choice. (Assuming, of course, it maps at all. And where it doesn't it will expose the issues.)
|
|
|
|
|
|
### Sample
|
|
|
|
|
|
```
|
|
|
<p style="font-size: 12pt; text-indent: 36pt">Take Emerson. Emerson is always on the verge of making himself exceptional—either exceptionally puny, ineffective, and futile, or exceptionally stable and transparent. He gave his “Laws of Writing” to the young George Woodbury one day in 1860. There are ten of them:</p>
|
|
|
<p style="font-size: 12pt; margin-left: 36pt; text-indent: 0pt">1. Write not at all unless you have something new.</p>
|
|
|
<p style="font-size: 12pt; margin-left: 36pt; text-indent: 0pt">2. Write <i>it</i>, and not before, behind, and about it.</p>
|
|
|
<p style="font-size: 12pt; margin-left: 36pt; text-indent: 0pt">3. Have nothing of the plan visible—nor firstly, secondly, or thirdly. Show the body, not the ligaments.</p>
|
|
|
```
|
|
|
|
|
|
(Example from (Epigram Microphone)[http://pausepress.net/EpigramMicrophone.html] which was made with XSweet.)
|
|
|
|
|
|
Open this in any HTML browser and you will see something quite consistent with the source data. The only indicators that there is a shift in structure (which the eye can see as the "beginning of the list") are in the indent and margin settings.
|
|
|
|
|
|
In a subsequent processing phase, we might use a filter to remove the font settings (as not informative of useful distinctions) and rewritte the styles to reduce verbosity. (This is still HTML Typescript.)
|
|
|
|
|
|
```
|
|
|
<p class="xsw_indent36pt">Take Emerson. Emerson is always on the verge of making himself exceptional—either exceptionally puny, ineffective, and futile, or exceptionally stable and transparent. He gave his “Laws of Writing” to the young George Woodbury one day in 1860. There are ten of them:</p>
|
|
|
<p class="xsw_marginleft36ptindent0pt">1. Write not at all unless you have something new.</p>
|
|
|
<p class="xsw_marginleft36ptindent0pt">2. Write <i>it</i>, and not before, behind, and about it.</p>
|
|
|
<p class="xsw_marginleft36ptindent0pt">3. Have nothing of the plan visible—nor firstly, secondly, or thirdly. Show the body, not the ligaments.</p>
|
|
|
```
|
|
|
|
|
|
while our CSS has
|
|
|
```
|
|
|
.xsw_indent36pt { text-indent: 36pt }
|
|
|
.xsw_marginleft36ptindent0pt { margin-left: 36pt; text-indent: 0pt }
|
|
|
```
|
|
|
|
|
|
The intent is to reduce the "noise" -- turn down the background static so we can see represented exactly what the Word document original represents, the way it represents it. From there, our editorial process can go forward.
|
|
|
|
|
|
BTW since our "editorial process" is set up with tools, there's nothing to prevent us from deploying a filter that would turn the above into something more like what we know we want, maybe something like:
|
|
|
|
|
|
<p>Take Emerson. Emerson is always on the verge of making himself exceptional—either exceptionally puny, ineffective, and futile, or exceptionally stable and transparent. He gave his “Laws of Writing” to the young George Woodbury one day in 1860. There are ten of them:</p>
|
|
|
<ol>
|
|
|
<li>Write not at all unless you have something new.</p>
|
|
|
<li>Write <i>it</i>, and not before, behind, and about it.</p>
|
|
|
<li>Have nothing of the plan visible—nor firstly, secondly, or thirdly. Show the body, not the ligaments.</p>
|
|
|
...</ol>
|
|
|
<p class="continuing">...</p>
|
|
|
|
|
|
But this is no longer HTML Typescript. (It's something more like "HTML Galley Proof".)
|
|
|
|
|
|
### Translation principles
|
|
|
|
... | ... | @@ -93,4 +139,10 @@ XSweet components can be applied to perform certain regularizations, such as pro |
|
|
|
|
|
### Sticky bits
|
|
|
|
|
|
### Sausage inputs
|
|
|
|
|
|
Here is an example of Word data (OfficeOpen XML or WordML) -- the source of the HTML Typescript example given above. What XSweet does is read this, and produce that.
|
|
|
|
|
|
```
|
|
|
<w:p w:rsidR="00000000" w:rsidDel="00000000" w:rsidP="00000000" w:rsidRDefault="00000000" w:rsidRPr="00000000"><w:pPr><w:keepNext w:val="0"/><w:keepLines w:val="0"/><w:widowControl w:val="0"/><w:spacing w:line="480" w:lineRule="auto"/><w:ind w:firstLine="720"/><w:contextualSpacing w:val="0"/></w:pPr><w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000"><w:rPr><w:smallCaps w:val="0"/><w:sz w:val="24"/><w:szCs w:val="24"/><w:rtl w:val="0"/></w:rPr><w:t xml:space="preserve">Take Emerson. Emerson is always on the verge of making himself exceptional—either exceptionally puny, ineffective, and futile, or exceptionally stable and transparent. He gave his “Laws of Writing” to the young George Woodbury one day in 1860. There are ten of them:</w:t></w:r></w:p><w:p w:rsidR="00000000" w:rsidDel="00000000" w:rsidP="00000000" w:rsidRDefault="00000000" w:rsidRPr="00000000"><w:pPr><w:keepNext w:val="0"/><w:keepLines w:val="0"/><w:widowControl w:val="0"/><w:spacing w:line="480" w:lineRule="auto"/><w:ind w:left="720" w:firstLine="0"/><w:contextualSpacing w:val="0"/></w:pPr><w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000"><w:rPr><w:smallCaps w:val="0"/><w:sz w:val="24"/><w:szCs w:val="24"/><w:rtl w:val="0"/></w:rPr><w:t xml:space="preserve">1. Write not at all unless you have something new.</w:t></w:r></w:p><w:p w:rsidR="00000000" w:rsidDel="00000000" w:rsidP="00000000" w:rsidRDefault="00000000" w:rsidRPr="00000000"><w:pPr><w:keepNext w:val="0"/><w:keepLines w:val="0"/><w:widowControl w:val="0"/><w:spacing w:line="480" w:lineRule="auto"/><w:ind w:left="720" w:firstLine="0"/><w:contextualSpacing w:val="0"/></w:pPr><w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000"><w:rPr><w:smallCaps w:val="0"/><w:sz w:val="24"/><w:szCs w:val="24"/><w:rtl w:val="0"/></w:rPr><w:t xml:space="preserve">2. Write </w:t></w:r><w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000"><w:rPr><w:i w:val="1"/><w:smallCaps w:val="0"/><w:sz w:val="24"/><w:szCs w:val="24"/><w:rtl w:val="0"/></w:rPr><w:t xml:space="preserve">it</w:t></w:r><w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000"><w:rPr><w:smallCaps w:val="0"/><w:sz w:val="24"/><w:szCs w:val="24"/><w:rtl w:val="0"/></w:rPr><w:t xml:space="preserve">, and not before, behind, and about it.</w:t></w:r></w:p><w:p w:rsidR="00000000" w:rsidDel="00000000" w:rsidP="00000000" w:rsidRDefault="00000000" w:rsidRPr="00000000"><w:pPr><w:keepNext w:val="0"/><w:keepLines w:val="0"/><w:widowControl w:val="0"/><w:spacing w:line="480" w:lineRule="auto"/><w:ind w:left="720" w:firstLine="0"/><w:contextualSpacing w:val="0"/></w:pPr><w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000"><w:rPr><w:smallCaps w:val="0"/><w:sz w:val="24"/><w:szCs w:val="24"/><w:rtl w:val="0"/></w:rPr><w:t xml:space="preserve">3. Have nothing of the plan visible—nor firstly, secondly, or thirdly. Show the body, not the ligaments.</w:t></w:r></w:p>
|
|
|
``` |