... | ... | @@ -8,9 +8,9 @@ In particular, this means that insofar as as information available only from Wor |
|
|
|
|
|
### Design principles
|
|
|
|
|
|
A typewriter can be used to create an artifact (namely a typed MS or typescript) amenable to a type- and print-based publication process. Although a material object, the typescript also typically provides for a kind of *encoding* (indeed in more than one modality) by which it can communicate intentions from author to editors. Of course a typescript is also a "platform" for changes, with a kind of "production loop" built around it -- wherein it might be said to evolve in form, from submitted fair copy to galleys to printed production.
|
|
|
A typewriter can be used to create an artifact (namely a typed MS or typescript) amenable to a type- and print-based publication process. Although a material object, the typescript also typically provides for a kind of *encoding* (indeed in more than one modality) by which it can communicate intentions from author to editors. Of course a typescript is also a "platform" for changes, with a kind of "production loop" built around it -- wherein it might be said to evolve in form, from submitted "fair copy" to galleys to printed production.
|
|
|
|
|
|
We aim to provide the same sort of "paper functionality" in HTML. It is not -- quite -- an electronic doodle pad (consider SVG for that) - what we see are recognizably documents, with the features of formatted documents. But look at the code and you'll see -- once you get past the sheer verbosity of it -- it isn't a formally controlled or even very regular arrangement. Paradoxically, since no structure is imposed or formal control exerted, what stands out is consistencies in the *way things are made to appear* (in typescript, on a printed page; with HTML, in a commodity browser) -- and it proves that those consistencies are precisely the guideposts we want, to make futher inferences.
|
|
|
We aim to provide the same sort of "paper functionality" in HTML. It is not -- quite -- an electronic doodle pad, nor a publishing application, but something in between. What we see are recognizably documents, with the features of formatted documents. But look at the code and you'll see -- once you get past the sheer verbosity of it -- the data is not in a formally controlled or even very regular arrangement. Paradoxically, since no structure is imposed or formal control exerted, what stands out is consistencies in the *way things are made to appear* (when you 'hit print' or view in a commodity browser) -- and as it turns out, these consistencies are precisely the guideposts we want, to make futher inferences.
|
|
|
|
|
|
We'll do this using basic-brain-dead HTML/CSS: basically a few structural divs for framing, then `p` elements with an assortment of inline mixed content including `b`, `i`, `u`. There will be resort to CSS to describe things that are not easily described using tags alone (such as margins and indents on paragraphs). But nothing should be obscure to the web developer.
|
|
|
|
... | ... | @@ -22,13 +22,13 @@ HTML Typescript isn't one thing, because it is a transitional format. (So the sa |
|
|
|
|
|
* Not much richness of tagging. No HTML5 'semantic' elements such as 'header' or 'aside'. Mostly just `p` elements with inline elements, `span` and the like.
|
|
|
|
|
|
In other words, this looks much like what you might see for a Word processor -- except the volume is turned way way down so you can actually hear the signal.
|
|
|
In other words, this looks much like the kind of language you would use in a simple program to control basic print layout -- maybe something like a "word processor" except without all the application's superstructure and internal wierdnesses. (See example below.) In effect the static is turned way way down so you can actually hear the signal.
|
|
|
|
|
|
* The tagging will be *presentational*. For many purposes in document processing of course this is a complete no-no! But in HTML Typescript, we like to see presentational tagging as long as working with the data still includes a forensic process -- that is, as long as we are still interested in "what the author wrote (in the Word document)". In other words, in converting data this is information we want to hang onto at least until we know for sure, we don't want it.
|
|
|
* The tagging will be *presentational*. For many purposes in document processing of course this is a complete no-no! But in HTML Typescript, we like to see presentational tagging as long as working with the data still includes a forensic process -- that is, as long as we are still interested in "what the author wrote (in the Word document)". (Why? Because that's what the author did was put that formatting in.) In other words, in converting data this is information we want to hang onto at least until we know for sure, we don't want it (because we know it is meaningless or we have captured its meaning a better way).
|
|
|
|
|
|
(Of course cleaning up the cruft is exactly how we get "nicer" HTML Typescript out of "noisier" HTML Typescript.)
|
|
|
And of course cleaning up the cruft is exactly how we get "nicer" HTML Typescript out of "noisier" HTML Typescript.
|
|
|
|
|
|
By 'presentational' of course we mean that tagging is devoted to describing presentational or "formatting" (and generally 'visual') properties of the text, without abstract labeling of "semantic" categories.
|
|
|
By 'presentational' of course we mean that tagging is devoted to describing presentational or "formatting" (and generally 'visual') properties of the text, without abstract labeling of "semantic" categories. Font shifts and margins are the big ones.
|
|
|
|
|
|
* Structure will be "hidden" in presentational features. For example, margin shifts may indicate things like block quotes or excerpts.
|
|
|
|
... | ... | @@ -38,7 +38,7 @@ By 'presentational' of course we mean that tagging is devoted to describing pres |
|
|
|
|
|
* HTML Typescript, in other words, doesn't have the data the way you eventually want it. What it does have, is all the *distinctions* in the data, that you need to map it into the controlled form of your choice. (Assuming, of course, it maps at all. And where it doesn't it will expose the issues.)
|
|
|
|
|
|
### Sample
|
|
|
### Example
|
|
|
|
|
|
```
|
|
|
<p style="font-size: 12pt; text-indent: 36pt">Take Emerson. Emerson is always on the verge of making himself exceptional—either exceptionally puny, ineffective, and futile, or exceptionally stable and transparent. He gave his “Laws of Writing” to the young George Woodbury one day in 1860. There are ten of them:</p>
|
... | ... | @@ -47,7 +47,7 @@ By 'presentational' of course we mean that tagging is devoted to describing pres |
|
|
<p style="font-size: 12pt; margin-left: 36pt; text-indent: 0pt">3. Have nothing of the plan visible—nor firstly, secondly, or thirdly. Show the body, not the ligaments.</p>
|
|
|
```
|
|
|
|
|
|
(Example from (Epigram Microphone)[http://pausepress.net/EpigramMicrophone.html] which was made with XSweet.)
|
|
|
(Example from [Epigram Microphone](http://pausepress.net/EpigramMicrophone.html) which was produced using XSweet. XSweet generated HTML Typescript, which was converted up into NISO BITS then back down into a publication HTML.)
|
|
|
|
|
|
Open this in any HTML browser and you will see something quite consistent with the source data. The only indicators that there is a shift in structure (which the eye can see as the "beginning of the list") are in the indent and margin settings.
|
|
|
|
... | ... | @@ -144,5 +144,103 @@ XSweet components can be applied to perform certain regularizations, such as pro |
|
|
Here is an example of Word data (OfficeOpen XML or WordML) -- the source of the HTML Typescript example given above. What XSweet does is read this, and produce that.
|
|
|
|
|
|
```
|
|
|
<w:p w:rsidR="00000000" w:rsidDel="00000000" w:rsidP="00000000" w:rsidRDefault="00000000" w:rsidRPr="00000000"><w:pPr><w:keepNext w:val="0"/><w:keepLines w:val="0"/><w:widowControl w:val="0"/><w:spacing w:line="480" w:lineRule="auto"/><w:ind w:firstLine="720"/><w:contextualSpacing w:val="0"/></w:pPr><w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000"><w:rPr><w:smallCaps w:val="0"/><w:sz w:val="24"/><w:szCs w:val="24"/><w:rtl w:val="0"/></w:rPr><w:t xml:space="preserve">Take Emerson. Emerson is always on the verge of making himself exceptional—either exceptionally puny, ineffective, and futile, or exceptionally stable and transparent. He gave his “Laws of Writing” to the young George Woodbury one day in 1860. There are ten of them:</w:t></w:r></w:p><w:p w:rsidR="00000000" w:rsidDel="00000000" w:rsidP="00000000" w:rsidRDefault="00000000" w:rsidRPr="00000000"><w:pPr><w:keepNext w:val="0"/><w:keepLines w:val="0"/><w:widowControl w:val="0"/><w:spacing w:line="480" w:lineRule="auto"/><w:ind w:left="720" w:firstLine="0"/><w:contextualSpacing w:val="0"/></w:pPr><w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000"><w:rPr><w:smallCaps w:val="0"/><w:sz w:val="24"/><w:szCs w:val="24"/><w:rtl w:val="0"/></w:rPr><w:t xml:space="preserve">1. Write not at all unless you have something new.</w:t></w:r></w:p><w:p w:rsidR="00000000" w:rsidDel="00000000" w:rsidP="00000000" w:rsidRDefault="00000000" w:rsidRPr="00000000"><w:pPr><w:keepNext w:val="0"/><w:keepLines w:val="0"/><w:widowControl w:val="0"/><w:spacing w:line="480" w:lineRule="auto"/><w:ind w:left="720" w:firstLine="0"/><w:contextualSpacing w:val="0"/></w:pPr><w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000"><w:rPr><w:smallCaps w:val="0"/><w:sz w:val="24"/><w:szCs w:val="24"/><w:rtl w:val="0"/></w:rPr><w:t xml:space="preserve">2. Write </w:t></w:r><w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000"><w:rPr><w:i w:val="1"/><w:smallCaps w:val="0"/><w:sz w:val="24"/><w:szCs w:val="24"/><w:rtl w:val="0"/></w:rPr><w:t xml:space="preserve">it</w:t></w:r><w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000"><w:rPr><w:smallCaps w:val="0"/><w:sz w:val="24"/><w:szCs w:val="24"/><w:rtl w:val="0"/></w:rPr><w:t xml:space="preserve">, and not before, behind, and about it.</w:t></w:r></w:p><w:p w:rsidR="00000000" w:rsidDel="00000000" w:rsidP="00000000" w:rsidRDefault="00000000" w:rsidRPr="00000000"><w:pPr><w:keepNext w:val="0"/><w:keepLines w:val="0"/><w:widowControl w:val="0"/><w:spacing w:line="480" w:lineRule="auto"/><w:ind w:left="720" w:firstLine="0"/><w:contextualSpacing w:val="0"/></w:pPr><w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000"><w:rPr><w:smallCaps w:val="0"/><w:sz w:val="24"/><w:szCs w:val="24"/><w:rtl w:val="0"/></w:rPr><w:t xml:space="preserve">3. Have nothing of the plan visible—nor firstly, secondly, or thirdly. Show the body, not the ligaments.</w:t></w:r></w:p>
|
|
|
<w:p w:rsidR="00000000" w:rsidDel="00000000" w:rsidP="00000000" w:rsidRDefault="00000000"
|
|
|
w:rsidRPr="00000000">
|
|
|
<w:pPr>
|
|
|
<w:keepNext w:val="0"/>
|
|
|
<w:keepLines w:val="0"/>
|
|
|
<w:widowControl w:val="0"/>
|
|
|
<w:spacing w:line="480" w:lineRule="auto"/>
|
|
|
<w:ind w:firstLine="720"/>
|
|
|
<w:contextualSpacing w:val="0"/>
|
|
|
</w:pPr>
|
|
|
<w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000">
|
|
|
<w:rPr>
|
|
|
<w:smallCaps w:val="0"/>
|
|
|
<w:sz w:val="24"/>
|
|
|
<w:szCs w:val="24"/>
|
|
|
<w:rtl w:val="0"/>
|
|
|
</w:rPr>
|
|
|
<w:t xml:space="preserve">Take Emerson. Emerson is always on the verge of making himself exceptional—either exceptionally puny, ineffective, and futile, or exceptionally stable and transparent. He gave his “Laws of Writing” to the young George Woodbury one day in 1860. There are ten of them:</w:t>
|
|
|
</w:r>
|
|
|
</w:p>
|
|
|
<w:p w:rsidR="00000000" w:rsidDel="00000000" w:rsidP="00000000" w:rsidRDefault="00000000"
|
|
|
w:rsidRPr="00000000">
|
|
|
<w:pPr>
|
|
|
<w:keepNext w:val="0"/>
|
|
|
<w:keepLines w:val="0"/>
|
|
|
<w:widowControl w:val="0"/>
|
|
|
<w:spacing w:line="480" w:lineRule="auto"/>
|
|
|
<w:ind w:left="720" w:firstLine="0"/>
|
|
|
<w:contextualSpacing w:val="0"/>
|
|
|
</w:pPr>
|
|
|
<w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000">
|
|
|
<w:rPr>
|
|
|
<w:smallCaps w:val="0"/>
|
|
|
<w:sz w:val="24"/>
|
|
|
<w:szCs w:val="24"/>
|
|
|
<w:rtl w:val="0"/>
|
|
|
</w:rPr>
|
|
|
<w:t xml:space="preserve">1. Write not at all unless you have something new.</w:t>
|
|
|
</w:r>
|
|
|
</w:p>
|
|
|
<w:p w:rsidR="00000000" w:rsidDel="00000000" w:rsidP="00000000" w:rsidRDefault="00000000"
|
|
|
w:rsidRPr="00000000">
|
|
|
<w:pPr>
|
|
|
<w:keepNext w:val="0"/>
|
|
|
<w:keepLines w:val="0"/>
|
|
|
<w:widowControl w:val="0"/>
|
|
|
<w:spacing w:line="480" w:lineRule="auto"/>
|
|
|
<w:ind w:left="720" w:firstLine="0"/>
|
|
|
<w:contextualSpacing w:val="0"/>
|
|
|
</w:pPr>
|
|
|
<w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000">
|
|
|
<w:rPr>
|
|
|
<w:smallCaps w:val="0"/>
|
|
|
<w:sz w:val="24"/>
|
|
|
<w:szCs w:val="24"/>
|
|
|
<w:rtl w:val="0"/>
|
|
|
</w:rPr>
|
|
|
<w:t xml:space="preserve">2. Write </w:t>
|
|
|
</w:r>
|
|
|
<w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000">
|
|
|
<w:rPr>
|
|
|
<w:i w:val="1"/>
|
|
|
<w:smallCaps w:val="0"/>
|
|
|
<w:sz w:val="24"/>
|
|
|
<w:szCs w:val="24"/>
|
|
|
<w:rtl w:val="0"/>
|
|
|
</w:rPr>
|
|
|
<w:t xml:space="preserve">it</w:t>
|
|
|
</w:r>
|
|
|
<w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000">
|
|
|
<w:rPr>
|
|
|
<w:smallCaps w:val="0"/>
|
|
|
<w:sz w:val="24"/>
|
|
|
<w:szCs w:val="24"/>
|
|
|
<w:rtl w:val="0"/>
|
|
|
</w:rPr>
|
|
|
<w:t xml:space="preserve">, and not before, behind, and about it.</w:t>
|
|
|
</w:r>
|
|
|
</w:p>
|
|
|
<w:p w:rsidR="00000000" w:rsidDel="00000000" w:rsidP="00000000" w:rsidRDefault="00000000"
|
|
|
w:rsidRPr="00000000">
|
|
|
<w:pPr>
|
|
|
<w:keepNext w:val="0"/>
|
|
|
<w:keepLines w:val="0"/>
|
|
|
<w:widowControl w:val="0"/>
|
|
|
<w:spacing w:line="480" w:lineRule="auto"/>
|
|
|
<w:ind w:left="720" w:firstLine="0"/>
|
|
|
<w:contextualSpacing w:val="0"/>
|
|
|
</w:pPr>
|
|
|
<w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000">
|
|
|
<w:rPr>
|
|
|
<w:smallCaps w:val="0"/>
|
|
|
<w:sz w:val="24"/>
|
|
|
<w:szCs w:val="24"/>
|
|
|
<w:rtl w:val="0"/>
|
|
|
</w:rPr>
|
|
|
<w:t xml:space="preserve">3. Have nothing of the plan visible—nor firstly, secondly, or thirdly. Show the body, not the ligaments.</w:t>
|
|
|
</w:r>
|
|
|
</w:p>
|
|
|
``` |