... | ... | @@ -14,6 +14,26 @@ We aim to provide the same sort of "paper functionality" in HTML. It is not -- q |
|
|
|
|
|
We'll do this using basic-brain-dead HTML/CSS: basically a few structural divs for framing, then `p` elements with an assortment of inline mixed content including `b`, `i`, `u`. There will be resort to CSS to describe things that are not easily described using tags alone (such as margins and indents on paragraphs). But nothing should be obscure to the web developer.
|
|
|
|
|
|
### Variability
|
|
|
|
|
|
HTML Typescript isn't one thing, because it is a transitional format. (So the same document may be in one form of HTML Typescript early in an editing process, another one later -- before being "lifted" out of HTML Typescript altogether. There are a couple of consistent differences between HTML Typescript and other species of HTML, which make it recognizable:
|
|
|
|
|
|
* At least early in editing, it will be mostly flat.
|
|
|
|
|
|
* Not much richness of tagging. No HTML5 'semantic' elements such as 'header' or 'aside'. Mostly just `p` elements with inline elements, `span` and the like.
|
|
|
|
|
|
* It will be *presentational*. For many purposes in document processing of course this is a complete no-no! But in HTML Typescript it is a virtue as long as working with the data includes a forensic process -- that is, as long as we are still interested in "what the author wrote (in the Word document)".
|
|
|
By 'presentational' of course we mean that tagging is devoted to describing presentational or "formatting" (and generally 'visual') properties of the text, without abstract labeling of "semantic" categories.
|
|
|
|
|
|
* Structure will be "hidden" in presentational features. For example, margin shifts may indicate things like block quotes or excerpts.
|
|
|
|
|
|
* Most of the action happens in `class` and `style` attributes. In particular, `class` may be overloaded (more than one value may be assigned).
|
|
|
|
|
|
* Yet all this is flexible and these assignments (element name, `class` and `style) may be rewritten/refactored along a processing pipeline (aka INK service or recipe) -- so HTML Typescript data that is very 'raw' can (for example) be refined and "fitted" to the needs of a particular HTML client or environment (such as an editor).
|
|
|
|
|
|
* HTML Typescript, in other words, doesn't have the data the way you eventually want it. What it does have, is all the *distinctions* in the data, that you need to map it into the controlled form of your choice. (Assuming, of course, it maps at all. And where it doesn't it will expose the issues.)
|
|
|
|
|
|
|
|
|
### Translation principles
|
|
|
|
|
|
* When extracting from a Word processor -- wherever possible, we design everything to come through by default. But: we don't take this to an extreme; it doesn't actually mean we have to capture literally everything. For example, in a word processor document, we treat certain parts of documentary apparatus (such as page headers or page layout settings) to be incidental and dispensable by design. (Because part of our job is to provide those things). But we don't drop anything by accident.
|
... | ... | |