... | ... | @@ -4,15 +4,17 @@ See page on the original [Pre-alpha Specs](pre-alpha-specs) (before redesign) |
|
|
|
|
|
HTMLevator supports structural induction and "section type inferencing" in conversion of data from (appropriately coded) `.docx` files into HTML. It is designed to be used with XSweet.
|
|
|
|
|
|
*Structural induction* means HTMLevator will produce `<section>` elements where needed to "wrap" (structure) unorganized contents. *Section type inferencing* means recognizing, for example, a **Conclusions** section (by means of its title and/or other properties) and submitting it to appropriate handling -- including validation, to detect whether and where such a section is required, expected or permitted.
|
|
|
HTMLevator currently includes three separate applications. They can be used together or separately, although one of them is unlikely to be able to be as useful without another -- they are best used in combination.
|
|
|
|
|
|
HTMLevator relies on [XSweet](https://gitlab.coko.foundation/wendell/XSweet) (a companion project) via its Header Promotion pathway to produce HTML h1-h6 in HTML extracted from Word `docx` files wherever Paragraph Styles are assigned named "Header 1" through "Header 6".
|
|
|
Header promotion - converts paragraphs `p` elements in HTML into `h1-h6` elements. It uses one of several means to determine which paragraphs receive this treatment: the most robust is to configure it yourself with a styles mapping file, a runtime configuration you set up yourself (which can be made sensitive to consistent code points in your inputs). Or, if your data is sufficiently regular, another method may be less onerous. Indeed if asked, HTMLevator's header promotion will 'guess' appropriate headers based on a ranking of format (style) attributes in the inputs.
|
|
|
|
|
|
Prepare your Word file for HTMLevator by assigning "Header 1" - "Header 6" styles to your section titles at their respective levels of hierarchy. (Assign the style to the first line of the title only. Subtitles or subsequent lines of multi-line titles should not use these styles.) Within Word (since by default these styles are bound to the appropriate Outline level), this nominal structure can ordinarily be displayed in the Outline View, even before XSweet/HTMLevator is run.
|
|
|
Section inferencing - this requires structural induction.
|
|
|
|
|
|
Finding these Paragraph Style assignments in the `.docx`, XSweet and HTMLevator do the rest - XSweet makes HTML with h1-h6, then HTMLevator makes nested sections for the detected headers. (And goes from there to do things with these sections if/as necessary).
|
|
|
*Structural induction* means HTMLevator will produce `<section>` elements where needed to "wrap" (structure) unorganized contents.
|
|
|
|
|
|
HTMLevator can also be used on files with no such preparation but YMMV - its efficacy depends entirely on whether/how XSweet header promotion works to detect h1-h6 on your file.
|
|
|
Over and above structural induction, *section type inferencing* means recognizing, for example, a **Conclusions** section (by means of its title and/or other properties) and submitting it to appropriate handling -- including validation, to detect whether and where such a section is required, expected or permitted. HTMLevator currently does not provide for section type inferencing, except to note that it is a natural requirement and one that can be readily accomplished in this architecture.
|
|
|
|
|
|
On HTML files whose section levels are *regularly* and *systematically* indicated by a "regular order" of headers, the XSLT provided here will reliably create a nested section structure.
|
|
|
|
|
|
More details:
|
|
|
|
... | ... | @@ -75,7 +77,7 @@ correct, leading with para contents then h3; |
|
|
skipping levels at the front;
|
|
|
skipping levels inside)
|
|
|
|
|
|
## Validation of structures/content types
|
|
|
## For future development - validation of structures/content types
|
|
|
|
|
|
Once structures have been induced (inferred or projected over the element sequence), they need to be validated against rule sets appropriate to their workflows.
|
|
|
|
... | ... | |