Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • H HTMLevator
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 12
    • Issues 12
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • XSweet
  • HTMLevator
  • Wiki
  • Home

Last edited by Alex Theg May 02, 2018
Page history

Home

NOTE: THIS REPO WIKI INFORMATION HAS BEEN SUPERCEDED BY DOCUMENTATION ON THE PROJECT WEBSITE. SEE THE XSWEET PROJECT WEBSITE FOR THE MOST UP-TO-DATE DOCUMENTATION.

HTMLevator - draft specs

[Note 20180118 see the repo directory readme docs for up to date descriptions; this remains from early requirements analysis and should be regarded as superseded.]

See page on the original Pre-alpha Specs (before redesign)

HTMLevator supports structural induction and "section type inferencing" in conversion of data from (appropriately coded) .docx files into HTML. It is designed to be used with XSweet.

HTMLevator currently includes three separate applications. They can be used together or separately, although one of them is unlikely to be able to be as useful without another -- they are best used in combination.

HTML Tweak

A general purpose "mapper" enabling class or style values on HTML elements to be systematically mapped. So for example, all "font-style: italic" can be made class='emph' ... write whatever mappings you need. Very useful for data cleanup and consolidation, and/or as prep for any other steps.

Header promotion

XSweet can convert paragraphs p elements in HTML into h1-h6 elements. It uses one of several means to determine which paragraphs receive this treatment: the most robust is to configure it yourself with a styles mapping file, another runtime configuration you set up yourself (which can be made sensitive to consistent code points in your inputs). Or, if your data is sufficiently regular, another method may be less onerous. Indeed if asked, HTMLevator's header promotion will 'guess' appropriate headers based on a ranking of format (style) attributes in the inputs.

Structural induction

(What follows remains from original notes as to requirements: see the repository readme for up to date description of the implementation.)

Any sequence of HTML elements leading with a header (h1-h6) is wrapped as a section. Within each section, the header plus its (block level) elements are followed by sections for contiguous (subsequent) lower level sections.

I.e. h1 h2 h2 h3 h1 h2 h3 becomes section (h1 section (h2) section (h2 section (h3) ) ) section (h1 section (h2 (section h3) ) ).

In

h1
h2
h3
h2
h3
h1
h2
h2
h3

Out

section
  h1
  section
    h2
    section
      h3
  section  
    h2
    section
      h3
section
  h1
  section
    h2
  section
    h2
    section
    h3

Notes:

  • Paragraphs and all other elements travel with the immediately preceding header
  • Paragraphs and blocks preceding the first header, appear without a section wrapper (before the first section)
  • Hence sequences with no headers, are unchanged
  • The logic should also apply to 'section' elements as well as wrapper elements
    • Hence, a properly sectioned HTML is returned unchanged, but one whose headers do not lead sections is "repaired".
  • When sections are skipped (e.g. h4 appearing before h3), the extra section wrappers should not appear. So such a section comes wrapped as if it were at a higher level - although its header still indicates its 'presentation' level.

(Examples: correct, leading with h1; correct, leading with para contents then h1; correct, leading with h3; correct, leading with para contents then h3; skipping levels at the front; skipping levels inside)

section type inferencing

(At time of writing, these requirements are not addressed by HTMLevator.)

Means recognizing, for example, a Conclusions section (by means of its title and/or other properties) and submitting it to appropriate handling -- including validation, to detect whether and where such a section is required, expected or permitted. HTMLevator currently does not provide for section type inferencing, except to note that it is a natural requirement and one that can be readily accomplished in this architecture.

On HTML files whose section levels are regularly and systematically indicated by a "regular order" of headers, the XSLT provided here will reliably create a nested section structure.

For future development - validation of structures/content types

Once structures have been induced (inferred or projected over the element sequence), they need to be validated against rule sets appropriate to their workflows.

For example, a journal may have validation rules such as these:

  • There must be top-level sections entitled "Introduction", "Methods and Materials", "Conclusion[s]" and "Bibliography". ("Conclusion[s]" means the 's' is optional.)
  • They must appear in that order.
  • An "Acknowledgements" section may optionally appear after "Conclusion[s]" but before "Bibliography".
  • Sections at lower levels may be named anything except the names given.
  • No section name can be repeated at the top level. (It's okay to repeat subsection names as long as they avoid the top-level section names.)

The specific validation rules enforced by HTMLevator are tdb, based on requirements. We also expect to face a requirement to make these rules configurable. This may be done via a "driver" config file or a meta-stylesheet (a la Schematron).

For now, we presume we will use XPath within XSLT to test against constraints, piping validation results directly into outputs and/or into a separate report (tdb).

Clone repository
  • Home
  • pre alpha specs