|
|
# Header Promotion Logic
|
|
|
|
|
|
In the digest-paragraphs.xsl, paragraphs in a document (resulting from XSweet extraction) are submitted to a sequence of operations to determine whether and where to promote certain paragraphs, into HTML **h1** through **h6**.
|
|
|
|
|
|
The actual conversion is performed by a separate stylesheet, which is produced by filtering the results of this initial transform.
|
|
|
|
|
|
Indeed the initial transform produces intermediate results as well as its final results, for inspection.
|
|
|
|
|
|
This intermediate output takes the form of a single `body` element. It represents results in reverse order, that is the first element (`div[@class='grouped']`) shown in this HTML is the *last step* of the sequence:
|
|
|
|
|
|
* Proxy (paragraph contents are discarded but certain properties are captured)
|
|
|
* Measure (annotate proxies with info regarding their occurrences together)
|
|
|
* Assimilate (next the proxies are represented as a group, certain properties being averaged)
|
|
|
* Filter (only proxies sufficiently 'header' like are kept)
|
|
|
* Group (header levels are assigned by sorting on properties)
|
|
|
|
|
|
```
|
|
|
<body xmlns="http://www.w3.org/1999/xhtml">
|
|
|
<div class="grouped">
|
|
|
<div class="hX"> <!-- one or more of these: indicates which elements to cast to a given header level. -->
|
|
|
<p class="SectionHeading" data-nominal-fontsize="12" data-count="1" data-average-length="73"
|
|
|
data-average-run="1" data-always-caps="false" data-never-fullstop="true"/>
|
|
|
</div>
|
|
|
</div>
|
|
|
<div class="filtered"><!-- proxies remaining for elements to be cast -->
|
|
|
<p class="SectionHeading" data-nominal-fontsize="12" data-count="1" data-average-length="73"
|
|
|
data-average-run="1" data-always-caps="false" data-never-fullstop="true"/>
|
|
|
</div>
|
|
|
<div class="assimilated"><!-- proxies grouped and counted, w/ their analytic results -->
|
|
|
<p class="FreeForm" data-nominal-fontsize="12" data-count="60" data-average-length="910.52"
|
|
|
data-average-run="44.7" data-always-caps="false" data-never-fullstop="false"/>
|
|
|
[etc ... ]
|
|
|
</div>
|
|
|
<div class="measured"><!-- proxy copies of paragraphs with certain properties externalized. -->
|
|
|
<p data-lastchar=" " data-allcaps="false" data-length="541" class="FreeForm"/>
|
|
|
[etc ... ]
|
|
|
</div>
|
|
|
</body>
|
|
|
```
|
|
|
|
|
|
The XSLT https://gitlab.coko.foundation/wendell/XSweet/blob/master/applications/header-promote/digest-paragraphs.xsl also has comments documenting the process.
|
|
|
|