The Word doc for chapter 1 of the Berry book - "b01_Chapter1" - shows the "th" part of "13th and 18th" as superscripts in the 1st paragraph below the heading.
After the initial extraction, the "th"s are wrapped inside
<vertalign> tags. The scrub step changes that to a span, and the join elements step wraps that into the surrounding p tag. So, the vertaligns disappear and the superscripts do not come through into the HTML.
It looks like Word uses the vertalign for superscripts and probably subscripts too. Can we catch this and carry it over into the HTML? I wonder if there are other ways Word implements sub and supercripts.