Consider adding another bit of "scrub" logic to scrub.xsl
Oftentimes HTML results from .docx show how formatting was controlled at the inline level not paragraph level, so we get things like:
<p>
<span style="font-size: 18">A: Nobody can beat me! I am the best showman in the whole history of man. </span>
</p>
We might consider removing the span
and promoting its properties to the p
.
Don't do this when there's a @class
collision; also think through @style
.