Inline italics lost because "font-style: italic" applied at paragraph level
In Powell00, the heading “[b]The Hittite-Hurrian Kingship in Heaven and The Song of Ullikummi” has some inline italics that don’t come through, because the paragraph has a font-style: italic
applied to it. This means that the whole line shows in the browser in italics, and the differentiation between the normal text and the italics is hidden. Removing the paragraph-level italic styling means you can see the inline italics again.
Everything in the paragraph is contained within a series of spans - there is no text in this <p>
that's not enclosed by another tag. In Word, the text in the spans that don't specify italics come through as normal, unitalicized text. But in the extracted version, the entire line is italicized. Put a different way, in Word, the spans in the paragraph that specify style seem to override all the paragraph-level style information, but in the extracted html, they don't.
Is there a good way to correct for this?
This is how it's extracted:
<p style="font-family: Times New Roman; font-style: italic; color: #19191B">
<span style="font-family: Times New Roman; font-weight: bold; color: #19191B">
<b>[b]</b>
</span>
<span style="font-family: Times New Roman; color: #19191B">The Hittite-Hurrian
</span>
<span style="font-family: Times New Roman; font-style: italic; color: #19191B">
<i>Kingship in Heaven </i>
</span>
<span style="font-family: Times New Roman; color: #19191B">and </span>
<span style="font-family: Times New Roman; font-style: italic; color: #19191B">
<i>The Song of Ullikummi</i>
</span>
</p>
Interesting to note that this also gets promoted to a header, and the bolding, while preserved, is thus not apparent in a browser (it is if you change this back to a <p>
again though).