"font-family" and special character handling
In the "a3_Bakker_ack.docx" document, special characters are displayed in a different font than the rest of the text. The characters that I see causing the issue are:
- é
- í
- ¡ (upside down exclamation point)
- “ and ”
- ‘ and ’
These characters (I'm sure this is not an exhaustive list) all seem to get their own spans, explicitly labeled as Helvetica:
<span style="font-family: Helvetica; font-size: 12pt">“</span>
As a result, they also chop paragraphs into a group of several sequential spans, rather than just one. Removing the font family attribute from these spans makes them appear in the same font as the rest of the text. The original document was in Helvetica.
That the original was in Helvetica is not important semantically, so I see it gets stripped from most text. Can we get it to strip these and other characters of overly-specific font information as well? That might keep the spans from getting chopped up as finely as well.