Incorrect font family applied to a paragraph
See Best, References:
Almost all the references are in ps or spans that designate font-family: Ariel
. But, there is one entry that displays as Times New Roman in the html.
From the final html (rinsed): this snippet shows 2 bibliography entries. The first entry displays (incorrectly) as Times New Roman in the browser, while the second entry is a typical entry that displays correctly:
<p style="margin-left: 36pt; padding-left: 36pt; text-indent: -36pt">Aud, Susan, William, Hussar, Frank Johnson, Grace Kena, Erin Roth, Eileen Manning, Xiaolel Wang, and Jijun Zhang. 2012. <i>The Condition of Education 2012</i>. Washington: National Center for Education Statistics. (http://nces.ed.gov/pubs2012/2012045.pdf--retrieved May 15, 2013)<span style="font-family: Times New Roman; font-size: 10.5pt"> </span></p>
<p style="font-family: Arial; font-size: 12pt; margin-left: 36pt; padding-left: 36pt; text-indent: -36pt">Aud, Susan, William, Hussar, Grace Kena, Kevin Bianco, Lauren Frohlich, Jana Kemp, and Kim Tahan. 2011. <i>The Condition of Education 2011</i>. Washington: National Center for Education Statistics. (http://nces.ed.gov/pubs2011/2011033.pdf--retrieved May 16, 2013). </p>
The correct entry above specifies <p style="font-family: Arial;
but the first one doesn't.
Looking at the initial extraction shows the reason. While the paragraph consists mostly of spans with style="font-family: Arial"
, there's one empty span at the very end with a style="font-family: Times New Roman"
.
<p style="margin-left: 36pt; text-indent: -36pt; padding-left: 36pt">
<span style="font-family: Arial; font-size: 12pt">Aud, S</span>
<span style="font-family: Arial; font-size: 12pt">usan, William, Hussar, Frank</span>
<span style="font-family: Arial; font-size: 12pt"></span>
<span style="font-family: Arial; font-size: 12pt">Johnson, Grace Kena, Erin Roth, Eileen Manning, Xiaolel Wang,
</span>
<span style="font-family: Arial; font-size: 12pt">and
</span>
<span style="font-family: Arial; font-size: 12pt">Jijun Zhang.
</span>
<span style="font-family: Arial; font-size: 12pt">2012.</span>
<span style="font-family: Arial; font-size: 12pt"></span<name />
<span style="font-family: Arial; font-size: 12pt"></span>
<span style="font-family: Arial; font-size: 12pt">
<i>The Condition of Education 2012</i>
</span>
<span style="font-family: Arial; font-size: 12pt">. Washington:
</span>
<span style="font-family: Arial; font-size: 12pt">National Center for Education Statistics.
</span>
<span style="font-family: Arial; font-size: 12pt">
(</span>
<span style="font-family: Arial; font-size: 12pt">http://nces.ed.gov/pubs2012/2012045.pdf--retrieved May 15</span>
<span style="font-family: Arial; font-size: 12pt">, 2013)</span>
<span style="font-family: Times New Roman; font-size: 10.5pt"></span>
</p>
It seems the fact that there are multiple font families used is enough to keep the font-family from being added into paragraph styles.
Can you update this to ignore fonts that are applied to empty tags? That would keep invisible tags sprinkled into a Word doc from causing incorrect fonts in the html.