Ignore all character style information from the paragraph level
Some paragraphs that look like normal text in Word are being extracted as bold, which has the added effect of sabotaging header promotion.
Here are 2 examples of where the extraction goes wrong:
bBowenChapter1.docx outputs_bowen_ch1.zip
Bad bolding:
- "In the second half of the 19th century..."
- "In a modern global food system characterized..."
- "When people want to show how protecting terroir..."
bBowenChapter3.docx outputs_bowen_ch3.zip
Bad bolding:
- "The standard that regulates tequila production..."
- "As I discuss later in this chapter, several..."
- "Tequila’s regulatory infrastructure sounds like something..."
- "The main premise behind any DO is..."
In both of these cases, it stops almost all of the headings that should be promoted from being caught. Headers are denoted with bold. If you look at the digest-paragraph output, you'll see that p style correctly included in the assimilated list of discrete styles, but it doesn't make it through into the "filtered" group because the data-average-length gets skewed too high (120+ character average length) by the improperly bolded paragraphs, causing the style to be dropped from consideration.
So, fixing the underlying incorrect bolding issue will kill two birds. I will also keep an eye out for other places a "<120char" rule stops correct promotions.