Capitalization & the `<caps>` tag
For the Bakker TOC, it looks like the author used both bolding and all caps to denote some structural elements (h1 & h2). Once it's been transformed, the caps text is carried over as
<caps>, while the chapter headings are `
. These look the same in my browser.
Since the extracted HTML has upper and lower case letters and it's tagged as
<caps>, I'm assuming the author typed it in normally, then chose to upcase it all after the fact. I'm guessing that if the author simply typed in caps in the first place, we wouldn't get the nice
This seems good to keep in mind for post-processing and suggesting structure: pay attention to all caps, and really pay attention to