Fallback for unknown inline elements
XSweet currently emits elements inside <p>
by passing everything through from the Word except elements known in advance, which can be safely discarded or cast to HTML equivalents. Since this "known subset" of Word semantics covers 99%+ of cases in the wild, we rarely get invalid HTML as a result (that contains an element from the Word that has no meaning in HTML and hasn't already been cast). This is by design: we would prefer to see these elements (permitting us to trap them and extend to cover them) than to pretend they weren't there. (Some day we can do a comprehensive survey of WordML and reduce the set of unknowns to zero, but until then.)
For the Editoria filter, such unknown/non-HTML elements could be okay, or maybe not. We need a rule for "things not known about in advance". The solution may be a fallback: "when there is no mapping, produce a flag". It would consult a "white list" of permissible element types (strong
, em
etc.) in Editoria. The form of the flag produced (span.unexpected-X
?) needs to be defined in advance.
For simplicity, it would be good to have the same rule for unknown paragraphs, headers, list items and any other mixed content.