... | ... | @@ -151,7 +151,7 @@ These are all more or less the same or at any rate semantically equivalent inasm |
|
|
|
|
|
It turns out, among these the choice is fairly clear. Only a made-for-purpose language designed specifically to expose the info (the 'made for purpose" tagging shown) even comes close. It turns out that HTML5 is the clear winner among document formats as an initial target for a Word Extractor. (Note *initial* format - we say nothing of what we might improve this into eventually.)
|
|
|
|
|
|
This leaves it to the editorial team to do what is really important. Rather than impose or infer and semantics, that is, we regard the proper function of the extraction process should be to reflect the distinctions already given in the WordML source, in whatever form they are given. These distinctions, being the necessary points of semantic inflection, may provide a basis on which semantics (whether of labels, or of structural relations) can be exposed and expressed, by an *editorial* process.
|
|
|
This leaves it to the editorial team to do what is really important. Rather than impose or infer any (purported) "semantics", that is, we regard the proper function of the extraction process should be to reflect the distinctions already given in the WordML source, in whatever form they are given. These distinctions, being the necessary points of semantic inflection, may provide a basis on which semantics (whether of labels, or of structural relations) can then be exposed and expressed, by an *editorial* process.
|
|
|
|
|
|
It is, in other words, the proper function not of an extractor, but its target editing environment, to permit users to provide any structure not given explicitly in the source data, as well as to discover rules depending on rationales not given (that can distinguish, for example, a 'title.cited' from some other sort of italics). Since we expect to have an editing environment that can provide us this level of control and capability - our .docx extraction doesn't really have to worry about it.
|
|
|
|
... | ... | |