... | ... | @@ -52,6 +52,8 @@ Following extraction, data (now HTML) may be piped through a sequence of steps, |
|
|
|
|
|
The goal of the entire pipeline (just not the first step) is as clean and simple a representation as possible of the 'labeling' of document parts implicit in formatting and style names in the Word, with minimal (ideally no) 'interpolation' (only representation) of (nominal) semantics as given in the source data.
|
|
|
|
|
|
Note that separating requirements into extract and refine permits us to design each separately. Probably both phases will ultimately include customization layers. Initially our goal is to see how much we can do with only generic logic.
|
|
|
|
|
|
## Iterative development model
|
|
|
|
|
|
Since many of the particular requirements for data capture and representation can only be defined in use, project feedback is essential to further development of these stylesheets.
|
... | ... | |