Refactor document storage
The current way manuscripts and their versions (plus reviews, decisions etc) are stored in the DB is very inflexible and not well suited for storing document history or adding other types of document such as author proofing feedback. It also involves unnecessary duplication of code. I would like a more generic and flexible schema that permits different collections of documents and more fine-grained versioning:
- The
DocSet
is the top-level container: the thing that appears in the list on the Manuscripts page. - It contains multiple separate
Doc
s, which include different major versions of the manuscript/preprint, plus reviews, summaries, decisions and other related items. One of these is themainDoc
, usually the latest version of the manuscript/preprint. - We keep a set of
DocRelation
s that tell us e.g. doc B is a review of doc A, doc C is also a review of A, and doc D is a summary of B and C. Thecontext
can hold further information that may be of use for generating DOCMAPs etc. - A
Doc
has multipleDocVersion
s: these are not versions of record, but more fine-grained: the version I've just edited versus the version someone else edited 5 minutes ago. If I'm editing a document such as a review that I haven't yet submitted,isPendingSubmission
will be true, and this version will remain private to me. Until a first version has been submitted, the doc won't exist as far as the client is concerned. - The JSON
data
in theDocVersion
will contain everything we currently store inmanuscript.meta
andmanuscript.submission
. We'll just keep a very basic title at theDocSet
level for convenience.
For versioning, I'm thinking we should create a new version each time edits are made by a user who is different to the previous user who edited, or if 5 minutes have elapsed since the previous edit (and isPendingSubmission
isn't true).
Refactoring steps
- (optional) We would benefit from more extensive integration tests covering the full standard workflows.
- Make the submission form work the same as the decision and review forms: all form data to be stored in the
submission
object, andSupplementaryFiles
andVisualAbstract
fields store file references within the form data. - I propose creating the graphql API for this structure before we change the database schema: it will obtain data from the existing schema and restructure it to mimic the new schema. We can keep both old and new APIs running in parallel and gradually port client code to use the new structure.
- Once the old API is completely phased out, we can refactor the back-end and DB.
Benefits
- Submissions, reviews and decisions are treated the same, meaning we need only a single set of code to deal with all of these.
- New artifacts/document types, such as author proofing feedback, will be very easy to add.
- Publishing can become much simpler and more flexible, using simple mappings rather than specialised code to publish the various artifacts.
- Helps rationalise the confusing and error-prone system of manuscript versioning we currently have. Querying to get a doc will always return the most recent version, while prior versions of record can be obtained with a subquery.
- It distinguishes between submitted and unsubmitted data. With decisions and submissions we do this very poorly.
- It keeps all version information needed for auditing, diffing and rollback.
- Simplifies production of rich DOCMAPs.