Wholebook load into PMC proposal
The following workflow is proposed for CoKo review:
- CoKo deposits a book package and a meta JSON file to
/ftp/ingest/book/
. The package contains all XML, PDF, image, and supplementary data files for the book. - That triggers the TM session
ingest_book_wholebook
which will run asynchronously for several minutes. - TM loads the book to PMC and sends a success / failure notification JSON back to CoKo via Kafka, the topic name is
ingest_book_wholebook_receipt
.
Proposed file format for Load to PMC:
a) JSON meta file vasleepapp.123457890.2020_05_15-09_30_19.json
{
"package_id": 1234567890, // Reference ID for the package. It's generated by Coko and
// used by NCBI mainly for reporting back the status of the package ingest.
"domain": "vasleepap", // PMCBook domain name of the book being loaded.
"main_xml": "vasleepap.bxml", // Filename of the primary XML file to use as the processing entry point.
"thumb": "vasleepap.png", // Filename of the book thumbnail.
"package": "vasleepap.123457890.2020_05_15-09_30_19.zip", // Name of the package data file. Follows the pattern domain.package_id.timestamp.
"target_database": "prod", // Values are "prod", "preview", or "dev". These are aliases for which database to
// load to (prod=PMCBook, preview=PMCBookTest, dev=PMCA3Book)
"release": true, // Boolean flag for whether to release the book or not
"notification_recipients": { // List of emails for NCBI Task Manager (not CoKo) to send notifications to
"success": ["bookshelf@ncbi.nlm.nih.gov","fritz@publisher.org"],
"failure": ["bookshelf@ncbi.nlm.nih.gov"]
}
}
b) Successful response
{
"package_id": 1234567890, // Loading job reference ID
"status": 0, // Success = 0, any other value is a failure
"timestamp": "2020-11-07 15:14:59", // Completion time
"url": "https://www.ncbi.nlm.nih.gov/books/NBK123.2", // URL to view the loaded book
"notices": [
{
"name": "floatcheck",
"severity": "WARNING",
"assignee": "PMC",
"timestamp": "2020-11-07 15:14:59",
"message": " No crossreference to floating object: fig microplates.F4"
},
{
"name": "tablecheck",
"severity": "WARNING",
"assignee": "PMC",
"timestamp": "2020-11-07 15:14:59",
"message": " Table integrity check failed: microplates.T4"
}
]
}
c) Failed response
{
"package_id": 1234567890, // Loading job reference ID
"status": 3, // Status code, non-zero means an error
"timestamp": "2020-11-07 15:14:59", // Completion time
"url": "http://ipmc-prod.be-md.ncbi.nlm.nih.gov:5701/internal/utils/tm/index.fcgi?s=monitor&sel=1&sessid=8904088", // Task Manager URL to view the session log with errors
"notices": [
{
"name": "stylecheck",
"severity": "ERROR",
"assignee": "publisher",
"timestamp": "2020-11-07 15:14:59",
"message": "Empty element: <sup/>"
}
]
}