Pass metadata and settings PDF2XML vendor for tagging
Expected behaviour
Requirements here: #404 (closed)
Package to Apex from BCMS
Sample package: BCMS-to-APEX-sample.zip
Metadata Content added in the metadata UI of the relevant book, corresponding to manifest file types:
- cover
And:
- Book meta in json
- Collection meta in json
- settings in json
Current behaviour
PDF2XML taggers report no metadata or settings are being provided in packages with completed values, per:
Folder: Metadata Contains a cover image only in the case where the image has been extracted from the source PDF. No metadata is sent back from Apex:
- Wholebook use case: Any metadata sent to Apex is written into the metadata node in the wholebook.xml
- Chapter use case: BCMS writes Book metadata from UI into chapter.xml file prior to load to PMC
Like wholebook XML, following NCBI-specific metadata and settings should be minimally sent to taggers:
Coko will pass the following pieces of NCBI metadata to XML conversion and PDF tagging:
- domain name (required)
- collection id (optional)
- collection title - TBD whether this can be a string or requires formatting and needs to be passed during Load to PMC
- source type (required)
- publisher name (optional)
- publisher location (optional)
All other data or metadata will be in the PDF or support folders.
{
"job_id": 9876543210, // Reference ID for the XML conversion job. It's generated by Coko and used by NCBI to report back the status of the conversion.
"user_name": "jordandc", // Name of a user initiating a conversion
"main_xml": "978-3-030-47318-1_Book.xml" // Filename of the primary XML file to use as the conversion entry point.
"domain": "spr9783030473181", // PMCBook domain name of the book being converted
"collection_id": "wtcollect", // PMCBook domain name of the collection that the book being converted belongs to (OPTIONAL)
"collection_title": "Wellcome Monographs", // Title of the collection that the book being converted belongs to (OPTIONAL)
"source_type": "Book", // BCMS assigned "source-type" for the book being converted
"publisher_name": "Springer", // BCMS assigned name of the bibliographic publisher (OPTIONAL)
"publisher_loc": "Cham (CH)", // BCMS assigned publication place (OPTIONAL)
"package": "spr9783030473181.9876543210.2020_05_15-09_25_45.zip", // Name of the package with the XML file, in domain.job_id.timestamp format
"notification_recipients": { // List of emails for NCBI Task Manager (not CoKo) to send notifications to
"success": ["bookshelf@ncbi.nlm.nih.gov","fritz@publisher.org"],
"failure": ["bookshelf@ncbi.nlm.nih.gov"]
}
}
Steps to reproduce
- Create a PDF wholebook
- Enter in source type and publisher name and publisher location using metadata UI
- Add a collection via settings
- Check that all above values are sent to Tagging vendor in JSON per provided specifications
- Check converted XML that all these values are correctly tagged per those provided specifications
Environment
Possible solution
QA Steps
[To be completed by Coko once dev is done]
Scheduling
Fixing this issue is required for Priority 1: Deploy MVP