FTP integration with Apex (PDF2XML vendor)
Hi @lathrops1 and @Kireev @John.kopanas
Proposal updated 10 Sept
- BCMS receives PDF submission by FTP (#570 and #571) or user upload in the UI
- Files are shown in the UI, assuming the minimum submission requirements are met. (See #438 (closed) for resolving submission errors.)
- If this is the first submission, BCMS creates an
vendor-meta.xml
file that's included in the package, example structure:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book-submit SYSTEM "books-submit-vendor.dtd">
<book-submit bcms-id="bcms319" job-id="438ab1" submission-type="chapter"
</book-submit>
- Files are packaged, the package is named in syntax
domain.bcmsid.timestamp
format - Files are posted to Apex FTP to
/ftp/APEX/new
, either by manual "Submit" button or automatically if coming from FTP without errors, according the folder structure here - Apex picks up package and does conversion work
- Apex posts completed package (named in a syntax of their choice) to
/ftp/APEX/done
, according the folder structure here - NCBI polls the FTP site and sends json notification to Coko in format agreed in #566 (closed).
- If package is chapter, BCMS writes book metadata into chapter.xml
- If book belongs to collection, BCMS writes collection-meta into chapter.xml or wholebook.xml file
- BCMS posts package to Load to PMC
- NCBI reports errors or successful preview in kafka notification as done for word and xml workflows.
Package to Apex from BCMS
Sample package: BCMS-to-APEX-sample.zip
Source Files uploaded to "source or "book component" section, corresponding to manifest file types:
- book OR hybrid
- chapter OR manuscript OR prepub
- fm
- part
- addendum
- appendix
Suppl Files uploaded to "Supplementary" section, corresponding to manifest file types:
- supplement: files that are linked in the source document
Metadata Content added in the metadata UI of the relevant book, corresponding to manifest file types:
- cover
And:
- Book meta in json
- Collection meta in json
- settings in json
Notes:
- If there is no book cover in book Meta UI AND the book belongs to a collection that has a setting “copy cover to all books”, then include the collection cover in this folder. (Note: A simpler implementation may be to always copy collection cover to book metadata UI whenever a book is submitted to a collection with this setting).
- If no cover image is supplied, Apex extracts from source PDF.
- Book Meta and Collection Meta: To be determined which fields specifically (Dione is reviewing info provided by Stacy)
Support Files uploaded to "Support" section, corresponding to FTP file types:
- meta.xml file
- alt-text
- notes
- source: for any other source material, such as: Word version of the PDF containing images to be extracted by APEX
And:
- vendor-meta.xml (files created by Coko which must must not be edited or deleted)
Package from Apex to NCBI to BCMS
Sample package (empty except for the required vendor-meta.xml): APEX-to-NCBI-to-BCMS-sample.zip
Converted Containing one converted xml file
Folder: Bookshelf Display PDFs Containing the source PDFs that have been renamed by the vendor as follows:
- book PDF: Book
domain name
- chapter PDF:
chapter id
These are send to Load to PMC and displayed as a download on the Bookshelf website.
Folder: Metadata Contains a cover image only in the case where the image has been extracted from the source PDF. No metadata is sent back from Apex:
- Wholebook use case: Any metadata sent to Apex is written into the metadata node in the wholebook.xml
- Chapter use case: BCMS writes Book metadata from UI into chapter.xml file prior to load to PMC
Folder: Suppl Contains only suppl files that have been created by Apex. These must be named according to NCBI's submission file naming specs.
Folder: Images Contains the images extracted from Source and referenced in the converted xml.
Folder: Support The vendor-meta.xml file that was provided. Apex should not edit this file as this may lead to submission errors.