*XML* workflow: Renaming `book_pdf` file with domain name
Context
NCBI systems require Wholebook Bookshelf display PDF to be named with the domain name. These files can be submitted by a user in the UI and FTP. The domain name is not known to the submitter at time of submission, but can be looked up in the book metadata UI.
Other PDFs of each chapter of the book could be submitted at the same time.
Proposal
MVP deployment development
For MVP deployment only uploads via the UI have been dealt with.
- User uploads PDF(s) in UI
- User adds the
book_pdf
tag to one file (the file matching the wholebook) - BCMS automatically renames the PDF to match the
domain name
when sending package
This development was approved and confirmed completed in #403 (comment 97692). All discussion beyond the agreement for MVP is saved in the issue history section.
QA steps for MVP deployment development
The below actions are for as much as can be verified by the QA team. NCBI can check on their side that the received package has the correct changes as well.:
- When submitted via UI and we have a file tagged as pdf. When the file changes the status to loading preview, we downloaded from the FTP folder of
/ingest/book
the corresponding package that was sent which had the file tagged as book_pdf now renamed.
This is a bit easier to check if you click reload preview as the package is on this folder only for 2 minutes. - When sent an FTP package, where we also have on the meta.xml a file tagged as book_pdf. When the book goes to status loading preview again the package on
/ingest/book
folder had the file renamed with book domain name.
Issue history
Use cases
NCBI system requires that:
- Bookshelf display PDFs for a wholebook (submitted as file type
book_pdf
in FTP submissions) are named with the PMCBookdomain name
. This is not known to the submitter at time of submission, but can be looked up in the book metadata UI. - Bookshelf display PDFs for a chapter (submitted as file type
chapter_pdf
in FTP submissions) are named with thechapter id
for chapter submissions only. This is known to the submitter at time of submission, and can be looked up in the book component metadata UI. It's up to the user to name this file correctly in FTP submissions, if it's not correctly named, this will result in a load to PMC error (of severity type warning).
Expected flow
FTP submission of wholebook:
- Organization submits an FTP wholebook package with one file typed as
book_pdf
(the package may also include multiple files ofchapter_pdf
type. - BCMS creates book and domain (if it doesn't already exist)
- BCMS renames
book_pdf
file with the bookdomain name
. - BCMS submits book to XML conversion, load to PMC, etc.
- If Organization submits another package, BCMS renames
book_pdf
with thedomain name
again and creates next file version.
UI Upload of wholebook:
- Book is created in UI by user; domain is created.
- User selects upload to Bookshelf Display PDF section --> BCMS validates that the filename matches the
domain name
, if filename does not match, then show error "Filename does not match the book domain name:domain name
"
[Original issue]
Hi @lathrops1
cc @John.kopanas and @latternm
Documenting what we discussed:
Organizations can submit the published PDF versions with the xml. These are provided as downloads on the book or chapter page on Bookshelf website.
XML book component submissions
The manifest file identifies the PDF as chapter_pdf
, for example:
chapter eh0072_3dprinting.xml
chapter_pdf eh0072_3dprinting.pdf
image eh0072_3dprinting.tif
meta meta.xml
The chapter_pdf
file needs to be renamed to match the book-part-id
created durning the source xml Convention step. This chapter_pdf
file must be sent to NCBI at Load to PMC step.
- Firstly, is my above description correct?
- For my own understanding, why must
chapter_pdf
filename matchbook-part-id
? - How is this renaming currently handled?
XML book submissions
For the whole book scenario, the manifest file identifies the book PDF as book_pdf
. Additionally there can be multiple chapter_pdf
s supplied, one for each chapter in the book, as in this manifest example:
- The
book_pdf
filename must match the domain ID (thebook-id
provided in the FTP submission) - The
chapter_pdf
filenames must matchbook-part-id
. Again, these ids are only known after source book.xml is converted. - Both
book_pdf
andchapter_pdf
files must be sent to NCBI at Load to PMC step.
- How are these multiple
chapter_pdf
files currently matched to the book.xml nodes and renamed? Is it possible for NCBI to continue with the same process? - In the source xml, does each book-part node have an id? Could that be used as the filename for the pdfs? Or can the
chapter_pdf
filenames be included in the source book.xml file (assuming the xml would till be valid)? - How common is the use case of multiple
chapter_pdf
files submitted in one book package?