Inconsistent behavior in permitting ingest of tagged XML from PDF
cc @douglassue
Expected behaviour
System creates error if chapter-processed files do not meet file naming specifications regardless if posted to FTP by taggers or manually uploaded.
Current behaviour
System ingests chapter-processed tagged files from PDF that are not named the same as the source file. This file is processed all the way to the publish state.
System creates error if user tries to manually upload a edited tagged file that is not named the same as the source file.
Steps to reproduce
- Download files and package as tagged files from PDF2XML taggers
- Once published, download converted file and try to reupload
Solution
Since this problem occurs when the Vendor submit a converted file that should not be accepted, this should be considered a "Tagging error". This solution described here is documented as user story 5a in the BCMS scoping outcomes list
- Apply a new Status "Tagging error" to the PDF workflow (with the user actions defined in the Status/Action sheet)
- When Vendor submits Chapter package with a converted file that does not have the same filename as the existing source file, put the chapter into a "Tagging error" state.
- Report the following error on the errors tab
Name | Category | Severity | Assignee | Error Message |
---|---|---|---|---|
file naming | Tagging error | error | Vendor | The converted file name must match the source file name |
(This is included on the error tracking sheet)
- Email the Sys admin assigned to the Org and the PDF2XML Vendor (this will only be done by #1296)
- The Vendor submits a corrected package, and the status of the chapter updates accordingly, or a System admin can download the converted file, rename and upload it, to resolve the Tagging errors status.
Priority
Y, if there is no workaround to prevent file version and content integrity issues on Bookshelf
Remaining tasks
Note NCBI task remaining to complete in #804 (comment 97990)
QA Steps
- For PDF chapter processed books, we validate the filename that is uploaded on the converted section via UI. So if it doesn't match with the source it will not be uploaded.
- Tagging-errors status can be caused by the submission via ftp. Steps to reproduce:
- Create a PDF chapter-processed book
- Upload any source file
- Download the vendor-meta.xml from the book component file tab
- Connect with APEX ftp
- Create a package similar to this compress18aug.zip. Where the filename on the converted folder doesn't match with the source. Replace the vendor-meta.xml with the one you downloaded on step 3.
- Submit the source file for tagging
- The package which was created on step 5, submit it to ftp folder /done/testbcms
- When the package is sent to ncbi, the converted file will appear on the UI and the status will be tagging-errors.