Progress review agenda 16 August 2022
Hi @lathrops1 @Kireev @latternm @ErinS @deniskar @douglassue @jordandc @jeffbeckncbi
(cc @bela @danjela @DioneMentis @John.kopanas @pokhi @rudresh @Shanthi_B @shubtiwari @sidorelauku @vignesh03)
Update on release this week
We will do a release tomorrow which will include:
URL | Title |
---|---|
#1076 (closed) | Should not be able to publish Collection TOC unless at least one members (book) is published |
#1078 (closed) | Should not be able to publish collection TOC with chapter-processed books in Published status but without any Published TOC |
#1237 (closed) | Display heading level & Expand heading level |
#1321 (closed) | Inaccurate Component and File Status |
#1375 (closed) | Chapter processed book: Metadata for contributors on book components is not updated from the converted file. |
#1376 (closed) | upload error when uploading converted file |
Subject to timing and outcomes of our final stages of QA, I expect the release to also include:
URL | Title |
---|---|
#804 | Inconsistent behavior in permitting ingest of tagged XML from PDF |
#1219 (closed) | Special character: & not tagged properly |
#1281 (closed) | Permissions for Editors on Grants for Books and Book components. |
#1352 (closed) | Image File extensions don't match when case on file extension differs and create the same version number for each |
#1365 (closed) | Chapter-processed book under wrong collection name navigation |
#1373 (closed) | Chapter processed metadata modal: Book subtitle is missing |
#1330 (closed) | Add Enviroment variable which will show the Bcms login or will redirect to Login NCBI |
And other issues may be included in the release as well if they pass QA.
Other priority MVP issues are in progress or are discussion items to revisit with NCBI to confirm the scope.
Issues I'd like to discuss today (some may be continued in the project manager's meeting tomorrow)
#1332 (closed): Multiple versions appearing on Collection pages
Since related TOC work in #1078 (closed) and #1076 (closed) is ready to be released, we need to finalise the intended scope of #1332 (closed).
We have received info that 'I cannot answer what the future might hold, but currently all versions of a book have been part of the same collection' does that mean we should assume all versions of a book always belong to the same collection, or do we need to accommodate versions of a book belonging to different collections?
Assuming all versions of a book always belong to the same collection, we have info that 'Collection pages should display latest created book version'. As with TOC work, does this refer to the latest created version that is published? Or the latest created regardless of status, in which case the previously published version of a book will be removed when the new version is in 'New version' status.
#1302 (closed): Book previews sometimes don't show the preview
From the example preview provided, we are loading the iframe received from NCBI but it seems the link from the preview gives a 404 error that the page doesn't exist. In issue #1302 (closed) we have added more example based on our testing, and in many cases we can't replicate it even when we use the same content in a new book, so we don't think it's related to the content but rather, NCBI systems.
#1361 (closed) and #933 (closed): Supported converted file sizes
We have run into issues with files of 10 MBs, as described in #1361 (closed), since the tool we use for reading metadata does not support files that big, and we never encountered files of over 5 MB before. From the spreadsheet Denis shared with us, it seems there are files as big as 17 MB (for example dia3ed: Diabetes in America. 3rd edition). We are unsure of the exact limit of our tool and need a data dump of all files to test, as we know 10 MB is a problem but need to see if 9 MB, 8 MB etc. are a problem too, to assess the extent of the limitation.
Can NCBI provide a data dump of all your files you're migrating or expecting to receive?
We thought the issue reported in #933 (closed) feedback was due to unexpected entities in the file, which we can move past with further development, but the example file provided there is also 10 MB, so we can expect timeouts and also failures to read metadata for files this big.
We need to confirm the priority for these issues (which may be determined by testing results with example files to be provided) since development to use another library to read metadata is a big change and we can’t give an estimate until investigating further with example files and establishing the exact needs.
#1346 (closed): Should not be able to add Content type for chapter-processed books, but should be able to add Content type for chapter versions, and other related changes
We should discuss and further scope the needs, but there are risks at making such fundamental changes this close to deployment.
A brief summary of some of the proposed changes, as I understand them at present, are as follows:
-
Remove content type and version number from chapter processed books at the book level: we can do this but it does contradict what we have in our Book Versions epic: 'NCBI expects that in the future users will need book versions to support changing from wholebook-processing to chapter-processing' where content type would be required for that. The epic does also say 'NCBI has also confirmed that when this support is required, this will not include support for a book that has both book versions and chapter versions' so would we not develop a restriction that if you turn on the setting for 'support multiple chapter published versions' in a chapter processed book (perhaps moved to the new book step) you can't add a content type at the book level for those chapter processed books?
-
In chapter processed books, chapter published versions which are 'Author manuscripts' have a processing instruction (PI) in book metadata, but in the BCMS, NCBI don't want the user to apply this value at the book level (i.e. they should apply it at the chapter level, and we write into the book metadata for those chapters only). Possibly we could overwrite the processing instructions for book metadata, with the PI from the chapter. But we are not sure this is sound to develop in this way and there are related risks we'd need to examine.
-
In chapter processed books for chapter versions, NCBI has asked for MVP, that we need to add the content type to each chapter published version (which as per point 2 would need to be written into book metadata of the chapter but not applied at the book level), and allow the user to input a custom version number for each chapter version. When there are duplicate version numbers, because PMC doesn't support content type they will be considered duplicates (e.g. Manuscript v1 of a chapter) then (maybe later but unlikely at deployment) Final full text v1 of that chapter) PMC would overwrite Manuscript v1. I want to be sure that is desired (why custom version numbers are needed) and that the proposed technically okay for NCBI.
-
We should support all four content types (version names) for chapter published versions in the PDF workflow: currently only manuscript is supported and is applied at the book level, and therefore present in all chapters in that book in book metadata.
Besides for these specific concerns and questions about #1346 (closed) above, my other concern is that:
Book component versions are approximately 2 years old, and are everywhere in the system. If you have a CPB, and have ten chapters that are published, then upload ten files in bulk upload, the chapter version will increment automatically. This has been developed per docs here and here (note this does apply to all workflows). In that bulk upload flow the user can't add a version name (content type) or custom version number to a chapter. We would have to remove the entire bulk upload version incrementing process and we wouldn't be able to do it automatically. It also affects TOC generation functionality, which are both key functionalities that have been built with the understanding that the BCMS automatically increments version numbers and doesn't allow a chapter-level version name (content type).
I understand that NCBI have said we can't have book versions AND multiple published chapter versions for the same book, but not that we can't have version number and content type at the book level for chapter processed books, before information in #1346 (closed). Since we don't support multiple book versions of a chapter processed book, at present this statement is true because although we have a content type (version name) and a version number 1 for chapter processed books, there will not be another version of these chapter processed books.
Relatedly we should discuss how these PIs are used by NCBI, related to the new issue #1378 (closed) (Support for processing instructions in chapter-processed books / collections) this week, which unfortunately I haven't had a chance to review yet.