SET UP / CREATE: ID Use Cases and Proposals (along with other searchable fields)
Updated 20 May 2021
This diagram, copied below, reflects the decision made in this issue.
For phase 1 we agreed search by BCMS ID and title.
[Original description]
As supplied Aug 11, 2020 when project scoping and design sessions were taking place, the following IDs (as ID use cases based on time-extensive, exhaustive data pulls and analysis) for Bookshelf were provided in relation to the following:
- Rich metadata for tagged data
- Mapping and identifying content in BCMS to other integration points
- Finding content in BCMS
These are IDS we asked to be supported:
Collection
- PMCID
- BCMSID*
- PUBLISHERID
- ISSN
- FUNDERID (As supported by NLMGrantHub and in XML data with the authority assignee for the id)*
- NLMCATALOGID*
- Volume and Issue (they just need to be done by filter and sort in table)
- Collection Name?
Book
- PMCID
- BCMSID*
- PUBLISHERID
- DATASUPPLIERID*
- DOI
- NBKID*
- PMID*
- ISBN
- FUNDERID (As supported by NLMGrantHub and in XML data with the authority assignee for the id)*
- AWARDID (As supported by NLMGrantHub and in XML data with the authority assignee for the id)*
- ORCID
- NLMCATALOGID*
- Title?
- Author / Editor Name?
Book Part
- PMCID
- BCMSID*
- PUBLISHERID
- DATASUPPLIERID*
- DOI
- NBKID*
- PMID*
- FUNDERID (As supported by NLMGrantHub and in XML data with the authority assignee for the id)*
- AWARDID (As supported by NLMGrantHub and in XML data with the authority assignee for the id)*
- ORCID
- Book Part Title?
- Author / Editor Name?
- Rare: E-LOCATION
Organization
F- UNDERID (As supp
- orted by NLMGrantHub and in XML data with the authority assignee for the id)*
- BCMSORGID*
- NLMPUB ID (from publisher portal)*
- NLMAGREEMENT ID (from publisher portal)*
- Organization name?
- Publisher name? (may be imprint of an Org)
Key:
*Not typically an ID always provided in XML data, but is a persistent ID for Bookshelf content meaningful for discovery, retrievability, identification, policy, etc
?Signifies data that may or may not be associated with a Unique ID on which we think is important data users will search on to find content in the system
Note: we have users who will also want to add a wildcard list of strings to retrieve more than one record in BCMS
We **NOW **have the following immediate technical issues:
- How map NCBI domains to Bookshelf BCMS IDs
- How map in the FTP File Submission XML Meta Information meta.xml book-submit book-id attribute to BOTH NCBI domain AND BCMS ID
- What ID is named when a collection or book record is created in BCMS and how is that mapped to other domains?
- In Phase 1, which IDs do we support and how?
- Which IDs can be used for data integrity checks to match possible duplicates or submissions from multiple submissions points not in communication?
I propose the following for long-term vision and hopefully minimum requirements for Phase 1 implementation:
-
Every container record in BCMS receive a BCMS ID with this syntax – BCMS# that is consecutive from time of first creation
-
This BCMS ID is used to map to ALL other IDS for the following use cases:
- Mapping to legacy PMCBook existing domains
- Mapping to or creating new PMCBook existing domains
- Checking for database integrity (possible dupes or updates) based on a DOI, PUBLISHERID, ISBN, or other unique persistent IDs for the BOOK or CHAPTER
- Mapping to IDS in FTP provided meta.xml files (see below)
-
We give a new name for the meta.xml book-submit book-id attributes (not to be tagged in BITS XML to go to PMC Load) that are recorded in BCMS ONLY for mapping and findability as:
- BOOKSUBMIT ID
- CHAPTERSUBMITID
- COLLECTIONSUBMITID
- AGENCYSUBMITID
-
In Phase 1, we require all non BCMS or NCBI IDs provided in supplied source files EXCEPT for funded submissions, in which case we create in PUBLISHER / COLLECTION metadata templates which IDs (and other metadata fields) are required, and these get output in some easy export file (Excel?) to send to PDF2XML tagger to add to the BITS BXML converted files. I propose we only require ID fields that we know will be necessary for integrity checks to avoid duplicate records in Bookshelf and PubMed (e.g., DOI OR ISBN). This set of IDS WILL be tagged in BXML according to current Bookshelf tagging guidelines.
-
For search and findability in Phase 1, we either permit search or hackable URLs by “meaningful” IDs to Bookshelf Staff / External User – this needs to be further determined based on expected users in phase 1 and what is easiest for implementation AND by title search. I'm not confident what will be "meaningful" for external users.
If a proposed Wireframe is necessary, I can create one on request.