Enhance BCMS logs for regular monitoring, troubleshooting, resolution of application instability
Context
Background
This relates to searching application logs in (#1493 (closed)). The BCMS currently has extensive logs, however there will be updates needed, for example: we should review all the API end points that a user initiates and add any missing logs.
User Story to Satisfy
BCMS users experience frequent instances of the application giving a red pop up error and/or then going blank and/or giving a Gateway error, which usually causes our users to refresh their browser repeatedly to try to resurrect what they were doing before this experience. Despite extensive requests to users to provide screenshots, videos, or describe what they were doing when this happened, we have not been able to fully identify the source of all these crashes without access to logs as part of maintaining the system. This led to the conclusion that there must be sufficient logging of any action in the BCMS that could be potential reason for its instability, and to have enough data for reliable monitoring and troubleshooting to identify root causes of problems as part of supported maintenance to rapidly resolve those root problems, all of which is necessary for the stability of Bookshelf operations.
Proposal
NCBI to provide a prioritised list f logs expectations. From the proposal:
...developers can search logs by crashes by time, frequency, BCMS URL, any other relevant variables (to be defined).
Design
NA
Acceptance criteria
-
Logs will report any instability of the application, including: -
crashes or outages, with date and time of event -
restarts of the application, with date and time of event
-
-
All activities conducted by users and the system can be queried from the logs, including: -
creation or updates to users and organizations and their settings -
creation or updates to book, book part, and collection components and their metadata and settings -
creation and update of parts and their metadata -
manual or FTP uploads of files, including packages ingested by tagging vendors or NCBI and/or their failed ingests -
all supported workflow actions, including submission, loading to preview, and publishing of components, including book and collection TOCs -
management of chapter-processed contents through move and repeat functionality
-
-
all logs will include the following details: -
user ID (with understanding username will be available if/when we expose user history in UI) or system who conducted action, required -
date and time of action all in the same time and date zone, where that time and date zone is known and documented, required -
domain name, when relevant -
BCMS id, when relevant, for instance, creating components and uploading / processing files in those components -
filenames with component.version number, when relevant, for instance, for uploads of files and workflow actions
-
Definition of ready
-
BCMS User Story / Context has been well defined -
The priority of the user story is specified and agreed -
Digital assets added (design, database scheme, mockups etc if relevant) -
Coko Technical Proposal approved by NCBI -
Testable Acceptance Criteria approved by NCBI -
Estimate of effort to complete (time or points) -
The issue has been broken down into development tasks (if necessary) -
Requirements Clarified -
The product owner and development team agree that the user story is ready for development -
NCBI adds “Dev_Ready”
Definition of done
-
All coding tasks are finished and implemented -
QA approved -
Deployed and tested on “ncbidev” (by Coko team) -
Deployed and tested on “ncbi” (by NCBI team) -
Acceptance Criteria Met
Implementation
Alternative approaches (if applicable)
Scheduling
-
Milestone is linked -
Iteration is linked -
Dependencies: ("None" or list issue numbers if relevant) -
Development estimate is added to issue time tracking
Development estimate for enhancing existing BCMS logs: 2 days at start of project; then 5 days spread over the project.