[Kotahidev] Dashboard errors (database load)
Users are experiencing intermittent errors on the Dashboard.
Feedback from @BenWh
The intermittent dashboard crashes you're seeing on Kotahidev are a result of too many database connections at once. It happens when there are too many database queries and/or the queries are taking too long to complete. It's an error coming from the database itself, and it's probably starting to appear now because we have so many manuscripts in the system on kotahidev, meaning that in some cases queries may run longer, and in other cases we might be running more queries in parallel to get data relating to individual manuscripts.
There are several possible things we can do about this:
- Most crudely, we can increase the number of permitted connections, though this may increase memory overheads for the DB. This is something Cloud68 would have to do per database. I'd prefer not to have to do this in the long run, as the app should ideally work with databases with default settings.
- We can hunt in code for queries that are being run in parallel, such as within a forEach, and chunk them so that no more than, say, 10 run simultaneously, and the rest wait until those are finished.
- We can hunt in code for queries that should be combined, or queries that are wasteful because we already have the information. In some cases withGraphFetched could be replaced with withGraphJoined.
- We can hunt for queries that are long-running due to being poorly written or lacking the appropriate DB indexes.
- We can do some database logging to find frequent and long-running queries.
- To avoid the same DB query being repeated by multiple resolvers (and by graphql-shield code), we can use the DataLoader library to batch and cache DB requests
- some frequently accessed and rarely changed queries, such as group and config data, could be cached in memory
- We could possibly use a specialised cache like Redis