DevOps - Prometheus & Grafana for Kotahi monitoring and alerting
EC2:
-
- create docker-compose for selfhosted stack "cAdvisor + NodeExporter + Prometheus + Grafana" -
- deploy stack on eLife Kotahi prod server -
- add Grafana dashboards for OS and Docker metrics -
- add Grafana notifications channel Mattermost -
- send Grafana's admin credentials to Paul and user credential to Ryan,Ben and Vadim
EKS:
-
- install Prometheus in eks-elife-kotahi cluster and check targets -
- install Grafana in eks-elife-kotahi cluster and check access -
- import needed dashboards in Grafana (imported https://grafana.com/grafana/dashboards/10670) -
- configure and check notification channel "mattermost-elife-kotahi-alerts" -
- configure domain elife-grafana.kotahi.cloud and TLS certificate in ACM -
- add Paul,Ryan,Ben,Vadim,Mihail accounts in Grafana and send credentials -
- install Loki, Promtail in eks-elife-kotahi cluster for containers logs parsing -
- create Loki datasource check Kotahi logs in Loki datasource over Grafana explore -
- import and check "Loki Dashboard quick search" & "Logging Dashboard via Loki" in Grafana (https://grafana.com/grafana/dashboards/12019,https://grafana.com/grafana/dashboards/12611) -
- note all installation/configuration steps in https://gitlab.coko.foundation/kotahi/endava-infra/-/blob/master/k8s/monitoring/README.md -
- check connection Prometheus/Loki datasources from eLife k8s cluster to Grafana Cloud account
sources: