add prometheus metrics for errors when getting certificates through acme (typically from let's encrypt)

and add an alerting rule for it.
we certainly want a heads up when there are issues with the certificates.
This commit is contained in:
Mechiel Lukkien
2025-02-06 15:12:36 +01:00
parent 1277d78cb1
commit e5e15a3965
2 changed files with 29 additions and 2 deletions

View File

@ -8,6 +8,11 @@ groups:
annotations:
summary: unhandled panic
- alert: mox-acme-request-cert-errors
expr: increase(mox_autotls_cert_request_errors_total[1h]) > 0
annotations:
summary: errors requesting tls certificates with acme
- alert: mox-ip-on-dns-blocklist
expr: mox_dnsbl_ips_success < 1
annotations: