smtpserver: add prometheus metric for failing starttls handshakes for incoming deliveries

and add an alerting rule if the failure rate becomes >10% (e.g. expired
certificate).

the prometheus metrics includes a reason, including potential tls alerts, if
remote smtp clients would send those (openssl s_client -starttls does).

inspired by issue #237, where incoming connections were aborted by remote. such
errors would show up as "eof" in the metrics.
This commit is contained in:
Mechiel Lukkien
2024-11-29 12:43:21 +01:00
parent 09e7ddba9e
commit afb182cb14
5 changed files with 63 additions and 5 deletions

View File

@ -62,9 +62,14 @@ groups:
# the alerts below can be used to keep a closer eye or when starting to use mox,
# but can be noisy, or you may not be able to prevent them.
- alert: mox-incoming-delivery-starttls-errors
expr: sum by (instance) (increase(mox_smtpserver_delivery_starttls_errors_total[1h])) / sum by (instance) (increase(mox_smtpserver_delivery_starttls_total[1h])) > 0.1
annotations:
summary: starttls handshake errors for >10% of incoming smtp delivery connections
# change period to match your expected incoming message rate.
- alert: mox-no-deliveries
expr: sum(rate(mox_smtpserver_delivery_total{result="delivered"}[6h])) == 0
expr: sum by (instance) (rate(mox_smtpserver_delivery_total{result="delivered"}[6h])) == 0
annotations:
summary: no mail delivered for 6 hours