# Expand Kapacitor rules to alert on failed validate_drp for HSC and CFHT separately

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
1
• Team:
SQuaRE

#### Description

We recently found that the deadman alert on the validate_drp processing in the nightly was just looking at any data landing in the validate_drp measurements. This meant that the if just one failed, there would be no notification.

#### Activity

Hide
Simon Krughoff added a comment -

I believe I have implemented this, but I don't know exactly how to test. See rules named "Is validate_drp running for HSC" and "Is validate_drp running for CFHT" here.

Show
Simon Krughoff added a comment - I believe I have implemented this, but I don't know exactly how to test. See rules named "Is validate_drp running for HSC" and "Is validate_drp running for CFHT" here .
Hide
Simon Krughoff added a comment -

Angelo Fausti will you have a look at these and let me know if you think they are alright? If there is a way to test them without making the nightly fail, let me know.

Show
Simon Krughoff added a comment - Angelo Fausti will you have a look at these and let me know if you think they are alright? If there is a way to test them without making the nightly fail, let me know.
Hide
Angelo Fausti added a comment - - edited

Simon Krughoff that looks great, I see the new notifications at #dm-squash-alerts

I think we can remove the original one... Just did that, and it's nice to see the @ mention working.

  validade_drp status changed to {{.Level}} for CFHT <@U06DGJCTB> please check. 

Show
Angelo Fausti added a comment - - edited Simon Krughoff that looks great, I see the new notifications at #dm-squash-alerts I think we can remove the original one... Just did that, and it's nice to see the @ mention working. validade_drp status changed to {{.Level}} for CFHT <@U06DGJCTB> please check.
Hide
Angelo Fausti added a comment -

Also, the Kapacitor command line client has the record/replay feature that can be used to test the alert rules. I never used it but it seems very useful:

https://docs.influxdata.com/kapacitor/v1.5/working/cli_client/#replay

Show
Angelo Fausti added a comment - Also, the Kapacitor command line client has the record/replay feature that can be used to test the alert rules. I never used it but it seems very useful: https://docs.influxdata.com/kapacitor/v1.5/working/cli_client/#replay
Hide
Simon Krughoff added a comment -

The replay functionality may be exactly what I want. Thanks!

If it's ok with you, I'm going to mark this done.

Show
Simon Krughoff added a comment - The replay functionality may be exactly what I want. Thanks! If it's ok with you, I'm going to mark this done.
Hide
Angelo Fausti added a comment -

Sounds good.

• Confirming that the Kapacitor HTTP API does support the record/replaying functionality, we could wrap that in the squash client for testing alert rules and notifications.
• Currently it is not possible to test alert rules from the Chronograf UI, but there's an open issue on GH for that.
Show
Angelo Fausti added a comment - Sounds good. Adding more info to this ticket from our discussion on slack, and marking as reviewed. Confirming that the Kapacitor HTTP API does support the record/replaying functionality, we could wrap that in the squash client for testing alert rules and notifications. Currently it is not possible to test alert rules from the Chronograf UI, but there's an open issue on GH for that.
Hide
Simon Krughoff added a comment -

The rules were triggered over the weekend and seemed to work generally as expected.

Show
Simon Krughoff added a comment - The rules were triggered over the weekend and seemed to work generally as expected.

#### People

Assignee:
Simon Krughoff
Reporter:
Simon Krughoff
Reviewers:
Angelo Fausti
Watchers:
Angelo Fausti, Simon Krughoff