Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-20808

Expand Kapacitor rules to alert on failed validate_drp for HSC and CFHT separately

    XMLWordPrintable

Details

    • Story
    • Status: Done
    • Resolution: Done
    • None
    • squash
    • None

    Description

      We recently found that the deadman alert on the validate_drp processing in the nightly was just looking at any data landing in the validate_drp measurements. This meant that the if just one failed, there would be no notification.

      Attachments

        Issue Links

          Activity

            I believe I have implemented this, but I don't know exactly how to test. See rules named "Is validate_drp running for HSC" and "Is validate_drp running for CFHT" here.

            krughoff Simon Krughoff (Inactive) added a comment - I believe I have implemented this, but I don't know exactly how to test. See rules named "Is validate_drp running for HSC" and "Is validate_drp running for CFHT" here .

            afausti will you have a look at these and let me know if you think they are alright? If there is a way to test them without making the nightly fail, let me know.

            krughoff Simon Krughoff (Inactive) added a comment - afausti will you have a look at these and let me know if you think they are alright? If there is a way to test them without making the nightly fail, let me know.
            afausti Angelo Fausti added a comment - - edited

            krughoff that looks great, I see the new notifications at #dm-squash-alerts

            I think we can remove the original one... Just did that, and it's nice to see the @ mention working.

             validade_drp status changed to {{.Level}} for CFHT <@U06DGJCTB> please check.
            

            afausti Angelo Fausti added a comment - - edited krughoff that looks great, I see the new notifications at #dm-squash-alerts I think we can remove the original one... Just did that, and it's nice to see the @ mention working. validade_drp status changed to {{.Level}} for CFHT <@U06DGJCTB> please check.

            Also, the Kapacitor command line client has the record/replay feature that can be used to test the alert rules. I never used it but it seems very useful:

            https://docs.influxdata.com/kapacitor/v1.5/working/cli_client/#replay

            afausti Angelo Fausti added a comment - Also, the Kapacitor command line client has the record/replay feature that can be used to test the alert rules. I never used it but it seems very useful: https://docs.influxdata.com/kapacitor/v1.5/working/cli_client/#replay

            The replay functionality may be exactly what I want. Thanks!

            If it's ok with you, I'm going to mark this done.

            krughoff Simon Krughoff (Inactive) added a comment - The replay functionality may be exactly what I want. Thanks! If it's ok with you, I'm going to mark this done.

            Sounds good.

            Adding more info to this ticket from our discussion on slack, and marking as reviewed.

            • Confirming that the Kapacitor HTTP API does support the record/replaying functionality, we could wrap that in the squash client for testing alert rules and notifications.
            • Currently it is not possible to test alert rules from the Chronograf UI, but there's an open issue on GH for that.
            afausti Angelo Fausti added a comment - Sounds good. Adding more info to this ticket from our discussion on slack, and marking as reviewed. Confirming that the Kapacitor HTTP API does support the record/replaying functionality, we could wrap that in the squash client for testing alert rules and notifications. Currently it is not possible to test alert rules from the Chronograf UI, but there's an open issue on GH for that.

            The rules were triggered over the weekend and seemed to work generally as expected.

            krughoff Simon Krughoff (Inactive) added a comment - The rules were triggered over the weekend and seemed to work generally as expected.

            People

              krughoff Simon Krughoff (Inactive)
              krughoff Simon Krughoff (Inactive)
              Angelo Fausti
              Angelo Fausti, Simon Krughoff (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.