Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-20808

Expand Kapacitor rules to alert on failed validate_drp for HSC and CFHT separately

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: squash
    • Labels:
      None

      Description

      We recently found that the deadman alert on the validate_drp processing in the nightly was just looking at any data landing in the validate_drp measurements. This meant that the if just one failed, there would be no notification.

        Attachments

          Issue Links

            Activity

            Hide
            krughoff Simon Krughoff added a comment -

            I believe I have implemented this, but I don't know exactly how to test. See rules named "Is validate_drp running for HSC" and "Is validate_drp running for CFHT" here.

            Show
            krughoff Simon Krughoff added a comment - I believe I have implemented this, but I don't know exactly how to test. See rules named "Is validate_drp running for HSC" and "Is validate_drp running for CFHT" here .
            Hide
            krughoff Simon Krughoff added a comment -

            Angelo Fausti will you have a look at these and let me know if you think they are alright? If there is a way to test them without making the nightly fail, let me know.

            Show
            krughoff Simon Krughoff added a comment - Angelo Fausti will you have a look at these and let me know if you think they are alright? If there is a way to test them without making the nightly fail, let me know.
            Hide
            afausti Angelo Fausti added a comment - - edited

            Simon Krughoff that looks great, I see the new notifications at #dm-squash-alerts

            I think we can remove the original one... Just did that, and it's nice to see the @ mention working.

             validade_drp status changed to {{.Level}} for CFHT <@U06DGJCTB> please check.
            

            Show
            afausti Angelo Fausti added a comment - - edited Simon Krughoff that looks great, I see the new notifications at #dm-squash-alerts I think we can remove the original one... Just did that, and it's nice to see the @ mention working. validade_drp status changed to {{.Level}} for CFHT <@U06DGJCTB> please check.
            Hide
            afausti Angelo Fausti added a comment -

            Also, the Kapacitor command line client has the record/replay feature that can be used to test the alert rules. I never used it but it seems very useful:

            https://docs.influxdata.com/kapacitor/v1.5/working/cli_client/#replay

            Show
            afausti Angelo Fausti added a comment - Also, the Kapacitor command line client has the record/replay feature that can be used to test the alert rules. I never used it but it seems very useful: https://docs.influxdata.com/kapacitor/v1.5/working/cli_client/#replay
            Hide
            krughoff Simon Krughoff added a comment -

            The replay functionality may be exactly what I want. Thanks!

            If it's ok with you, I'm going to mark this done.

            Show
            krughoff Simon Krughoff added a comment - The replay functionality may be exactly what I want. Thanks! If it's ok with you, I'm going to mark this done.
            Hide
            afausti Angelo Fausti added a comment -

            Sounds good.

            Adding more info to this ticket from our discussion on slack, and marking as reviewed.

            • Confirming that the Kapacitor HTTP API does support the record/replaying functionality, we could wrap that in the squash client for testing alert rules and notifications.
            • Currently it is not possible to test alert rules from the Chronograf UI, but there's an open issue on GH for that.
            Show
            afausti Angelo Fausti added a comment - Sounds good. Adding more info to this ticket from our discussion on slack, and marking as reviewed. Confirming that the Kapacitor HTTP API does support the record/replaying functionality, we could wrap that in the squash client for testing alert rules and notifications. Currently it is not possible to test alert rules from the Chronograf UI, but there's an open issue on GH for that.
            Hide
            krughoff Simon Krughoff added a comment -

            The rules were triggered over the weekend and seemed to work generally as expected.

            Show
            krughoff Simon Krughoff added a comment - The rules were triggered over the weekend and seemed to work generally as expected.

              People

              Assignee:
              krughoff Simon Krughoff
              Reporter:
              krughoff Simon Krughoff
              Reviewers:
              Angelo Fausti
              Watchers:
              Angelo Fausti, Simon Krughoff
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.