Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: ts_auxiliary_telescope, ts_main_telescope
-
Labels:
-
Story Points:3
-
Sprint:TSSW Sprint - Aug 15 - Aug 29
-
Team:Telescope and Site
-
Urgent?:No
Description
Update the Watcher to support escalating alarms to OpsGenie. This requires:
- Add a configuration field for the URL (or read it from an env var, but I will use configuration).
- Enhance the escalation config field to support multiple responders.
- Pick an env var to hold the authentication key (it is a secret, so it must not be part of configuration).
Also tweak the alarm event schema, but make the code compatible with the current ts_xml, as well.
- Include the ID of the escalated alert, to tie into the OpsGenie web site. Change "escalated" to "escalatedId". If escalation fails then set this to "Failed: ...reason...". Leave it blank if the alarm has not been escalated.
- Modify the description for escalateTo to document the new information: a json-encoded string of [\{"name": ..., "type": "team"\}, \{...\}] instead of a simple name.
If an alarm is acknowledged, try to close the associated OpsGenie alert. This prevents people from being needlessly woken up, and also simplifies handling the Alert's escalation state: clear the escalation ID when acknowledged. (If not when acknowledged, then when? We don't want stale data.)
Pull requests: