Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-37496

Sasquatch (inc EFD) development and operational support

    XMLWordPrintable

    Details

    • Type: Epic
    • Status: In Progress
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Epic Name:
      sqre-s23-efd-1-ops
    • Story Points:
      70
    • WBS:
      CS002.1.P1.OM.03.06.01.01
    • Team:
      SQuaRE
    • Cycle:
      Spring 2023

      Description

      Sasquatch is SQuaRE's scalar telemetry curartion services that, inter alia, provide the Engineering Facilities Database at the summit (sqr-068.lsst.io) . This system has been in operation from the beginning of AuxTel observing and was adopted by Telescope and Site post-facto on the basis that it met the original EFD requirements. While formal verification is still pending, the system is clearly performing above specification and therefore ongoing improvements and new features come under operations. 

       

      This cycle Sasquatch, which was already deployed at the summit, will also replace SQuaSH as the science metrics service. Furthermore, network permitting, EFD data will be replicated to the USDF. Related work will also be included here. 

        Attachments

          Issue Links

          Stories in Epic (Custom Issue Matrix)

          Key Summary Story Points Assignee Status
          DM-33801

          Implement a query to list tracts based on the selected dataset in the DRP metrics dashboard

          1.4 Angelo Fausti To Do
          DM-36002

          Issue replicating LFA URLs recorded at the Summit to the USDF

          Angelo Fausti To Do
          DM-34985

          Describe Sasquatch use cases in SQR-068

          1.4 Angelo Fausti To Do
          DM-39943

          Deploy source InfluxDB at USDF

          1.4 Angelo Fausti To Do
          DM-39102

          Fix monitoring of the InfluxDB data partition at the Summit

          1.4 Angelo Fausti To Do
          DM-39239

          InfluxDB ingress needs annotations to increase the proxy timeouts

          1.4 Angelo Fausti To Do
          DM-39236

          Sasquatch REST proxy authentication

          2.8 Angelo Fausti To Do
          DM-39397

          Enable InfluxDB v2 at USDF dev

          4.2 Angelo Fausti To Do
          DM-39393

          Add the annotate() method to the EFD client

          1.4 Angelo Fausti To Do
          DM-40005

          Fix default values for MirrorMaker 2 in Sasquatch

          0.7 Angelo Fausti To Do
          DM-39611

          Make kafdrop a shared Helm subchart in Phalanx

          2.8 Angelo Fausti To Do
          DM-38913

          Review kafdrop Kafka user authorization configuration

          Angelo Fausti To Do
          DM-30492

          Plan migration to InfluxDB 2.x in Sasquatch

          Angelo Fausti To Do
          DM-29912

          Sasquatch troubleshooting guide

          4.2 Angelo Fausti To Do
          DM-27808

          Plan migration of SQuaSH production data to Sasquatch at USDF

          Angelo Fausti To Do
          DM-29216

          Data missing in LDF EFD compared to summit EFD

          4 Angelo Fausti To Do
          DM-31914

          Implement the filter transformation in kafka-aggregator

          8.4 Angelo Fausti To Do
          DM-31898

          Run Flux tasks with Kapacitor 1.6

          2.8 Angelo Fausti To Do
          DM-31862

          Migrate annotations from InfluxDB 1.x to 2.x

          4.2 Angelo Fausti To Do
          DM-31577

          Create a separate database for log events in the EFD

          2.8 Angelo Fausti To Do
          DM-28568

          Address Simon's comments on squash-sandbox deployment

          5.6 Angelo Fausti To Do
          DM-37515

          Disable FIPS mode in the Strimzi Cluster Operator

          1.4 Angelo Fausti To Do
          DM-37443

          Investigate Kafka connect errors during Summit outage

          Angelo Fausti To Do
          DM-37250

          Set up EFD replication from TTS to USDF dev (and vice versa)

          2.8 Angelo Fausti To Do
          DM-36755

          Replace InfluxDB Sink connector by Telegraf Kafka Consumer

          2.8 Angelo Fausti To Do
          DM-36751

          User documentation for sending analysis_tools metrics to Sasquatch

          1.4 Angelo Fausti To Do
          DM-36728

          Send analysis_tools metrics to Sasquatch

          Angelo Fausti To Do
          DM-36430

          Telegraf Kafka Consumer status and auto restart features

          1.4 Angelo Fausti To Do
          DM-38784

          Monitor access to port 9094 from USDF to Summit

          1.4 Angelo Fausti To Do
          DM-38782

          Characterize Sasquatch latency for EFD data

          7 Angelo Fausti To Do
          DM-40742

          Modernize or to get rid of kafka-connect-manager

          Angelo Fausti To Do
          DM-40741

          Enable MirrorMaker2 auto-restart feature in Sasquatch

          0.7 Angelo Fausti To Do
          DM-40515

          Test CSC is not recording data to the expected InfluxDB measurement 

          1.4 Angelo Fausti To Do
          DM-40114

          Review system-test EFD example notebooks

          0.7 Angelo Fausti To Do
          DM-16293

          Use Kapacitor HTTP API to create alert rules programmatically

          2.8 Angelo Fausti To Do
          DM-40782

          Extend ts-salkafka authorization in Sasquatch

          0.7 Angelo Fausti To Do
          DM-18056

          QAWG-REC-36: The SQuaSH system should be closely coupled to the drill-down environment; in particular, the former should use the latter to enable drill-down functionality into particular metric values.

          Angelo Fausti To Do
          DM-18054

          QAWG-REC-34: SQuaSH should issue alerts to developers and key stakeholders on regressions in important metric values

          Angelo Fausti To Do
          DM-40914

          Deploy the Telegraf-based connectors at USDF

          1.4 Angelo Fausti To Do
          DM-35106

          Add memory and CPU limits for k8s apps based on actual usage

          3 Adam Thornton In Progress
          DM-39936

          Schema ID conflict at USDF dev

          1.4 Angelo Fausti In Progress
          DM-39230

          Restore old EFD data from Summit to USDF

          Angelo Fausti In Progress
          DM-39723

          Investigate inconsistencies in the EFD data between Summit and EFD

          4.2 Angelo Fausti In Progress
          DM-40008

          Add anti affinity to InfluxDB pod at USDF

          0.7 Angelo Fausti In Progress
          DM-40004

          Restore EFD shards 1300, 1291, 1282, 1274, and 1271 to USDF

          1.4 Angelo Fausti In Progress
          DM-39518

          Restore Summit EFD backup after major data loss at USDF

          1.4 Angelo Fausti In Progress
          DM-37460

          Restore EFD shards 1029, 1038, 1047, 1056, 1066, 1073, 1082, 1091, 1100 to USDF

          2.8 Angelo Fausti In Progress
          DM-40486

          Transfer shards 1354, 1345, 1336, 1327, 1318, 1309 to USDF

          1.4 Angelo Fausti In Progress
          DM-40075

          Create persistent volume for testing InfluxDB backup on Pillan

          1.4 Angelo Fausti In Progress
          DM-39942

          Deploy source InfluxDB for restoring historical data at Base EFD

          1.4 Angelo Fausti In Progress
           
          DM-39964

          Set appropriate resources requests for influxdb at idfdev and idfdint environments

          0.7 Angelo Fausti Done
           
          DM-39930

          Review InfluxDB limits and request configuration at summit

          1.4 Angelo Fausti Done
           
          DM-39890

          Summit maintenance window July 5th

          1.4 Angelo Fausti Done
           
          DM-39210

          Reduce retention period for the EFD database at the Summit

          1.4 Angelo Fausti Done
           
          DM-39135

          Add new lsst.camera namespace in Sasquatch

          0.7 Angelo Fausti Done
           
          DM-39087

          Deploy repairer connectors to Summit Sasquatch

          2.8 Angelo Fausti Done
           
          DM-39272

          Revert Chronograf to version 1.9.4 due to the ":" bug in the query editor

          0.7 Angelo Fausti Done
           
          DM-39010

          Summit upgrade window on May 3rd

          0.7 Angelo Fausti Done
           
          DM-38996

          Sasquatch documentation updates

          1.4 Angelo Fausti Done
           
          DM-38990

          Add new lsst.rubintv namespace in Sasquatch

          0.7 Angelo Fausti Done
           
          DM-38985

          Configure a connector for the lsst.rubintv Kafka topics in Sasquatch

          1.4 Angelo Fausti Done
           
          DM-39235

          Refresh EFD introductory notebooks and link them from Sasquatch documentation

          2.8 Angelo Fausti Done
           
          DM-39396

          SIT-COM 2023 Bootcamp preparation

          2.8 Angelo Fausti Done
           
          DM-39394

          Add new lsst.verify namespace to Sasquatch

          0.7 Angelo Fausti Done
           
          DM-39717

          Add new lsst.lf namespace in Sasquatch

          0.7 Angelo Fausti Done
           
          DM-39693

          Deploy telegraf-kafka-consumer at TTS

          2.8 Angelo Fausti Done
           
          DM-40006

          Change InfluxDB Sink connector error policy to RETRY

          1.4 Angelo Fausti Done
           
          DM-39545

          Deploy a staging InfluxDB on Manke

          2.8 Angelo Fausti Done
           
          DM-39507

          Deploy repairer connectors to recover data in Kafka at USDF

          1.4 Angelo Fausti Done
           
          DM-39677

          Enable topic creation/deletion in Kafdrop on data-dev

          0.7 Angelo Fausti Done
           
          DM-39664

          Deploy MirrorMaker 2 for replicating EFD data at the Summit (Yagan) to Base (Manke)

          7 Angelo Fausti Done
           
          DM-39658

          Example notebook to send topics to Sasquatch via the REST Proxy

          0.7 Angelo Fausti Done
           
          DM-39652

          Summit maintenance window Jun 14

          1.4 Angelo Fausti Done
           
          DM-39642

          Upgrade Sasquatch Kafka to version 3.4.0

          1.4 Angelo Fausti Done
           
          DM-39440

          Allow LOVE hostnames in the REST proxy CORS configuration

          2.8 Angelo Fausti Done
           
          DM-38877

          Refresh of the squash-api

          7 Angelo Fausti Done
           
          DM-38907

          Create sample dashboard in Chronograf to show how to display evt/cmd as vertical lines in timeseries

          4.2 Angelo Fausti Done
           
          DM-38636

          Deploy an InfluxDB Sink connectors for topics in the lsst.dm, lsst.debug and lsst.example namespaces for Sasquatch

          2.8 Angelo Fausti Done
           
          DM-38588

          Roll back Chronograf to version 1.9.4

          1.4 Angelo Fausti Done
           
          DM-37754

          Renew wildcard TLS certificate for lsst.codes

          0.7 Angelo Fausti Done
           
          DM-37752

          Make figures for FAFF2 report

          4.2 Angelo Fausti Done
           
          DM-37724

          Update EFD example notebooks after migration to USDF

          2.8 Angelo Fausti Done
           
          DM-37898

          Auto-generate Avro schemas from analysis_tools metrics' records

          2.8 Angelo Fausti Done
           
          DM-38263

          Investigate schema ID mismatch after replicating the Schema Registry topic (BTS->USDF dev)

          1.4 Angelo Fausti Done
           
          DM-38244

          Test telegraf-kafka-consumer changes

          1.4 Angelo Fausti Done
           
          DM-37377

          Evaluate confluent REST proxy for Sasquatch

          7 Angelo Fausti Done
           
          DM-38080

          RSP deployment on the BTS cluster

          4.2 Angelo Fausti Done
           
          DM-38064

          Updates to sqr-034

          1.4 Angelo Fausti Done
           
          DM-36735

          Design InfluxDB schema for analysis_tools metrics

          1.4 Angelo Fausti Done
           
          DM-37859

          Automate kafka topic creation via Saquatch REST proxy

          1.4 Angelo Fausti Done
           
          DM-37813

          Summit maintenance window Feb 1

          0.7 Angelo Fausti Done
           
          DM-37591

          Investigation on schemas for Butler metric datasets

          7 Angelo Fausti Done
           
          DM-37585

          Release kafka-connect-manager 1.0.2

          0.5 Angelo Fausti Done
           
          DM-37580

          Upgrade Strimzi Operator to version 0.32.0

          0.5 Angelo Fausti Done
           
          DM-37633

          Add support for InfluxDB tags in kafka-connect-manager

          4.2 Angelo Fausti Done
           
          DM-37685

          Create prompt-processing Kafka user in Sasquatch

          2.8 Angelo Fausti Done
           
          DM-38471

          Check Sasquatch TTS errors after ceph pool filled up

          0.7 Angelo Fausti Done
           
          DM-38462

          Investigate MirrorMakerSource connector task failures

          2.8 Angelo Fausti Done
           
          DM-38456

          Summit maintenance window Apr 5

          1.4 Angelo Fausti Done
           
          DM-37146

          Restore EFD shards 976, 987, 996, 1002, 1011, 1020 to USDF

          2.8 Angelo Fausti Done
           
          DM-38543

          Sasquatch user guide documentation

          7 Angelo Fausti Done
           
          DM-38336

          Give read access for prompt-processing user on the lsst.sal.ScriptQueue.logevent_nextVisit topic

          0.7 Angelo Fausti Done
           
          DM-38315

          Fine tune MM2 configuration for Summit->USDF replication

          1.4 Angelo Fausti Done
           
          DM-38308

          Updates on sqr-034 and sqr-068

          1.4 Angelo Fausti Done
           
          DM-38135

          Set up EFD replication from BTS to USDF dev

          2.8 Angelo Fausti Done
           
          DM-38134

          RSP installer: VaultSecret resource for the argocd-secret is note being created

          0.7 Angelo Fausti Done
           
          DM-38183

          Roll out Sasquatch at the Base

          1.4 Angelo Fausti Done
           
          DM-38147

          Summit maintenance window Mar 1

          0.7 Angelo Fausti Done
           
          DM-37976

          Restore EFD shard 1109 to USDF

          1.4 Angelo Fausti Done
           
          DM-37967

          Test REST proxy with Kafka replication

          1.4 Angelo Fausti Done
           
          DM-38483

          Strimzi operator is crashing at the usdfprod environment

          0.7 Angelo Fausti Done
           
          DM-38950

          Renew wildcard TLS certificate for lsst.codes

          0.7 Angelo Fausti Done
           
          DM-36088

          Deploy MirrorMaker2 configuration at USDF prod cluster

          5.6 Angelo Fausti Done
           
          DM-40723

          Troubleshooting offset corruption in Kafka at USDF

          1.4 Angelo Fausti Done
           
          DM-40712

          Scale up of REST Proxy to 3 replicas in Sasquatch

          0.7 Angelo Fausti Done
           
          DM-40703

          Disable EFD InfluxDB Sink connectors at USDF dev

          0.7 Angelo Fausti Done
           
          DM-40664

          Review InfluxDB Sink connector configuration across all environments

          1.4 Angelo Fausti Done
           
          DM-40662

          Fix Kapacitor alerts crosstalk at base

          1.4 Angelo Fausti Done
           
          DM-40661

          Change Strimzi log level to INFO in production environments

          0.7 Angelo Fausti Done
           
          DM-40655

          Upgrade Kafka to 3.5.1 in Sasquatch

          1.4 Angelo Fausti Done
           
          DM-40510

          Configure podAntiAffinity for Kafka in Sasquatch

          1.4 Angelo Fausti Done
           
          DM-40385

          Enable repairer connectors at the USDF EFD

          1.4 Angelo Fausti Done
           
          DM-40165

          Increase kafka data partition size on Manke

          0.7 Angelo Fausti Done
           
          DM-40071

          Increase MTM1M3 and MTMount telemetry throughput in Sasquatch

          1.4 Angelo Fausti Done
           
          DM-40070

          Troubleshooting offset corruption in Kafka at USDF

          1.4 Angelo Fausti Done
           
          DM-40272

          Example notebook to query historical data on a shard basis

          2.8 Angelo Fausti Done
           
          DM-40888

          Expose Schema Registry API in Sasquatch

          1.4 Angelo Fausti Done
           
          DM-40858

          Deploy the Telegraf-based connectors on the base cluster

          2.8 Angelo Fausti Done
           
          DM-40758

          Update Sasquatch environments documentation

          0.7 Angelo Fausti Done
           
          DM-40900

          Compare Ceph write trhoughput between Manke and Yagan

          2.8 Angelo Fausti Done
           
          DM-35638

          Write sqr-070: Rubin telegraf operator for Phalanx

          4 Adam Thornton Won't Fix
           
          DM-38949

          Refresh of the squash-api (cont.)

          Angelo Fausti Won't Fix
           
          DM-36749

          Implement Kafka consumer input plugin with the JSON parser for analysis_tools metrics

          1.4 Angelo Fausti Invalid

            Activity

            There are no comments yet on this issue.

              People

              Assignee:
              afausti Angelo Fausti
              Reporter:
              frossie Frossie Economou
              Watchers:
              Frossie Economou
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:

                  Jenkins

                  No builds found.