Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-16449

Instrument the SQuaSH API with Telegraf

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      We noticed a significant performance degradation on the SQuaSH API after DM-16300 when posting verification jobs to InfluxDB. I suspect that the problem is flask, redis, and celery running on the same pod with increasing memory usage so that the pods get evicted. But we need more instrumentation to understand what's going one. I have started this with the honeycomb python client. Since we are using influxdb+chronograf for the science pipelines metrics I think telegraf is a good option for the SQuaSH API monitoring.

        Attachments

          Issue Links

            Activity

            No work has yet been logged on this issue.

              People

              • Assignee:
                afausti Angelo Fausti
                Reporter:
                afausti Angelo Fausti
                Watchers:
                Angelo Fausti
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel