Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-27431

Setup data collection services for testing Qserv at NCSA

    XMLWordPrintable

    Details

    • Type: Story
    • Status: In Progress
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:
      None
    • Story Points:
      3
    • Sprint:
      DB_F20_09, DB_S21_12, DB_F21_06
    • Team:
      Data Access and Database
    • Urgent?:
      No

      Description

      Goals

      This is a continuation of work started in DM-26100.

      The current ticket documents an effort of setting, configuring, and testing data collection and visualization services needed for testing Qserv at NCSA. Though the initial focus of the effort is KPM50, the services are also going to be used for Qserv performance testing in other contexts, such as regression testing.

      Here is an incomplete list of services to be installed or affected:

      Each instance of Qserv installed at NCSA will be instrumented with a separate collection of services to be run on the primary or sattellite master nodes of an instance. The mapping of the services to specific nodes is shown below:

      Qserv instance host service comments
      small lsst-qserv-master04 influxdb (to be installed) time-series database for collecting test data
      small lsst-qserv-master04 graphana (to be installed) Web-based visualization services for the test data
      small lsst-qserv-master03 nginx (existing) Web proxy for accessing visualization services by users
      Qserv instance host service comments
      large lsst-qserv-master02 influxdb (to be installed) time-series database for collecting test data
      large lsst-qserv-master02 graphana (to be installed) Web-based visualization services for the test data
      large lsst-qserv-master01 nginx (existing) Web proxy for accessing visualization services by users

      All services will be run inside properly configured Docker containers. All aspects of configuring and managing the services shall be documented in the Git package qserv-ncsa. Configuration and automation scripts will be put into that package too.

      Persistent data of the influxdb service would be placed at the SSD-based local filesystems of the _sattellite hosts:

      /qserv_backup
      

      missing filesystem: It seems that this filesystem is not presently mounted at lsst-qserv-master02. Get in touch and resolve this matter directly with the NCSA team. See [IHS-4334] for progress on resolving this issue.

      Accessing and using the services

      It's expected that the main test application implemented for Qserv performance (see DM-26100) would be run on the LSST development (lsst-devl*) or Verification Cluster machines (the SLURM cluster). Metrics obtained from the application would be ingested into the corresponding influxdb service via the Qserv instance's nginx server configured as a proxy.

      It would be also possible, where applicable, to run the test application inside the latest version of the qserv/qserv:deps-* container directly on the sattelite machines of the corresponding Qserv clusters.

      Next steps

      It's understood that a focus of the effort is set primarily at the infrastructure for testing Qserv. Subsequent efforts (to be documented in tickets linked to the current one) would also include:

      • configuring graphana plots for vizusualizing test data
      • studying a possibility of visualizing additional metrics on the graphana plots obtained from other sources, such as the existing monitoring services for the relevant machines, storage, and networks at NCSA
      • obtaining Qserv-specific metrics directly from the Replication System of Qserv and putting them into the influxdb database for further visualization alongside the test metrics
      • investigating a possibility of setting up a registry of the tests, which would encapsulate various information relevant to the tests, such as the name of a test, configuration parameters of the test, test conditions, the date, time and duration of the test, a version of Qserv, a link to a collection of the graphana plots, etc.
      • a Web-based "push-buttom" solution for configuring, initiating and managing tests (in this case the tests would be run on the sattelite machines)

        Attachments

          Issue Links

            Activity

            There are no comments yet on this issue.

              People

              Assignee:
              gapon Igor Gaponenko
              Reporter:
              gapon Igor Gaponenko
              Reviewers:
              Fritz Mueller
              Watchers:
              Andy Salnikov, Fritz Mueller, Igor Gaponenko, Nate Pease
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated: