Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-22120

ap_verify scales poorly to large runs

    Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: ap_verify, verify
    • Labels:
      None
    • Templates:
    • Story Points:
      2
    • Sprint:
      AP F19-6 (November)
    • Team:
      Alert Production

      Description

      Chris Morrison reports that ap_verify can time out when run over large datasets on lsst-dev. This is partly because of the time needed to run the pipeline itself, but also the time needed to iterate through all metrics (since MetricsControllerTask was never parallelized). ap_verify does not offer any error recovery options beyond rerunning the entire pipeline.

      Both ap_verify's current control system and MetricsControllerTask will become obsolete with Gen 3, where responsibility for workflow management (and any checkpointing) will lie with the pipeline activator. Rather than try to design proper restart behavior into ap_verify now, provide a --skip-completed command-line flag that does the following:

      • runs ap_pipe with the --reuse-outputs-from all command-line argument, which skips completed pipeline steps (currently, up through association).
      • makes MetricsControllerTask check for a job file associated with each data ID, and skips processing that data ID if the file already exists

      This flag should be enough to let us retry large runs efficiently until Gen 2 is retired.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                krzys Krzysztof Findeisen
                Reporter:
                krzys Krzysztof Findeisen
                Reviewers:
                Chris Morrison
                Watchers:
                Chris Morrison, Jim Bosch, Krzysztof Findeisen
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel