Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-12888

Refactor pipe_analysis scripts to be able to process data in parallel

    Details

    • Type: Story
    • Status: To Do
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: pipe_analysis, QA
    • Labels:
      None
    • Team:
      Data Release Production

      Description

      As running pipe_analysis scripts on significant chunks of data becomes more common, it becomes more and more desirable for the scripts to be able to use cluster computing resources in the same ways that other driver scripts do. It seems like the tasks could be refactored to inherit from BatchParallelTask rather than just CmdLineTask. As of now, there doesn't seem to be a way to parallelize the computations/plot generation; Lauren MacArthur tested using the -j flag and that seemed to bomb on the cluster at lsst-dev.

      Currently, since the processing happens in serial, whenever I want to run the scripts on a significant chunk of data, I end up launching a slurm job array, with each subjob requesting a single core---which ends up clogging up the cluster. If IHS-576 gets implemented, that will help, but perhaps this refactoring would not be difficult and could get done sooner?

        Attachments

          Activity

          There are no comments yet on this issue.

            People

            • Assignee:
              Unassigned
              Reporter:
              tmorton Tim Morton
              Watchers:
              Hsin-Fang Chiang, Jim Bosch, John Swinbank, Lauren MacArthur, Paul Price, Tim Morton, Yusra AlSayyad
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:

                Summary Panel