Details
-
Type:
Story
-
Status: Won't Fix
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: pipe_analysis, QA
-
Labels:None
-
Team:Data Release Production
Description
As running pipe_analysis scripts on significant chunks of data becomes more common, it becomes more and more desirable for the scripts to be able to use cluster computing resources in the same ways that other driver scripts do. It seems like the tasks could be refactored to inherit from BatchParallelTask rather than just CmdLineTask. As of now, there doesn't seem to be a way to parallelize the computations/plot generation; Lauren MacArthur tested using the -j flag and that seemed to bomb on the cluster at lsst-dev.
Currently, since the processing happens in serial, whenever I want to run the scripts on a significant chunk of data, I end up launching a slurm job array, with each subjob requesting a single core---which ends up clogging up the cluster. If IHS-576 gets implemented, that will help, but perhaps this refactoring would not be difficult and could get done sooner?
Attachments
Issue Links
- mentioned in
-
Page Loading...
After the gen3 migration, it'll be trivially parallelizable.