# Try and document running ap_pipe on the Verification Cluster with SLURM

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s: None
• Labels:
None
• Story Points:
6
• Sprint:
AP F18-1, AP F18-2
• Team:

#### Description

Suggestion is to hand-construct SLURM commands rather than write a new pipe_driver since SuperTask will obviate the latter workflow shortly.

#### Activity

Hide
Krzysztof Findeisen added a comment -

It might be useful to look at what Eric did on DM-12960.

Show
Krzysztof Findeisen added a comment - It might be useful to look at what Eric did on DM-12960 .
Hide
Meredith Rawls added a comment -

I followed Eric's example as Krzysztof suggested. This is my workflow on lsst-dev:

• Edit the files make_run_ap_pipe_conf.py, run_ap_pipe.sh, and run_ap_pipe.sl as desired
• $python make_run_ap_pipe_conf.py •$ sbatch run_ap_pipe.sl

Processing the entire hits2015 dataset through ap_pipe took just a few hours. The results may be viewed in /project/mrawls/hits2015/rerun/slurm1.

One problem was encountered in which 15 srun tasks exited with code 1. This is due to a problem with ap_association IO (see DM-15114). However, an association.db was created with size 154 MB which suggests most of the DIAObjects were stored correctly. Follow-up ticket DM-15081 will take a closer look at what was stored in the database.

Show
Meredith Rawls added a comment - I followed Eric's example as Krzysztof suggested. This is my workflow on lsst-dev: Edit the files make_run_ap_pipe_conf.py , run_ap_pipe.sh , and run_ap_pipe.sl as desired $python make_run_ap_pipe_conf.py$ sbatch run_ap_pipe.sl Processing the entire hits2015 dataset through ap_pipe took just a few hours. The results may be viewed in /project/mrawls/hits2015/rerun/slurm1 . One problem was encountered in which 15 srun tasks exited with code 1. This is due to a problem with ap_association IO (see DM-15114 ). However, an association.db was created with size 154 MB which suggests most of the DIAObjects were stored correctly. Follow-up ticket DM-15081 will take a closer look at what was stored in the database.
Hide
Meredith Rawls added a comment -

Would you please take a look at this, Eric Bellm?

Show
Meredith Rawls added a comment - Would you please take a look at this, Eric Bellm ?
Hide
Eric Bellm added a comment -

I have some suggestions on Github for making this modestly more general to ease use on other datasets.

Show
Eric Bellm added a comment - I have some suggestions on Github for making this modestly more general to ease use on other datasets.
Hide
Meredith Rawls added a comment -

Thanks for your comments and offline discussions, Eric Bellm. I pushed some updates and the scripts seem to be running fine on lsst-dev now. Can you please re-review? The scripts still assume there are ccdnums 1 through 62, but I don't think there is any other hardcoded DECam stuff. A user can pass all the main ap_pipe command-line arguments at runtime via the new prep_ap_pipe.sh script.

Show
Meredith Rawls added a comment - Thanks for your comments and offline discussions, Eric Bellm . I pushed some updates and the scripts seem to be running fine on lsst-dev now. Can you please re-review? The scripts still assume there are ccdnums 1 through 62, but I don't think there is any other hardcoded DECam stuff. A user can pass all the main ap_pipe command-line arguments at runtime via the new prep_ap_pipe.sh  script.
Hide
Meredith Rawls added a comment -

The final form of the prep_ap_pipe.sh script generates three files for an ap_pipe slurm job based on user inputs (rerun, repo, calib and template locations, filter, camera, and desired PPDB location). The three files it creates are run_ap_pipe.conf, run_ap_pipe.sh, and run_ap_pipe.sl.

After the user runs prep_ap_pipe.sh, the user should enter sbatch run_ap_pipe.sl. The slurm job will then process all CCDs (DECam or HSC) in parallel and process each visit in the input repository sequentially.

Show
Meredith Rawls added a comment - The final form of the prep_ap_pipe.sh  script generates three files for an ap_pipe  slurm job based on user inputs (rerun, repo, calib and template locations, filter, camera, and desired PPDB location). The three files it creates are run_ap_pipe.conf , run_ap_pipe.sh , and run_ap_pipe.sl . After the user runs prep_ap_pipe.sh , the user should enter  sbatch run_ap_pipe.sl . The slurm job will then process all CCDs (DECam or HSC) in parallel and process each visit in the input repository sequentially.

#### People

Assignee:
Meredith Rawls
Reporter:
John Swinbank
Reviewers:
Eric Bellm
Watchers:
Eric Bellm, John Swinbank, Krzysztof Findeisen, Meredith Rawls