Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-14259

Try and document running ap_pipe on the Verification Cluster with SLURM

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Story Points:
      6
    • Epic Link:
    • Sprint:
      AP F18-1, AP F18-2
    • Team:
      Alert Production

      Description

      Suggestion is to hand-construct SLURM commands rather than write a new pipe_driver since SuperTask will obviate the latter workflow shortly.

        Attachments

          Issue Links

            Activity

            Hide
            krzys Krzysztof Findeisen added a comment -

            It might be useful to look at what Eric did on DM-12960.

            Show
            krzys Krzysztof Findeisen added a comment - It might be useful to look at what Eric did on DM-12960 .
            Hide
            mrawls Meredith Rawls added a comment -

            I followed Eric's example as Krzysztof suggested. This is my workflow on lsst-dev:

            • Edit the files make_run_ap_pipe_conf.py, run_ap_pipe.sh, and run_ap_pipe.sl as desired
            • $ python make_run_ap_pipe_conf.py
            • $ sbatch run_ap_pipe.sl

            Processing the entire hits2015 dataset through ap_pipe took just a few hours. The results may be viewed in /project/mrawls/hits2015/rerun/slurm1.

            One problem was encountered in which 15 srun tasks exited with code 1. This is due to a problem with ap_association IO (see DM-15114). However, an association.db was created with size 154 MB which suggests most of the DIAObjects were stored correctly. Follow-up ticket DM-15081 will take a closer look at what was stored in the database.

            Show
            mrawls Meredith Rawls added a comment - I followed Eric's example as Krzysztof suggested. This is my workflow on lsst-dev: Edit the files make_run_ap_pipe_conf.py , run_ap_pipe.sh , and run_ap_pipe.sl as desired $ python make_run_ap_pipe_conf.py $ sbatch run_ap_pipe.sl Processing the entire hits2015 dataset through ap_pipe took just a few hours. The results may be viewed in /project/mrawls/hits2015/rerun/slurm1 . One problem was encountered in which 15 srun tasks exited with code 1. This is due to a problem with ap_association IO (see DM-15114 ). However, an association.db was created with size 154 MB which suggests most of the DIAObjects were stored correctly. Follow-up ticket DM-15081 will take a closer look at what was stored in the database.
            Hide
            mrawls Meredith Rawls added a comment -

            Would you please take a look at this, Eric Bellm?

            Show
            mrawls Meredith Rawls added a comment - Would you please take a look at this, Eric Bellm ?
            Hide
            ebellm Eric Bellm added a comment -

            I have some suggestions on Github for making this modestly more general to ease use on other datasets.

            Show
            ebellm Eric Bellm added a comment - I have some suggestions on Github for making this modestly more general to ease use on other datasets.
            Hide
            mrawls Meredith Rawls added a comment -

            Thanks for your comments and offline discussions, Eric Bellm. I pushed some updates and the scripts seem to be running fine on lsst-dev now. Can you please re-review? The scripts still assume there are ccdnums 1 through 62, but I don't think there is any other hardcoded DECam stuff. A user can pass all the main ap_pipe command-line arguments at runtime via the new prep_ap_pipe.sh script.

            Show
            mrawls Meredith Rawls added a comment - Thanks for your comments and offline discussions, Eric Bellm . I pushed some updates and the scripts seem to be running fine on lsst-dev now. Can you please re-review? The scripts still assume there are ccdnums 1 through 62, but I don't think there is any other hardcoded DECam stuff. A user can pass all the main ap_pipe command-line arguments at runtime via the new prep_ap_pipe.sh  script.
            Hide
            mrawls Meredith Rawls added a comment -

            The final form of the prep_ap_pipe.sh script generates three files for an ap_pipe slurm job based on user inputs (rerun, repo, calib and template locations, filter, camera, and desired PPDB location). The three files it creates are run_ap_pipe.conf, run_ap_pipe.sh, and run_ap_pipe.sl.

            After the user runs prep_ap_pipe.sh, the user should enter sbatch run_ap_pipe.sl. The slurm job will then process all CCDs (DECam or HSC) in parallel and process each visit in the input repository sequentially.

            Show
            mrawls Meredith Rawls added a comment - The final form of the prep_ap_pipe.sh  script generates three files for an ap_pipe  slurm job based on user inputs (rerun, repo, calib and template locations, filter, camera, and desired PPDB location). The three files it creates are run_ap_pipe.conf , run_ap_pipe.sh , and run_ap_pipe.sl . After the user runs prep_ap_pipe.sh , the user should enter  sbatch run_ap_pipe.sl . The slurm job will then process all CCDs (DECam or HSC) in parallel and process each visit in the input repository sequentially.

              People

              Assignee:
              mrawls Meredith Rawls
              Reporter:
              swinbank John Swinbank
              Reviewers:
              Eric Bellm
              Watchers:
              Eric Bellm, John Swinbank, Krzysztof Findeisen, Meredith Rawls
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: