Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-32579

Fix Memory monitoring for Rubin PanDA jobs

    XMLWordPrintable

    Details

      Attachments

        Issue Links

          Activity

          Hide
          yesw Shuwei Ye added a comment -

          Hi Michelle Gower ,

          I did not realize that I should click the "merge" button. I just made the merge.

          Shuwei

           

          Show
          yesw Shuwei Ye added a comment - Hi Michelle Gower , I did not realize that I should click the "merge" button. I just made the merge. Shuwei  
          Hide
          mgower Michelle Gower added a comment -

          Thanks Shuwei Ye for merging and helping me with testing the backport.  I have the backport in the ticket branch, but I'm having trouble getting runs through to test it.  And the one that did go through last night didn't work.  So I'm not sure what's up.

           

          LSST stack version: r23_0_1_rc4

          https://github.com/lsst/ctrl_bps branch (should only be needed on submit side):  tickets/DM-32579-v23

          Here's the yaml with 3 jobs where the first and last jobs should report successful and the middle job should always fail (bad pipetask command line):

           

          includeConfigs:
          - ${CTRL_BPS_DIR}/config/bps_idf.yaml
          project: dev
          campaign: quick
          pipelineYaml: "${OBS_LSST_DIR}/pipelines/imsim/DRP.yaml#isr"
          runPreCmdOpts: "--bad"
          payload:
            payloadName: prmon/shouldFail/r23_0_1_rc4
            butlerConfig: s3://butler-us-central1-panda-dev/dc2/butler-external.yaml
            inCollection: "2.2i/defaults/test-med-1"
            dataQuery: "instrument='LSSTCam-imSim' and skymap='DC2' and exposure in (214433) and detector=2"
            sw_image: "lsstsqre/centos:7-stack-lsst_distrib-r23_0_1_rc4"
          

           

          You don't have to do anything with the branch or PR.  I will do all the github stuff once someone has verified that it actually works in the r23_0_1_rc4 environment.  Thanks again for helping.

           

           

          Show
          mgower Michelle Gower added a comment - Thanks Shuwei Ye  for merging and helping me with testing the backport.  I have the backport in the ticket branch, but I'm having trouble getting runs through to test it.  And the one that did go through last night didn't work.  So I'm not sure what's up.   LSST stack version: r23_0_1_rc4 https://github.com/lsst/ctrl_bps branch (should only be needed on submit side):  tickets/ DM-32579 -v23 Here's the yaml with 3 jobs where the first and last jobs should report successful and the middle job should always fail (bad pipetask command line):   includeConfigs: - ${CTRL_BPS_DIR}/config/bps_idf.yaml project: dev campaign: quick pipelineYaml: "${OBS_LSST_DIR}/pipelines/imsim/DRP.yaml#isr" runPreCmdOpts: "--bad" payload: payloadName: prmon/shouldFail/r23_0_1_rc4 butlerConfig: s3: //butler-us-central1-panda-dev/dc2/butler-external.yaml inCollection: "2.2i/defaults/test-med-1" dataQuery: "instrument='LSSTCam-imSim' and skymap='DC2' and exposure in (214433) and detector=2" sw_image: "lsstsqre/centos:7-stack-lsst_distrib-r23_0_1_rc4"   You don't have to do anything with the branch or PR.  I will do all the github stuff once someone has verified that it actually works in the r23_0_1_rc4 environment.  Thanks again for helping.    
          Hide
          yesw Shuwei Ye added a comment -

          Hi Michelle Gower ,

          You asked for the container image "lsstsqre/centos:7-stack-lsst_distrib-r23_0_1_rc4", but I could not find such image tag on https://hub.docker.com/r/lsstsqre/centos/tags.

          Shuwei

           

           

          Show
          yesw Shuwei Ye added a comment - Hi Michelle Gower , You asked for the container image "lsstsqre/centos:7-stack-lsst_distrib-r23_0_1_rc4" , but I could not find such image tag on https://hub.docker.com/r/lsstsqre/centos/tags. Shuwei    
          Hide
          mgower Michelle Gower added a comment -

          Whew.  Where did you find any error messages about that?  

          It is actually: lsstsqre/centos:7-stack-lsst_distrib-v23_0_1_rc4

          (Not sure why the jupyter one uses an r whereas this one uses a v)

          Show
          mgower Michelle Gower added a comment - Whew.  Where did you find any error messages about that?   It is actually: lsstsqre/centos:7-stack-lsst_distrib-v23_0_1_rc4 (Not sure why the jupyter one uses an r whereas this one uses a v)
          Hide
          mgower Michelle Gower added a comment -

          Nevermind.  I found it in the pilot stdout.  At various points today I've clicked on something and it's told me not found, but this time I must have clicked in all the right places.

          Show
          mgower Michelle Gower added a comment - Nevermind.  I found it in the pilot stdout.  At various points today I've clicked on something and it's told me not found, but this time I must have clicked in all the right places.

            People

            Assignee:
            yesw Shuwei Ye
            Reporter:
            yesw Shuwei Ye
            Reviewers:
            Michelle Gower
            Watchers:
            Hsin-Fang Chiang, Kian-Tat Lim, Michelle Gower, Sergey Padolski, Shuwei Ye, Tim Jenness
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Jenkins

                No builds found.