Fix Memory monitoring for Rubin PanDA jobs

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Team:
Ops Middleware
• Urgent?:
No

Shuwei Ye added a comment -

I did not realize that I should click the "merge" button. I just made the merge.

Shuwei

Michelle Gower added a comment -

Thanks Shuwei Ye for merging and helping me with testing the backport.  I have the backport in the ticket branch, but I'm having trouble getting runs through to test it.  And the one that did go through last night didn't work.  So I'm not sure what's up.

LSST stack version: r23_0_1_rc4

https://github.com/lsst/ctrl_bps branch (should only be needed on submit side):  tickets/DM-32579-v23

Here's the yaml with 3 jobs where the first and last jobs should report successful and the middle job should always fail (bad pipetask command line):

 includeConfigs: - ${CTRL_BPS_DIR}/config/bps_idf.yaml project: dev campaign: quick pipelineYaml: "${OBS_LSST_DIR}/pipelines/imsim/DRP.yaml#isr" runPreCmdOpts: "--bad" payload: payloadName: prmon/shouldFail/r23_0_1_rc4 butlerConfig: s3://butler-us-central1-panda-dev/dc2/butler-external.yaml inCollection: "2.2i/defaults/test-med-1" dataQuery: "instrument='LSSTCam-imSim' and skymap='DC2' and exposure in (214433) and detector=2" sw_image: "lsstsqre/centos:7-stack-lsst_distrib-r23_0_1_rc4"

You don't have to do anything with the branch or PR.  I will do all the github stuff once someone has verified that it actually works in the r23_0_1_rc4 environment.  Thanks again for helping.

Shuwei Ye added a comment -

You asked for the container image "lsstsqre/centos:7-stack-lsst_distrib-r23_0_1_rc4", but I could not find such image tag on https://hub.docker.com/r/lsstsqre/centos/tags.

Shuwei

Michelle Gower added a comment -

Whew.  Where did you find any error messages about that?

It is actually: lsstsqre/centos:7-stack-lsst_distrib-v23_0_1_rc4

(Not sure why the jupyter one uses an r whereas this one uses a v)

Michelle Gower added a comment -

Nevermind.  I found it in the pilot stdout.  At various points today I've clicked on something and it's told me not found, but this time I must have clicked in all the right places.

Assignee:
Shuwei Ye
Reporter:
Shuwei Ye
Reviewers:
Michelle Gower
Watchers:
Hsin-Fang Chiang, Kian-Tat Lim, Michelle Gower, Sergey Padolski, Shuwei Ye, Tim Jenness