Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-8156

add support for Slurm to allocateNodes.py

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: ctrl_execute
    • Labels:
      None

      Description

      Add support for allocation of HTCondor nodes through Slurm via allocateNodes.py command. This is related to work being done for DM-8154

        Attachments

          Issue Links

            Activity

            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            Not directly relevant to this ticket.. I noticed that in ctrl_orca/ups/ctrl_orca.table, there are these two setup Ray added long time ago:

            setupOptional(condor)
            setupOptional(condor_glidein)
            

            But I don't see packages named "condor" or "condor_glidein" in our github lsst. Are they still needed?

            Show
            hchiang2 Hsin-Fang Chiang added a comment - Not directly relevant to this ticket.. I noticed that in ctrl_orca/ups/ctrl_orca.table , there are these two setup Ray added long time ago: setupOptional(condor) setupOptional(condor_glidein) But I don't see packages named "condor" or "condor_glidein" in our github lsst. Are they still needed?
            Hide
            spietrowicz Steve Pietrowicz added a comment -

            Those can be removed, so I'll do that.

            Show
            spietrowicz Steve Pietrowicz added a comment - Those can be removed, so I'll do that.
            Hide
            spietrowicz Steve Pietrowicz added a comment -

            Can you take another look at this?

            After your review, I found a bug where two things could happen: A job that didn't specify node set could run on allocated nodes and nodes that weren't tagged with a node set could run jobs that did specify a node set.

            We never ran across either issue before because of the way we ran things, but because of the way we now do allocations, it's possible to have an independent job owned by the user take over nodes that it shouldn't.

            Basically, this keeps the user from accidentally screwing up either own jobs by accident.

            Show
            spietrowicz Steve Pietrowicz added a comment - Can you take another look at this? After your review, I found a bug where two things could happen: A job that didn't specify node set could run on allocated nodes and nodes that weren't tagged with a node set could run jobs that did specify a node set. We never ran across either issue before because of the way we ran things, but because of the way we now do allocations, it's possible to have an independent job owned by the user take over nodes that it shouldn't. Basically, this keeps the user from accidentally screwing up either own jobs by accident.
            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            The new changes in ctrl_platform_verification looks good to me.

            I tried runOrca and some stack ProcessCcd task and it worked great.

            Show
            hchiang2 Hsin-Fang Chiang added a comment - The new changes in ctrl_platform_verification looks good to me. I tried runOrca and some stack ProcessCcd task and it worked great.
            Hide
            spietrowicz Steve Pietrowicz added a comment -

            merged ctrl_execute and ctrl_execute
            ctrl_platform_verification was added, but will be renamed as ctrl_platform_lsstvc

            Show
            spietrowicz Steve Pietrowicz added a comment - merged ctrl_execute and ctrl_execute ctrl_platform_verification was added, but will be renamed as ctrl_platform_lsstvc

              People

              Assignee:
              spietrowicz Steve Pietrowicz
              Reporter:
              spietrowicz Steve Pietrowicz
              Reviewers:
              Hsin-Fang Chiang
              Watchers:
              Hsin-Fang Chiang, Steve Pietrowicz
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.