Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-414

Configure slurm to accept jobs to use only partial nodes on lsst-dev verification cluster

    XMLWordPrintable

    Details

    • Type: RFC
    • Status: Retired
    • Resolution: Done
    • Component/s: DM
    • Labels:
      None

      Description

      As described in IHS-576, it would be useful for the productivity of the development team for the verification cluster at NCSA to accept jobs that use partial nodes. As it stands now, if I launch a job via slurm that requests only a single core, it will take up a whole node (48 cores). This is a very inefficient way to use resources when jobs are not actually using all those cores, and it often leads to an unnecessary backlog in the queue while many cores sit idle on fully allocated nodes. I have experienced this, as has Lauren MacArthur, Nate Lust, and probably others.

      I am proposing (as requested in IHS-576) that the VC be configured to allow partial nodes to be allocated. The Princeton Slurm clusters (e.g., tiger, perseus, della, etc.) are all configured such that partial nodes can be used, and this can be very useful, e.g. for launching job arrays of many serial jobs. It will make the VC much more useful to the development team.

        Attachments

          Issue Links

            Activity

            Hide
            pdomagala Paul Domagala [X] (Inactive) added a comment -

            IHS-612 Implement debug and normal queues for developers on the verification cluster will probably be implemented before break. IHS-576 Configure slurm to accept jobs to use only partial nodes shortly after break.

            Show
            pdomagala Paul Domagala [X] (Inactive) added a comment - IHS-612 Implement debug and normal queues for developers on the verification cluster will probably be implemented before break. IHS-576 Configure slurm to accept jobs to use only partial nodes shortly after break.
            Hide
            tjenness Tim Jenness added a comment -

            Paul Domagala [X] I think you are telling me that this RFC can be Adopted and that IHS-612 and IHS-576 are the triggered work. When those tickets are closed the RFC can be marked as Implemented. If you do not reply I will assume that my interpretation is correct and adjust this ticket accordingly.

            Show
            tjenness Tim Jenness added a comment - Paul Domagala [X] I think you are telling me that this RFC can be Adopted and that IHS-612 and IHS-576 are the triggered work. When those tickets are closed the RFC can be marked as Implemented. If you do not reply I will assume that my interpretation is correct and adjust this ticket accordingly.
            Hide
            pdomagala Paul Domagala [X] (Inactive) added a comment -

            Tim Jenness your interpretation is correct.

            Show
            pdomagala Paul Domagala [X] (Inactive) added a comment - Tim Jenness your interpretation is correct.
            Hide
            tjenness Tim Jenness added a comment -

            Adopting this based on feedback from Paul Domagala [X]. Work is being done in IHS-576 and IHS-612.

            Show
            tjenness Tim Jenness added a comment - Adopting this based on feedback from Paul Domagala [X] . Work is being done in IHS-576 and IHS-612.
            Hide
            tjenness Tim Jenness added a comment -

            This RFC has been superseded by work on HTCondor.

            Show
            tjenness Tim Jenness added a comment - This RFC has been superseded by work on HTCondor.

              People

              Assignee:
              pdomagala Paul Domagala [X] (Inactive)
              Reporter:
              tmorton Tim Morton [X] (Inactive)
              Watchers:
              Andrew Loftus [X] (Inactive), Bob Armstrong, Brian Van Klaveren, Hsin-Fang Chiang, Jim Bosch, John Parejko, John Swinbank, Kian-Tat Lim, Paul Domagala [X] (Inactive), Paul Price, Pim Schellart [X] (Inactive), Tim Jenness, Tim Morton [X] (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              13 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Planned End:

                  Jenkins

                  No builds found.