Details
-
Type:
RFC
-
Status: Retired
-
Resolution: Done
-
Component/s: DM
-
Labels:None
Description
As described in IHS-576, it would be useful for the productivity of the development team for the verification cluster at NCSA to accept jobs that use partial nodes. As it stands now, if I launch a job via slurm that requests only a single core, it will take up a whole node (48 cores). This is a very inefficient way to use resources when jobs are not actually using all those cores, and it often leads to an unnecessary backlog in the queue while many cores sit idle on fully allocated nodes. I have experienced this, as has Lauren MacArthur, Nate Lust, and probably others.
I am proposing (as requested in IHS-576) that the VC be configured to allow partial nodes to be allocated. The Princeton Slurm clusters (e.g., tiger, perseus, della, etc.) are all configured such that partial nodes can be used, and this can be very useful, e.g. for launching job arrays of many serial jobs. It will make the VC much more useful to the development team.
IHS-612 Implement debug and normal queues for developers on the verification cluster will probably be implemented before break. IHS-576 Configure slurm to accept jobs to use only partial nodes shortly after break.