Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-4835

Allow slurm to request total CPUs rather than nodes*processors.

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: ctrl_pool
    • Labels:
      None
    • Story Points:
      2
    • Sprint:
      Science Pipelines DM-W16-6
    • Team:
      Data Release Production

      Description

      On some systems, we are asked to request a total number of tasks, rather than specify a combination of nodes and processors per node.

      It also makes sense to use the SMP option this way.

      This is a port of HSC-1369.

        Attachments

          Issue Links

            Activity

            Hide
            price Paul Price added a comment -

            Code's in; still working out how to test it.

            price@price-laptop:~/LSST/ctrl/pool (tickets/DM-4835=) $ git sub
            commit 24e75f691507192d21d344c560aa314bef871eb9
            Author: Paul Price <price@astro.princeton.edu>
            Date:   Tue Nov 17 15:03:22 2015 -0500
             
                parallel: use --cores to specify number of cores
                
                For some slurm systems and for SMP, the only thing that matters is
                the total number of cores requested, rather than a combination
                of nodes and number of processors per node.  Added support for
                specifying the number of cores (via --cores).  Made this optional
                for SMP (supporting the old --nodes=1 --procs=NN as a synonym for
                --cores=NN), optional for slurm, and disallowed for PBS.
                
                Cherry-picked from hscPipeBase commit fa131a6 (HSC-1369).
             
             python/lsst/ctrl/pool/parallel.py | 63 +++++++++++++++++++++++++++++++--------
             1 file changed, 51 insertions(+), 12 deletions(-)
             
            commit b37b46b867f40d00f4568df38dc15feb5a90152d
            Author: Paul Price <price@astro.princeton.edu>
            Date:   Tue Jan 5 10:29:50 2016 -0500
             
                SmpBatch: fix mpiexec arguments
                
                Need to define 'self.mpiexec' after manipulating 'self.numCores', or
                using "--nodes=1 --procs=12" will result in attempting to use 0 cores.
                
                Cherry-picked from hscPipeBase commit c82797b.
             
             python/lsst/ctrl/pool/parallel.py | 3 ++-
             1 file changed, 2 insertions(+), 1 deletion(-)
            

            Show
            price Paul Price added a comment - Code's in; still working out how to test it. price@price-laptop:~/LSST/ctrl/pool (tickets/DM-4835=) $ git sub commit 24e75f691507192d21d344c560aa314bef871eb9 Author: Paul Price <price@astro.princeton.edu> Date: Tue Nov 17 15:03:22 2015 -0500   parallel: use --cores to specify number of cores For some slurm systems and for SMP, the only thing that matters is the total number of cores requested, rather than a combination of nodes and number of processors per node. Added support for specifying the number of cores (via --cores). Made this optional for SMP (supporting the old --nodes=1 --procs=NN as a synonym for --cores=NN), optional for slurm, and disallowed for PBS. Cherry-picked from hscPipeBase commit fa131a6 (HSC-1369).   python/lsst/ctrl/pool/parallel.py | 63 +++++++++++++++++++++++++++++++-------- 1 file changed, 51 insertions(+), 12 deletions(-)   commit b37b46b867f40d00f4568df38dc15feb5a90152d Author: Paul Price <price@astro.princeton.edu> Date: Tue Jan 5 10:29:50 2016 -0500   SmpBatch: fix mpiexec arguments Need to define 'self.mpiexec' after manipulating 'self.numCores', or using "--nodes=1 --procs=12" will result in attempting to use 0 cores. Cherry-picked from hscPipeBase commit c82797b.   python/lsst/ctrl/pool/parallel.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
            Hide
            price Paul Price added a comment -

            Nate Lust, would you please review these commits? I figure you're going to be playing with this functionality soon, so it would help to get your hands dirty. One of the commits is yours, but you can assume that I've reviewed it. Commits are on tickets/DM-4835 of ctrl_pool.

            pprice@tiger-sumire:~/LSST/ctrl/pool (tickets/DM-4835=) $ git sub
            commit 24e75f691507192d21d344c560aa314bef871eb9
            Author: Paul Price <price@astro.princeton.edu>
            Date:   Tue Nov 17 15:03:22 2015 -0500
             
                parallel: use --cores to specify number of cores
                
                For some slurm systems and for SMP, the only thing that matters is
                the total number of cores requested, rather than a combination
                of nodes and number of processors per node.  Added support for
                specifying the number of cores (via --cores).  Made this optional
                for SMP (supporting the old --nodes=1 --procs=NN as a synonym for
                --cores=NN), optional for slurm, and disallowed for PBS.
                
                Cherry-picked from hscPipeBase commit fa131a6 (HSC-1369).
             
             python/lsst/ctrl/pool/parallel.py | 63 +++++++++++++++++++++++++++++++--------
             1 file changed, 51 insertions(+), 12 deletions(-)
             
            commit b37b46b867f40d00f4568df38dc15feb5a90152d
            Author: Paul Price <price@astro.princeton.edu>
            Date:   Tue Jan 5 10:29:50 2016 -0500
             
                SmpBatch: fix mpiexec arguments
                
                Need to define 'self.mpiexec' after manipulating 'self.numCores', or
                using "--nodes=1 --procs=12" will result in attempting to use 0 cores.
                
                Cherry-picked from hscPipeBase commit c82797b.
             
             python/lsst/ctrl/pool/parallel.py | 3 ++-
             1 file changed, 2 insertions(+), 1 deletion(-)
             
            commit d9c033a0c5ee6574977c701fb85a9683151138f9
            Author: Paul Price <price@astro.princeton.edu>
            Date:   Tue Feb 2 01:15:04 2016 +0000
             
                SlurmBatch: set minimum walltime of 1 min
                
                If less than 1 min, Slurm will produce an error.
             
             python/lsst/ctrl/pool/parallel.py | 2 +-
             1 file changed, 1 insertion(+), 1 deletion(-)
             
            commit 62c5339320b1819a2848762ed4e3c20f07abbfb5
            Author: Nate Lust <nlust@astro.princeton.edu>
            Date:   Wed Jan 20 14:48:05 2016 -0500
             
                Syntax bug fix
             
             python/lsst/ctrl/pool/parallel.py | 2 +-
             1 file changed, 1 insertion(+), 1 deletion(-)
             
            commit 72a676b98e689a2550679d3e8720daa367f8336c
            Author: Paul Price <price@astro.princeton.edu>
            Date:   Mon Feb 1 20:34:20 2016 -0500
             
                BatchCmdLineTask: remove unnecessary arguments for batchWallTime
                
                Now that we have the number of cores specified, that's all
                the walltime calculation needs --- it doesn't need to know
                how the cores are distributed (and indeed it can't always
                know).
             
             python/lsst/ctrl/pool/parallel.py | 23 +++++------------------
             1 file changed, 5 insertions(+), 18 deletions(-)
             
            commit a2d911792811606879d67f1418b8f098f0c3a91a
            Author: Paul Price <price@astro.princeton.edu>
            Date:   Mon Feb 1 12:30:36 2016 -0500
             
                Add script that demonstrates and tests BatchCmdLineTask
                
                The Slurm, PBS, etc. functionality can't be tested with the
                usual unit tests, but we can make it easy for the developer
                to exercise things. Here we add a very simple demonstration
                script that is intended to show how BatchCmdLineTask can
                be used, and exercise it to ensure it's working.
             
             .gitignore                             |  1 +
             bin.src/SConscript                     |  3 ++
             bin.src/ctrlPoolDemo.py                |  3 ++
             python/lsst/ctrl/pool/test/__init__.py |  1 +
             python/lsst/ctrl/pool/test/demoTask.py | 76 ++++++++++++++++++++++++++++++++++
             ups/ctrl_pool.table                    |  1 +
             6 files changed, 85 insertions(+)
             
            commit 395be8054f570338af7c39f6fbef8f0f5d930fa7
            Author: Paul Price <price@astro.princeton.edu>
            Date:   Mon Feb 1 20:15:47 2016 -0500
             
                Add file with example command-lines.
                
                This is intended to help users exercise the demo script.
             
             demo.txt | 11 +++++++++++
             1 file changed, 11 insertions(+)
            

            Show
            price Paul Price added a comment - Nate Lust , would you please review these commits? I figure you're going to be playing with this functionality soon, so it would help to get your hands dirty. One of the commits is yours, but you can assume that I've reviewed it. Commits are on tickets/ DM-4835 of ctrl_pool. pprice@tiger-sumire:~/LSST/ctrl/pool (tickets/DM-4835=) $ git sub commit 24e75f691507192d21d344c560aa314bef871eb9 Author: Paul Price <price@astro.princeton.edu> Date: Tue Nov 17 15:03:22 2015 -0500   parallel: use --cores to specify number of cores For some slurm systems and for SMP, the only thing that matters is the total number of cores requested, rather than a combination of nodes and number of processors per node. Added support for specifying the number of cores (via --cores). Made this optional for SMP (supporting the old --nodes=1 --procs=NN as a synonym for --cores=NN), optional for slurm, and disallowed for PBS. Cherry-picked from hscPipeBase commit fa131a6 (HSC-1369).   python/lsst/ctrl/pool/parallel.py | 63 +++++++++++++++++++++++++++++++-------- 1 file changed, 51 insertions(+), 12 deletions(-)   commit b37b46b867f40d00f4568df38dc15feb5a90152d Author: Paul Price <price@astro.princeton.edu> Date: Tue Jan 5 10:29:50 2016 -0500   SmpBatch: fix mpiexec arguments Need to define 'self.mpiexec' after manipulating 'self.numCores', or using "--nodes=1 --procs=12" will result in attempting to use 0 cores. Cherry-picked from hscPipeBase commit c82797b.   python/lsst/ctrl/pool/parallel.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)   commit d9c033a0c5ee6574977c701fb85a9683151138f9 Author: Paul Price <price@astro.princeton.edu> Date: Tue Feb 2 01:15:04 2016 +0000   SlurmBatch: set minimum walltime of 1 min If less than 1 min, Slurm will produce an error.   python/lsst/ctrl/pool/parallel.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)   commit 62c5339320b1819a2848762ed4e3c20f07abbfb5 Author: Nate Lust <nlust@astro.princeton.edu> Date: Wed Jan 20 14:48:05 2016 -0500   Syntax bug fix   python/lsst/ctrl/pool/parallel.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)   commit 72a676b98e689a2550679d3e8720daa367f8336c Author: Paul Price <price@astro.princeton.edu> Date: Mon Feb 1 20:34:20 2016 -0500   BatchCmdLineTask: remove unnecessary arguments for batchWallTime Now that we have the number of cores specified, that's all the walltime calculation needs --- it doesn't need to know how the cores are distributed (and indeed it can't always know).   python/lsst/ctrl/pool/parallel.py | 23 +++++------------------ 1 file changed, 5 insertions(+), 18 deletions(-)   commit a2d911792811606879d67f1418b8f098f0c3a91a Author: Paul Price <price@astro.princeton.edu> Date: Mon Feb 1 12:30:36 2016 -0500   Add script that demonstrates and tests BatchCmdLineTask The Slurm, PBS, etc. functionality can't be tested with the usual unit tests, but we can make it easy for the developer to exercise things. Here we add a very simple demonstration script that is intended to show how BatchCmdLineTask can be used, and exercise it to ensure it's working.   .gitignore | 1 + bin.src/SConscript | 3 ++ bin.src/ctrlPoolDemo.py | 3 ++ python/lsst/ctrl/pool/test/__init__.py | 1 + python/lsst/ctrl/pool/test/demoTask.py | 76 ++++++++++++++++++++++++++++++++++ ups/ctrl_pool.table | 1 + 6 files changed, 85 insertions(+)   commit 395be8054f570338af7c39f6fbef8f0f5d930fa7 Author: Paul Price <price@astro.princeton.edu> Date: Mon Feb 1 20:15:47 2016 -0500   Add file with example command-lines. This is intended to help users exercise the demo script.   demo.txt | 11 +++++++++++ 1 file changed, 11 insertions(+)
            Hide
            nlust Nate Lust added a comment -

            The only thing I really notice is in the demo.txt. It would be nice to have a line that says the specific arguments for input and rerun pertain to our test system tiger and will be necessarily different on other machines. Perhaps as a parenthetical.

            Show
            nlust Nate Lust added a comment - The only thing I really notice is in the demo.txt. It would be nice to have a line that says the specific arguments for input and rerun pertain to our test system tiger and will be necessarily different on other machines. Perhaps as a parenthetical.
            Hide
            nlust Nate Lust added a comment -

            line 396 in parallel.py has the wrong arguments to batchwall time

            Show
            nlust Nate Lust added a comment - line 396 in parallel.py has the wrong arguments to batchwall time
            Hide
            price Paul Price added a comment -

            The arguments to batchWallTime changed, and will need updating in user code (like stack.py that Nate is working on). I'm choosing not to RFC this since ctrl_pool isn't currently used beyond Nate (who is porting the first use of it into LSST land).

            Added a note in demo.txt about the commands being specific to the Princeton cluster and HSC data.

            Merged to master.

            Show
            price Paul Price added a comment - The arguments to batchWallTime changed, and will need updating in user code (like stack.py that Nate is working on). I'm choosing not to RFC this since ctrl_pool isn't currently used beyond Nate (who is porting the first use of it into LSST land). Added a note in demo.txt about the commands being specific to the Princeton cluster and HSC data. Merged to master.

              People

              Assignee:
              price Paul Price
              Reporter:
              swinbank John Swinbank
              Reviewers:
              Nate Lust
              Watchers:
              John Swinbank, Nate Lust, Paul Price
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.