Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-13783

Make color-coded node-usage plot for S17B HSC PDR1 reprocessing and find total node-hours

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Using the SLURM JobIDs detailed here (https://confluence.lsstcorp.org/display/DM/S17B+HSC+PDR1+reprocessing), create a color-coded node-usage plot, like those created with DM-13699, and also find the total node-hours of the entire run. Put the plot here, as well as in the confluence page above.

        Attachments

          Issue Links

            Activity

            Hide
            sthrush Samantha Thrush added a comment -

            After modifying the usage codes slightly so that they would work with the job names of the SLURM jobIDs provided, I found the total node-hours used to be 9383.9, and I created the usage-s17b_hsc_pdr1.png plot (attached above). 

            The plot brings up a few questions, however.  When compared to the Verification Cluster Node Usage for PDR1 Reprocessing plot (https://confluence.lsstcorp.org/display/DM/S17B+HSC+PDR1+reprocessing?preview=/54859059/62263313/hourly.nodes.2.png), it would seem that there is a disagreement by a factor of 4 between how many nodes are utilized during the "multiband" section of the plot.  Additionally, the plot on confluence doesn't seem to include the forcedPhotCcd.py data, since those runs start about 450 hours into the project, whereas the confluence plot stops at only 250 hours into the project.  

            Show
            sthrush Samantha Thrush added a comment - After modifying the usage codes slightly so that they would work with the job names of the SLURM jobIDs provided, I found the total node-hours used to be 9383.9, and I created the usage-s17b_hsc_pdr1.png plot (attached above).  The plot brings up a few questions, however.  When compared to the Verification Cluster Node Usage for PDR1 Reprocessing plot ( https://confluence.lsstcorp.org/display/DM/S17B+HSC+PDR1+reprocessing?preview=/54859059/62263313/hourly.nodes.2.png),  it would seem that there is a disagreement by a factor of 4 between how many nodes are utilized during the "multiband" section of the plot.  Additionally, the plot on confluence doesn't seem to include the forcedPhotCcd.py data, since those runs start about 450 hours into the project, whereas the confluence plot stops at only 250 hours into the project.  
            Hide
            sthrush Samantha Thrush added a comment -

            Hsin-Fang Chiang, how was the data for the Verification Cluster Node Usage for PDR1 Reprocessing plot (https://confluence.lsstcorp.org/display/DM/S17B+HSC+PDR1+reprocessing?preview=/54859059/62263313/hourly.nodes.2.png) collected? 

            Show
            sthrush Samantha Thrush added a comment - Hsin-Fang Chiang , how was the data for the Verification Cluster Node Usage for PDR1 Reprocessing plot ( https://confluence.lsstcorp.org/display/DM/S17B+HSC+PDR1+reprocessing?preview=/54859059/62263313/hourly.nodes.2.png ) collected? 
            Hide
            hchiang2 Hsin-Fang Chiang added a comment -

            Greg Daues made the plot back then.  There was actually a plot with the x-axis extending beyond 450 hour but we somehow left it out, probably because forcedPhotCcd was a late addition to that processing campaign.           

            The factor of 4 could mean there was a bug in the old plot. Rather than reexamining the old plot, I suggest double check your plot and if you are confident, please replace the plot on the confluence page and add your results there (I believe you already have the permission to edit the page).

            Show
            hchiang2 Hsin-Fang Chiang added a comment - Greg Daues made the plot back then.  There was actually a plot with the x-axis extending beyond 450 hour but we somehow left it out, probably because forcedPhotCcd was a late addition to that processing campaign.            The factor of 4 could mean there was a bug in the old plot. Rather than reexamining the old plot, I suggest double check your plot and if you are confident, please replace the plot on the confluence page and add your results there (I believe you already have the permission to edit the page).
            Hide
            sthrush Samantha Thrush added a comment -

            Ok, I'll do that.  Thanks Hsin-Fang!

            Show
            sthrush Samantha Thrush added a comment - Ok, I'll do that.  Thanks Hsin-Fang!
            Hide
            gdaues Greg Daues added a comment -

            The original plot was based on tabulated job start and end times, and may have literally 'taken snapshots' of job/cluster activity at an instant and not accumulative usage. Perhaps there is an error in the logic somewhere there. If there is a new plot based on Slurm accounting. yes go ahead and label/declare the new results.

            Show
            gdaues Greg Daues added a comment - The original plot was based on tabulated job start and end times, and may have literally 'taken snapshots' of job/cluster activity at an instant and not accumulative usage. Perhaps there is an error in the logic somewhere there. If there is a new plot based on Slurm accounting. yes go ahead and label/declare the new results.
            Hide
            sthrush Samantha Thrush added a comment -

            Thanks for the insight, Greg!

            Show
            sthrush Samantha Thrush added a comment - Thanks for the insight, Greg!
            Hide
            sthrush Samantha Thrush added a comment - - edited

            Ok, after re-checking everything, I'm confident that my plot was correct.  I'll fix the confluence page.

            Show
            sthrush Samantha Thrush added a comment - - edited Ok, after re-checking everything, I'm confident that my plot was correct.  I'll fix the confluence page.

              People

              Assignee:
              sthrush Samantha Thrush
              Reporter:
              sthrush Samantha Thrush
              Watchers:
              Greg Daues, Hsin-Fang Chiang, Samantha Thrush
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.