Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: None
-
Labels:None
-
Story Points:1
-
Epic Link:
-
Team:Data Facility
Description
+underlined text+When completing DM-13578, it was recognized that some of the SLURM IDs gave the following error when finding their information via sacct due to the GPSF outage on 24 Oct. 2017:
JobID JobName NNodes Elapsed State ExitCode
|
------------ ---------- -------- ---------- ---------- --------
|
Conflicting JOB_STEP record for jobstep 95210.0 at line 263284 -- ignoring it |
Conflicting JOB_STEP record for jobstep 95211.0 at line 263288 -- ignoring it |
Conflicting JOB_TERMINATED record (COMPLETED) for job 95210 at line 263355 -- ignoring it |
Conflicting JOB_TERMINATED record (COMPLETED) for job 95211 at line 263359 -- ignoring it |
95210 mtWide 3 00:04:28 NODE_FAIL 127:0 |
95210.0 hydra_pmi+ 3 00:04:27 FAILED 7:0 |
95210.1 hydra_pmi+ 3 07:44:48 COMPLETED 0:0 |
95211 mtCosmos 4 00:04:04 NODE_FAIL 127:0 |
95211.0 hydra_pmi+ 4 00:04:04 FAILED 7:0 |
95211.1 hydra_pmi+ 4 10:10:13 COMPLETED 0:0 |
So while the job initially failed, it was later run successfully with the same JobID. Modify usage.py to allow for the inclusion of such jobs.
Attachments
Issue Links
- relates to
-
DM-13699 Modify usage.py and usageplot.py to allow for color-coded plots
- Done
-
DM-13783 Make color-coded node-usage plot for S17B HSC PDR1 reprocessing and find total node-hours
- Done
-
DM-13815 Find elapsed code times from usage.py/usageplot.py
- Done
-
DM-13816 Modify usage.py to allow the user to specify SLURM job names.
- Done
-
DM-13818 Modify usage.py to output node-hours that take into account significant figures
- Done
-
DM-13819 Create a top-level script to run both usage.py and usageplot.py
- Done
-
DM-14054 Add command line option for resolution in usage.py
- Done
-
DM-14111 Overhaul usage.py Readme file and delete key_len variable
- Done
-
DM-13547 Plot the node utilization of the w_2018_03 RC1 reprocessing
- Done
-
DM-13578 Plot the node utilization for RC1 Reprocessed Jobs
- Done
-
DM-13618 Find node-hour usage from usage codes
- Done
The code has been completed and the review should be fairly quick.