Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: None
-
Labels:None
-
Story Points:2
-
Epic Link:
-
Team:Data Facility
Description
Create node utilization vs. time plots for the following reprocessing weeks and slurm ID's :
w_2017_25: 75450, 75452, 75456, 75451, 75461, 75453, 74961, 74963, 74962, 74965, 74964, 75611, 75686, 75704, 74936, 74938, 74941, 74937, 74940, 74939, 74931, 74934, 74932, 74935, 74933, 74945, 74946, 74948, 74949, 74951, 74952, 74953, 74955, 74956, 74958, 74959, 74960, 75042, 75431, 75432, 75433, 74944
w_2017_27: 78843, 78841, 78845, 78375, 78847, 78377, 78877, 78844, 78380, 78855, 78848, 78376, 78370, 78846, 78379, 78372, 78838, 78378, 78371, 78840, 78374, 78839, 78373, 78842 (cannot find logs from mosaic)
w_2017_28: 77637, 77871, 78079, 78072, 77872, 78099, 78080, 77813, 77815, 77814, 77817, 77816, 78300, 77881, 77631, 77633, 77636, 77632, 77635, 77634, 77626, 77629, 77627, 77630, 77628, 77801, 77803, 77802, 77805, 77804, 77806, 77808, 77807, 77810, 77809, 77795, 77797, 77796, 77798, 77799, 77800
w_2017_30: 79563, 79561, 79565, 79432, 79567, 79434, 80036, 79564, 79437, 79742, 79568, 79433, 79427, 79566, 79436, 79429, 79558, 79435, 79428, 79560, 79431, 79559, 79430, 79562 (cannot find logs from mosaic)
w_2017_32: 81995, 82071, 82073, 82075, 82072, 82076, 82074, 82066, 82068, 82067, 82070, 82069, 82078, 82077, 82042, 82044, 82047, 82043, 82046, 82045, 82037, 82040, 82038, 82041, 82039, 82054, 82056, 82055, 82058, 82057, 82059, 82061, 82060, 82063, 82062, 82048, 82050, 82049, 82051, 82052, 82053
w_2017_34: 84984, 84986, 84985, 84988, 84987, 84989, 84991, 84990, 84993, 84992, 84978, 84980, 84979, 85011, 84982, 84983, 85110, 85112, 85114, 85111, 85215, 85113, 85105, 85107, 85106, 85109, 85108, 85542, 85687, 85541, 84808, 84810, 84813, 84809, 84812, 84811, 84803, 84806, 84804, 84807, 84805,84814
w_2017_36: 87154, 87661, 87663, 87662, 87665, 87664, 87666, 87668, 87667, 87670, 87669, 87655, 87657, 87656, 87658, 87659, 87660, 87689, 87691, 87693, 87690, 87694, 87692, 87684, 87686, 87685, 87688, 87687, 87777, 88034, 87776, 87142, 87144, 87147, 87143, 87146, 87145, 87137, 87140, 87138, 87141, 87139
w_2017_38: 90037, 90144, 90146, 90145, 90401, 90147, 90149, 90151, 90150, 90400, 90152, 90138, 90140, 90139, 90399, 90142, 90143, 90410, 90412, 90417, 90411, 90424, 90416, 90196, 90198, 90197, 90409, 90199, 90428, 90453, 90427, 90029, 90031, 90034, 90030, 90379, 90032, 90024, 90027, 90035, 90378, 90026
w_2017_40: 92456, 92981, 92983, 92985, 92982, 92986, 92984, 92976, 92978, 92977, 92980, 92979, 93000, 93029, 92999, 92449, 92451, 92454, 92450, 92453, 92452, 92444, 92447, 92445, 92448, 92446, 92963, 92965, 92964, 92967, 92966, 92968, 92970, 92969, 92972, 92971, 92957, 92959, 92958, 92960, 92961, 92962
w_2017_42: 94858, 94909, 94911, 94910, 94913, 94912, 94914, 94916, 94915, 94918, 94917, 94903, 94905, 94904, 94906, 94907, 94908, 95124, 95126, 95128, 95125, 95129, 95127, 95119, 95121, 95120, 95123, 95122, 95211, 95210, 94852, 94854, 94857, 94853, 94856, 94855, 94847, 94850, 94848, 94851, 94849
w_2017_44: 99690, 99741, 99743, 99745, 99742, 99746, 99744, 99736, 99738, 99737, 99740, 99739, 100062, 99752, 99751, 99696, 99698, 99701, 99697, 99700, 99699, 99691, 99694, 99692, 99695, 99693, 99725, 99727, 99726, 99729, 99728, 99730, 99732, 99731, 99734, 99733, 99719, 99721, 99720, 99722, 99723, 99724
w_2017_46: 102428, 102541, 102543, 102542, 102545, 102544, 102546, 102548, 102547, 102550, 102549, 102535, 102537, 102536, 102538, 102539, 102540, 103850, 103852, 103854, 103851, 103855, 103853, 103839, 103841, 103840, 103843, 103842, 104331, 104149, 102422, 102424, 102427, 102423, 102426, 102425, 102411, 102414, 102412, 102415, 102413
w_2017_48: 105786, 105903, 105905, 105907, 105904, 105908, 105906, 105909, 105911, 105910, 105913, 105912, 106226, 106225, 105827, 105782, 105785, 105781, 105784, 105783, 105853, 105771, 105769, 105779, 105770, 105892, 105894, 105893, 105896, 105895, 105897, 105899, 105898, 105901, 105900, 105854, 105856, 105855, 105857, 105858, 105859
w_2017_50: 106683, 106691, 106693, 106692, 106695, 106694, 106696, 106698, 106697, 106700, 106699, 106685, 106687, 106686, 106688, 106689, 106690, 106708, 106710, 106712, 106709, 106713, 106711, 106703, 106705, 106704, 106707, 106706, 106861, 106855, 106621, 106623, 106626, 106622, 106625, 106624, 106616, 106619, 106617, 106620, 106618
w_2017_52: DM-12982
w_2018_02: 107589, 107591, 107593, 107590, 107596, 107592, 107584, 107586, 107585, 107588, 107587, 107598, 107597, 107558, 107560, 107563, 107559, 107562, 107561, 107553, 107556, 107554, 107557, 107555, 107574, 107576, 107575, 107578, 107577, 107579, 107581, 107580, 107583, 107582, 107568, 107570, 107569, 107571, 107572, 107573
w_2018_03: DM-13463
Attachments
Issue Links
- relates to
-
DM-13618 Find node-hour usage from usage codes
- Done
-
DM-13619 Modify usage.py for NODE_FAIL/COMPLETED failure case
- Done
-
DM-13699 Modify usage.py and usageplot.py to allow for color-coded plots
- Done
-
DM-13783 Make color-coded node-usage plot for S17B HSC PDR1 reprocessing and find total node-hours
- Done
-
DM-13815 Find elapsed code times from usage.py/usageplot.py
- Done
-
DM-13816 Modify usage.py to allow the user to specify SLURM job names.
- Done
-
DM-13818 Modify usage.py to output node-hours that take into account significant figures
- Done
-
DM-13819 Create a top-level script to run both usage.py and usageplot.py
- Done
-
DM-14054 Add command line option for resolution in usage.py
- Done
-
DM-14111 Overhaul usage.py Readme file and delete key_len variable
- Done
-
DM-13547 Plot the node utilization of the w_2018_03 RC1 reprocessing
- Done
I've attached all of the plots requested above. Overall, the plots were exactly as expected and showed an increase in node usage over time, which is to be expected (running multiBandDriver on the Cosmos data set takes more nodes than it did on the past, among other contributing factors). This increase is best shown when comparing the results from w_2017_25 and w_2018_03:
However when I tried to run usage.py on the w_2017_42, I noticed that sacct gave the following errors (also included in usage_w42_err.out
):
JobID JobName NNodes Elapsed State ExitCode
------------ ---------- -------- ---------- ---------- --------
95210 mtWide 3 00:04:28 NODE_FAIL 127:0
95210.0 hydra_pmi+ 3 00:04:27 FAILED 7:0
95210.1 hydra_pmi+ 3 07:44:48 COMPLETED 0:0
95211 mtCosmos 4 00:04:04 NODE_FAIL 127:0
95211.0 hydra_pmi+ 4 00:04:04 FAILED 7:0
95211.1 hydra_pmi+ 4 10:10:13 COMPLETED 0:0
Initially, this caused usage.py to halt, but after removing those two visits from the jobID set and rerunning, I was able to create the plot. It should be noted that I did not encounter these errors in any of the other jobID sets provided.
However, this doesn't solve the core problem that the mtWide and mtCosmos completed successfully, but the errors given prevented them from being included in the usage.py data (and removing those visits gives an unrealistic view of the node usage in this case). In order to resolve this I'm currently modifying usage.py slightly so that I will be able to grab the needed information for those two jobIDs despite the error messages. Once I do that, I'll upload the new w_2017_42 as well as the modified usage.py.