Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: ctrl_mpexec, PanDA
-
Labels:
-
Story Points:2
-
Epic Link:
-
Sprint:DB_S22_12
-
Team:Data Access and Database
-
Urgent?:No
Description
Hi,
To improve the scheduling/rescheduling of failed jobs, we have discussed to return a jobreport.json file when the job terminates. I discussed with the Pilot Developer and checked the jobReport.json file in ATLAS. Here are some suggestions for this file. I put it in the attachment. It's free to add/update/remove some parts.
(Note: Pilot has some default signal handlers to catch SIGKILL, SIGTERM and other signals. It may not what you want. This jobreport.json will be used as an assistant to improve the scheduling.)
Thanks
Wen
For ATLAS, the jobReport.json is created at the end of the job.
The quanta information can be included in the jobReport. However, PanDA only knows about the node name or job name. PanDA doesn't know the quantas inside the job.
The jobReport info will be saved as a meta of the job. It can be used to analyse the quantas.