Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-33481

Middleware: jobReport from LSST executable

    XMLWordPrintable

    Details

    • Story Points:
      2
    • Sprint:
      DB_S22_12
    • Team:
      Data Access and Database
    • Urgent?:
      No

      Description

      Hi,

      To improve the scheduling/rescheduling of failed jobs, we have discussed to return a jobreport.json file when the job terminates. I discussed with the Pilot Developer and checked the jobReport.json file in ATLAS. Here are some suggestions for this file. I put it in the attachment. It's free to add/update/remove some parts.

      (Note: Pilot has some default signal handlers to catch SIGKILL, SIGTERM and other signals. It may not what you want. This jobreport.json will be used as an assistant to improve the scheduling.)

       

      Thanks

      Wen

       

        Attachments

          Issue Links

            Activity

            Hide
            wguan25 Wen Guan added a comment -

            For ATLAS, the jobReport.json is created at the end of the job.

            The quanta information can be included in the jobReport. However, PanDA only knows about the node name or job name. PanDA doesn't know the quantas inside the job.

            The jobReport info will be saved as a meta of the job. It can be used to analyse the quantas.

            Show
            wguan25 Wen Guan added a comment - For ATLAS, the jobReport.json is created at the end of the job. The quanta information can be included in the jobReport. However, PanDA only knows about the node name or job name. PanDA doesn't know the quantas inside the job. The jobReport info will be saved as a meta of the job. It can be used to analyse the quantas.
            Hide
            salnikov Andy Salnikov added a comment -

            Summary of what has been implemented so far on this ticket:

            • I added a couple of pydantic classes (QuantumReport and JobReport) which contain the info that we want to save to jobreport.json
            • pipetask has new option --job-report to specify the path for jobreport file
            • SingleQuantumExecutor makes an instance of QuantumReport for each executed quantum
            • MPGraphExecutor collects those QuantumReports, packages them into JobReport together with other job-level ionformation
            • pipetask gets JobReport from MPGraphExecutor and saves it to JSON file
            • the information in JobReport is not complete in that sense that exitCode is not know to the process itself, I expect there would be some post-processing of the JSON file to fill missing pieces

            An example of the file generated by pipetask:

            {
              "status": "success",
              "cmdLine": "/home/salnikov/gen3-middleware/ctrl_mpexec/bin/pipetask --long-log --log-level=INFO run -j 12 -b /home/salnikov/gen3-middleware/ci_hsc_gen3/DATA/butler.yaml --input HSC/defaults --output HSC/runs/ci_hsc --register-dataset-types --mock --job-report report1.json --qgraph /tmp/tmp.HvjiKZF77T.qgraph",
              "quantaReports": [
                {
                  "status": "success",
                  "dataId": {
                    "instrument": "HSC",
                    "detector": 10,
                    "exposure": 903342
                  },
                  "taskLabel": "isr",
                  "exitCode": 0
                },
                {
                  "status": "success",
                  "dataId": {
                    "instrument": "HSC",
                    "detector": 11,
                    "exposure": 903344
                  },
                  "taskLabel": "isr",
                  "exitCode": 0
                },
            ................................
                {
                  "status": "success",
                  "dataId": {
                    "instrument": "HSC",
                    "skymap": "discrete/ci_hsc",
                    "tract": 0
                  },
                  "taskLabel": "consolidateForcedSourceOnDiaObjectTable",
                  "exitCode": 0
                }
              ]
            }
            

            (It does not include attributes that are not populated such as exitCode for the whole job).

            Show
            salnikov Andy Salnikov added a comment - Summary of what has been implemented so far on this ticket: I added a couple of pydantic classes (QuantumReport and JobReport) which contain the info that we want to save to jobreport.json pipetask has new option --job-report to specify the path for jobreport file SingleQuantumExecutor makes an instance of QuantumReport for each executed quantum MPGraphExecutor collects those QuantumReports, packages them into JobReport together with other job-level ionformation pipetask gets JobReport from MPGraphExecutor and saves it to JSON file the information in JobReport is not complete in that sense that exitCode is not know to the process itself, I expect there would be some post-processing of the JSON file to fill missing pieces An example of the file generated by pipetask: { "status": "success", "cmdLine": "/home/salnikov/gen3-middleware/ctrl_mpexec/bin/pipetask --long-log --log-level=INFO run -j 12 -b /home/salnikov/gen3-middleware/ci_hsc_gen3/DATA/butler.yaml --input HSC/defaults --output HSC/runs/ci_hsc --register-dataset-types --mock --job-report report1.json --qgraph /tmp/tmp.HvjiKZF77T.qgraph", "quantaReports": [ { "status": "success", "dataId": { "instrument": "HSC", "detector": 10, "exposure": 903342 }, "taskLabel": "isr", "exitCode": 0 }, { "status": "success", "dataId": { "instrument": "HSC", "detector": 11, "exposure": 903344 }, "taskLabel": "isr", "exitCode": 0 }, ................................ { "status": "success", "dataId": { "instrument": "HSC", "skymap": "discrete/ci_hsc", "tract": 0 }, "taskLabel": "consolidateForcedSourceOnDiaObjectTable", "exitCode": 0 } ] } (It does not include attributes that are not populated such as exitCode for the whole job).
            Hide
            salnikov Andy Salnikov added a comment -

            Michelle Gower, thanks for agreeing to review it! It's reasonably small amount of code, and if anything is missing from report classes we can add it on this or later tickets.

            Show
            salnikov Andy Salnikov added a comment - Michelle Gower , thanks for agreeing to review it! It's reasonably small amount of code, and if anything is missing from report classes we can add it on this or later tickets.
            Hide
            mgower Michelle Gower added a comment -

            I have an overall question about whether "job" should be changed to something more pipetask-y like executor. And I'm not sure the behavior I saw with a failed quantum was what is expected. And a few other misc items. It would be nice if changing the "job" to get that done before in a stack, but if the "pruned"/skipped statuses need to wait until another ticket that's fine.

            Show
            mgower Michelle Gower added a comment - I have an overall question about whether "job" should be changed to something more pipetask-y like executor. And I'm not sure the behavior I saw with a failed quantum was what is expected. And a few other misc items. It would be nice if changing the "job" to get that done before in a stack, but if the "pruned"/skipped statuses need to wait until another ticket that's fine.
            Hide
            salnikov Andy Salnikov added a comment -

            Thanks for review and all suggestions! Merged.

            Show
            salnikov Andy Salnikov added a comment - Thanks for review and all suggestions! Merged.

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              wguan25 Wen Guan
              Reviewers:
              Michelle Gower
              Watchers:
              Andy Salnikov, Michelle Gower, Shuwei Ye, Tim Jenness, Torre Wenaus, Wen Guan
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.