Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-10351

Concise progress tracking for long-running jobs

    Details

    • Type: Improvement
    • Status: To Do
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: pipe_base
    • Labels:
      None
    • Templates:
    • Story Points:
      3
    • Team:
      Data Access and Database

      Description

      It would be desirable to have some way to inspect a long-running job to assess its fractional progress. When a BatchCmdLineTask is launched, it gets a wall-time limit based on an estimate of its execution time; however, if this is estimate is off, the job may time out before completion, thus potentially wasting cluster time on an incomplete job. There is always a well-defined set of data that is being iterated over, and so there should be some way to track status. E.g., I'd like to be able to glance at some sort of status file and see how complete the job is, and an (updated) estimate of how long it has left. That way if need be I can update to the slurm time limit to the running job. Something like a tqdm progress bar output to a file would be ideal, but I don't know how well that would work with MPI; there are apparently suggestions it can be made to work with multiprocessing, but I don't know if we can get that to work in our case. But even if not, something easier would be nice.

        Attachments

          Container Issues

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                tmorton Tim Morton
                Watchers:
                Fritz Mueller, John Swinbank, Tim Morton
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Summary Panel