Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-10351

Concise progress tracking for long-running jobs


    • Type: Improvement
    • Status: To Do
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: pipe_base
    • Labels:
    • Templates:
    • Story Points:
    • Team:
      Data Access and Database


      It would be desirable to have some way to inspect a long-running job to assess its fractional progress. When a BatchCmdLineTask is launched, it gets a wall-time limit based on an estimate of its execution time; however, if this is estimate is off, the job may time out before completion, thus potentially wasting cluster time on an incomplete job. There is always a well-defined set of data that is being iterated over, and so there should be some way to track status. E.g., I'd like to be able to glance at some sort of status file and see how complete the job is, and an (updated) estimate of how long it has left. That way if need be I can update to the slurm time limit to the running job. Something like a tqdm progress bar output to a file would be ideal, but I don't know how well that would work with MPI; there are apparently suggestions it can be made to work with multiprocessing, but I don't know if we can get that to work in our case. But even if not, something easier would be nice.




            • Assignee:
              tmorton Tim Morton
              Fritz Mueller, John Swinbank, Tim Morton
            • Votes:
              0 Vote for this issue
              3 Start watching this issue


              • Created:

                Summary Panel