Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-32066

BPS jobs with memory autoscaling enabled remain idle after the first run attempt

    XMLWordPrintable

    Details

      Description

      Eli Rykoff reported that the BPS jobs for which automatic memory scaling was enabled remain idle in the job queue on the verification cluster at NCSA if the first run attempt failed due to insufficient memory.

      Preliminary investigation suggests that there is an issue with the ClassAd expression governing memory scaling which prevents HTCondor from finding a matching resource. From the output generated by condor_q -better-analyze -reverse 1836647.0:

      Job 1836647.0 has the following attributes:
       
          TARGET.JobUniverse = 5
          TARGET.Nodeset = "NORMAL"
          TARGET.NumCkpts = 0
          TARGET.RequestCpus = 1
          TARGET.RequestDisk = 3750
          TARGET.RequestMemory = error
          TARGET.Walltime = 259200
      

      For future reference, I'm attaching the full output (1836647.0-analyze.out) as well as the job ClassAd (1836647.0-classad.out).

        Attachments

          Activity

          mkowalik Mikolaj Kowalik created issue -
          mkowalik Mikolaj Kowalik made changes -
          Field Original Value New Value
          Labels ctr
          mkowalik Mikolaj Kowalik made changes -
          Labels ctr
          mkowalik Mikolaj Kowalik made changes -
          Component/s ctrl_bps [ 18701 ]
          mkowalik Mikolaj Kowalik made changes -
          Status To Do [ 10001 ] In Progress [ 3 ]
          mkowalik Mikolaj Kowalik made changes -
          Reviewers Michelle Gower [ mgower ]
          Status In Progress [ 3 ] In Review [ 10004 ]
          mgower Michelle Gower made changes -
          Status In Review [ 10004 ] Reviewed [ 10101 ]
          mkowalik Mikolaj Kowalik made changes -
          Resolution Done [ 10000 ]
          Status Reviewed [ 10101 ] Done [ 10002 ]
          mkowalik Mikolaj Kowalik made changes -
          Labels backport-v23
          mgower Michelle Gower made changes -
          Labels backport-v23 backport-v23 gen3-middleware
          yusra Yusra AlSayyad made changes -
          Labels backport-v23 gen3-middleware backport-approved backport-v23 gen3-middleware
          mkowalik Mikolaj Kowalik made changes -
          Labels backport-approved backport-v23 gen3-middleware backport-approved backport-done backport-v23 gen3-middleware

            People

            Assignee:
            mkowalik Mikolaj Kowalik
            Reporter:
            mkowalik Mikolaj Kowalik
            Reviewers:
            Michelle Gower
            Watchers:
            Michelle Gower, Mikolaj Kowalik
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Jenkins

                No builds found.