Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-14888

osx tarball builds hitting 8 hour timelimit; allow per configuration timelimits

    Details

      Description

      Currently, all eups tarball builds are given an 8 hour timelimit. However, within the last week, OSX builds have been frequently hitting this limit and being restarted. It isn't clear if this is due to the build taking more wallclock time or some sort of deadlock.

      Examples from the last 4 days:

      https://ci.lsst.codes/blue/organizations/jenkins/release%2Ftarball/detail/tarball/2935/pipeline/
      https://ci.lsst.codes/blue/organizations/jenkins/release%2Ftarball/detail/tarball/2933/pipeline/
      https://ci.lsst.codes/blue/organizations/jenkins/release%2Ftarball/detail/tarball/2928/pipeline/
      https://ci.lsst.codes/blue/organizations/jenkins/release%2Ftarball/detail/tarball/2909/pipeline/

      8 hours is already excessive for a linux tarball build, it seems reasonable to allow a per build configuration value so that the osx timelimit may be increased (and decreased for linux).

        Attachments

          Activity

          Hide
          jhoblitt Joshua Hoblitt added a comment -

          There are also several OSX tarball builds that have failed with eups distrib: [Errno 54] Connection reset by peer. Ultimately, eups should retry upon HTTP errors but I suspect this may be a sign of issues with either the macpros or the external networking at NOAO as this doesn't seem to happen, at least not frequently, with the linux/docker builds.

          A possible explanation of the osx slow down could be other VMs on the same macpro being in use at the same time. There's not much that could be done about that except purchasing additional mac pros.

          Show
          jhoblitt Joshua Hoblitt added a comment - There are also several OSX tarball builds that have failed with eups distrib: [Errno 54] Connection reset by peer . Ultimately, eups should retry upon HTTP errors but I suspect this may be a sign of issues with either the macpros or the external networking at NOAO as this doesn't seem to happen, at least not frequently, with the linux/docker builds. A possible explanation of the osx slow down could be other VMs on the same macpro being in use at the same time. There's not much that could be done about that except purchasing additional mac pros.
          Hide
          jhoblitt Joshua Hoblitt added a comment -

          Another 8 hour timeout: https://ci.lsst.codes/blue/organizations/jenkins/release%2Ftarball/detail/tarball/2949/pipeline

          I'm going to move this ticket into self review as a "wait and see" on the new 12 hour osx timelimit.

          Show
          jhoblitt Joshua Hoblitt added a comment - Another 8 hour timeout: https://ci.lsst.codes/blue/organizations/jenkins/release%2Ftarball/detail/tarball/2949/pipeline I'm going to move this ticket into self review as a "wait and see" on the new 12 hour osx timelimit.
          Hide
          jhoblitt Joshua Hoblitt added a comment -

          The OSX build seems to be completing in < 6 hours since the timelimit change was implemented. I'm planning to close this ticket tomorrow if unless there's another build timeout.

          Show
          jhoblitt Joshua Hoblitt added a comment - The OSX build seems to be completing in < 6 hours since the timelimit change was implemented. I'm planning to close this ticket tomorrow if unless there's another build timeout.
          Hide
          jhoblitt Joshua Hoblitt added a comment -

          The longest osx tarball build in the last few days has been 5 hours 6 mins – perhaps just lucky with cached state? I'm going to close this ticket out but it should be reopened if builds > 8 hours are observed again for further investigation.

          Show
          jhoblitt Joshua Hoblitt added a comment - The longest osx tarball build in the last few days has been 5 hours 6 mins – perhaps just lucky with cached state? I'm going to close this ticket out but it should be reopened if builds > 8 hours are observed again for further investigation.

            People

            • Assignee:
              jhoblitt Joshua Hoblitt
              Reporter:
              jhoblitt Joshua Hoblitt
              Reviewers:
              Joshua Hoblitt
              Watchers:
              Gabriele Comoretto, Joshua Hoblitt, Kian-Tat Lim, Tim Jenness
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Summary Panel