Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-18373

error generating v17.0.1.rc1

    XMLWordPrintable

    Details

      Attachments

        Issue Links

          Activity

          Hide
          gcomoretto Gabriele Comoretto [X] (Inactive) added a comment -

          I relaunched the job to create the rc1.

          So far it seems it is going well.

          I suspect that yesterday there was a weekly test job going in parallel, hammering on GitHub at the same... and this may have generated the error:

           
          DEBUG:codekit:looking for existing tag: v17.0.1.rc1 in repo: lsst/lsst_ci
          DEBUG:codekit: not found: v17.0.1.rc1
          ERROR:codekit:1 pre-flight error(s)
          Caught: <class 'github.GithubException.GithubException'>
          In repo: lsst/galsim
          Message: error getting teams
          Exception Message: 502 {'message': 'Server Error'}
          DEBUG:codekit:github ratelimit: (3664, 5000)
          DEBUG:codekit:exit 1

          Show
          gcomoretto Gabriele Comoretto [X] (Inactive) added a comment - I relaunched the job to create the rc1. So far it seems it is going well. I suspect that yesterday there was a weekly test job going in parallel, hammering on GitHub at the same... and this may have generated the error:   DEBUG:codekit:looking for existing tag: v17.0.1.rc1 in repo: lsst/lsst_ci DEBUG:codekit: not found: v17.0.1.rc1 ERROR:codekit:1 pre-flight error(s) Caught: <class 'github.GithubException.GithubException'> In repo: lsst/galsim Message: error getting teams Exception Message: 502 {'message': 'Server Error'} DEBUG:codekit:github ratelimit: (3664, 5000) DEBUG:codekit:exit 1
          Hide
          gcomoretto Gabriele Comoretto [X] (Inactive) added a comment -

          The re-run of rc1 completed fine.

          Show
          gcomoretto Gabriele Comoretto [X] (Inactive) added a comment - The re-run of rc1 completed fine.
          Hide
          jhoblitt Joshua Hoblitt added a comment - - edited

          Gabriele Comoretto [X] As we have discussed on slack before, http 502's from github are an internal error that we can't do much about. Historically, they have come in waves for a short period of time and then aren't seen for months. This is also the reason git operation retrying was added to lsst-build and why the jenkins pipelines try to run github-tag-release 3 times before giving up.

          I took a brief look at added github request retrying to sqre-codekit upon http 5xx's. There are ~25 try blocks that would need to be updated to support operation retrying and possibly some other call sites. One way to achieve this would be to move each try operation into a wrapper method and adding a "retry" decorator (E.g., https://github.com/jd/tenacity), as there are already several types of exception that have special handling. Eg., https://github.com/lsst-sqre/sqre-codekit/blob/master/codekit/cli/github_tag_release.py#L565-L613 I fear this would end up being fairly ugly. Another option would be to write a wrapper around pygithub that supports retrying or to modify pygithub itself. It appears that there is a PR open that implements the later that is likely to be merged soon. https://github.com/PyGithub/PyGithub/pull/1002 This is by far my preferred approach as it would avoid having to modify the existing error handling and avoid expending effort refactoring for a feature which almost certainly won't be a complete fix. I've opened DM-18331 as a backlog story and am planning to log only the research time on this ticket. Does that sound reasonable?

          Show
          jhoblitt Joshua Hoblitt added a comment - - edited Gabriele Comoretto [X] As we have discussed on slack before, http 502's from github are an internal error that we can't do much about. Historically, they have come in waves for a short period of time and then aren't seen for months. This is also the reason git operation retrying was added to lsst-build and why the jenkins pipelines try to run github-tag-release 3 times before giving up. I took a brief look at added github request retrying to sqre-codekit upon http 5xx's. There are ~25 try blocks that would need to be updated to support operation retrying and possibly some other call sites. One way to achieve this would be to move each try operation into a wrapper method and adding a "retry" decorator (E.g., https://github.com/jd/tenacity ), as there are already several types of exception that have special handling. Eg., https://github.com/lsst-sqre/sqre-codekit/blob/master/codekit/cli/github_tag_release.py#L565-L613 I fear this would end up being fairly ugly. Another option would be to write a wrapper around pygithub that supports retrying or to modify pygithub itself. It appears that there is a PR open that implements the later that is likely to be merged soon. https://github.com/PyGithub/PyGithub/pull/1002 This is by far my preferred approach as it would avoid having to modify the existing error handling and avoid expending effort refactoring for a feature which almost certainly won't be a complete fix. I've opened DM-18331 as a backlog story and am planning to log only the research time on this ticket. Does that sound reasonable?

            People

            Assignee:
            jhoblitt Joshua Hoblitt
            Reporter:
            gcomoretto Gabriele Comoretto [X] (Inactive)
            Reviewers:
            Gabriele Comoretto [X] (Inactive)
            Watchers:
            Frossie Economou, Gabriele Comoretto [X] (Inactive), Joshua Hoblitt, Kian-Tat Lim, Leanne Guy, Wil O'Mullane
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Jenkins

                No builds found.