Details
-
Type:
Bug
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: Continuous Integration, Developer Infrastructure
-
Labels:None
-
Story Points:0.25
-
Epic Link:
-
Team:System Management
Description
jenkins official-releases job
Gabriele Comoretto As we have discussed on slack before, http 502's from github are an internal error that we can't do much about. Historically, they have come in waves for a short period of time and then aren't seen for months. This is also the reason git operation retrying was added to lsst-build and why the jenkins pipelines try to run github-tag-release 3 times before giving up.
I took a brief look at added github request retrying to sqre-codekit upon http 5xx's. There are ~25 try blocks that would need to be updated to support operation retrying and possibly some other call sites. One way to achieve this would be to move each try operation into a wrapper method and adding a "retry" decorator (E.g., https://github.com/jd/tenacity), as there are already several types of exception that have special handling. Eg., https://github.com/lsst-sqre/sqre-codekit/blob/master/codekit/cli/github_tag_release.py#L565-L613 I fear this would end up being fairly ugly. Another option would be to write a wrapper around pygithub that supports retrying or to modify pygithub itself. It appears that there is a PR open that implements the later that is likely to be merged soon. https://github.com/PyGithub/PyGithub/pull/1002 This is by far my preferred approach as it would avoid having to modify the existing error handling and avoid expending effort refactoring for a feature which almost certainly won't be a complete fix. I've opened
DM-18331as a backlog story and am planning to log only the research time on this ticket. Does that sound reasonable?