Details

    • Sprint:
      DevOps Sprint 1, DevOps Sprint 2, DevOps Sprint 3
    • Team:
      SQuaRE

      Description

      When an operation initiated on the buildslave fails, either send email to [lsst-data] (or equivalent developer list) if it was an lsst-build failure or to the buildbot nanny if it was an error when setting up the build process.

        Attachments

          Issue Links

            Activity

            Hide
            robyn Robyn Allsman [X] (Inactive) added a comment -

            Starting to add email support to notify a mail list of lsst developers that a build has failed. I propose to use a mail list which only contains DM developers and perhaps a few select others rather than [lsst-data].

            I'll check if [lsst-devel] or [dm-devel] are already setup.

            Show
            robyn Robyn Allsman [X] (Inactive) added a comment - Starting to add email support to notify a mail list of lsst developers that a build has failed. I propose to use a mail list which only contains DM developers and perhaps a few select others rather than [lsst-data] . I'll check if [lsst-devel] or [dm-devel] are already setup.
            Hide
            ktl Kian-Tat Lim added a comment -

            I suggest we use HipChat instead of E-mail. That may help change the culture towards using HC more.

            Show
            ktl Kian-Tat Lim added a comment - I suggest we use HipChat instead of E-mail. That may help change the culture towards using HC more.
            Hide
            robyn Robyn Allsman [X] (Inactive) added a comment -

            DM-703 is to explore use of HipChat for buildbot failure notifications.

            Show
            robyn Robyn Allsman [X] (Inactive) added a comment - DM-703 is to explore use of HipChat for buildbot failure notifications.
            Hide
            robyn Robyn Allsman [X] (Inactive) added a comment - - edited

            The attached files show the various status emails which occur:
            1) Successful build, unittest, doxygen generation, and comparison
            2) Failure due to fatal 'scons' build result
            3) Failure due to fatal unittest result
            4) Failure due to random error - system blocked, missing resource, failed doxydoc generation, failed end-to-end testing, etc.

            The email recipients are chosen based on how the buildbot job was triggered. If a user personally triggered the build using the DM_stack "ForceBuild" form, the email (success and failure) will only be sent to the email address provided by the user.
            If the DM_stack build was triggered by the gitolite-repo-change trigger, then the failure output will be sent to the listserv: [lsst-dm-dev].

            During development, the consensus formed that all status reports from the repo-change trigger should be sent to a listserv address. So redirecting based on the type of error was not included in this version. It is trivial to re-institute if the consensus swings back.

            Future work related to triggers and email notification:
            DM-703 adds HipChat notification on every build (success or failure).
            DM-955 adds stash-repo-change trigger to handle stash repositories.
            DM-956 adds to the status email the branches used during a build. Incorporating non-master branches is only possible during user-triggered ForceBuild.

            Show
            robyn Robyn Allsman [X] (Inactive) added a comment - - edited The attached files show the various status emails which occur: 1) Successful build, unittest, doxygen generation, and comparison 2) Failure due to fatal 'scons' build result 3) Failure due to fatal unittest result 4) Failure due to random error - system blocked, missing resource, failed doxydoc generation, failed end-to-end testing, etc. The email recipients are chosen based on how the buildbot job was triggered. If a user personally triggered the build using the DM_stack "ForceBuild" form, the email (success and failure) will only be sent to the email address provided by the user. If the DM_stack build was triggered by the gitolite-repo-change trigger, then the failure output will be sent to the listserv: [lsst-dm-dev] . During development, the consensus formed that all status reports from the repo-change trigger should be sent to a listserv address. So redirecting based on the type of error was not included in this version. It is trivial to re-institute if the consensus swings back. Future work related to triggers and email notification: DM-703 adds HipChat notification on every build (success or failure). DM-955 adds stash-repo-change trigger to handle stash repositories. DM-956 adds to the status email the branches used during a build. Incorporating non-master branches is only possible during user-triggered ForceBuild.
            Hide
            robyn Robyn Allsman [X] (Inactive) added a comment - - edited

            Perry,
            This is so similar to the Ticket just given to you for review, I ask you to review this also. Again, If you can not do the review, let me know.

            This review covers the existence of the email generation and delivery. The review of the buildbot master configuration file (where mail is formatted generated) is a different ticket which you will be spared from reviewing.

            Please comment on additions or changes to the format which you would find helpful.

            To start that off, some planned upgrades are:

            • there is a ticket to update the email output to include the release stack build number which has a format of 'bNNN'. Buildbot deals with the BB number (aka BB#) which is a monotonically increasing integer number for each buildbot invocation. NOTE: This modification was just added to the buildbot master configuration. If you 'Force' a build, you'll see the new layout which includes the EUPS-tag and the git branches used during the build.
            • Shortly there will be a ticket to create a permanent and accessible mapping of the BB# and the bNNN. The users are interested (I hope) in the BB# since is is used to point to the STDIO file form the entire stack build. The bNNN is needed because the daily life of the developer revolves around the stack tagged alternately by the bNNN tags and/or the DM Release tags.

            There are samples of the various types of email that are generated.

            Show
            robyn Robyn Allsman [X] (Inactive) added a comment - - edited Perry, This is so similar to the Ticket just given to you for review, I ask you to review this also. Again, If you can not do the review, let me know. This review covers the existence of the email generation and delivery. The review of the buildbot master configuration file (where mail is formatted generated) is a different ticket which you will be spared from reviewing. Please comment on additions or changes to the format which you would find helpful. To start that off, some planned upgrades are: there is a ticket to update the email output to include the release stack build number which has a format of 'bNNN'. Buildbot deals with the BB number (aka BB#) which is a monotonically increasing integer number for each buildbot invocation. NOTE: This modification was just added to the buildbot master configuration. If you 'Force' a build, you'll see the new layout which includes the EUPS-tag and the git branches used during the build. Shortly there will be a ticket to create a permanent and accessible mapping of the BB# and the bNNN. The users are interested (I hope) in the BB# since is is used to point to the STDIO file form the entire stack build. The bNNN is needed because the daily life of the developer revolves around the stack tagged alternately by the bNNN tags and/or the DM Release tags. There are samples of the various types of email that are generated.
            Hide
            pgee Perry Gee added a comment -

            This code at line of master.cfg in DM-947 is incorrect. It should read:

            a = re.compile("(?<=BUILD ID: )b[0-9]+")

            I basically reviewed 947, as it doesn't look like the change described here is actually on 477. I am trusting that at the end of this checkin process, that master.cfg will look the same as in DM-947. Except for this one bug, I think that the email has the information which is required.
            -------------------------------------
            I know this review is not about the website, but two things to consider:

            1. I think it is important that the build number and the BB number both be visible from both the waterfall and builders/DM_stack/builds/NNN pages. Since only the instigator of the build gets the email, any other user will be forced to search the stdio manually.

            2. The link from the website to the build output is called "stdio", whereas the same line from the email is called "logs".

            Show
            pgee Perry Gee added a comment - This code at line of master.cfg in DM-947 is incorrect. It should read: a = re.compile("(?<=BUILD ID: )b [0-9] +") I basically reviewed 947, as it doesn't look like the change described here is actually on 477. I am trusting that at the end of this checkin process, that master.cfg will look the same as in DM-947 . Except for this one bug, I think that the email has the information which is required. ------------------------------------- I know this review is not about the website, but two things to consider: 1. I think it is important that the build number and the BB number both be visible from both the waterfall and builders/DM_stack/builds/NNN pages. Since only the instigator of the build gets the email, any other user will be forced to search the stdio manually. 2. The link from the website to the build output is called "stdio", whereas the same line from the email is called "logs".
            Hide
            robyn Robyn Allsman [X] (Inactive) added a comment -

            Wow, good catch on the 0-8 vs 0-9. I will also update the email to reference the 'stdio log'.

            Your additional comments on the web interface are all very useful to me. Currently I've been using the canned web templates provided by buildbot. There is some limited capability for what I can do to personalize them. Information that is known at the start of the build can be displayed. Information gleaned mid-way through the build is unavailable. So I can display the branches used for a build but not the eups-tag since that datum is actually created 1/2 way through the build.

            On a positive note, this release of buildbot is supposed to have much better capability for allowing me to write my own webpage for display. I haven't really looked into it yet so I can't promise anything. But maybe there is a way to create a webpage with the info you and others have requested.

            Thank you very much for handling this review on short notice.

            Show
            robyn Robyn Allsman [X] (Inactive) added a comment - Wow, good catch on the 0-8 vs 0-9. I will also update the email to reference the 'stdio log'. Your additional comments on the web interface are all very useful to me. Currently I've been using the canned web templates provided by buildbot. There is some limited capability for what I can do to personalize them. Information that is known at the start of the build can be displayed. Information gleaned mid-way through the build is unavailable. So I can display the branches used for a build but not the eups-tag since that datum is actually created 1/2 way through the build. On a positive note, this release of buildbot is supposed to have much better capability for allowing me to write my own webpage for display. I haven't really looked into it yet so I can't promise anything. But maybe there is a way to create a webpage with the info you and others have requested. Thank you very much for handling this review on short notice.

              People

              • Assignee:
                robyn Robyn Allsman [X] (Inactive)
                Reporter:
                robyn Robyn Allsman [X] (Inactive)
                Reviewers:
                Perry Gee
                Watchers:
                Kian-Tat Lim, Perry Gee
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel

                    Time Tracking

                    Estimated:
                    Original Estimate - 1 week, 3 days
                    1w 3d
                    Remaining:
                    Remaining Estimate - 0 minutes
                    0m
                    Logged:
                    Time Spent - 1 week, 2 days Time Not Required
                    1w 2d