Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-13346

eups install fails under jenkins when @ in a path compoent

    Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Continuous Integration
    • Labels:
      None

      Description

      Jenkins will create workspace paths with @X appended to them, where X is a postive integer, in order to separate concurrent builds of the same job that are running on the same agent.  In general, this does not happen in our productive environment as only one build is allowed to use an agent at a time.  However, for reasons that aren't clear, jenkins will has been observed deciding to use the @X logic. This may be triggered by a master restart and some sort of strangess with durable tasks (DM-13343).

      Having @ as a path component seem to cause eups installation to fail on Linux but strangely not on OSX.

      [centos-6.py3] ::: Deploying eups 2.1.4
      [centos-6.py3] configure: creating ./config.status
      [centos-6.py3] config.status: creating Makefile
      [centos-6.py3] config.status: creating bin/eups
      [centos-6.py3] config.status: creating bin/eups_setup
      [centos-6.py3] config.status: creating ups/eups.table
      [centos-6.py3] Writing a csh startup script
      [centos-6.py3] Writing a sh startup script
      [centos-6.py3] Linking zsh/dash startup script from sh one for historical reasons
      [centos-6.py3] make: `git.version' is up to date.
      [centos-6.py3] Eups will use                               /home/jenkins-slave/workspace/stack-os-matrix@2/centos-6.py3/lsstsw/miniconda/bin/python
      [centos-6.py3] You will be installing ups in $EUPS_DIR   = /home/jenkins-slave/workspace/stack-os-matrix@2/centos-6.py3/lsstsw/eups/2.1.4
      [centos-6.py3] Eups will look for products in $EUPS_PATH = /home/jenkins-slave/workspace/stack-os-matrix@2/centos-6.py3/lsstsw/stack
      [centos-6.py3] Your EUPS database[s] will be               /home/jenkins-slave/workspace/stack-os-matrix@2/centos-6.py3/lsstsw/stack/ups_db
      [centos-6.py3] Your EUPS version is                        2.1.4
      [centos-6.py3] Your site configuration files will be in    /home/jenkins-slave/workspace/stack-os-matrix@2/centos-6.py3/lsstsw/stack/site
      [centos-6.py3] 
      [centos-6.py3] Please use "make install" if you want to install eups
      [centos-6.py3] Eups will use                               /home/jenkins-slave/workspace/stack-os-matrix@2/centos-6.py3/lsstsw/miniconda/bin/python
      [centos-6.py3] You will be installing ups in $EUPS_DIR   = /home/jenkins-slave/workspace/stack-os-matrix@2/centos-6.py3/lsstsw/eups/2.1.4
      [centos-6.py3] Eups will look for products in $EUPS_PATH = /home/jenkins-slave/workspace/stack-os-matrix@2/centos-6.py3/lsstsw/stack
      [centos-6.py3] Your EUPS database[s] will be               /home/jenkins-slave/workspace/stack-os-matrix@2/centos-6.py3/lsstsw/stack/ups_db
      [centos-6.py3] Your EUPS version is                        2.1.4
      [centos-6.py3] Your site configuration files will be in    /home/jenkins-slave/workspace/stack-os-matrix@2/centos-6.py3/lsstsw/stack/site
      [centos-6.py3] mkdir: cannot create directory `/home/jenkins-slave/workspace/stack-os-matrix': Permission denied
      [centos-6.py3] make: *** [install] Error 1
      

      I thought I had resolved this problem some time ago in the eups Makefile:

      https://github.com/RobertLuptonTheGood/eups/pull/103
      https://github.com/RobertLuptonTheGood/eups/pull/110

      Its possible that the fix doesn't work, there's been a regression, or there is an environmental component (Eg., the minimal env inside of a container). Regardless, it turns out that jenkins can be configured to use a different concurrent build seperator by setting a jvm hudson.slaves.WorkspaceList system property at startup.

      https://wiki.jenkins.io/display/JENKINS/Features+controlled+by+system+properties

      It also appears that editing this property via the groovy script console does not have an immediate effect, however restarting the master with this property set on the cli does... it's possible that reconnecting the agents (due to the restart) was the method of action.

      It seems more labor efficient at this point to permanently remove the usage of @ rather than continuing to battle this issue.

        Attachments

          Issue Links

            Activity

            Hide
            jhoblitt Joshua Hoblitt added a comment -

            This configuration change was tested "live" in production yesterday evening. Due to no obvious fall out over night, the configuration change has been committed to git.

            Show
            jhoblitt Joshua Hoblitt added a comment - This configuration change was tested "live" in production yesterday evening. Due to no obvious fall out over night, the configuration change has been committed to git.
            Hide
            jhoblitt Joshua Hoblitt added a comment -

            This problem "came back" after the last production deploy.  I believe I've figured out why it was broken.  There are conflicting hash variables in the puppet code that control flags to be passed to the jenkins master jvm and the one with precedence is not the one the "fix" added the hudson.slaves.WorkspaceList option too .  However, I'm puzzled as to why the "fix" ever worked at all...

            The "new fix" should be simple – remove the duplicate hashes.

            Show
            jhoblitt Joshua Hoblitt added a comment - This problem "came back" after the last production deploy.  I believe I've figured out why it was broken.  There are conflicting hash variables in the puppet code that control flags to be passed to the jenkins master jvm and the one with precedence is not the one the "fix" added the hudson.slaves.WorkspaceList option too .  However, I'm puzzled as to why the "fix" ever worked at all... The "new fix" should be simple – remove the duplicate hashes.
            Hide
            jhoblitt Joshua Hoblitt added a comment -

            The "new fix" has been deployed to production and confirmed as having taken affect. I'm going to leave this ticket in self review to do a second follow up later today.

            Show
            jhoblitt Joshua Hoblitt added a comment - The "new fix" has been deployed to production and confirmed as having taken affect. I'm going to leave this ticket in self review to do a second follow up later today.
            Hide
            jhoblitt Joshua Hoblitt added a comment -

            This issue has not reoccurred and is probably safe to consider resolved.

            Show
            jhoblitt Joshua Hoblitt added a comment - This issue has not reoccurred and is probably safe to consider resolved.

              People

              • Assignee:
                jhoblitt Joshua Hoblitt
                Reporter:
                jhoblitt Joshua Hoblitt
                Reviewers:
                Joshua Hoblitt
                Watchers:
                Ian Sullivan, Joshua Hoblitt, Paul Price, Scott Daniel
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel