Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-26551

Jenkins stack-os-matrix fails due to leftover EUPS .lockDir

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: To Do
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: jenkins
    • Labels:
      None
    • Team:
      Architecture
    • Urgent?:
      No

      Description

      stack-os-matrix jobs occasionally fail with eups declare: Unable to take exclusive lock on /j/ws/stack-os-matrix/[...]: locks are held by [user=jswarm, pid=...]

      The actual lock directory in the stack looks like /project/jenkins/prod/agent-ldfc-ws-1/ws/stack-os-matrix/adacff179f/lsstsw/stack/cb4e2dc/.lockDir. Removing that and its contents resolves the problem but is manual (although a Jenkins pipeline, sqre/infra/clean-locks has been created to let anyone do this cleanup without requiring administrator intervention).

      The problem appears to be correlated with manual termination of previous jobs. The code in eups is supposed to clean up lock files and directories when interrupted, but perhaps something else is going on here.

      Either fix the manual job termination to clean up properly, or turn off locking altogether for these jobs, if that is safe and feasible.

        Attachments

          Activity

          There are no comments yet on this issue.

            People

            Assignee:
            ktl Kian-Tat Lim
            Reporter:
            ktl Kian-Tat Lim
            Watchers:
            Kian-Tat Lim
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated:

                Jenkins

                No builds found.