Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: Continuous Integration
-
Labels:None
-
Story Points:1.5
-
Epic Link:
-
Team:SQuaRE
Description
Build #206 marked jenkins-el7-5 as offline for cleanup then threw an uncaught exception for a file that could not be deleted for unknown reasons. Then, subsequent builds (#207+), which are supposed to detect nodes that were unintentionally left in an offline state by the cleanup script, failed to return the node to service.
https://ci.lsst.codes/job/sqre/job/infrastructure/job/jenkins-node-cleanup/206/consoleText
Started by timer
|
[EnvInject] - Loading node environment variables.
|
Building remotely on jenkins-master (swarm) in workspace /home/jenkins-slave/workspace/sqre/infrastructure/jenkins-node-cleanup
|
found elcapitan-1 |
Node elcapitan-1 is offline |
found elcapitan-2 |
Node elcapitan-2 is offline |
found elcapitan-3 |
Node elcapitan-3 is offline |
found jenkins-el6-1 |
node: jenkins-el6-1, free space: 434GB. Idle: true |
Skipping jenkins-el6-1 based on disk threshhold |
found jenkins-el6-2 |
node: jenkins-el6-2, free space: 172GB. Idle: true |
Skipping jenkins-el6-2 based on disk threshhold |
found jenkins-el7-1 |
node: jenkins-el7-1, free space: 523GB. Idle: true |
Skipping jenkins-el7-1 based on disk threshhold |
found jenkins-el7-2 |
node: jenkins-el7-2, free space: 441GB. Idle: false |
Skipping jenkins-el7-2 based on disk threshhold |
found jenkins-el7-3 |
node: jenkins-el7-3, free space: 811GB. Idle: true |
Skipping jenkins-el7-3 based on disk threshhold |
found jenkins-el7-4 |
node: jenkins-el7-4, free space: 1128GB. Idle: false |
Skipping jenkins-el7-4 based on disk threshhold |
found jenkins-el7-5 |
node: jenkins-el7-5, free space: 94GB. Idle: true |
Failed to delete /home/jenkins-slave/workspace: java.io.IOException: remote file operation failed: /home/jenkins-slave/workspace at hudson.remoting.Channel@17506ed0:jenkins-el7-5: java.io.IOException: Unable to delete '/home/jenkins-slave/workspace/infrastructure/update-cmirror/local_mirror/linux-64/_license-1.1-py27_0.tar.bz2'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts. |
Error with jenkins-el7-5: groovy.lang.MissingPropertyException: No such property: allJobs for class: Script1 |
found jenkins-master
|
node: jenkins-master, free space: 479GB. Idle: false |
Skipping jenkins-master based on disk threshhold
|
found lsst-dev
|
Skipping lsst-dev based on labels
|
found sierra-1 |
node: sierra-1, free space: 153GB. Idle: true |
Skipping sierra-1 based on disk threshhold |
found sierra-2 |
Node sierra-2 is offline |
found sierra-3 |
node: sierra-3, free space: 160GB. Idle: true |
Skipping sierra-3 based on disk threshhold |
### SUMMARY
|
ERRORS with: jenkins-el7-5 |
ERRORS with: jenkins-el7-5 |
Offline: elcapitan-1 |
Offline: elcapitan-2 |
Offline: elcapitan-3 |
Offline: sierra-2 |
Skipped: jenkins-el6-1 |
Skipped: jenkins-el6-2 |
Skipped: jenkins-el7-1 |
Skipped: jenkins-el7-2 |
Skipped: jenkins-el7-3 |
Skipped: jenkins-el7-4 |
Skipped: jenkins-master
|
Skipped: lsst-dev
|
Skipped: sierra-1 |
Skipped: sierra-3 |
FATAL: assert failedNodes.size() == 0 |
| | |
|
| 2 false |
[hudson.plugins.swarm.SwarmSlave[jenkins-el7-5], hudson.plugins.swarm.SwarmSlave[jenkins-el7-5]] |
Assertion failed:
|
|
assert failedNodes.size() == 0 |
| | |
|
| 2 false |
[hudson.plugins.swarm.SwarmSlave[jenkins-el7-5], hudson.plugins.swarm.SwarmSlave[jenkins-el7-5]] |
|
at org.codehaus.groovy.runtime.InvokerHelper.assertFailed(InvokerHelper.java:402) |
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.assertFailed(ScriptBytecodeAdapter.java:650) |
at Script1.run(Script1.groovy:190) |
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:585) |
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:623) |
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:594) |
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SecureGroovyScript.evaluate(SecureGroovyScript.java:168) |
at hudson.plugins.groovy.SystemGroovy.run(SystemGroovy.java:95) |
at hudson.plugins.groovy.SystemGroovy.perform(SystemGroovy.java:59) |
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) |
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:779) |
at hudson.model.Build$BuildExecution.build(Build.java:205) |
at hudson.model.Build$BuildExecution.doRun(Build.java:162) |
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:534) |
at hudson.model.Run.execute(Run.java:1720) |
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) |
at hudson.model.ResourceController.execute(ResourceController.java:98) |
at hudson.model.Executor.run(Executor.java:404) |
Finished: FAILURE
|
https://ci.lsst.codes/job/sqre/job/infrastructure/job/jenkins-node-cleanup/207/consoleText
Started by timer
|
[EnvInject] - Loading node environment variables.
|
Building remotely on jenkins-master (swarm) in workspace /home/jenkins-slave/workspace/sqre/infrastructure/jenkins-node-cleanup
|
found elcapitan-1 |
Node elcapitan-1 is offline |
found elcapitan-2 |
Node elcapitan-2 is offline |
found elcapitan-3 |
Node elcapitan-3 is offline |
found jenkins-el6-1 |
node: jenkins-el6-1, free space: 434GB. Idle: true |
Skipping jenkins-el6-1 based on disk threshhold |
found jenkins-el6-2 |
node: jenkins-el6-2, free space: 172GB. Idle: false |
Skipping jenkins-el6-2 based on disk threshhold |
found jenkins-el7-1 |
node: jenkins-el7-1, free space: 521GB. Idle: false |
Skipping jenkins-el7-1 based on disk threshhold |
found jenkins-el7-2 |
node: jenkins-el7-2, free space: 443GB. Idle: false |
Skipping jenkins-el7-2 based on disk threshhold |
found jenkins-el7-3 |
node: jenkins-el7-3, free space: 811GB. Idle: false |
Skipping jenkins-el7-3 based on disk threshhold |
found jenkins-el7-4 |
node: jenkins-el7-4, free space: 1120GB. Idle: false |
Skipping jenkins-el7-4 based on disk threshhold |
found jenkins-el7-5 |
node: jenkins-el7-5, free space: 1070GB. Idle: true |
Skipping jenkins-el7-5 based on disk threshhold |
found jenkins-master
|
node: jenkins-master, free space: 479GB. Idle: false |
Skipping jenkins-master based on disk threshhold
|
found lsst-dev
|
Skipping lsst-dev based on labels
|
found sierra-1 |
node: sierra-1, free space: 153GB. Idle: true |
Skipping sierra-1 based on disk threshhold |
found sierra-2 |
Node sierra-2 is offline |
found sierra-3 |
node: sierra-3, free space: 160GB. Idle: true |
Skipping sierra-3 based on disk threshhold |
### SUMMARY
|
Offline: elcapitan-1 |
Offline: elcapitan-2 |
Offline: elcapitan-3 |
Offline: sierra-2 |
Skipped: jenkins-el6-1 |
Skipped: jenkins-el6-2 |
Skipped: jenkins-el7-1 |
Skipped: jenkins-el7-2 |
Skipped: jenkins-el7-3 |
Skipped: jenkins-el7-4 |
Skipped: jenkins-el7-5 |
Skipped: jenkins-master
|
Skipped: lsst-dev
|
Skipped: sierra-1 |
Skipped: sierra-3 |
Finished: SUCCESS
|
Attachments
Issue Links
- relates to
-
DM-11681 update-cmirror job creating files with bad ownersihp
- Done
The delete error was valid. This is is a side-effect of the conda mirror job running in a container and writing to a bind mount volume, thus the UID does not match that of the jenkins agent role user.
That job needs to have the uid mapping fixed as well.