# tssw jenkins agent jenkins-el7-1 running out of disk space

XMLWordPrintable

## Details

• Type: Bug
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
0.125
• Epic Link:
• Team:
SQuaRE

## Description

The jenkins-el7-1 has dropped below the free disk space limit several times in the last few weeks. It seems plausible there is an issue with workspace cleanup and that this agent may need a larger volume.

## Attachments

1. autoPopulation.png
245 kB

## Activity

Hide
Joshua Hoblitt added a comment -

It appears that this morning the free disk space is up to ~90GiB. I've gone ahead and updated the cleanup script to be in sync with the current DM version. Triggering a forced cleanup hit directories that can't be removed:

 .... trying to delete: /j/ws/_ts_mt_hexRot_middleware_develop Failed to delete /j/ws/_ts_mt_hexRot_middleware_develop: jenkins.util.io.CompositeIOException: Unable to delete '/j/ws/_ts_mt_hexRot_middleware_develop'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts. 

I am investigating.

Show
Joshua Hoblitt added a comment - It appears that this morning the free disk space is up to ~90GiB. I've gone ahead and updated the cleanup script to be in sync with the current DM version. Triggering a forced cleanup hit directories that can't be removed: .... trying to delete: /j/ws/_ts_mt_hexRot_middleware_develop Failed to delete /j/ws/_ts_mt_hexRot_middleware_develop: jenkins.util.io.CompositeIOException: Unable to delete '/j/ws/_ts_mt_hexRot_middleware_develop' . Tried 3 times (of a maximum of 3 ) waiting 0.1 sec between attempts. I am investigating.
Hide
Joshua Hoblitt added a comment -

There are many (more than a dozen) job workspaces that are owned uid/gid 1891:1891:

 [root@jenkins-el7-1 ws]# ls -lad _ts_mt_hexRot_middleware_develop* drwxr-xr-x 14 1891 1891 242 Sep 30 09:45 _ts_mt_hexRot_middleware_develop drwxr-sr-x 2 jenkins-slave jenkins-slave 6 Sep 30 09:45 _ts_mt_hexRot_middleware_develop_tmp drwxr-sr-x 2 jenkins-slave jenkins-slave 6 Sep 30 09:44 _ts_mt_hexRot_middleware_develop@tmp 

I've added a root crontab entry to fix up permissions once per hour:

 10 * * * * find /j/ws \! -user jenkins-slave -exec chown jenkins-slave:jenkins-slave {} \; 

Show
Joshua Hoblitt added a comment - There are many (more than a dozen) job workspaces that are owned uid/gid 1891:1891 : [root @jenkins -el7- 1 ws]# ls -lad _ts_mt_hexRot_middleware_develop* drwxr-xr-x 14 1891 1891 242 Sep 30 09 : 45 _ts_mt_hexRot_middleware_develop drwxr-sr-x 2 jenkins-slave jenkins-slave 6 Sep 30 09 : 45 _ts_mt_hexRot_middleware_develop_tmp drwxr-sr-x 2 jenkins-slave jenkins-slave 6 Sep 30 09 : 44 _ts_mt_hexRot_middleware_develop @tmp I've added a root crontab entry to fix up permissions once per hour: 10 * * * * find /j/ws \! -user jenkins-slave -exec chown jenkins-slave:jenkins-slave {} \;
Hide
Joshua Hoblitt added a comment -

Fixing the permissions got a "force cleanup" working but there was still ~90GiB of files space under /j/ws that weren't being cleaned up so I did a manual delete.

Show
Joshua Hoblitt added a comment - Fixing the permissions got a "force cleanup" working but there was still ~90GiB of files space under /j/ws that weren't being cleaned up so I did a manual delete.
Hide
Joshua Hoblitt added a comment -

Free disk space for jenkins-el7-1 is now showing as 246.63 GB. I've set the free space threshold back to 50 GiB. If this becomes a problem again I will likely expand the volume.

Show
Joshua Hoblitt added a comment - Free disk space for jenkins-el7-1 is now showing as 246.63 GB. I've set the free space threshold back to 50 GiB . If this becomes a problem again I will likely expand the volume.
Hide
Andy Clements added a comment -

Joshua Hoblitt  Thanks for looking into this.  Rob Bovill, Tiago Ribeiro, Te-Wei Tsai - Can someone look into this?  Is the hexRot job new?

Show
Andy Clements added a comment - Joshua Hoblitt   Thanks for looking into this.  Rob Bovill , Tiago Ribeiro , Te-Wei Tsai - Can someone look into this?  Is the hexRot job new?
Hide
Te-Wei Tsai added a comment - - edited

Joshua Hoblitt I am the owner of ts_mt_hexRot_middleware repo. That repo was automatically on TSSW instance by the auto-population of "LSST Telescope & Site". Please help to remove that one.

Thanks!

BTW, the test of ts_mt_hexRot_middleware repo is on the Jenkins instance by T&S team now.

Show
Te-Wei Tsai added a comment - - edited Joshua Hoblitt I am the owner of ts_mt_hexRot_middleware repo. That repo was automatically on TSSW instance by the auto-population of "LSST Telescope & Site". Please help to remove that one. Thanks! BTW, the test of ts_mt_hexRot_middleware repo is on the Jenkins instance by T&S team now.
Hide
Te-Wei Tsai added a comment -

To be clear, the TSSW jenkins instance hold by DM team uses the root authority for the docker image. The TSSW jenkins instance hold by T&S team uses the jenkinsuser (uid: 1004, not the root) for the docker image. This is why you will see the error here.

Show
Te-Wei Tsai added a comment - To be clear, the TSSW jenkins instance hold by DM team uses the root authority for the docker image. The TSSW jenkins instance hold by T&S team uses the jenkinsuser (uid: 1004, not the root) for the docker image. This is why you will see the error here.
Hide
Joshua Hoblitt added a comment -

Te-Wei Tsai Are you saying that the job https://ts-ci.lsst.codes/blue/organizations/jenkins/lsst-ts%2Fts_mt_hexRot_middleware/branches/ should not exist? It looks like there are Jenkinsfiles at the tips of branches.

Show
Joshua Hoblitt added a comment - Te-Wei Tsai Are you saying that the job https://ts-ci.lsst.codes/blue/organizations/jenkins/lsst-ts%2Fts_mt_hexRot_middleware/branches/ should not exist? It looks like there are Jenkinsfiles at the tips of branches.
Hide
Tiago Ribeiro added a comment -

I removed all the triggers from that build. It should not run anymore.

Show
Tiago Ribeiro added a comment - I removed all the triggers from that build. It should not run anymore.
Hide
Te-Wei Tsai added a comment -

Joshua Hoblitt The Jenkinsfile of ts_mt_hexRot_middleware is designed for the Jenkins instance of T&S team to work around the permission issue (uid: 1004). Therefore, it will fail on the Jenkins instance by DM team. Actually, I thought the auto-population of "LSST Telescope & Site" on DM's Jenkins instance may not be a good idea. Thanks!

Show
Te-Wei Tsai added a comment - Joshua Hoblitt The Jenkinsfile of ts_mt_hexRot_middleware is designed for the Jenkins instance of T&S team to work around the permission issue (uid: 1004). Therefore, it will fail on the Jenkins instance by DM team. Actually, I thought the auto-population of "LSST Telescope & Site" on DM's Jenkins instance may not be a good idea. Thanks!

## People

• Assignee:
Unassigned
Reporter:
Joshua Hoblitt
Watchers:
Andy Clements, Joshua Hoblitt, Te-Wei Tsai, Tiago Ribeiro
• Votes:
0 Vote for this issue
Watchers:
4 Start watching this issue

## Dates

• Created:
Updated:
Resolved: