# eups.lsst.codes sync from s3 does not update objects of identical size

XMLWordPrintable

## Details

• Type: Bug
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
1
• Team:
SQuaRE

## Description

Fritz Mueller reported yesterday that the qserv-dev eups distrib tag was not updating after being published. The s3 object was confirmed to be correct but was not syncing to the k8s service. It was assumed at the time that this was a random case of s3 eventual consistency taking an excessively long time and the k8s pod was always getting an old version of the object. However, > 12 hours seems excessive for this.

Upon further investigation this morning, it appears that aws s3 sync from awscli, which is used to to perform the sync, does not checksum the local file to determine if it is in-sync with s3. All it does it look at the file size by default, and can optionally compare timestamps (which isn't enabled) – there is no option to force checksums (ie., rsync -c). This is rather unfortunate as s3 does have an ETag (md5) for all objects. Eg.,

 $aws s3api head-object --bucket eups.lsst.codes --key stack/src/tags/qserv-dev.list  {  "AcceptRanges": "bytes",  "LastModified": "Fri, 22 Jun 2018 01:14:45 GMT",  "ContentLength": 2495,  "ETag": "\"04d0d2da6b4b1107bb03453177813201\"",  "VersionId": "null",  "ContentType": "binary/octet-stream",  "Metadata": {} } $ aws s3 cp s3://eups.lsst.codes/stack/src/tags/qserv-dev.list . download: s3://eups.lsst.codes/stack/src/tags/qserv-dev.list to ./qserv-dev.list  $md5sum qserv-dev.list  04d0d2da6b4b1107bb03453177813201 qserv-dev.list  Demonstration that the s3 object and the stale eups.lsst.codes file are the same size:  [root@pkgroot-rc-jh4lf /]# grep BUILD= /var/www/html/stack/src/tags/qserv-dev.list #BUILD=b3668 [root@pkgroot-rc-jh4lf /]# ls -la /var/www/html/stack/src/tags/qserv-dev.list -rw-r--r-- 1 root root 2495 Jun 21 12:42 /var/www/html/stack/src/tags/qserv-dev.list   $ grep BUILD= qserv-dev.list  #BUILD=b3670 $ls -la qserv-dev.list  -rw-rw-r--. 1 jhoblitt jhoblitt 2495 Jun 21 18:14 qserv-dev.list  ## Attachments ## Activity Hide Joshua Hoblitt added a comment - I'm timing the performance of s3cmd sync, which does compute an md5sum to compare against the ETag by default. If it comes it < 20mins, I'm planning to switch over to it. Otherwise, aws s3 sync --exact-timestamps will be the fastest fix. Show Joshua Hoblitt added a comment - I'm timing the performance of s3cmd sync , which does compute an md5sum to compare against the ETag by default. If it comes it < 20mins, I'm planning to switch over to it. Otherwise, aws s3 sync --exact-timestamps will be the fastest fix. Hide Joshua Hoblitt added a comment - s3cmd appears to be vastly too slow to sync the entire bucket. The ultimate solution is probably to use a work queue to only copy files which have changed (which can be obtained from events on the s3 bucket). I bailed out after > 20mins:  download: 's3://eups.lsst.codes/stack/osx/10.9/clang-800.0.42.1/miniconda2-4.2.12-7c8e67/ip_diffim-13.0-28-gf4bc96c+11@DarwinX86.tar.gz' -> '/var/www/html/stack/osx/10.9/clang-800.0.42.1/miniconda2-4.2.12-7c8e67/ip_diffim-13.0-28-gf4bc96c+11@DarwinX86.tar.gz' [463 of 147924]  65536 of 13176882 0% in 0s 467.28 kB/s^CSee ya!   real 24m10.789s user 4m48.893s sys 0m22.416s  Show Joshua Hoblitt added a comment - s3cmd appears to be vastly too slow to sync the entire bucket. The ultimate solution is probably to use a work queue to only copy files which have changed (which can be obtained from events on the s3 bucket). I bailed out after > 20mins: download: 's3://eups.lsst.codes/stack/osx/10.9/clang-800.0.42.1/miniconda2-4.2.12-7c8e67/ip_diffim-13.0-28-gf4bc96c+11@DarwinX86.tar.gz' -> '/var/www/html/stack/osx/10.9/clang-800.0.42.1/miniconda2-4.2.12-7c8e67/ip_diffim-13.0-28-gf4bc96c+11@DarwinX86.tar.gz' [ 463 of 147924 ] 65536 of 13176882 0 % in 0s 467.28 kB/s^CSee ya! real 24m10.789s user 4m48.893s sys 0m22.416s Hide Joshua Hoblitt added a comment - Hmm. It looks like aws s3 sync --exact-timestamps is going to cause almost all files to be re-downloaded as well. Show Joshua Hoblitt added a comment - Hmm. It looks like aws s3 sync --exact-timestamps is going to cause almost all files to be re-downloaded as well. Hide Joshua Hoblitt added a comment - Even with the --exact-timestamps flag, awscli was able to sync thousands of files in < 10mins.  real 6m47.635s user 3m55.213s sys 0m53.356s  And a noop is ~3mins:  [root@pkgroot-rc-jh4lf ~]# time aws s3 sync --delete --exact-timestamps "s3://${S3_BUCKET}" "$WWW_ROOT"   real 2m58.257s user 1m51.405s sys 0m7.709s  Show Joshua Hoblitt added a comment - Even with the --exact-timestamps flag, awscli was able to sync thousands of files in < 10mins. real 6m47.635s user 3m55.213s sys 0m53.356s And a noop is ~3mins: [root @pkgroot -rc-jh4lf ~]# time aws s3 sync --delete --exact-timestamps "s3://${S3_BUCKET}" "$WWW_ROOT" real 2m58.257s user 1m51.405s sys 0m7.709s Hide Joshua Hoblitt added a comment - - edited A new jenkins job named sqre/infrastructure/build-s3sync has been merged to automate the build/push of the docker image. Show Joshua Hoblitt added a comment - - edited A new jenkins job named sqre/infrastructure/build-s3sync has been merged to automate the build/push of the docker image. Hide Joshua Hoblitt added a comment - I've restarted the pkgroot pod and the correct qsrev-dev file is now present. This must have been a long standing bug for any new version of a file with the exact same size – which really should only ever be a tag but I suspect this has been true of other packages that get recreated/published by eups if the jenkins agent didn't already have a cached version of the package.$ curl -sSL https://eups.lsst.codes/stack/src/tags/qserv-dev.list | grep BUILD
#BUILD=b3670

Show
Joshua Hoblitt added a comment - I've restarted the pkgroot pod and the correct qsrev-dev file is now present.  This must have been a long standing bug for any new version of a file with the exact same size – which really should only ever be a tag but I suspect this has been true of other packages that get recreated/published by eups if the jenkins agent didn't already have a cached version of the package.  \$ curl -sSL https://eups.lsst.codes/stack/src/tags/qserv-dev.list | grep BUILD #BUILD=b3670
Hide
Fritz Mueller added a comment -

Working now for me – thank you for the help!

Show
Fritz Mueller added a comment - Working now for me – thank you for the help!

## People

• Assignee:
Joshua Hoblitt
Reporter:
Joshua Hoblitt
Reviewers:
Fritz Mueller
Watchers:
Fritz Mueller, Frossie Economou, Gabriele Comoretto, Joshua Hoblitt