Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-12780

prune eups.lsst.code s3 backups

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Continuous Integration
    • Labels:
      None

      Description

      The daily backups for s3://eups.lsst.codes were growing unbounded. Backups have been prune down to retain only the 1st day of each month prior to the current month. Eg.

      #!/bin/bash
       
      #set -e
      set -o xtrace
       
      for m in {4..10}; do
        mon=$(printf "%02d" $m)
        for d in {2..30}; do
          day=$(printf "%02d" $d)
          aws s3 rm --recursive "s3://eups.lsst.codes-backups/2017/${mon}/${day}" &
        done
      done
      

      Either a rotation script is needed or s3 object expiration needs to be set. Ie., backups on every day but the first of the month have a 30 day expiration set on the objects.

        Attachments

          Issue Links

            Activity

            Hide
            jhoblitt Joshua Hoblitt added a comment - - edited

            Updated backup cleanup procedure:

              git clone https://github.com/jhoblitt/s3wipe -b working 
              cd s3wipe
              docker build -t s3wipe:latest . && \
                  docker run -ti \
                      -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
                      -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
                      --entrypoint=/bin/bash s3wipe:latest
              
              set -o xtrace
              
              year=2018
              for m in {1..3}; do
                mon=$(printf "%02d" $m)
                for d in {2..31}; do
                  day=$(printf "%02d" $d)
                      ./s3wipe --id $AWS_ACCESS_KEY_ID \
                          --key $AWS_SECRET_ACCESS_KEY \
                          --batchsize 10 \
                          --deletethreads 3 \
                          --listthreads 1 \
                          --maxqueue 10000 \
                          --path "s3://eups.lsst.codes-backups/${year}/${mon}/${day}" &
                done
              done
            

            Show
            jhoblitt Joshua Hoblitt added a comment - - edited Updated backup cleanup procedure: git clone https: //github.com/jhoblitt/s3wipe -b working cd s3wipe docker build -t s3wipe:latest . && \ docker run -ti \ -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \ -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \ --entrypoint=/bin/bash s3wipe:latest set -o xtrace year= 2018 for m in { 1 .. 3 }; do mon=$(printf "%02d" $m) for d in { 2 .. 31 }; do day=$(printf "%02d" $d) ./s3wipe --id $AWS_ACCESS_KEY_ID \ --key $AWS_SECRET_ACCESS_KEY \ --batchsize 10 \ --deletethreads 3 \ --listthreads 1 \ --maxqueue 10000 \ --path "s3://eups.lsst.codes-backups/${year}/${mon}/${day}" & done done
            Hide
            jhoblitt Joshua Hoblitt added a comment -

            A backup retention policy was implemented and deployed yesterday using s3 bucket lifecycle policy rules in order to prevent on-going backup accumulation requiring manual cleanup. The backup bucket now has rules attached to object prefixes of daily/ - 8 days, weekly/ - 35 days (7x5), and montly/ - 217 days (7x31). Objects under the monthly prefix are migrated to glacier after 30 days.

            The sqre/backup/s3backup-eups job was split up into separate jobs for each backup period:

            • sqre/backup/s3backup-eups-daily-cron
            • sqre/backup/s3backup-eups-weekly-cron
            • sqre/backup/s3backup-eups-montly-cron

            The new -daily-cron job was tested as working yesterday. However, the cron triggered build this morning exited non-zero.

            copy failed: s3://****/stack/redhat/el7/devtoolset-6/miniconda3-4.3.21-10a4fa6/log4cxx-0.10.0.lsst7@Linux64.tar.gz to s3://****/daily/2018/03/29/2018-03-29T11:52:05Z/stack/redhat/el7/devtoolset-6/miniconda3-4.3.21-10a4fa6/log4cxx-0.10.0.lsst7@Linux64.tar.gz An error occurred (InvalidArgument) when calling the UploadPartCopy operation: Range specified is not valid for source object of size: 17601350
            

            Which is likely the same failure mode as described in DM-12861. It isn't clear if this is an s3 server side glitch or a bug in awscli.

            Show
            jhoblitt Joshua Hoblitt added a comment - A backup retention policy was implemented and deployed yesterday using s3 bucket lifecycle policy rules in order to prevent on-going backup accumulation requiring manual cleanup. The backup bucket now has rules attached to object prefixes of daily/ - 8 days, weekly/ - 35 days (7x5), and montly/ - 217 days (7x31). Objects under the monthly prefix are migrated to glacier after 30 days. The sqre/backup/s3backup-eups job was split up into separate jobs for each backup period: sqre/backup/s3backup-eups-daily-cron sqre/backup/s3backup-eups-weekly-cron sqre/backup/s3backup-eups-montly-cron The new -daily-cron job was tested as working yesterday. However, the cron triggered build this morning exited non-zero. copy failed: s3: //****/stack/redhat/el7/devtoolset-6/miniconda3-4.3.21-10a4fa6/log4cxx-0.10.0.lsst7@Linux64.tar.gz to s3://****/daily/2018/03/29/2018-03-29T11:52:05Z/stack/redhat/el7/devtoolset-6/miniconda3-4.3.21-10a4fa6/log4cxx-0.10.0.lsst7@Linux64.tar.gz An error occurred (InvalidArgument) when calling the UploadPartCopy operation: Range specified is not valid for source object of size: 17601350 Which is likely the same failure mode as described in DM-12861 . It isn't clear if this is an s3 server side glitch or a bug in awscli .
            Hide
            jhoblitt Joshua Hoblitt added a comment -

            I've opened an issue on `awscli`: https://github.com/aws/aws-cli/issues/3227

            Show
            jhoblitt Joshua Hoblitt added a comment - I've opened an issue on `awscli`: https://github.com/aws/aws-cli/issues/3227
            Hide
            jhoblitt Joshua Hoblitt added a comment -

            Pruning previous backups with the exception of the 1st of the month for the last 6 months is completed. At some point in the future, these will need to be manually cleaned up as well as they will not be removed by the lifecycle rules which are (intentionally) limited to object key prefix.

            I've also added retrying to the s3backup script in an attempt to work around the random s3 upload failures.

            Show
            jhoblitt Joshua Hoblitt added a comment - Pruning previous backups with the exception of the 1st of the month for the last 6 months is completed. At some point in the future, these will need to be manually cleaned up as well as they will not be removed by the lifecycle rules which are (intentionally) limited to object key prefix. I've also added retrying to the s3backup script in an attempt to work around the random s3 upload failures.

              People

              • Assignee:
                jhoblitt Joshua Hoblitt
                Reporter:
                jhoblitt Joshua Hoblitt
                Watchers:
                Joshua Hoblitt
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel