Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-14631

end-to-end testing of the science-pipelines release process

    Details

      Description

      There are several components related to publishing an official science pipeline release:

      • jenkins instance
      • s3 bucket for eups tags/products
      • s3 bucket for doxygen builds
      • an Apache directory format indexed site to host the (s3 backed) EUPS_PKGROOT
      • a site to access doxygen
      • github org hosting git repos for eups products (uses github API)
      • remote versiondb git repo (pure git remote – not currently required to be github; must be writable)
      • https accessible copy of newinstall.sh
      • remote git repos for CI related build tools: eg., lsstsw, ci-scripts) (not required to be github; read-only)
      • docker registry hosting images for build tools and science-pipeline base images (can be read-only)
      • docker registry to which science-pipeline release images are published (must be writable)
      • squash instance to publish validate_drp results to

      To date, there has never been complete end-to-end testing of the release process external to the production environment. DM-14138 added support to sqre-codekit to allow previously hard-coded urls to be overridden but further effort is needed to allow the jenkins jobs and jenkins instance configuration to support alternative environments.

        Attachments

          Issue Links

            Activity

            Hide
            jhoblitt Joshua Hoblitt added a comment -

            A number of general cleanups and updates to sandbox-jenkins-demo have been merged, solving some minor but long running irritations.  It is also now possible to have multiple parallel deployment environments defined in the hiera tree, making it easier to seperate out secrets by environment without confusion.

            I have started work on on updating deploy-eups-pkgroot and am working on splitting common terraform patterns out of other deployments for reuse. Eg., https://github.com/lsst-sqre/terraform-gke-std

            Show
            jhoblitt Joshua Hoblitt added a comment - A number of general cleanups and updates to sandbox-jenkins-demo have been merged, solving some minor but long running irritations.  It is also now possible to have multiple parallel deployment environments defined in the hiera tree, making it easier to seperate out secrets by environment without confusion. I have started work on on updating deploy-eups-pkgroot and am working on splitting common terraform patterns out of other deployments for reuse. Eg., https://github.com/lsst-sqre/terraform-gke-std
            Hide
            jhoblitt Joshua Hoblitt added a comment - - edited

            Course summary of work on this ticket:

            • https://github.com/lsst-sqre/terraform-gke-std was split out of lsst-sqre/deploy-publish-release to provide a basic reusable gke configuration
            • lsst-sqre/deploy-eups-pkgroot was renamed to lsst-sqre/deploy-publish-release and refactored to be "pure" terraform rather than a mix of tf and k8s yaml config files – this eliminated glue scripts to support the tf -> k8s -> tf workflow AND the need to manually export secrets to jenkins, which had to be done for every jenkins job test end
            • lsst-sqre/sandbox-jenkins-demo had a general tf update/cleanup and now imports s3 "remote state" from lsst-sqre/deploy-publish-release to configure secrets/credentials
            • A error handling bug in lsst-sqre/sqre-codekit was discovered and fixed
            • lsst-sqre/deploy-eups-redirect was renamed to lsst-sqre/terraform-pkgroot-redirect and refactored to be a tf module used by lsst-sqre/deploy-publish-release
            • The hiera hierarchy in lsst-sqre/sandbox-jenkins-demo was refactored to support in-tree configuration for multiple deployment envs largely eliminating the need for a working branch per jenkins test env with configuration changes
            • linting of shell/yaml/markdown/docker/terraform/make was added, as appropriate, to most repositories that were touched in the course of this ticket
            • Hardcoded values and assumptions in lsst-sqre/jenkins-dm-jobs about the production environment (URLs, github repo names, etc.) were identified and either moved into yaml configuration files or pipeline logic was changed to pass more explicit information to triggered jobs.
            • There were multiple "cleanup" refactors of jenkins jobs including consistent build parameter handling, de-duplication of code, logic rewrites to hopefully improve the clarity of the code, a shift towards using groovy named parameters for methods with an arity > ~2.
            • A new release/official-release job was created to automate / allow testing of the "official" release workflow
            • The "demo" handling in lsst-sqre/ci-scripts was refactored to try to find a git ref matching the lsstsw "BRANCH" list – this was needed as changed merged to sci-pipe code right after v16_0_rc1 was cut required breaking changes to the "demo"
            • MANIFEST_ID or "manifest id" is now consistently used to refer to a manifest in a "versiondb" in place of BUILD=, BUILD_ID, bNNNN, bxxxx, etc.

            An [almost] end-to-end release was demonstrated in a test env. The exception were validate_drp data was not shipped to a "squash" env nor was the experimental documenteer docs built pushed to an "lsst-the-docs" instance.

            All changes have now been merged to master. I'm going to leave this ticket open in self review for a day or two in case of fallout.

            Show
            jhoblitt Joshua Hoblitt added a comment - - edited Course summary of work on this ticket: https://github.com/lsst-sqre/terraform-gke-std was split out of lsst-sqre/deploy-publish-release to provide a basic reusable gke configuration lsst-sqre/deploy-eups-pkgroot was renamed to lsst-sqre/deploy-publish-release and refactored to be "pure" terraform rather than a mix of tf and k8s yaml config files – this eliminated glue scripts to support the tf -> k8s -> tf workflow AND the need to manually export secrets to jenkins, which had to be done for every jenkins job test end lsst-sqre/sandbox-jenkins-demo had a general tf update/cleanup and now imports s3 "remote state" from lsst-sqre/deploy-publish-release to configure secrets/credentials A error handling bug in lsst-sqre/sqre-codekit was discovered and fixed lsst-sqre/deploy-eups-redirect was renamed to lsst-sqre/terraform-pkgroot-redirect and refactored to be a tf module used by lsst-sqre/deploy-publish-release The hiera hierarchy in lsst-sqre/sandbox-jenkins-demo was refactored to support in-tree configuration for multiple deployment envs largely eliminating the need for a working branch per jenkins test env with configuration changes linting of shell/yaml/markdown/docker/terraform/make was added, as appropriate, to most repositories that were touched in the course of this ticket Hardcoded values and assumptions in lsst-sqre/jenkins-dm-jobs about the production environment (URLs, github repo names, etc.) were identified and either moved into yaml configuration files or pipeline logic was changed to pass more explicit information to triggered jobs. There were multiple "cleanup" refactors of jenkins jobs including consistent build parameter handling, de-duplication of code, logic rewrites to hopefully improve the clarity of the code, a shift towards using groovy named parameters for methods with an arity > ~2. A new release/official-release job was created to automate / allow testing of the "official" release workflow The "demo" handling in lsst-sqre/ci-scripts was refactored to try to find a git ref matching the lsstsw "BRANCH" list – this was needed as changed merged to sci-pipe code right after v16_0_rc1 was cut required breaking changes to the "demo" MANIFEST_ID or "manifest id" is now consistently used to refer to a manifest in a "versiondb" in place of BUILD= , BUILD_ID , bNNNN , bxxxx , etc. An [almost] end-to-end release was demonstrated in a test env. The exception were validate_drp data was not shipped to a "squash" env nor was the experimental documenteer docs built pushed to an "lsst-the-docs" instance. All changes have now been merged to master. I'm going to leave this ticket open in self review for a day or two in case of fallout.
            Hide
            jhoblitt Joshua Hoblitt added a comment -

            Fallout noticed so far:

            • science-pipelines/lsst_distrib failed trying to run the doxygen build/push on OSX
            • release/nightly-release failed tagging git repos as it appears to have been using a stale lsstsqre/codekit image – the docker tag seems to have been lost and the image should always be force pulled anyways.
            Show
            jhoblitt Joshua Hoblitt added a comment - Fallout noticed so far: science-pipelines/lsst_distrib failed trying to run the doxygen build/push on OSX release/nightly-release failed tagging git repos as it appears to have been using a stale lsstsqre/codekit image – the docker tag seems to have been lost and the image should always be force pulled anyways.
            Hide
            jhoblitt Joshua Hoblitt added a comment -

            All known fallout has been resolved.

            Show
            jhoblitt Joshua Hoblitt added a comment - All known fallout has been resolved.

              People

              • Assignee:
                jhoblitt Joshua Hoblitt
                Reporter:
                jhoblitt Joshua Hoblitt
                Reviewers:
                Joshua Hoblitt
                Watchers:
                Gabriele Comoretto, Joshua Hoblitt
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel