Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-13970

Investigate options for integrating AP with CI

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: ap_verify
    • Labels:
      None
    • Story Points:
      4
    • Epic Link:
    • Sprint:
      AP S18-5, AP S18-6
    • Team:
      Alert Production

      Description

      Take a look at how Jenkins works. Understand how jobs are scheduled. Think about how we could integrate a job which exercises the ap_verify system.

      It would be great if this included integration with SQuaSH, but if it did nothing other than pass/fail that would also be fine in the short term.

        Attachments

          Issue Links

            Activity

            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            It looks like the validate_drp job already has many of the features we want (in particular, the ability to run on multiple datasets). The CI script handles SQuaSH upload; perhaps we should consider removing this feature from ap_verify itself (which would certainly make SQuaRE happier).

            Show
            krzys Krzysztof Findeisen added a comment - - edited It looks like the validate_drp job already has many of the features we want (in particular, the ability to run on multiple datasets). The CI script handles SQuaSH upload; perhaps we should consider removing this feature from ap_verify itself (which would certainly make SQuaRE happier).
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            ci_hsc is run as part of the lsst_distrib Jenkins job. It appears to be configured in Jenkins to run daily at midnight, although I don't have the permissions to confirm this directly. Jenkins uses a cron-like syntax to configure scheduling, so we should be free to schedule CI or mid-sized jobs jobs however we like.

            validate_drp is run as part of the nightly release pipeline. At some point we may want a similar integration for ap_verify.

            Show
            krzys Krzysztof Findeisen added a comment - - edited ci_hsc is run as part of the lsst_distrib Jenkins job. It appears to be configured in Jenkins to run daily at midnight, although I don't have the permissions to confirm this directly. Jenkins uses a cron-like syntax to configure scheduling, so we should be free to schedule CI or mid-sized jobs jobs however we like. validate_drp is run as part of the nightly release pipeline . At some point we may want a similar integration for ap_verify .
            Hide
            krzys Krzysztof Findeisen added a comment -

            I propose that ap_verify have a dedicated Jenkins job for running it on multiple small (<10 images) datasets, and another Jenkins job for running it on a specific larger dataset. Much of the code for these jobs can be shared with validate_drp.groovy, possibly after some generalization.

            The small datasets job would need to:

            • Set up parallel pipeline nodes (is that the correct term?) for each test dataset, much like validate_drp does. Each dataset would be identified by a command-line name like "HiTS2015". No suitable datasets exist at present.
            • On each branch, look up the name of the data repository from ${AP_VERIFY_DIR}/config/dataset_config.yaml and perform a Git LFS checkout, assuming lsst as the repository owner. Note that validate_drp provides repository information in the Jenkins pipeline file, but ap_verify needs the name anyway to locate the downloaded package; it is best to avoid duplicating the info.
            • Perform an Eups setup of the downloaded dataset.
            • Execute python ${AP_VERIFY_DIR}/bin/ap_verify.py --dataset [dataset name] --output [outputDir] --id [dataId] --metrics-file [filename] --silent. Note that at present each instance of ap_verify can handle only one data ID, so these may need to be managed by a calling script. validate_drp provides a dataset-specific script in examples/; making similar scripts for ap_verify may be an acceptable workaround until ap_verify can do dataset management and parallelism internally (some time after the introduction of SuperTask).
            • Run a "post-QA" step to take the metrics created in the metrics files (which may, as for validate_drp, include dataset-specific specifications for pass/fail analysis) and upload them to SQuaSH.

            The larger job would be similar, except it would not need multiple pipelines for dataset management. I don't think we can use pipeline nodes to speed up large dataset handling, because each is isolated from the others.

            Both jobs would be scheduled directly with the Jenkins scheduler, exact schedule TBD.

            Show
            krzys Krzysztof Findeisen added a comment - I propose that ap_verify have a dedicated Jenkins job for running it on multiple small (<10 images) datasets, and another Jenkins job for running it on a specific larger dataset. Much of the code for these jobs can be shared with validate_drp.groovy , possibly after some generalization. The small datasets job would need to: Set up parallel pipeline nodes (is that the correct term?) for each test dataset, much like validate_drp does. Each dataset would be identified by a command-line name like "HiTS2015". No suitable datasets exist at present. On each branch, look up the name of the data repository from ${AP_VERIFY_DIR}/config/dataset_config.yaml and perform a Git LFS checkout, assuming lsst as the repository owner. Note that validate_drp provides repository information in the Jenkins pipeline file, but ap_verify needs the name anyway to locate the downloaded package; it is best to avoid duplicating the info. Perform an Eups setup of the downloaded dataset. Execute python ${AP_VERIFY_DIR}/bin/ap_verify.py --dataset [dataset name] --output [outputDir] --id [dataId] --metrics-file [filename] --silent . Note that at present each instance of ap_verify can handle only one data ID, so these may need to be managed by a calling script. validate_drp provides a dataset-specific script in examples/ ; making similar scripts for ap_verify may be an acceptable workaround until ap_verify can do dataset management and parallelism internally (some time after the introduction of SuperTask ). Run a "post-QA" step to take the metrics created in the metrics files (which may, as for validate_drp , include dataset-specific specifications for pass/fail analysis) and upload them to SQuaSH. The larger job would be similar, except it would not need multiple pipelines for dataset management. I don't think we can use pipeline nodes to speed up large dataset handling, because each is isolated from the others. Both jobs would be scheduled directly with the Jenkins scheduler, exact schedule TBD.
            Hide
            krzys Krzysztof Findeisen added a comment -

            Hi John Swinbank, can you look over the proposal in my previous post and let me know what you think?

            Show
            krzys Krzysztof Findeisen added a comment - Hi John Swinbank , can you look over the proposal in my previous post and let me know what you think?
            Hide
            krzys Krzysztof Findeisen added a comment -

            One addendum: given that lsst_ci runs some scripts from validate_drp, we may want to add AP runs to it as well.

            Show
            krzys Krzysztof Findeisen added a comment - One addendum: given that lsst_ci runs some scripts from validate_drp , we may want to add AP runs to it as well.
            Hide
            swinbank John Swinbank added a comment -

            I'm sorry I've taken so long to get to this.

            I think the proposal above is great. As we discussed yesterday, there are two complicating factors:

            • The QAWG will likely weigh in on the way jobs like this are handled in CI; and
            • We need an appropriate rep from SQuaRE to weigh in on how best to set things up in Jenkins.

            Per our discussion of 2018-06-07, I suggest that we do not block further activity on the first point, but rather should forge ahead in the interests of solving our short term problems.

            Also per our discussion, and answering the second point, I hope we'll get chance to sit down with Simon Krughoff to discuss this proposal with him soon. I don't think there's any further discussion we need at our end until that can happen, so I think it's fair to close this ticket.

            Show
            swinbank John Swinbank added a comment - I'm sorry I've taken so long to get to this. I think the proposal above is great. As we discussed yesterday, there are two complicating factors: The QAWG will likely weigh in on the way jobs like this are handled in CI; and We need an appropriate rep from SQuaRE to weigh in on how best to set things up in Jenkins. Per our discussion of 2018-06-07, I suggest that we do not block further activity on the first point, but rather should forge ahead in the interests of solving our short term problems. Also per our discussion, and answering the second point, I hope we'll get chance to sit down with Simon Krughoff to discuss this proposal with him soon. I don't think there's any further discussion we need at our end until that can happen, so I think it's fair to close this ticket.

              People

              • Assignee:
                krzys Krzysztof Findeisen
                Reporter:
                swinbank John Swinbank
                Reviewers:
                John Swinbank
                Watchers:
                John Swinbank, Krzysztof Findeisen
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel