Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-24260

Create Gen3 versions of ap_verify datasets

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Story Points:
      4
    • Sprint:
      AP S20-5 (April), AP S20-6 (May)
    • Team:
      Alert Production
    • Urgent?:
      No

      Description

      Produce a Gen3 version of existing ap_verify datasets, including the dataset compiled in DM-24259.

        Attachments

          Issue Links

            Activity

            No builds found.
            swinbank John Swinbank created issue -
            swinbank John Swinbank made changes -
            Field Original Value New Value
            Epic Link DM-22633 [ 427742 ]
            swinbank John Swinbank made changes -
            Link This issue blocks DM-21939 [ DM-21939 ]
            swinbank John Swinbank made changes -
            Link This issue blocks DM-22038 [ DM-22038 ]
            swinbank John Swinbank made changes -
            Link This issue is blocked by DM-24259 [ DM-24259 ]
            swinbank John Swinbank made changes -
            Description Create a “stub” dataset for testing HSC Gen3 processing in CI. The contents of this do not have to be (indeed, are not expected to be) scientifically meaningful: we are aiming for a smoke test to make sure nothing throws an exception, rather than outputs for further analysis.

            Please check with [~gkovacs], who might already have compiled some useful data for testing.
            Produce a Gen3 version of the dataset compiled in DM-24259.
            swinbank John Swinbank made changes -
            Link This issue is blocked by DM-21915 [ DM-21915 ]
            swinbank John Swinbank made changes -
            Epic Link DM-22633 [ 427742 ] DM-24341 [ 433028 ]
            krzys Krzysztof Findeisen made changes -
            Link This issue has to be finished together with DM-21915 [ DM-21915 ]
            krzys Krzysztof Findeisen made changes -
            Link This issue is blocked by DM-21915 [ DM-21915 ]
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            While the Gen 2 stub repository doesn't have transmission curve datasets, it should be possible to manually add them (using https://github.com/lsst/obs_subaru/blob/master/bin.src/installTransmissionCurves.py) before conversion to Gen 3.

            EDIT: the Gen 2->3 conversion script adds the transmission curves automatically

            Show
            krzys Krzysztof Findeisen added a comment - - edited While the Gen 2 stub repository doesn't have transmission curve datasets, it should be possible to manually add them (using https://github.com/lsst/obs_subaru/blob/master/bin.src/installTransmissionCurves.py ) before conversion to Gen 3. EDIT: the Gen 2->3 conversion script adds the transmission curves automatically
            krzys Krzysztof Findeisen made changes -
            Link This issue relates to DM-24402 [ DM-24402 ]
            krzys Krzysztof Findeisen made changes -
            Status To Do [ 10001 ] In Progress [ 3 ]
            krzys Krzysztof Findeisen made changes -
            Link This issue blocks DM-24262 [ DM-24262 ]
            Hide
            krzys Krzysztof Findeisen added a comment -

            I'm expanding the scope of this issue to also cover ap_verify_hits2015 and its CI version. I need to test the Gen 2-3 conversion script (DM-21915) on both HSC and DECam anyway, and once the script is working adding Gen 3 support for all the repos is no more work than adding it for just one.

            Show
            krzys Krzysztof Findeisen added a comment - I'm expanding the scope of this issue to also cover ap_verify_hits2015 and its CI version. I need to test the Gen 2-3 conversion script ( DM-21915 ) on both HSC and DECam anyway, and once the script is working adding Gen 3 support for all the repos is no more work than adding it for just one.
            krzys Krzysztof Findeisen made changes -
            Description Produce a Gen3 version of the dataset compiled in DM-24259. Produce a Gen3 version of existing {{ap_verify}} datasets, including the dataset compiled in DM-24259.
            Summary Create “stub“ Gen3 HSC dataset for CI testing Create Gen3 versions of ap_verify datasets
            swinbank John Swinbank made changes -
            Sprint AP S20-5 (April) [ 986 ] AP S20-5 (April), AP S20-6 (May) [ 986, 987 ]
            krzys Krzysztof Findeisen made changes -
            Link This issue has to be done before DM-21915 [ DM-21915 ]
            krzys Krzysztof Findeisen made changes -
            Link This issue has to be finished together with DM-21915 [ DM-21915 ]
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            Hi Jim Bosch, this review involves five packages:

            • obs_base, bugfix to the converter
            • ap_verify_dataset_template, the dataset conversion script
            • ap_verify_ci_hits2015, ap_verify_hits2015, ap_verify_ci_cosmos_pdr2, the datasets being given Gen 3 repositories and matching config files

            Naturally, I'm only asking you to review the code in the first two packages (especially the dataset conversion script's use of the existing repository converter), though the datasets may be useful as example output.

            For those keeping track, I have not converted ap_verify_testdata because it depends on obs_test, which is incompatible with Gen 3. It may be possible to convert it after DM-24844.

            Show
            krzys Krzysztof Findeisen added a comment - - edited Hi Jim Bosch , this review involves five packages: obs_base , bugfix to the converter ap_verify_dataset_template , the dataset conversion script ap_verify_ci_hits2015 , ap_verify_hits2015 , ap_verify_ci_cosmos_pdr2 , the datasets being given Gen 3 repositories and matching config files Naturally, I'm only asking you to review the code in the first two packages (especially the dataset conversion script's use of the existing repository converter), though the datasets may be useful as example output. For those keeping track, I have not converted ap_verify_testdata because it depends on obs_test , which is incompatible with Gen 3. It may be possible to convert it after DM-24844 .
            krzys Krzysztof Findeisen made changes -
            Reviewers Jim Bosch [ jbosch ]
            Status In Progress [ 3 ] In Review [ 10004 ]
            Hide
            tjenness Tim Jenness added a comment -

            I'm sorry but I don't understand how obs_test can be involved in anything that involves data from a real instrument? If we have lots of data for a pretend instrument in obs_test then we should fix it but I was under the impression that obs_test was a hacked together pretend instrument.

            Show
            tjenness Tim Jenness added a comment - I'm sorry but I don't understand how obs_test can be involved in anything that involves data from a real instrument? If we have lots of data for a pretend instrument in obs_test then we should fix it but I was under the impression that obs_test was a hacked together pretend instrument.
            Hide
            krzys Krzysztof Findeisen added a comment -

            Tim Jenness, obs_test has a couple of images from old lsstSim, per the readme. I am not aware of any requirement that data be from a real instrument.

            Show
            krzys Krzysztof Findeisen added a comment - Tim Jenness , obs_test has a couple of images from old lsstSim, per the readme . I am not aware of any requirement that data be from a real instrument.
            Hide
            tjenness Tim Jenness added a comment -

            Yes, I know that obs_test has a couple of old sims images. What I'm confused by is that ap_verify_testdata depends on obs_test.

            Show
            tjenness Tim Jenness added a comment - Yes, I know that obs_test has a couple of old sims images. What I'm confused by is that ap_verify_testdata depends on obs_test.
            Hide
            jbosch Jim Bosch added a comment -

            All looks good; I didn't have much to say, but there are a few comments on the _templates PR.

            I did not look at the collection structure in the converted repos - I think the code you're using to create them probably guarantees that they're close enough to what we're envisioning that it's not worth holding anything up right now. As a heads up I think we'll want to have another pass at defining those conventions more clearly in the future (RFC-663 is clearly proving to be at least insufficient, and in some cases maybe made poor choices), so these probably shouldn't be considered set in stone (and hopefully the code you delegate to will continue to make changes there propagate through to here automatically - though that may not be the case for code that uses the datasets created here.

            It's a bit unfortunate that daf_butler and ap_verify use "dataset" to mean very specific and quite different things, but I'm sure this is now too baked in to both to change, and it's not completely surprising given that the meaning of the word outside our codebase is somewhat ambiguous.

            Show
            jbosch Jim Bosch added a comment - All looks good; I didn't have much to say, but there are a few comments on the _templates PR. I did not look at the collection structure in the converted repos - I think the code you're using to create them probably guarantees that they're close enough to what we're envisioning that it's not worth holding anything up right now. As a heads up I think we'll want to have another pass at defining those conventions more clearly in the future ( RFC-663 is clearly proving to be at least insufficient, and in some cases maybe made poor choices), so these probably shouldn't be considered set in stone (and hopefully the code you delegate to will continue to make changes there propagate through to here automatically - though that may not be the case for code that uses the datasets created here. It's a bit unfortunate that daf_butler and ap_verify use "dataset" to mean very specific and quite different things, but I'm sure this is now too baked in to both to change, and it's not completely surprising given that the meaning of the word outside our codebase is somewhat ambiguous.
            jbosch Jim Bosch made changes -
            Status In Review [ 10004 ] Reviewed [ 10101 ]
            Hide
            mrawls Meredith Rawls added a comment -

            Re. ap_verify_testdata depending on obs_test, I believe we did that because we needed some manner of "real" image data to run tests on and wanted to keep the testdata package from being any larger than it needed to be. Since perfectly acceptable test images already exist, and we didn't want to tie ap_verify_testdata to any particular obs_ package, we went with "generic" image data in obs_test. This could be pretty easily circumvented by putting a copy of the data that's currently loaded from obs_test directly in the ap_verify_testdata repo.

            Show
            mrawls Meredith Rawls added a comment - Re. ap_verify_testdata depending on obs_test, I believe we did that because we needed some manner of "real" image data to run tests on and wanted to keep the testdata package from being any larger than it needed to be. Since perfectly acceptable test images already exist, and we didn't want to tie ap_verify_testdata to any particular obs_ package, we went with "generic" image data in obs_test. This could be pretty easily circumvented by putting a copy of the data that's currently loaded from obs_test directly in the ap_verify_testdata repo.
            Hide
            krzys Krzysztof Findeisen added a comment -

            Thanks for the review! For the record, the current reference for ap_verify datasets' collection structure is the discussion in DM-21915 (I just realized that I should also document that in ap_verify_dataset_templates, since its documentation is the "official" spec for what an ap_verify dataset should look like).

            As for the term "dataset", yes, it's been a problem since the Gen 2 days. I could swear we had a Slack discussion about it a few years back, but now I can't find it.

            Show
            krzys Krzysztof Findeisen added a comment - Thanks for the review! For the record, the current reference for ap_verify datasets' collection structure is the discussion in DM-21915 (I just realized that I should also document that in ap_verify_dataset_templates , since its documentation is the "official" spec for what an ap_verify dataset should look like). As for the term "dataset", yes, it's been a problem since the Gen 2 days. I could swear we had a Slack discussion about it a few years back, but now I can't find it.
            krzys Krzysztof Findeisen made changes -
            Resolution Done [ 10000 ]
            Status Reviewed [ 10101 ] Done [ 10002 ]

              People

              Assignee:
              krzys Krzysztof Findeisen
              Reporter:
              swinbank John Swinbank
              Reviewers:
              Jim Bosch
              Watchers:
              Eric Bellm, Jim Bosch, John Swinbank, Krzysztof Findeisen, Meredith Rawls, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.