Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-15142

Create CI-sized DECam Dataset

    Details

    • Story Points:
      4
    • Epic Link:
    • Sprint:
      AP F18-2, AP F18-3, AP F18-4
    • Team:
      Alert Production

      Description

      Create a dataset (in the Reiss sense) that is a subset of ap_verify_hits2015 for use in CI. The dataset must contain multiple visits to the same location in order to test source association, and should have at most ~10 images. See discussion on #dm for more details.

      In addition to containing only a subset of raw images, the new dataset should contain only those templates that are necessary to process the subset. Unfortunately the only way I know of to figure out which templates get used is to run the pipeline and parse the log output.

        Attachments

          Issue Links

            Activity

            Hide
            swinbank John Swinbank added a comment -

            Can you link to “discussion on #dm” please?

            I'm wondering why we'd need as many as 50-60 images. In the interests of speed and simplicity, I'd think we really only want a very few (perhaps two or three) images — that's all we need to make sure the machinery is working, and it seems to be better aligned with the “< 10” that we agreed on DM-13970.

            Show
            swinbank John Swinbank added a comment - Can you link to “discussion on #dm” please? I'm wondering why we'd need as many as 50-60 images. In the interests of speed and simplicity, I'd think we really only want a very few (perhaps two or three) images — that's all we need to make sure the machinery is working, and it seems to be better aligned with the “< 10” that we agreed on DM-13970 .
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            Ok, per the more recent discussion I've reduced the requested dataset size. 2-3 might be too small, as I think for the sake of code coverage we do want both multiple ccds and multiple epochs.

            Show
            krzys Krzysztof Findeisen added a comment - - edited Ok, per the more recent discussion I've reduced the requested dataset size. 2-3 might be too small, as I think for the sake of code coverage we do want both multiple ccds and multiple epochs.
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            I propose the following six-image dataset: HiTS visits 411420, 419802, and 412518 (all for HiTS field Blind15A_40) and CCDs 52 and 56 (these correspond to CCDs N21 and N25, which are adjacent but at different radii). That should let us cover as many bases as possible while sticking to a minutes processing timescale.

            Edit: apologies for the spam. That was weird...

            Show
            krzys Krzysztof Findeisen added a comment - - edited I propose the following six-image dataset: HiTS visits 411420, 419802, and 412518 (all for HiTS field Blind15A_40 ) and CCDs 52 and 56 (these correspond to CCDs N21 and N25, which are adjacent but at different radii). That should let us cover as many bases as possible while sticking to a minutes processing timescale. Edit: apologies for the spam. That was weird...
            Hide
            ebellm Eric Bellm added a comment -

            This seems fine, but why these visits and CCDs specifically?

            It might be more interesting to look at some of the HiTS CCDs from the two overlapping fields, which might stress our machinery more effectively.

            Show
            ebellm Eric Bellm added a comment - This seems fine, but why these visits and CCDs specifically? It might be more interesting to look at some of the HiTS CCDs from the two overlapping fields, which might stress our machinery more effectively.
            Hide
            krzys Krzysztof Findeisen added a comment -

            The specific numbers are mainly to provide a concrete proposal (though as I noted the CCDs let us try both edge and non-edge chips, which might probe different pipeline failure modes).

            Blind15A_40 is one of the two overlapping fields. I guess we could replace one of the visits with a Blind15A_42 visit, but I'm not sure I could match up the CCDs precisely enough...

            Show
            krzys Krzysztof Findeisen added a comment - The specific numbers are mainly to provide a concrete proposal (though as I noted the CCDs let us try both edge and non-edge chips, which might probe different pipeline failure modes). Blind15A_40 is one of the two overlapping fields. I guess we could replace one of the visits with a Blind15A_42 visit, but I'm not sure I could match up the CCDs precisely enough...
            Hide
            krzys Krzysztof Findeisen added a comment -

            Ok then, how about visits 411420 and 419802, chips 5 and 10 from Blind15A_40, plus visit 411371, chips 58 and 62 from Blind15A_42? That's the best use of the overlap that I can think of (see attached figure), and between chip 10 and chip 62 we get decent radial coverage.

            Show
            krzys Krzysztof Findeisen added a comment - Ok then, how about visits 411420 and 419802, chips 5 and 10 from Blind15A_40 , plus visit 411371, chips 58 and 62 from Blind15A_42 ? That's the best use of the overlap that I can think of (see attached figure), and between chip 10 and chip 62 we get decent radial coverage.
            Hide
            ebellm Eric Bellm added a comment -

            Looks great to me!

            Show
            ebellm Eric Bellm added a comment - Looks great to me!
            Hide
            krzys Krzysztof Findeisen added a comment -

            I've created a new dataset on GitHub/lsst with the images identified above. The new dataset weighs in at 3.1 GB. Most of the remaining size is in calib files (which, apparently, have a hardcoded assumption that data for chip N live in the Nth FITS extension) and templates (where we need multiple patches to completely cover each chip). Curiously, although most of the footprint is covered by two visits and roughly a third is covered by three, association finds 1600 DIAObjects with 1 source, 82 DIAObjects with 2 sources, and nothing with 3 sources. This is something to watch out for when setting up integration testing.

            Simon Krughoff, as the main user of the new dataset, would you be willing to review it?

            Show
            krzys Krzysztof Findeisen added a comment - I've created a new dataset on GitHub/lsst with the images identified above. The new dataset weighs in at 3.1 GB. Most of the remaining size is in calib files (which, apparently, have a hardcoded assumption that data for chip N live in the Nth FITS extension) and templates (where we need multiple patches to completely cover each chip). Curiously, although most of the footprint is covered by two visits and roughly a third is covered by three, association finds 1600 DIAObjects with 1 source, 82 DIAObjects with 2 sources, and nothing with 3 sources. This is something to watch out for when setting up integration testing. Simon Krughoff , as the main user of the new dataset, would you be willing to review it?
            Hide
            abh Andy Hanushevsky added a comment -

            Upgraded to latest packages (actually this required a full reinstall) and switched overt to Python 3 since Pandas support for Python2 will, for all practical purposes, ends at the start of 2019 (officially not until 2020 but no Python 2 patches past the beginning of the year).

            Show
            abh Andy Hanushevsky added a comment - Upgraded to latest packages (actually this required a full reinstall) and switched overt to Python 3 since Pandas support for Python2 will, for all practical purposes, ends at the start of 2019 (officially not until 2020 but no Python 2 patches past the beginning of the year).
            Hide
            krzys Krzysztof Findeisen added a comment -

            Andy Hanushevsky sorry, but did you mean to close a different issue?

            Show
            krzys Krzysztof Findeisen added a comment - Andy Hanushevsky sorry, but did you mean to close a different issue?
            Hide
            abh Andy Hanushevsky added a comment -

            Hi Krzysztof,

            Good grief, how did that happen? I clicked on the right link and the text
            said what I needed to see. How did I manage to close this one? Clearly,
            never meant to. Sorry about that, please reopen it.

            Andy

            Show
            abh Andy Hanushevsky added a comment - Hi Krzysztof, Good grief, how did that happen? I clicked on the right link and the text said what I needed to see. How did I manage to close this one? Clearly, never meant to. Sorry about that, please reopen it. Andy
            Hide
            swinbank John Swinbank added a comment -

            Hey Simon Krughoff — just a reminder of this one. Do you have an ETA on a review?

            Show
            swinbank John Swinbank added a comment - Hey Simon Krughoff — just a reminder of this one. Do you have an ETA on a review?
            Hide
            krzys Krzysztof Findeisen added a comment -

            On DM-15224 Meredith Rawls and I just discovered that the Blind15A_42 CCDs don't line up as expected. We'll try to figure out why they don't line up and re-submit for review then.

            Show
            krzys Krzysztof Findeisen added a comment - On DM-15224 Meredith Rawls and I just discovered that the Blind15A_42 CCDs don't line up as expected. We'll try to figure out why they don't line up and re-submit for review then.
            Hide
            krzys Krzysztof Findeisen added a comment -

            The problem turned out be my misreading of our maps of where the fields were located. I've changed the visit 411371 CCDs from 58 and 62 to 56 and 60; Meredith Rawls and I have verified through both ds9 inspection and running source association that these chips do overlap CCDs 5 and 10 from the other field.

            Show
            krzys Krzysztof Findeisen added a comment - The problem turned out be my misreading of our maps of where the fields were located. I've changed the visit 411371 CCDs from 58 and 62 to 56 and 60; Meredith Rawls and I have verified through both ds9 inspection and running source association that these chips do overlap CCDs 5 and 10 from the other field.
            Hide
            krughoff Simon Krughoff added a comment -

            This all seems good. Sorry for taking so long.

            Show
            krughoff Simon Krughoff added a comment - This all seems good. Sorry for taking so long.
            Hide
            krzys Krzysztof Findeisen added a comment -

            Thanks for the review!

            Show
            krzys Krzysztof Findeisen added a comment - Thanks for the review!

              People

              • Assignee:
                krzys Krzysztof Findeisen
                Reporter:
                krzys Krzysztof Findeisen
                Reviewers:
                Simon Krughoff
                Watchers:
                Eric Bellm, John Swinbank, Krzysztof Findeisen, Meredith Rawls, Simon Krughoff
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel