Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-25990

Reprocess HSC COSMOS medium dataset with ap_pipe

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Story Points:
      8
    • Sprint:
      AP F20-3 (August), AP F20-4 (September), AP F20-5 (October), AP F20-6 (November)
    • Team:
      Alert Production
    • Urgent?:
      No

      Description

      Redo DM-24252 (plus HSC-G band), but with Yusra's variance scaling scheme (https://github.com/lsst/pipe_tasks/compare/u/yusra/scaleVariance). Also make some reasonable choices about excluding problematic visits. Make a nice summary notebook.

      The subsequent reprocessing should consider DM-25988 as well as whatever other issues this reprocessing uncovers.

        Attachments

          Issue Links

            Activity

            Hide
            mrawls Meredith Rawls added a comment -

            As before, I processed 30+ full-field (103 CCD) visits in each of HSC-G, HSC-R2, HSC-I2, HSC-Z, and HSC-Y.

            Last time, there were lots of database lock errors and background noise sources. This time, I used PostgreSQL instead of sqlite and implemented difference image variance scaling. The results were generally very similar, but with no database lock errors. The difference image variance scaling likely removed a handful of background noise sources but the difference is not stark.

            The new notebook DM-25990-HSC-ap_pipe-take2.ipynb explores the APDB from this rerun. Some outstanding issues are listed in the notebook and also written here for easy reference.

            • There is a new APDB issue on DM-26700. It thinks there are duplicate DIA Objects or DIA Forced Sources being written, and fails.
            • As before, there are many failures to retrieve a template which register as "No coadd PhotoCalib found!" This is being tracked on DM-25988, along with some ancillary issues below.
            • As before, some PSFs are erroneously large. However, now that the database lock error is fixed, this is more prevalent, to the point that slurm log messages are too big and crash the whole job. This is likely what is causing fewer visits to be processed than my input visit lists suggest.
            • Valid HSC visit numbers are even, so an easy way to get rid of some of the wonky ones (especially prevalent in g and y) should be to delete the odd visit numbers. However, I revisited my input visit lists, and they don't have any odd numbers, so something odd (pun intended) is happening behind the scenes.
            • The difference image variance scaling didn't make a big difference at a glance. The plethora of sources is at least partly due to the fact that HSC images are deeper than DECam images, both in terms of telescope/optics/detector as well as exposure times. I am also not at all convinced that my templates are particularly great, though, and a bunch of ~0 `totFlux` measurements is a bit of a red flag.
            Show
            mrawls Meredith Rawls added a comment - As before, I processed 30+ full-field (103 CCD) visits in each of HSC-G, HSC-R2, HSC-I2, HSC-Z, and HSC-Y. Last time, there were lots of database lock errors  and background noise sources . This time, I used PostgreSQL instead of sqlite and implemented difference image variance scaling. The results were generally very similar, but with no database lock errors.  The difference image variance scaling likely removed a handful of background noise sources but the difference is not stark. The new notebook DM-25990 -HSC-ap_pipe-take2.ipynb  explores the APDB from this rerun. Some outstanding issues are listed in the notebook and also written here for easy reference. There is a new APDB issue on DM-26700 . It thinks there are duplicate DIA Objects or DIA Forced Sources being written, and fails. As before, there are many failures to retrieve a template which register as "No coadd PhotoCalib found!" This is being tracked on DM-25988 , along with some ancillary issues below. As before, some PSFs are erroneously large. However, now that the database lock error is fixed, this is more prevalent, to the point that slurm log messages are too big and crash the whole job. This is likely what is causing fewer visits to be processed than my input visit lists suggest. Valid HSC visit numbers are even, so an easy way to get rid of some of the wonky ones (especially prevalent in g and y ) should be to delete the odd visit numbers. However, I revisited my input visit lists, and they don't have any odd numbers, so something odd (pun intended) is happening behind the scenes. The difference image variance scaling didn't make a big difference at a glance. The plethora of sources is at least partly due to the fact that HSC images are deeper than DECam images, both in terms of telescope/optics/detector as well as exposure times. I am also not at all convinced that my templates are particularly great, though, and a bunch of ~0 `totFlux` measurements is a bit of a red flag.
            Hide
            mrawls Meredith Rawls added a comment - - edited

            Along with the usual commits in ap_pipe-notebooks, there are two related PRs in pipe_tasks and ap_association. One is to enable a new difference image variance scaling configuration option. The other is to temporarily force all the APDB tables to use single-character filter names for now. DM-21333 should include a more permanent fix in line with RFC-730.

            Jenkins is running at https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/32874/pipeline 

            Show
            mrawls Meredith Rawls added a comment - - edited Along with the usual commits in ap_pipe-notebooks, there are two related PRs in pipe_tasks and ap_association. One is to enable a new difference image variance scaling configuration option. The other is to temporarily force all the APDB tables to use single-character filter names for now. DM-21333 should include a more permanent fix in line with RFC-730 . Jenkins is running at https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/32874/pipeline  
            Hide
            sullivan Ian Sullivan added a comment -

            I had a few comments on the code changes, but am happy with the notebook as it is. We need to do a separate investigation into the erroneously large PSF sizes, and at least catch those and prevent them from crashing everything.

            Show
            sullivan Ian Sullivan added a comment - I had a few comments on the code changes, but am happy with the notebook as it is. We need to do a separate investigation into the erroneously large PSF sizes, and at least catch those and prevent them from crashing everything.
            Hide
            mrawls Meredith Rawls added a comment -

            Thanks - I addressed your comments, messed up the ap_association GitHub history but fixed it, Jenkins passed, and now I'm going to merge.

            Show
            mrawls Meredith Rawls added a comment - Thanks - I addressed your comments, messed up the ap_association GitHub history but fixed it, Jenkins passed, and now I'm going to merge.

              People

              • Assignee:
                mrawls Meredith Rawls
                Reporter:
                mrawls Meredith Rawls
                Reviewers:
                Ian Sullivan
                Watchers:
                Ian Sullivan, Meredith Rawls
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: