Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-39203

Write moments-based star/galaxy classifier

    XMLWordPrintable

Details

    • 20
    • Ops Pipelines 2023
    • Data Release Production
    • No

    Description

      We're currently using the "base_ClassificationExtendedness" "CatalogCalculation" plugin as our primary star/galaxy classifier everywhere. This is motivated by evidence from SDSS that this classifier was about as good as a simple S/G classifier could get. We do not have similarly strong evidence for our own implementation of extendedness (or the CModel measurements on which it heavily depends) being as good.

      More importantly, we're also using this classifier on single-visit catalogs in which CModel isn't being run, with GaussianFlux as the stand-in for the galaxy model flux. This has never been shown to be much good, aside from the general sense that we've been doing this for a long time and it's seemed.

      But "extendedness" has long been known to be sensitive to aperture corrections, and we've known for a while that our aperture corrections are noisy and sometimes biased, leading to an effort to reduce our dependence on them and do them more carefully. And DM-38733 has implicated extendedness in our problems with stellar-locus metric sensitivity.

      So, it's long past time to replace our usage of extendedness on single-visit images with something better-motivated and ideally not dependent on aperture corrections, and the obvious solution is something that compares the source moments to the PSF moments. Doing that somewhat rigorously - using the uncertainties on the source moments to compute an actual likelihood - seems like a good start.

      I think we should probably write this as a regular measurement plugin, not a CatalogCalculation; the latter was invented as a workaround for extendedness' dependency on aperture corrections (we first measure aperture corrections after running all regular plugins). Perhaps after we have a solid replacement, we can make extendedness into a regular plugin and drop the CatalogCalculation concept.

      Attachments

        1. completeness_purity.png
          completeness_purity.png
          50 kB
        2. Distribution of chi-squared values.png
          Distribution of chi-squared values.png
          23 kB
        3. elis_fluxRatio_flux.png
          elis_fluxRatio_flux.png
          290 kB
        4. Flux ratios.png
          Flux ratios.png
          28 kB
        5. image-2023-06-02-12-40-20-242.png
          image-2023-06-02-12-40-20-242.png
          114 kB
        6. likelihood_lt_0_5.png
          likelihood_lt_0_5.png
          51 kB
        7. likelihood_lt_0_85.png
          likelihood_lt_0_85.png
          61 kB
        8. ROC_comparison.png
          ROC_comparison.png
          38 kB
        9. ROC_moments.png
          ROC_moments.png
          24 kB
        10. scaledg.jpg
          scaledg.jpg
          131 kB
        11. screenshot-1.png
          screenshot-1.png
          24 kB
        12. screenshot-2.png
          screenshot-2.png
          27 kB

        Issue Links

          Activity

            erykoff Eli Rykoff added a comment -

            New run on RC2 tract 9813 is done! This took a lot longer than my previous run because it turns out that running coadd measurement is surprisingly fast if you reject all the images from the coadd. Output is in u/erykoff/RC2/DM-39203-3/step8, and can be compared with my coadd plot run of the official w32 u/erykoff/RC2/DM-39203-2/step8-32compare

            Anyway, things are looking good now! Repeatability:

            band w_2023_32 (mmag) ticket (mmag)
            g 7.84 7.86
            r 9.47 9.55
            i 8.3 8.34
            z 7.03 7.05
            y 7.27 7.29

            And wFit (psf) goes from 24.32 mmag (using 2588 of 7467 stars) to 24.16 mmag (using 2530 of 7260 stars). yFit goes from 16.74 mmag (using 5087 of 11331 stars) to 16.66 (using 4861 of 10927 stars).

            Finally, in terms of the (very questionable; see DM-40668) scaled size scatter cut, with the default settings on w32 we rejected 2361 detectors, and on this ticket we rejected 2613 for tract 9813. I haven't broken out anything per band, but given the nature of the cut I'm not sure this is a problem (but may need further investigation).

            erykoff Eli Rykoff added a comment - New run on RC2 tract 9813 is done! This took a lot longer than my previous run because it turns out that running coadd measurement is surprisingly fast if you reject all the images from the coadd. Output is in u/erykoff/RC2/ DM-39203 -3/step8 , and can be compared with my coadd plot run of the official w32 u/erykoff/RC2/ DM-39203 -2/step8-32compare Anyway, things are looking good now! Repeatability: band w_2023_32 (mmag) ticket (mmag) g 7.84 7.86 r 9.47 9.55 i 8.3 8.34 z 7.03 7.05 y 7.27 7.29 And wFit (psf) goes from 24.32 mmag (using 2588 of 7467 stars) to 24.16 mmag (using 2530 of 7260 stars). yFit goes from 16.74 mmag (using 5087 of 11331 stars) to 16.66 (using 4861 of 10927 stars). Finally, in terms of the (very questionable; see DM-40668 ) scaled size scatter cut, with the default settings on w32 we rejected 2361 detectors, and on this ticket we rejected 2613 for tract 9813. I haven't broken out anything per band, but given the nature of the cut I'm not sure this is a problem (but may need further investigation).

            With DM-41648 in, the set of moments to use for either for classification or validation has expanded. I'm still going to classify with second-order moments and use fourth-order moments to validate.

            kannawad Arun Kannawadi added a comment - With DM-41648 in, the set of moments to use for either for classification or validation has expanded. I'm still going to classify with second-order moments and use fourth-order moments to validate.
            erykoff Eli Rykoff added a comment -

            I'm not against using the higher order moments as additional q/a checks, they should definitely not be used for generic classification since we won't be running those in minimal/AP-type pipelines.

            erykoff Eli Rykoff added a comment - I'm not against using the higher order moments as additional q/a checks, they should definitely not be used for generic classification since we won't be running those in minimal/AP-type pipelines.

            After discussions with erykoff, splitting this ticket to implement the plugin and propagate the values everywhere we might want to use them. The actual switchover will take place in DM-42663 after validing the config values.

             

            Jenkins is running here: https://rubin-ci.slac.stanford.edu/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/523/pipeline

            kannawad Arun Kannawadi added a comment - After discussions with erykoff , splitting this ticket to implement the plugin and propagate the values everywhere we might want to use them. The actual switchover will take place in DM-42663 after validing the config values.   Jenkins is running here:  https://rubin-ci.slac.stanford.edu/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/523/pipeline
            kannawad Arun Kannawadi added a comment - And this is finally done! https://rubin-ci.slac.stanford.edu/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/562/pipeline/

            People

              kannawad Arun Kannawadi
              jbosch Jim Bosch
              Eli Rykoff
              Arun Kannawadi, Eli Rykoff, Eric Bellm, Jeffrey Carlin, Jim Bosch, John Parejko, Keith Bechtol, Peter Ferguson, Sophie Reed
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.