Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-6925

star selector and PSF determiner are selecting stars that are not valid point sources

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: meas_algorithms
    • Labels:
      None
    • Story Points:
      6
    • Epic Link:
    • Sprint:
      DRP F16-3, DRP F16-4, DRP F16-5
    • Team:
      Data Release Production

      Description

      When turning on CModel a more robust extendedness classifier relieved that many of the stars being used as PSF candidates were being classified as extended as shown in the attached plot. This plot was generated from the output of ci_hsc. Work should be done to determine why these stars are mistakenly being selected and fix the bad behavior. Additionally the temporary work around in ci_hsc, where the success criteria for validate sources in validate.py should be reverted from 85% to 95%.

        Attachments

          Issue Links

            Activity

            nlust Nate Lust created issue -
            swinbank John Swinbank made changes -
            Field Original Value New Value
            Description When turning on CModel a more robust extendedness classifier relieved that many of the stars being used as PSF candidates were being classified as extended as shown in the attached plot. This plot was generated from the output of ci_hsc. Work should be done to determine why these stars are mistakenly being selected and fix the bad behavior. Additionally the temporary work around in ci_hsc, where the success criteria for validate sources in validate.py should be reverted from 85% to 95%. When turning on CModel a more robust extendedness classifier relieved that many of the stars being used as PSF candidates were being classified as extended as shown in the attached plot. This plot was generated from the output of ci_hsc. Work should be done to determine why these stars are mistakenly being selected and fix the bad behavior. Additionally the [temporary work around in ci_hsc|https://github.com/lsst/ci_hsc/commit/6daf43ca41b6d192b6e1dbedb60cde0bec90b615], where the success criteria for validate sources in validate.py should be reverted from 85% to 95%.
            Hide
            swinbank John Swinbank added a comment -

            Worth adding that, at time of writing, the "temporary work around in ci_hsc" was only on the tickets/DM-4202 branch: it has not yet merged to master.

            Show
            swinbank John Swinbank added a comment - Worth adding that, at time of writing, the "temporary work around in ci_hsc" was only on the tickets/ DM-4202 branch: it has not yet merged to master .
            swinbank John Swinbank made changes -
            Epic Link DM-6172 [ 24685 ]
            swinbank John Swinbank made changes -
            Link This issue relates to DM-4202 [ DM-4202 ]
            swinbank John Swinbank made changes -
            Story Points 4
            Team Data Release Production [ 10301 ]
            swinbank John Swinbank made changes -
            Assignee Perry Gee [ pgee ]
            swinbank John Swinbank made changes -
            Sprint DRP F16-3 [ 237 ]
            Hide
            swinbank John Swinbank added a comment -

            Looks like Paul is uncovering a number of star selector issues on DM-7040. It's possible that fixing this might even be a byproduct of the work he's doing there. That means:

            • First thing to do when working on this is to try to reproduce it and make sure it's still an issue with the codebase at that time;
            • Next thing to do is to look at DM-7040 to see if anything relevant came up in the discussion there.
            Show
            swinbank John Swinbank added a comment - Looks like Paul is uncovering a number of star selector issues on DM-7040 . It's possible that fixing this might even be a byproduct of the work he's doing there. That means: First thing to do when working on this is to try to reproduce it and make sure it's still an issue with the codebase at that time; Next thing to do is to look at DM-7040 to see if anything relevant came up in the discussion there.
            Hide
            price Paul Price added a comment -

            I don't believe this is connected with DM-7040, as that has to do with the size of the PsfCandidate, which can only cause sources to fall out of the candidates list, not include bad candidates.

            Show
            price Paul Price added a comment - I don't believe this is connected with DM-7040 , as that has to do with the size of the PsfCandidate , which can only cause sources to fall out of the candidates list, not include bad candidates.
            Hide
            price Paul Price added a comment -

            Is it possible that we just need to tweak PSFEx's config so it's a bit more willing to reject outliers? I don't see how to do that easily though....

            Show
            price Paul Price added a comment - Is it possible that we just need to tweak PSFEx's config so it's a bit more willing to reject outliers? I don't see how to do that easily though....
            pgee Perry Gee made changes -
            Status To Do [ 10001 ] In Progress [ 3 ]
            swinbank John Swinbank made changes -
            Story Points 4 6
            Hide
            pgee Perry Gee added a comment -

            Nate Lust When you produced this graph, do you recall which catalog you used.? And which filter? And I assume this was the measurement catalog for one of the coadds, since CModel is in the schema?

            I am not seeing as many used and extended objects as is depicted in your graph, so perhaps I am looking at the wrong data.

            Show
            pgee Perry Gee added a comment - Nate Lust When you produced this graph, do you recall which catalog you used.? And which filter? And I assume this was the measurement catalog for one of the coadds, since CModel is in the schema? I am not seeing as many used and extended objects as is depicted in your graph, so perhaps I am looking at the wrong data.
            Hide
            nlust Nate Lust added a comment -

            It was on a coadd, but I can't remember if it was R or I.

            Show
            nlust Nate Lust added a comment - It was on a coadd, but I can't remember if it was R or I.
            Hide
            pgee Perry Gee added a comment -

            The results at the end of running SConstruct produced one coadd in I and one in R. I assume that you are looking at the measurement catalog for one or both of those coadds. There only appeared to be 36 source which were marked both calib_psfUsed and extendedness < 0.5. I am investigating those, but your graph seems to have a different population, so I thought I would ask if we might be using different catalogs.

            Show
            pgee Perry Gee added a comment - The results at the end of running SConstruct produced one coadd in I and one in R. I assume that you are looking at the measurement catalog for one or both of those coadds. There only appeared to be 36 source which were marked both calib_psfUsed and extendedness < 0.5. I am investigating those, but your graph seems to have a different population, so I thought I would ask if we might be using different catalogs.
            Hide
            pgee Perry Gee added a comment -

            Oh, and in both R and I this population is less than 5%, so I don't see why if would have triggered the validate at 95% problem.

            Show
            pgee Perry Gee added a comment - Oh, and in both R and I this population is less than 5%, so I don't see why if would have triggered the validate at 95% problem.
            Hide
            nlust Nate Lust added a comment -

            Here is the plotting script

            from lsst.daf.persistence.butler import Butler
            import matplotlib.pyplot as plt
            import numpy as np
             
            repo = Butler('/Users/nate/repos_lsst/ci_hsc/DATA/rerun/ci_hsc/')
            catalog = repo.get('deepCoadd_meas', {'filter': 'HSC-R', 'tract': 0, 'patch': '5,4'})
            image = repo.get('deepCoadd', {'filter': 'HSC-I', 'tract': 0, 'patch': '5,4'})
            cmags = catalog['modelfit_CModel_flux']
            psfMags = catalog['slot_PsfFlux_flux']
            extendednes = catalog.get("base_ClassificationExtendedness_value")
            extendMask = extendednes == 1
            used = catalog['calib_psfUsed']
             
            plt.ion()
            plt.plot(-1*np.log10(cmags[used == False]), -1*(np.log10(psfMags[used == False]) -
                     np.log10(cmags[used == False])), '.', label='PSF not used')
            plt.plot(-1*np.log10(cmags[used]), -1*(np.log10(psfMags[used]) - np.log10(cmags[used])),
                     '.', label='PSF used')
            plt.plot(-1*np.log10(cmags[used*extendMask]), -1*(np.log10(psfMags[used*extendMask]) -
                np.log10(cmags[used*extendMask])),
                     '.', label='PSF used and extended')
            plt.legend()
            plt.xlabel('-log10(CModel_flux)')
            plt.ylabel('log10(CModel_flux) - log10(Psf_Flux)')
            

            Show
            nlust Nate Lust added a comment - Here is the plotting script from lsst.daf.persistence.butler import Butler import matplotlib.pyplot as plt import numpy as np   repo = Butler( '/Users/nate/repos_lsst/ci_hsc/DATA/rerun/ci_hsc/' ) catalog = repo.get( 'deepCoadd_meas' , { 'filter' : 'HSC-R' , 'tract' : 0 , 'patch' : '5,4' }) image = repo.get( 'deepCoadd' , { 'filter' : 'HSC-I' , 'tract' : 0 , 'patch' : '5,4' }) cmags = catalog[ 'modelfit_CModel_flux' ] psfMags = catalog[ 'slot_PsfFlux_flux' ] extendednes = catalog.get( "base_ClassificationExtendedness_value" ) extendMask = extendednes = = 1 used = catalog[ 'calib_psfUsed' ]   plt.ion() plt.plot( - 1 * np.log10(cmags[used = = False ]), - 1 * (np.log10(psfMags[used = = False ]) - np.log10(cmags[used = = False ])), '.' , label = 'PSF not used' ) plt.plot( - 1 * np.log10(cmags[used]), - 1 * (np.log10(psfMags[used]) - np.log10(cmags[used])), '.' , label = 'PSF used' ) plt.plot( - 1 * np.log10(cmags[used * extendMask]), - 1 * (np.log10(psfMags[used * extendMask]) - np.log10(cmags[used * extendMask])), '.' , label = 'PSF used and extended' ) plt.legend() plt.xlabel( '-log10(CModel_flux)' ) plt.ylabel( 'log10(CModel_flux) - log10(Psf_Flux)' )
            Hide
            pgee Perry Gee added a comment -

            I tried running your script, but find the the np.log10(cmags[used==False]) is failing.

            194/6855 of the cmags in my table have negative values. I think I am running the latest ci_hsc with the latest stack (I'm using Tim's lsstsw arrangement and rebuilding everything). But this is making me suspicious that there is something wrong with my setup. Any ideas?

            Show
            pgee Perry Gee added a comment - I tried running your script, but find the the np.log10(cmags [used==False] ) is failing. 194/6855 of the cmags in my table have negative values. I think I am running the latest ci_hsc with the latest stack (I'm using Tim's lsstsw arrangement and rebuilding everything). But this is making me suspicious that there is something wrong with my setup. Any ideas?
            Hide
            swinbank John Swinbank added a comment -

            Perhaps when Nate Lust gets back on Monday (he's on vacation today), he can check and see whether he can reproduce the original plot (or some similar demonstration of the problem)? That way we can be really sure whether there's a issue with Perry Gee's setup, or if the problem has been resolved in the course of other development, or if it still remains to be addressed.

            Show
            swinbank John Swinbank added a comment - Perhaps when Nate Lust gets back on Monday (he's on vacation today), he can check and see whether he can reproduce the original plot (or some similar demonstration of the problem)? That way we can be really sure whether there's a issue with Perry Gee 's setup, or if the problem has been resolved in the course of other development, or if it still remains to be addressed.
            Hide
            pgee Perry Gee added a comment -

            Nate Lust

            I get 7403 sources in the measurement of the I coadd, of which 6879 have positive psfFlux and a positive CModel_flux. The rejected values are either < = 0.0 or nans.

            I am able to be a plot something like what is shown above, but I don't have a population of stars which are used but have extendedness.

            At this point, both because you don't seem to have the negative flux values, and because of the difference in our plots, it might be a good idea for you to try to reproduce your results with the current stack.

            Show
            pgee Perry Gee added a comment - Nate Lust I get 7403 sources in the measurement of the I coadd, of which 6879 have positive psfFlux and a positive CModel_flux. The rejected values are either < = 0.0 or nans. I am able to be a plot something like what is shown above, but I don't have a population of stars which are used but have extendedness. At this point, both because you don't seem to have the negative flux values, and because of the difference in our plots, it might be a good idea for you to try to reproduce your results with the current stack.
            swinbank John Swinbank made changes -
            Sprint DRP F16-3 [ 237 ] DRP F16-3, DRP F16-4 [ 237, 246 ]
            swinbank John Swinbank made changes -
            Rank Ranked higher
            Hide
            pgee Perry Gee added a comment -

            Could not reproduce this problem with the current stack and version of ci_hsc.

            There are some problem cases, many of them at the edges of the coadd. It may be that there are fewer samples covering these areas, and therefore lower SNR. But neither of the coadds in this practice set exceeded 5% failure rate as was reported in this issue.

            Show
            pgee Perry Gee added a comment - Could not reproduce this problem with the current stack and version of ci_hsc. There are some problem cases, many of them at the edges of the coadd. It may be that there are fewer samples covering these areas, and therefore lower SNR. But neither of the coadds in this practice set exceeded 5% failure rate as was reported in this issue.
            pgee Perry Gee made changes -
            Resolution Done [ 10000 ]
            Status In Progress [ 3 ] Done [ 10002 ]
            Hide
            swinbank John Swinbank added a comment -

            Hey Perry Gee, Nate Lust, I just want to be sure — is it true that Nate can't repeat his earlier results with the current version of the stack, or rather that Perry hasn't been able to?

            Show
            swinbank John Swinbank added a comment - Hey Perry Gee , Nate Lust , I just want to be sure — is it true that Nate can't repeat his earlier results with the current version of the stack, or rather that Perry hasn't been able to?
            Hide
            swinbank John Swinbank added a comment -

            Actually, regardless of the above: even if it's impossible for anybody to reproduce the original problem, this issue still isn't complete since the threshold in ci_hsc has not been reset per the original description. Please do so.

            Show
            swinbank John Swinbank added a comment - Actually, regardless of the above: even if it's impossible for anybody to reproduce the original problem, this issue still isn't complete since the threshold in ci_hsc has not been reset per the original description. Please do so.
            swinbank John Swinbank made changes -
            Resolution Done [ 10000 ]
            Status Done [ 10002 ] In Progress [ 3 ]
            Hide
            pgee Perry Gee added a comment -

            I just moved the validate criterion from 85% to 95% and reran ci_hsc. I think this is the reversal John was asking for. Then I tried to run ci_hsc to see if it still would run sucessfully.

            I again had troubles running it on my personal machine, so I ran it on Jenkins. Turned out that to get this to run, a change had to be made in obs_subaru. This is already checked in to master to fix a build break.

            The change in ci_hsc to move the threshold back to 95% seems to run OK with that change to obs_subaru.
            .

            Show
            pgee Perry Gee added a comment - I just moved the validate criterion from 85% to 95% and reran ci_hsc. I think this is the reversal John was asking for. Then I tried to run ci_hsc to see if it still would run sucessfully. I again had troubles running it on my personal machine, so I ran it on Jenkins. Turned out that to get this to run, a change had to be made in obs_subaru. This is already checked in to master to fix a build break. The change in ci_hsc to move the threshold back to 95% seems to run OK with that change to obs_subaru. .
            pgee Perry Gee made changes -
            Reviewers Nate Lust [ nlust ]
            Status In Progress [ 3 ] In Review [ 10004 ]
            Hide
            nlust Nate Lust added a comment -

            Sorry for the delay in reviewing this, the reversion looks good to me. If ci_hsc runs without a problem you are good to merge and close this ticket.

            Show
            nlust Nate Lust added a comment - Sorry for the delay in reviewing this, the reversion looks good to me. If ci_hsc runs without a problem you are good to merge and close this ticket.
            nlust Nate Lust made changes -
            Status In Review [ 10004 ] Reviewed [ 10101 ]
            swinbank John Swinbank made changes -
            Sprint DRP F16-3, DRP F16-4 [ 237, 246 ] DRP F16-3, DRP F16-4, DRP F16-5 [ 237, 246, 252 ]
            swinbank John Swinbank made changes -
            Rank Ranked higher
            pgee Perry Gee made changes -
            Resolution Done [ 10000 ]
            Status Reviewed [ 10101 ] Done [ 10002 ]

              People

              • Assignee:
                pgee Perry Gee
                Reporter:
                nlust Nate Lust
                Reviewers:
                Nate Lust
                Watchers:
                John Swinbank, Nate Lust, Paul Price, Perry Gee
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel