Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-37386

Add a CenterAll flag during detect and measure

    XMLWordPrintable

Details

    Description

      Following up from the diaSource sprint, we should explore adding `*CenterAll` flags during detect and measure (see PixelFlags.cc) for sources where the given bit (e.g. interpolated or saturated) is set for every pixel in the 3x3 center of the footprint. This contrasts with the existing `*Center` flag that is for any pixel in the center having that bit set.

      Once we set this flag, we should be able to use it to entirely reject classes of bad sources before any measurement plugins are run on them. For example, if all central pixels are interpolated, there is no real source there.

      Attachments

        Issue Links

          Activity

            Parejkoj John Parejko added a comment -

            This ticket should also help reduce the number of sources that end up off the image (see DM-37421).

            Parejkoj John Parejko added a comment - This ticket should also help reduce the number of sources that end up off the image (see DM-37421 ).
            Parejkoj John Parejko added a comment -

            I've added this functionality on the meas_base branch. It does not measurably increase the plugin execution time on a 2000x2000 image with 100 sources on it, so I don't think there's any speed issues. Speed tested with the following in ipython:

            import itertools
            import numpy as np
            import lsst.geom
            import lsst.meas.base.tests
             
            size = 2000
            bbox = lsst.geom.Box2I(lsst.geom.Point2I(0, 0),
                                   lsst.geom.Extent2I(size, size))
            dataset = lsst.meas.base.tests.TestDataset(bbox)
            for i, (x, y) in enumerate(itertools.product(np.arange(30, size, size/10),
                                                         np.arange(30, size, size/10))):
                dataset.addSource(1000 + 1000*i, lsst.geom.Point2D(x, y))
             
            tester = lsst.meas.base.tests.AlgorithmTestCase()
            task = tester.makeSingleFrameMeasurementTask("base_PixelFlags")
            exposure, catalog = dataset.realize(10.0, task.schema, randomSeed=0)
             
            %timeit -n10 -r7 task.run(catalog, exposure)
            

            Parejkoj John Parejko added a comment - I've added this functionality on the meas_base branch. It does not measurably increase the plugin execution time on a 2000x2000 image with 100 sources on it, so I don't think there's any speed issues. Speed tested with the following in ipython: import itertools import numpy as np import lsst.geom import lsst.meas.base.tests   size = 2000 bbox = lsst.geom.Box2I(lsst.geom.Point2I(0, 0), lsst.geom.Extent2I(size, size)) dataset = lsst.meas.base.tests.TestDataset(bbox) for i, (x, y) in enumerate(itertools.product(np.arange(30, size, size/10), np.arange(30, size, size/10))): dataset.addSource(1000 + 1000*i, lsst.geom.Point2D(x, y))   tester = lsst.meas.base.tests.AlgorithmTestCase() task = tester.makeSingleFrameMeasurementTask("base_PixelFlags") exposure, catalog = dataset.realize(10.0, task.schema, randomSeed=0)   %timeit -n10 -r7 task.run(catalog, exposure)
            Parejkoj John Parejko added a comment - - edited

            A quick run of the ap_verify datasets with calibrateImage (not necessary for this ticket, but I figure I might as well start using it by default) results in about 10% (dc2, hits2015) and 20-50% (cosmos_pdr2) of DiaSources on each image having one of the "All" flags that should result in that source just being rejected outright (interpolated/saturated/bad).

            Several comsos_pdr2 exposures also have rather a lot of `streakCenterAll` set, which is worrying, because it implies that either there are a lot of streaks, or whatever streak there is completely covers a lot of sources.

            Very few non-diffim sources (calibrateImage output) had any `All` flags of any kind, which is probably to be expected. The ones that do are probably particularly bad!

            I think this is useable as it is now; we'll want to look in more detail at the flagged sources (e.g. cutouts) to double check them before we start using it for rejection, and we'll have to decide where in the pipeline to do that rejection. I advocate for as soon as possible: preferably with a new plugin that runs right after PixelFlags.

            Parejkoj John Parejko added a comment - - edited A quick run of the ap_verify datasets with calibrateImage (not necessary for this ticket, but I figure I might as well start using it by default) results in about 10% (dc2, hits2015) and 20-50% (cosmos_pdr2) of DiaSources on each image having one of the "All" flags that should result in that source just being rejected outright (interpolated/saturated/bad). Several comsos_pdr2 exposures also have rather a lot of `streakCenterAll` set, which is worrying, because it implies that either there are a lot of streaks, or whatever streak there is completely covers a lot of sources. Very few non-diffim sources (calibrateImage output) had any `All` flags of any kind, which is probably to be expected. The ones that do are probably particularly bad! I think this is useable as it is now; we'll want to look in more detail at the flagged sources (e.g. cutouts) to double check them before we start using it for rejection, and we'll have to decide where in the pipeline to do that rejection. I advocate for as soon as possible: preferably with a new plugin that runs right after PixelFlags.
            Parejkoj John Parejko added a comment -

            Jenkins run with ci_hsc ci_imsim (though I'm pretty sure neither of those are necessary or informative): https://rubin-ci.slac.stanford.edu/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/745/pipeline

            Parejkoj John Parejko added a comment - Jenkins run with ci_hsc ci_imsim (though I'm pretty sure neither of those are necessary or informative): https://rubin-ci.slac.stanford.edu/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/745/pipeline
            Parejkoj John Parejko added a comment -

            Code I ran to look at the flags set on output catalogs in three ap_verify runs:

            import lsst.daf.butler
            # change the below to the desired output path
            butler = lsst.daf.butler.Butler('cosmos-DM-37386/repo',
                                            collections="ap_verify-output")
            # these two are the catalogs to check:
            # dataset = "initial_stars_footprints_detector"
            dataset = "goodSeeingDiff_diaSrc"
            data_ids = list(butler.registry.queryDataIds(("visit", "detector"),
                                                         datasets=dataset))
             
            def count_all_flagged(catalog):
                allFlagged = []
                for x in catalog.schema:
                    name = x.getField().getName()
                    if "All" in name:
                        allFlagged.append(name)
             
                for name in allFlagged:
                    print(catalog[name].sum(), name.split("_")[-1])
             
            for data_id in data_ids:
                catalog = butler.get(dataset, data_id)
                print(len(catalog), dataset, data_id)
                count_all_flagged(catalog)
            

            Parejkoj John Parejko added a comment - Code I ran to look at the flags set on output catalogs in three ap_verify runs: import lsst.daf.butler # change the below to the desired output path butler = lsst.daf.butler.Butler('cosmos-DM-37386/repo', collections="ap_verify-output") # these two are the catalogs to check: # dataset = "initial_stars_footprints_detector" dataset = "goodSeeingDiff_diaSrc" data_ids = list(butler.registry.queryDataIds(("visit", "detector"), datasets=dataset))   def count_all_flagged(catalog): allFlagged = [] for x in catalog.schema: name = x.getField().getName() if "All" in name: allFlagged.append(name)   for name in allFlagged: print(catalog[name].sum(), name.split("_")[-1])   for data_id in data_ids: catalog = butler.get(dataset, data_id) print(len(catalog), dataset, data_id) count_all_flagged(catalog)
            Parejkoj John Parejko added a comment -

            fred3m: Thanks for offering to review. Please don't hesitate to ask any questions on the PR: it's slightly non-obvious code, I think. I suggest looking at my comments above for some context, and reviewing the three commits separately.

            Parejkoj John Parejko added a comment - fred3m : Thanks for offering to review. Please don't hesitate to ask any questions on the PR: it's slightly non-obvious code, I think. I suggest looking at my comments above for some context, and reviewing the three commits separately.

            I left a comment requesting a name change that I think will clarify things for the future, but otherwise it LGTM.

            fred3m Fred Moolekamp added a comment - I left a comment requesting a name change that I think will clarify things for the future, but otherwise it LGTM.
            Parejkoj John Parejko added a comment - New Jenkins: https://rubin-ci.slac.stanford.edu/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/3/pipeline

            People

              Parejkoj John Parejko
              Parejkoj John Parejko
              Fred Moolekamp
              Eric Bellm, Fred Moolekamp, Ian Sullivan, John Parejko, Meredith Rawls
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.