Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-750

Columns to identify blended children and isolated sources

    XMLWordPrintable

    Details

    • Type: RFC
    • Status: Implemented
    • Resolution: Done
    • Component/s: DM
    • Labels:
      None

      Description

      DM-7100 seeks to implement a "blendedness" flag to describe the blendedness of each source. I agree with the general sentiment expressed in that ticket but I think we can come up with a more useful flag or set of columns that will be more informative. The goal of this RFC is to make sure that people are familiar with the way that the current columns work and open up a discussion for potential changes that people might find useful.

      Background:
      When detection is run on an image it creates a footprint with at least one "peak" that is commonly interpreted as a new source, but in practice just represents a maximum in the smoothed out image. A record is created in the SourceCatalog for each "parent" with a footprint containing a SpanSet with all of the pixels in the image above the detection threshold and a PeakCatalog with an entry for each child peak.

      When either meas_deblender or meas_extensions_scarlet is run, several new columns are created, including:

      • deblend_nPeaks: the number of peaks in the footprint of a parent
      • deblend_nChild: the number of records in the catalog created for children of the parent due to deblending. In most cases this is the same as deblend_nPeaks but it can be different if any peaks are culled, added, or if the blend fails.
      • deblend_parentNPeaks: The number of peaks in the parent footprint (ie. the parents deblend_nChild)

      When meas_deblender is run there are a number of different types of sources that one might want to distinguish. Parents that have only a single child are skipped by the deblender, since flux is conserved and there is nothing to deblend (these sources can be identified with deblend_nPeaks=deblend_nChild=1, parent=0). Parents that have N>1 peaks are sent to the deblender. As a result M (not necessarily equal to N) peaks are deblended and added to the SourceCatalog. These parents can be identified with parent=0, deblend_nPeaks=N, deblend_nChild=M. Each child will have parent=parentID (parentID is the unique source ID of the parent), deblend_nPeaks=deblend_nChild=0, deblend_parentNPeaks=N.

      When meas_extensions_scarlet is used as the deblender things are slightly more complicated. Because scarlet generates a model for each source, by design flux is not conserved and even "isolated" sources could have a scarlet model that is different than the flux contained in their footprint (for example there could be an undetected source or a feature that scarlet cannot currently model). In order to keep the measurements consistent between blended and isolated sources, by default we also model isolated sources with scarlet and give them their own entry in the SourceCatalog (parent=parentID, deblend_nPeaks=deblend_nChild=0, deblend_parentNPeaks=1).

      There are also a couple of outlier cases that must be handled. If a parent footprint is skipped because it is too large, then the parent will have parent=0, deblend_nPeaks=N, deblend_nChild=0, since none of the children were deblended and added to the catalog. If a parent footprint has too many sources to deblend, in meas_deblender the brightest K=maxNumberOfPeaks children are deblended and the parent will have parent=0, deblend_nPeaks=N, deblend_nChild=K. In meas_extensions_scarlet, simply dropping the fainter sources may result in catastrophic failure so the blend is skipped entirely.

      So in summary here are the current types of sources that one might want to distinguish after deblending and how to select on them:

      • (parent=0, deblend_nChild=1): Isolated sources that have not been passed to the deblender
      • (parent!=0, deblend_parentNPeaks=1): Isolated sources that have been modeled by the deblender
      • (parent=0, deblend_nChild>1): Parents that are blends of multiple sources
      • (parent!=0, deblend_parentNPeaks>1): Sources that are blended with other sources
      • (parent=0, deblend_nChild=0, deblend_nPeaks>0): Parent blends that were skipped because they were either too large, contained too many children, or failed to deblend. Note that these also have deblend_skipped=True and possibly deblend_failed=True.

      Discussion:
      First of all I want to know if there are any places in the stack that might be functioning incorrectly because when the above was implemented last year I didn't do a sufficiently good job of making sure that everything downstream knows what to do with the columns provided.

      Second I'm curious as to peoples opinions about how to implement things such as "isPrimary," which is designed to identify leaf nodes (sources with no children). There was a discussion in slack in #scarlet-hsc-test in October about this but since this has come up recently in #dm-science-pipelines I thought that it's worth opening the discussion to the wider science pipelines team. Part of the problem is that there are now two different types of "primary" isolated sources, the parent records that have not been deblended and a child record for each parent, where the child record has been modeled by scarlet.

      In some cases it might be advantageous to use the un-deblended version of an isolated source while in others it will make more sense to use the scarlet modeled isolated source that will be more consistent with the blended source measurements. So it has been suggested that we add a new field to the setPrimaryFlagsTask that allows the user to choose between the modeled and un-modeled versions of a source (eg. we could create a boolean field "useModel" in the config file that is true when the isPrimary key should use the scarlet model).

      What do people think about this, and are there any other suggestions that might help make the deblender outputs more useful? At first glance I feel like it might also be useful to add a deblend_parentNChild column to keep track of the number of children that were actually deblended from a sources parent.

        Attachments

          Issue Links

            Activity

            Hide
            lauren Lauren MacArthur added a comment -

            So my only hesitations on the above are as follows:

            • I'm uncomfortable using Primary as part of a name that does not imply any of its usual (and deeply ingrained) meanings (namely, the non-duplicate aspect).  In this case I think I would much prefer something "else".  I think isLeaf is now taken to mean "end node of the hierarchy tree with no children", so my (probably terrible) alternative suggestions are: Branch, Channel, Fork, ...  If no one else objects to the suggested use of Primary, then I will let this go (but wanted to emphasize this point in case it was missed, `cuz I'm sure it will trip me up going forward!)
            • The column isBlended has a doc that reads "True for each source that was deblended....", which I find quite jarring. It also reads to me that isBlended would be the logical NOT of isIsolated, but this turns out not to be the case. Please have a look at the discussion on the PR here where Fred provides a list of the different True/False possibilities (and my even-more-terrible name suggestions!)  Again, if no one else thinks finds this confusing, I'm happy to let it go too!
            Show
            lauren Lauren MacArthur added a comment - So my only hesitations on the above are as follows: I'm uncomfortable using Primary  as part of a name that does not imply any of its usual (and deeply ingrained) meanings (namely, the non-duplicate aspect).  In this case I think I would much prefer something "else".  I think isLeaf is now taken to mean "end node of the hierarchy tree with no children", so my (probably terrible) alternative suggestions are: Branch , Channel , Fork , ...  If no one else objects to the suggested use of Primary , then I will let this go (but wanted to emphasize this point in case it was missed, `cuz I'm sure it will trip me up going forward!) The column isBlended has a doc that reads "True for each source that was deblended....", which I find quite jarring. It also reads to me that isBlended  would be the logical NOT of isIsolated , but this turns out not to be the case. Please have a look at the discussion on the PR here where Fred provides a list of the different True/False possibilities (and my even-more-terrible name suggestions!)  Again, if no one else thinks finds this confusing, I'm happy to let it go too!
            Hide
            fred3m Fred Moolekamp added a comment -

            What about renaming `isDeblendedPrimary` -> `isDeblendedSource` and `isDeblendedModelPrimary` -> `isDeblendedModelSource`, making it clear that these are individual "sources" while also clearing up the "Primary" confusion?

            As for the use of `isBlended` and `isIsolated`, I think that in the very least the docs need to be updated to provide less confusion, and I'm happy to make those changes. But I do like the names as is. Even though the deblender has attempted to "deblend" sources from their parent footprint, the results are still just models that contain our best guess as to what the blended sources would look like if we could measure them as isolated sources. But we know that in actuality those sources are blended and even our best guess models will introduce biases in terms of flux, shape, and centroid location. So knowing that the models are blended in the exposures is an important flag to propagate.

            Here's an attempt to update the docstrings. Please feel free to chime in if these explanations still seem unclear.

            Current docstrings:

            self.isBlendedKey = self.schema.addField(
                "detect_isBlended", type="Flag",
                doc="This source is deblended from a parent with more than one child."
            )

            Suggested update:

            self.isBlendedKey = self.schema.addField(
                "detect_isBlended", type="Flag",
                doc="This source was modeled by the deblender from a `Peak` "
            	    "in a parent footprint that contained at least one other `Peak`"
            )
            

            with a longer explanation in the docstring:

            Returns
                -------
                isBlended : array-like of `bool`
                    True for each source modeled by the deblender from a `Peak`
                    in a parent footprint that contained at least one other `Peak`.
                    While these models can be approximated as isolated,
                    and measurements are made on them as if that's the case,
                    we know deblending to introduce biases in the shape and centroid
                    of objects and it is important to know that the sources that these
                    models are based on are all bleneded in the true image.
                isIsolated : array-like of `bool`
                    True for isolated sources, regardless of whether or not they
                    were modeled by the deblender.
            

            It's going to take a day or so to run these through the RC2 reprocessing, but if there are no objections to the above then I will merge this PR when that reprocessing has completed.

            Show
            fred3m Fred Moolekamp added a comment - What about renaming `isDeblendedPrimary` -> `isDeblendedSource` and `isDeblendedModelPrimary` -> `isDeblendedModelSource`, making it clear that these are individual "sources" while also clearing up the "Primary" confusion? As for the use of `isBlended` and `isIsolated`, I think that in the very least the docs need to be updated to provide less confusion, and I'm happy to make those changes. But I do like the names as is. Even though the deblender has attempted to "deblend" sources from their parent footprint, the results are still just models that contain our best guess as to what the blended sources would look like if we could measure them as isolated sources. But we know that in actuality those sources are blended and even our best guess models will introduce biases in terms of flux, shape, and centroid location. So knowing that the models are blended in the exposures is an important flag to propagate. Here's an attempt to update the docstrings. Please feel free to chime in if these explanations still seem unclear. Current docstrings: self .isBlendedKey = self .schema.addField( "detect_isBlended" , type = "Flag" , doc = "This source is deblended from a parent with more than one child." ) Suggested update: self .isBlendedKey = self .schema.addField( "detect_isBlended" , type = "Flag" , doc = "This source was modeled by the deblender from a `Peak` " "in a parent footprint that contained at least one other `Peak`" ) with a longer explanation in the docstring: Returns ------- isBlended : array-like of `bool` True for each source modeled by the deblender from a `Peak` in a parent footprint that contained at least one other `Peak`. While these models can be approximated as isolated, and measurements are made on them as if that's the case , we know deblending to introduce biases in the shape and centroid of objects and it is important to know that the sources that these models are based on are all bleneded in the true image. isIsolated : array-like of `bool` True for isolated sources, regardless of whether or not they were modeled by the deblender. It's going to take a day or so to run these through the RC2 reprocessing, but if there are no objections to the above then I will merge this PR when that reprocessing has completed.
            Hide
            ktl Kian-Tat Lim added a comment -

            Sounds like isBlended is really more like fromBlend?

            Show
            ktl Kian-Tat Lim added a comment - Sounds like isBlended is really more like fromBlend ?
            Hide
            lauren Lauren MacArthur added a comment -

            Oh, I do like that suggestion K-T!  And I'm happy with the Primary -> Source change.

            Show
            lauren Lauren MacArthur added a comment - Oh, I do like that suggestion K-T!  And I'm happy with the Primary -> Source change.
            Hide
            fred3m Fred Moolekamp added a comment -

            Show
            fred3m Fred Moolekamp added a comment -

              People

              Assignee:
              fred3m Fred Moolekamp
              Reporter:
              fred3m Fred Moolekamp
              Watchers:
              Colin Slater, Eric Bellm, Fred Moolekamp, Jim Bosch, John Parejko, Kian-Tat Lim, Lauren MacArthur, Leanne Guy, Robert Lupton, Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Planned End:

                  Jenkins

                  No builds found.