DM-7100 seeks to implement a "blendedness" flag to describe the blendedness of each source. I agree with the general sentiment expressed in that ticket but I think we can come up with a more useful flag or set of columns that will be more informative. The goal of this RFC is to make sure that people are familiar with the way that the current columns work and open up a discussion for potential changes that people might find useful.
When detection is run on an image it creates a footprint with at least one "peak" that is commonly interpreted as a new source, but in practice just represents a maximum in the smoothed out image. A record is created in the SourceCatalog for each "parent" with a footprint containing a SpanSet with all of the pixels in the image above the detection threshold and a PeakCatalog with an entry for each child peak.
When either meas_deblender or meas_extensions_scarlet is run, several new columns are created, including:
- deblend_nPeaks: the number of peaks in the footprint of a parent
- deblend_nChild: the number of records in the catalog created for children of the parent due to deblending. In most cases this is the same as deblend_nPeaks but it can be different if any peaks are culled, added, or if the blend fails.
- deblend_parentNPeaks: The number of peaks in the parent footprint (ie. the parents deblend_nChild)
When meas_deblender is run there are a number of different types of sources that one might want to distinguish. Parents that have only a single child are skipped by the deblender, since flux is conserved and there is nothing to deblend (these sources can be identified with deblend_nPeaks=deblend_nChild=1, parent=0). Parents that have N>1 peaks are sent to the deblender. As a result M (not necessarily equal to N) peaks are deblended and added to the SourceCatalog. These parents can be identified with parent=0, deblend_nPeaks=N, deblend_nChild=M. Each child will have parent=parentID (parentID is the unique source ID of the parent), deblend_nPeaks=deblend_nChild=0, deblend_parentNPeaks=N.
When meas_extensions_scarlet is used as the deblender things are slightly more complicated. Because scarlet generates a model for each source, by design flux is not conserved and even "isolated" sources could have a scarlet model that is different than the flux contained in their footprint (for example there could be an undetected source or a feature that scarlet cannot currently model). In order to keep the measurements consistent between blended and isolated sources, by default we also model isolated sources with scarlet and give them their own entry in the SourceCatalog (parent=parentID, deblend_nPeaks=deblend_nChild=0, deblend_parentNPeaks=1).
There are also a couple of outlier cases that must be handled. If a parent footprint is skipped because it is too large, then the parent will have parent=0, deblend_nPeaks=N, deblend_nChild=0, since none of the children were deblended and added to the catalog. If a parent footprint has too many sources to deblend, in meas_deblender the brightest K=maxNumberOfPeaks children are deblended and the parent will have parent=0, deblend_nPeaks=N, deblend_nChild=K. In meas_extensions_scarlet, simply dropping the fainter sources may result in catastrophic failure so the blend is skipped entirely.
So in summary here are the current types of sources that one might want to distinguish after deblending and how to select on them:
- (parent=0, deblend_nChild=1): Isolated sources that have not been passed to the deblender
- (parent!=0, deblend_parentNPeaks=1): Isolated sources that have been modeled by the deblender
- (parent=0, deblend_nChild>1): Parents that are blends of multiple sources
- (parent!=0, deblend_parentNPeaks>1): Sources that are blended with other sources
- (parent=0, deblend_nChild=0, deblend_nPeaks>0): Parent blends that were skipped because they were either too large, contained too many children, or failed to deblend. Note that these also have deblend_skipped=True and possibly deblend_failed=True.
First of all I want to know if there are any places in the stack that might be functioning incorrectly because when the above was implemented last year I didn't do a sufficiently good job of making sure that everything downstream knows what to do with the columns provided.
Second I'm curious as to peoples opinions about how to implement things such as "isPrimary," which is designed to identify leaf nodes (sources with no children). There was a discussion in slack in #scarlet-hsc-test in October about this but since this has come up recently in #dm-science-pipelines I thought that it's worth opening the discussion to the wider science pipelines team. Part of the problem is that there are now two different types of "primary" isolated sources, the parent records that have not been deblended and a child record for each parent, where the child record has been modeled by scarlet.
In some cases it might be advantageous to use the un-deblended version of an isolated source while in others it will make more sense to use the scarlet modeled isolated source that will be more consistent with the blended source measurements. So it has been suggested that we add a new field to the setPrimaryFlagsTask that allows the user to choose between the modeled and un-modeled versions of a source (eg. we could create a boolean field "useModel" in the config file that is true when the isPrimary key should use the scarlet model).
What do people think about this, and are there any other suggestions that might help make the deblender outputs more useful? At first glance I feel like it might also be useful to add a deblend_parentNChild column to keep track of the number of children that were actually deblended from a sources parent.