Fix Version/s: None
Sprint:DRP S21a (Dec Jan)
Team:Data Release Production
The scripts in pipe_analysis use a set of flag combinations to sub-select samples that represent the "best" entry for a given object while omitting "sky objects" (i.e. fake sources laid down in "empty" regions of the image to aid in sky background correction assessment). Duplication of a given object can happen intra-patch/ccd in that the undeblended "parent" object of a given blend family is included in the catalogs in addition to all the deblended "children" (it's only the latter that we want). On an inter-patch/tract level, duplicate entries can occur in the overlap regions. These are discriminated based on their position within the patch/tract via "isInner" flags (only the one for which this is True is kept). There is also an isPrimary flag that is meant to provide "the absolute minimum science users need to do to get a viable set of objects with no duplicates and no sky objects." This flag is not used explicitly in the pipe_analysis scripts as they are at the single tract level, so omitting "outer" tract objects is not desired as there are no overlapping tracts, so no duplicates of these. Rather, the scripts define their own set of flags that is essentially akin to isPrimary minus the detect_isTractInner.
The flag settings are routed in their behavior and meaning based on the meas_deblender deblender conventions which have been in play for the duration of the Rubin/LSST pipelines...up until
DM-28323, which switched the default deblender to meas_extensions_scarlet. The scarlet deblender has some significant differences in the meaning of some of the flags in common with those of meas_deblender and also has some additional flags required to identify specific sub-samples. As such, a new set of flag combinations is required to represent the appropriate catalog sub-selections for our science and QA samples. Flag naming and conventions are not quite settled (follow RFC-750 and its implementation ticket DM-28542), but we can still make better use of the flags being set now to make the sample sub-selections we wish to include for our QA analyses. This ticket is to update those selections appropriate for scarlet deblended catalogs.
Indeed, I will implement the above once
DM-28542 lands and is part of a weekly RC2 processing. This point here is to try to get the best set of flags we can based on catalog flag entries prior to that ticket getting merged. To this end, I have updated the sky source selection to only use the non-model isolated scarlet models via a selection on the sky catalog of:
skyObjCat["parent"] == 0 & skyObjCat["deblend_nChild"] == 1
. In my first pass at doing this, I noticed that the outer "ring" in the COSMOS 9813 tract was getting selected against. These are the sources without full-band coverage and, if I understand correctly, scarlet falls back to the effective "no model" measurement style for these objects. They get the "deblend_skipped" flag set and can be identified in a sky object-only catalog in that they all have:
skyObjCat["parent"] == 0 & skyObjCat["deblend_nChild"] == 0
so I add these back in. Comments in the code have added to make note of this.
The sky object plots do indeed now look much closer to what we were getting with meas_deblender runs. However, there are still offsets to more positive sky measurements in the circular aperture measurements as demonstrated in the following plots:
w06 scarlet this ticket:
So, while things look much better, there is still a bias towards higher sky levels in the scarlet no-model plot compared to that of the w02 meas_deblender run.
To rule out the possibility of differences in source placement somehow creeping in, I made plots comparing them between the scarlet-base w06 run and the meas_deblender 02 run. Some examples:
They are nearly identical (first plot), but with the occasional exception (second plot). I believe these small differences stem from small differences (in particular in the fgcm calibrations) at the visit level stage, leading to slight differences in the mask planes used for source placement selection (and has nothing to do which which deblender was run).
More examples for all patches in tract 9813 can be perused (for some finite amount of time!) here.
So, it seems there is still something "different" about what is getting passed into the measurement algorithms (perhaps a combination of Fred Moolekamp + Lee Kelvin could file a ticket to look into this in more detail?) Totally naïve speculation: are the boxes big enough (but...if not, should the algorithm fail/report NaN for flux? Most entries have the circular aperture flux flags set for both deblender outputs).
Ok, now for the main catalogs, I have gone with the approach of including scarlet-model based versions for both isolated and blended sources, whereas
RFC-750 concluded that the isPrimary sources should include the non-model versions for isolated sources. The reason for this it that I can't uniquely select the isolated scarlet-model sources (in order to swap them out for the non-model ones). This is due to an error in the propagation of the "deblend_parentNPeaks" flag in current catalogs (this will also be fixed on DM-28542). I have kicked off a runs of all RC2 tract/filters with the current selection. Only the 9813 HSC-I is done so far (the others are waiting in the queue), but results will be tricking in here. As an example of why I think I've potentially got this right, look at the footprint distribution from the original w06 run:
vs. the one from this branch:
Note that the latter is missing the extra "branch", which I am assuming is attributed to the non-model isolated "leaf".
Would you mind giving this a look to make sure I have the logic of the current scarlet flag setting correct and that this is the "best" we can do at present for looking at consistent samples. This is meant as a stopgap until we can do this "properly" after the changes in
DM-28542 make it into an RC2 weekly processing run).
Thanks, Fred. As to your main comment on the PR, yes, I did look at the distributions omitting the skipped sources. There are differences (of, course), but they do not explain the trends noted above. Here is the histogram plot:
and the sky distributions for:
scarlet skipped excluded:
scarlet skipped included:
The “included” sample seems like the fairer comparison to me, but I’d be happy to leave them out if you fell strongly about it.
I recommend basing this off of
DM-28542, in which case you can just use
which will give you the unblended isolated sources and the sources from blends with >= 2 children, without any of the restrictions on patch or tract.