Fix Version/s: None
meas_extensions_scarlet seems to have catalogs that are not sorted properly, at least in gen3. This ticket is to figure out why, and to fix it. It is blocking
DM-29888 because I can't get ci_hsc_gen3 to complete due to the parent sorting issue.
- is duplicated by
DM-29729 Debug and fix source ID failures in Gen3 ForcedPhotCoadd discovered in Gen3 RC2
- is triggered by
DM-29888 Add config field(s) to meas_extensions_scarlet to run on a subset of an input catalog
- relates to
DM-29936 Enable getting Children without repeatedly checking if the SourceCatalog is sorted
- Won't Fix
I think I've found the problem, though it's definitely one of those, "how did that ever work" mysteries now. The Gen2 DeblendCoaddSourcesTask has these critical lines:
...which says, "when you add new records to this catalog, don't use this ID" (for each ID being copied from the mergeDet catalog).
I don't see any such calls in either of the Gen3 deblending PipelineTasks in pipe_tasks, in meas_extensions_scarlet, or in meas_deblender.
It looks like the right place to fix this is probably around
DM-17689 is absolved, except maybe being part of the "how did this ever work" mystery.
I implemented your suggested fix and tested it using ci_hsc_gen3 locally and it works. Jenkins is currently building: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/34085/pipeline.
As for "how did that ever work," I'm wondering if gen3 was run all the way through the measurement algorithms that would be offended by an unsorted catalog before mid January? Once meas_extensions_scarlet became the default, and it re-wrote the catalog anyway, then it probably wasn't an issue. But my guess is that had it been tested using the old deblender, which also just appended the mergeDet catalog, then it this problem would have shown up sooner. If it had been run all the way through before January, then it's still a mystery to me too.
We have definitely been running ci_hsc_gen3 all the way through for a long time, but w14 was probably the first Gen3 RC2 run that got that far. Since this problem didn't cause a hard error all the time, I bet we just got lucky in ci_hsc_gen3 (until more recently something perturbed it, I guess).
Thanks for the quick review. Sorry I didn't merge last night, but I went to bed before Jenkins finished.
I see a few different ways to fix this: