Details
-
Type:
RFC
-
Status: Implemented
-
Resolution: Done
-
Component/s: DM
-
Labels:None
Description
Source selectors are typically written in Python and so can run significantly faster if they can use vector operations on catalogs, instead of looping over each record. However, using vector operations requires contiguous catalogs. As a result, most of our source selectors contain two implementations: a vectorized implementation used when the catalog is contiguous and a fallback slow implementation used for non-contiguous catalogs. This is difficult to maintain.
The use of non-contiguous catalogs is very rare because we prefer to flag sources to ignore rather than delete them. As such I propose that source selectors require catalogs be contiguous, and raise a specific, documented exception when that criterion is not met.
As to the exception to raise: I propose to raise the exception afw table raises when one attempts to run vector operations on a non-contiguous catalog in Python. That would avoid the need for an explicit test in most situations.
Attachments
Issue Links
- is triggering
-
DM-9832 Cleanup and unify star selector call signatures
- Done
- relates to
-
DM-14529 "RuntimeError: Input catalogs for source selection must be contiguous" in ci_hsc
- Done
-
DM-8977 BaseSourceSelector._isBad() should use Keys instead of string fields
- To Do
-
DM-11568 SourceDetectionTask's outputs should be contiguous in memory
- To Do
-
DM-4878 Propagate flags from individual visit measurements to coadd measurements
- Done
It's worth keeping in mind that catalogs have other uses than just sources within processCcd. I think discontiguous catalogs can be useful for certain small-scale operations.