The Gen3 Butler moves spatial lookups for reference catalogs (and essentially all other managed data products) into its internal Registry database.* That works fine for the HTM-sharded reference catalogs we've been using essentially exclusively since we added support for them - it removes the need for much of the logic in LoadIndexedReferenceObjectsTask, but the reference catalog datasets themselves are usable as-is. It will also automatically add support for any sky pixelization supported by sphgeom.
The astrometry.net ReferenceObjectLoaderTask relies on its own internal sharding system, however, and that makes its interaction with the Gen3 Butler problematic. We have three options:
- We can continue to treat the sharding as opaque and internal to astrometry.net. This will make it impossible for the Butler to identify which astrometry.net-sharded files need to be transferred to any particular shared-nothing worker node, essentially requiring that they be available via a shared filesystem. The same problem will affect users that want to select a small but internally-complete subset of a repository to transfer to e.g. their laptops. It will also make a missing reference catalog a runtime failure, rather than a prior-to-pipeline-launch failure.
- We can try to add support for astrometry.net's sharding system to the Gen3 Butler (probably by adding it to sphgeom). I'm not sure how much it deviates (if at all) from vanilla HEALPIx; if it does, this could involve a lot of work either reimplementing astrometry.net functionality ourselves or invoking APIs it considers private. But even if it doesn't deviate from vanilla HEALPix, it's at least currently blocked by the lack of HEALPix support in sphgeom (something we ultimately need to address but have not yet scheduled).
- We can drop support for astrometry.net entirely when we drop support for the Gen2 middleware.
As the summary of this RFC suggests, I'd like to start by proposing that last option, as it's by far the least amount of work, it rids us of a third-party dependency that has been problematic at times (in terms of cross-platform build/install), and I don't think anyone is using the astrometry.net code now anyway.
*More precisely, it provides possibly-coarse first-pass spatial lookups and relationships that downstream code should in general refine, but that's beside the point for this RFC.