DAX needs a fiducial sky coord to use for the spatial partitioning. It's important that all the `diaSources` associated into a `diaObject` are on the same partition.
Frtiz says (in context of ForcedSource, but applies to DiaSource too):
if you give us an ObjectId, for each one we need to look up the associated Object to get the Object's ra/dec to determine where to place it in order to ingest. So we need to have seen the Objects first to build a hash table or index for this, and that will be really big for the whole sky. If pipelines happen to "know" the fiducial ra/dec for the Object associated with a ForcedSource and generates those as separate columns in the ForcedSource output products, we can drive spatial sharding off that directly during ingest, which will be much more efficient and would allow us to take in ForcedSources before Objects.
We can easily strip the "redundant" ra/dec partitioning columns during ingest (we'd want to to cut down on storage; every column in the final ForcedSource tables hurts a lot because there are so many rows.)
Currently drpAssociation writes out a nicely normalized `goodSeeingDiff_assocDiaSrcTable` and `goodSeeingDiff_diaObjTable`. We could either denormalize `goodSeeingDiff_assocDiaSrcTable` and add the diaObject's ra/decl (this is what we've been calling coord_ra and coord_dec in the parquet tables, and why you often see it in addition to ra/decl) or we can write an another task that joins the two and writes out a new table specifically for ingest. Other ideas welcome! I don't know how DAX uses these parquet for ingest, but if you write them to text files for bulk loading speed, maybe we could even override that to do the join before writing.
Depending on when and how dax calculates their spatial index, you could update the full history of DiaSources for a given DiaObject with the DiaObject's index to enforce this. Would save having to store extra information with the DiaSource.