Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-31813

Add diaObjectId's coords to DiaSource Parquet Table before ingest



    • Urgent?:


      DAX needs a fiducial sky coord to use for the spatial partitioning. It's important that all the `diaSources` associated into a `diaObject` are on the same partition.

      Frtiz says (in context of ForcedSource, but applies to DiaSource too):

      if you give us an ObjectId, for each one we need to look up the associated Object to get the Object's ra/dec to determine where to place it in order to ingest. So we need to have seen the Objects first to build a hash table or index for this, and that will be really big for the whole sky. If pipelines happen to "know" the fiducial ra/dec for the Object associated with a ForcedSource and generates those as separate columns in the ForcedSource output products, we can drive spatial sharding off that directly during ingest, which will be much more efficient and would allow us to take in ForcedSources before Objects.

      We can easily strip the "redundant" ra/dec partitioning columns during ingest (we'd want to to cut down on storage; every column in the final ForcedSource tables hurts a lot because there are so many rows.)

      Currently drpAssociation writes out a nicely normalized `goodSeeingDiff_assocDiaSrcTable` and `goodSeeingDiff_diaObjTable`. We could either denormalize `goodSeeingDiff_assocDiaSrcTable` and add the diaObject's ra/decl (this is what we've been calling coord_ra and coord_dec in the parquet tables, and why you often see it in addition to ra/decl) or we can write an another task that joins the two and writes out a new table specifically for ingest. Other ideas welcome! I don't know how DAX uses these parquet for ingest, but if you write them to text files for bulk loading speed, maybe we could even override that to do the join before writing.


          Issue Links



              Unassigned Unassigned
              yusra Yusra AlSayyad
              Chris Morrison [X] (Inactive), Colin Slater, Eric Bellm, Fritz Mueller, Ian Sullivan, Yusra AlSayyad
              0 Vote for this issue
              6 Start watching this issue




                  No builds found.