Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-863

Standardize on "coord_ra/dec" for the names of the "canonical" coordinates in catalog tables

    XMLWordPrintable

    Details

    • Type: RFC
    • Status: Flagged
    • Resolution: Unresolved
    • Component/s: DM
    • Labels:
      None

      Description

      Currently (based on yml/imsim.yaml we have several tables that use coord_ra / coord_dec for the "canonical" coordinates of a catalog entry: Object, Source, DiaSource, ForcedSource, ForcedSourceOnDiaObject.

      DiaObject appears to use ra / decl.

      Source and DiaSource have both pairs, and they are said to be actual duplicates in content, for reasons that are not entirely clear.

      Object, in addition to coord_ra / coord_dec, has (band)ra and _(band)_decl columns for each of the six bands. The former is a consolidated, canonical assessment of position, based on choosing a "reference band" and using the values from that band; the latter are the centroids in each filter band.  This is of course a scientifically meaningful distinction.

      (Added later) It is possible that in a future Science Pipelines release the canonical position would come from centroiding a multi-band coadd or some other means of combining the values from all bands, and not be equal to any of the per-band ones.

      This RFC asks that we standardize on the use of coord_ra / coord_dec for the "canonical" coordinates in all catalogs of individual objects/sources.  This will help users develop reliable expectations.  In cases where multiple coordinate sets are available for a catalog entry, as for Object, the others should receive appropriate names and descriptions that clarify their role.

      In practical terms, the only immediate consequence would be to change the name of the coordinate columns in the DiaObject table.

      The "canonical" coordinates would also receive the "meta.main" IVOA UCD designation.  This will cause them to be used by default in spatial searches in IVOA-aware query clients, including the RSP Portal Aspect.

        Attachments

          Issue Links

            Activity

            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            I'm going to revise the RFC by tomorrow to call for "ra, dec", and extend the date for a few days to ensure that it's been clearly understood.

            Show
            gpdf Gregory Dubois-Felsmann added a comment - I'm going to revise the RFC by tomorrow to call for "ra, dec", and extend the date for a few days to ensure that it's been clearly understood.
            Hide
            yusra Yusra AlSayyad added a comment - - edited

            If this RFC applies to the DRP parquet tables, I can tell you that we currently use the following convention:

            (1) ra/decl for the position that was measured in this table, with ({<band>_ra}}/<band>_decl in the case of Object.

            • If there were no centroids measured in this task (e.g. ForcedSource), there will be no ra/decl.

            (2) coord_ra/coord_dec for the position that DAX should partition on. These are included as a convenience to you, requested by DAX.

            Why not partition on ra/decl? In the case of DiaSource see convo on DM-31813.

            I don't care what they're called, but I want to keep it clear they mean different things. Do you mean #1 or #2 by cannonical?

            Currently after ingest and partitioning, you can delete all the coord_ra/coord_dec columns without any loss of information*!

            • Caveat being that if/when you delete those, you should keep the coord_ra/coord_dec in Object and rename them ra/dec; because I think having the reference ra/dec easily accessible without looking it up from the refBand is something users expect.
            Show
            yusra Yusra AlSayyad added a comment - - edited If this RFC applies to the DRP parquet tables, I can tell you that we currently use the following convention: (1) ra/decl for the position that was measured in this table, with ({<band>_ra}}/ <band>_decl in the case of Object. If there were no centroids measured in this task (e.g. ForcedSource), there will be no ra/decl . (2) coord_ra/coord_dec for the position that DAX should partition on. These are included as a convenience to you, requested by DAX. Why not partition on ra/decl? In the case of DiaSource see convo on DM-31813 . I don't care what they're called, but I want to keep it clear they mean different things. Do you mean #1 or #2 by cannonical? Currently after ingest and partitioning, you can delete all the coord_ra/coord_dec columns without any loss of information*! Caveat being that if/when you delete those, you should keep the coord_ra/coord_dec in Object and rename them ra/dec; because I think having the reference ra/dec easily accessible without looking it up from the refBand is something users expect.
            Hide
            fritzm Fritz Mueller added a comment -

            Yes, thanks Yusra AlSayyad for jogging my memory on this!

            Having "partitioning coordinates" available explicitly in the parquets is a huge boon for timely ingestion of the data.  Otherwise, foreign key links must be snapped for each "child table" row to find the row's destination partition, and some of those indices will be billions or trillions of rows.  The sequencing/construction/merging of those indices in time to be used for this also complicates things, vs. simply compiling them at the end.

            We would indeed be pleased to drop these columns during/after partitioning, though, if they need not or should not be part of the published catalogs (substantial storage savings = better scan times = better query times).

            This raises an additional question: if we were to drop them, where/how would we document/differentiate columns which might be in parquets, but not in the published database version of the catalogs?

            Show
            fritzm Fritz Mueller added a comment - Yes, thanks Yusra AlSayyad  for jogging my memory on this! Having "partitioning coordinates" available explicitly in the parquets is a huge boon for timely ingestion of the data.  Otherwise, foreign key links must be snapped for each "child table" row to find the row's destination partition, and some of those indices will be billions or trillions of rows.  The sequencing/construction/merging of those indices in time to be used for this also complicates things, vs. simply compiling them at the end. We would indeed be pleased to drop these columns during/after partitioning, though, if they need not or should not be part of the published catalogs (substantial storage savings = better scan times = better query times). This raises an additional question: if we were to drop them, where/how would we document/differentiate columns which might be in parquets, but not in the published database version of the catalogs?
            Hide
            tjenness Tim Jenness added a comment -

            Gregory Dubois-Felsmann, this is a reminder that you were going to revise this RFC.

            Show
            tjenness Tim Jenness added a comment - Gregory Dubois-Felsmann , this is a reminder that you were going to revise this RFC.
            Hide
            Parejkoj John Parejko added a comment -

            I'll note here that RFC-906 and it's implementation ticket DM-37196 have removed the decl and [band]_decl fields, and replaced them with dec and [band]_dec across the DM codebase and SDM output. We have a deprecation period for a few months (see Jim's comment on RFC-924) but all SDM output going forward should have ra/dec.

            This doesn't change the question of whether there should be separate coord_ra/coord_dec fields to partition on (I don't see what value they serve, unless it's related to the "dec" reserved word or to having specific coordinate units), and I'll leave that to this RFC's authors to sort out.

            Show
            Parejkoj John Parejko added a comment - I'll note here that RFC-906 and it's implementation ticket DM-37196 have removed the decl and [band] _decl fields, and replaced them with dec and [band] _dec across the DM codebase and SDM output. We have a deprecation period for a few months (see Jim's comment on RFC-924 ) but all SDM output going forward should have ra/dec. This doesn't change the question of whether there should be separate coord_ra/coord_dec fields to partition on (I don't see what value they serve, unless it's related to the "dec" reserved word or to having specific coordinate units), and I'll leave that to this RFC's authors to sort out.

              People

              Assignee:
              gpdf Gregory Dubois-Felsmann
              Reporter:
              gpdf Gregory Dubois-Felsmann
              Watchers:
              Clare Saunders, Colin Slater, Eric Bellm, Fritz Mueller, Gregory Dubois-Felsmann, John Parejko, Kian-Tat Lim, Tim Jenness, Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Dates

                Created:
                Updated:
                Planned End:

                  Jenkins

                  No builds found.