Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-924

Data model and API changes resulting from the APDB schema update

    XMLWordPrintable

    Details

    • Type: RFC
    • Status: Adopted
    • Resolution: Unresolved
    • Component/s: DM

      Description

      This is a followup to RFC-906 (updating the APDB schema), to clarify the large number of changes that result from it (as implemented on DM-37196). These changes will result a non-backwards compatible break once merged: datasets prior to the merge will likely not be able to be processed with code from after the merge, and many notebooks and other tools will not work without updates to their data models. As part of this RFC, I am asking to waive the deprecation period for API changes, as there doesn't seem to be a point of maintaining old APIs when the output data model has also changed.

      Summary of changes

      The changes caused by RFC-906 resulted in 14 PRs, 10 in lsst_distrib packages, and 4 in testdata and CI packages. All of the modified packages and their respective PRs are listed in a comment on DM-37196, where the implementation work for RFC-906 is occurring.

      The most significant breaking change is that declination coordinate fields are now `dec` everywhere, instead of often (but not always!) being `decl` in our output products. Once DM-37196 merges, there will be no more uses of `decl` anywhere in the science pipelines code. This includes the ExposureSummaryStats data product, isolated star catalogs, DiaSource/DiaObject/forced catalogs, visit summary tables, solar system catalogs generated from outside sources, and more. This change implements at least the spirit of RFC-863, though there are some details yet to be decided on that RFC.

      The names of many difference imaging-related flux fields have also changed (e.g. PSFlux->psfFlux, TOTFlux->scienceFlux). See the sdm_schemas PR for a detailed breakdown of all such changes. In order to keep our code and data model self-consistent, I also changed the names of many diaCalculation plugins and functions in meas_base, so that for example, the code calculating "psfFluxMean" is named "WeightedMeanDiaPsfFlux", not "WeightedMeanDiaPsFlux". I am not attempting to deprecate the old methods or make this API change backwards compatible, given that the data model changes mean that code outside of lsst_distrib will have to be updated anyway.

      I have replaced "filterName" with "band" to make the schemas more consistent with our butler field names, which takes care of DM-28503.

      Impact

      Any code outside of lsst_distrib (e.g. analysis_ap, notebooks, RubinTV, summit packages) that accessed any of the changed fields will have to be updated to work with data processed after the merge. It is up to the developers of those external tools whether they put in hooks to attempt to allow operating on data processed both before and after this change, or to make a clean break with the past.

      What to do about analysis_tools is my biggest open question: there is an ongoing sprint on that package and it will be used for the upcoming bootcamp, but it does have 9 instances of the decl->dec change, and may have more as the sprint continues. I propose to merge DM-37196 (once it is reviewed) on Tuesday, May 30th, just after the bootcamp is completed, so that these changes do not impact the bootcamp itself, and so that they are included in the end of May weekly. The bootcamp is working with earlier data, so it is in their interest to have analysis_tools maintain compatibility with that old data.

      A thought: this could be an opportunity for us to start a new /repo/main butler repository on USDF, given that many of the datasets in the current repo will not work with analysis code going forward.

      I will announce the merge and write up a post for Community about how "data processed beginning with weekly X is incompatible with previous processing".

        Attachments

          Issue Links

            Activity

            Hide
            Parejkoj John Parejko added a comment -

            I think that the above implies that I can merge everything on DM-37196, except for the analysis_tools PR? I don't know for sure whether any of my other changes could affect analysis; I hope not, but that's hard to know for sure. I'm not sure how to validate that it doesn't break "our ability to analyze", either, other than running ci_imsim and ci_hsc (I guess with the analysis_tools branch moved to a different branch?).

            I've put "To be removed after September 2023." in the pipe_tasks code that makes the "decl" duplicated column.

            Show
            Parejkoj John Parejko added a comment - I think that the above implies that I can merge everything on DM-37196 , except for the analysis_tools PR? I don't know for sure whether any of my other changes could affect analysis; I hope not, but that's hard to know for sure. I'm not sure how to validate that it doesn't break "our ability to analyze", either, other than running ci_imsim and ci_hsc (I guess with the analysis_tools branch moved to a different branch?). I've put "To be removed after September 2023." in the pipe_tasks code that makes the "decl" duplicated column.
            Hide
            jbosch Jim Bosch added a comment -

            That sounds plausible to me: analysis_tools, analysis_drp, and faro are the three packages that I know of for which we'd want to preserve compatibility with older run, so if you don't have any changes in the latter two, I think that's fine.  The ci_ packages won't help check this, because they always do new runs instead of analyzing old ones, but I think "best effort" is okay here.

            Show
            jbosch Jim Bosch added a comment - That sounds plausible to me: analysis_tools, analysis_drp, and faro are the three packages that I know of for which we'd want to preserve compatibility with older run, so if you don't have any changes in the latter two, I think that's fine.  The ci_ packages won't help check this, because they always do new runs instead of analyzing old ones, but I think "best effort" is okay here.
            Hide
            Parejkoj John Parejko added a comment -

            It turns out that there was no `decl` in the Object table, only `coord_dec` and the various `band_decl` fields.

            Show
            Parejkoj John Parejko added a comment - It turns out that there was no `decl` in the Object table, only `coord_dec` and the various `band_decl` fields.
            Hide
            Parejkoj John Parejko added a comment -

            I've added DM-39539 and DM-39540 as triggered tickets for this RFC. Those have somewhat nebulous merge dates (mid-July and mid-September, respectively), but I think it's up to DRP to decide on the exact criteria to do the merge. I think we can't mark this implemented until DM-39540 is merged.

            Show
            Parejkoj John Parejko added a comment - I've added DM-39539 and DM-39540 as triggered tickets for this RFC. Those have somewhat nebulous merge dates (mid-July and mid-September, respectively), but I think it's up to DRP to decide on the exact criteria to do the merge. I think we can't mark this implemented until DM-39540 is merged.
            Hide
            frossie Frossie Economou added a comment -
            Show
            frossie Frossie Economou added a comment - Leaving this for tracking...   https://lsstc.slack.com/archives/C2JPMCF5X/p1694109487642019

              People

              Assignee:
              Parejkoj John Parejko
              Reporter:
              Parejkoj John Parejko
              Watchers:
              Andy Salnikov, Colin Slater, Eric Bellm, Fritz Mueller, Frossie Economou, Gregory Dubois-Felsmann, Ian Sullivan, Jim Bosch, John Parejko, Kian-Tat Lim, Meredith Rawls, Tim Jenness, Yusra AlSayyad
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

                Dates

                Created:
                Updated:
                Planned End:

                  Jenkins

                  No builds found.