Details
-
Type:
RFC
-
Status: Adopted
-
Resolution: Unresolved
-
Component/s: DM
-
Labels:
Description
This is a followup to RFC-906 (updating the APDB schema), to clarify the large number of changes that result from it (as implemented on DM-37196). These changes will result a non-backwards compatible break once merged: datasets prior to the merge will likely not be able to be processed with code from after the merge, and many notebooks and other tools will not work without updates to their data models. As part of this RFC, I am asking to waive the deprecation period for API changes, as there doesn't seem to be a point of maintaining old APIs when the output data model has also changed.
Summary of changes
The changes caused by RFC-906 resulted in 14 PRs, 10 in lsst_distrib packages, and 4 in testdata and CI packages. All of the modified packages and their respective PRs are listed in a comment on DM-37196, where the implementation work for RFC-906 is occurring.
The most significant breaking change is that declination coordinate fields are now `dec` everywhere, instead of often (but not always!) being `decl` in our output products. Once DM-37196 merges, there will be no more uses of `decl` anywhere in the science pipelines code. This includes the ExposureSummaryStats data product, isolated star catalogs, DiaSource/DiaObject/forced catalogs, visit summary tables, solar system catalogs generated from outside sources, and more. This change implements at least the spirit of RFC-863, though there are some details yet to be decided on that RFC.
The names of many difference imaging-related flux fields have also changed (e.g. PSFlux->psfFlux, TOTFlux->scienceFlux). See the sdm_schemas PR for a detailed breakdown of all such changes. In order to keep our code and data model self-consistent, I also changed the names of many diaCalculation plugins and functions in meas_base, so that for example, the code calculating "psfFluxMean" is named "WeightedMeanDiaPsfFlux", not "WeightedMeanDiaPsFlux". I am not attempting to deprecate the old methods or make this API change backwards compatible, given that the data model changes mean that code outside of lsst_distrib will have to be updated anyway.
I have replaced "filterName" with "band" to make the schemas more consistent with our butler field names, which takes care of DM-28503.
Impact
Any code outside of lsst_distrib (e.g. analysis_ap, notebooks, RubinTV, summit packages) that accessed any of the changed fields will have to be updated to work with data processed after the merge. It is up to the developers of those external tools whether they put in hooks to attempt to allow operating on data processed both before and after this change, or to make a clean break with the past.
What to do about analysis_tools is my biggest open question: there is an ongoing sprint on that package and it will be used for the upcoming bootcamp, but it does have 9 instances of the decl->dec change, and may have more as the sprint continues. I propose to merge DM-37196 (once it is reviewed) on Tuesday, May 30th, just after the bootcamp is completed, so that these changes do not impact the bootcamp itself, and so that they are included in the end of May weekly. The bootcamp is working with earlier data, so it is in their interest to have analysis_tools maintain compatibility with that old data.
A thought: this could be an opportunity for us to start a new /repo/main butler repository on USDF, given that many of the datasets in the current repo will not work with analysis code going forward.
I will announce the merge and write up a post for Community about how "data processed beginning with weekly X is incompatible with previous processing".
Attachments
Issue Links
- is triggering
-
DM-39540 Remove deprecated decl columns from DRP output
- To Do
-
DM-39503 Update analysis_ap to reflect APDB schema change
- Done
- relates to
-
DM-37196 Modernize APDB schema to reflect desired usage (decl -> dec)
- Done
-
RFC-863 Standardize on "coord_ra/dec" for the names of the "canonical" coordinates in catalog tables
- Flagged
-
RFC-906 APDB column renaming
- Implemented
I think that the above implies that I can merge everything on
DM-37196, except for the analysis_tools PR? I don't know for sure whether any of my other changes could affect analysis; I hope not, but that's hard to know for sure. I'm not sure how to validate that it doesn't break "our ability to analyze", either, other than running ci_imsim and ci_hsc (I guess with the analysis_tools branch moved to a different branch?).I've put "To be removed after September 2023." in the pipe_tasks code that makes the "decl" duplicated column.