As part of
RFC-57 ( DM-1674) the ingestion code was removed from datarel and placed in daf_ingest. daf_ingest was never added back into lsst_distrib. This RFC proposes that we put it back. I am assuming this is not controversial as it's nominally a rebranding of something that was already in lsst_distrib.
The one wrinkle is that daf_ingest does not build with Python 3 (see
- How much maintenance effort is required to add daf_ingest to lsst_distrib? That is, beyond making in Python 3 compatible, does it need further work?
- Do we have a pressing requirement for this package? If we add it, who will use it?
- Maybe a silly question, but... what does it actually do? I mean, I can speculate based on the name, but there seems to be a total absence of useful documentation (which is something that should certainly be fixed before it goes into _distrib).
I assumed from the name that this was necessary for ingest scripts to run. If that's correct, I would argue that this is essential and should be shipped with lsst_distrib.
If it's not then I don't know what it's for and I withdraw any opinion on the subject (other than that the package has a misleading name!).
John Swinbank it passes tests and there isn't much code in there so it may not be a large maintenance effort. As to the user,
DM-5766 implies that daf_ingest is needed.
As for "what does it do", I admit I am guilty of trying to clean up a loose end from
RFC-57. The lack of documentation is a problem. The tests are for ingesting a catalog and for indexing exposures. Fritz Mueller I guess one outcome of this discussion is that you need to find someone to add some basic documentation explaining the package.
DM-5766 is a long way off or doesn't really need daf_ingest then I'm happy to let it drop for now. If DM-5766 does not need daf_ingest then please clarify that ticket.
As for the name, it was chosen based on the datarel code used to ingest data releases.
|Watchers||John Swinbank, Kian-Tat Lim, Merlin Fisher-Levine, Paul Price, Tim Jenness [ John Swinbank, Kian-Tat Lim, Merlin Fisher-Levine, Paul Price, Tim Jenness ]||Fritz Mueller, John Swinbank, Kian-Tat Lim, Merlin Fisher-Levine, Paul Price, Tim Jenness [ Fritz Mueller, John Swinbank, Kian-Tat Lim, Merlin Fisher-Levine, Paul Price, Tim Jenness ]|
See also this confluence page: https://confluence.lsstcorp.org/display/SQRE/Bulge+Survey+Processing where daf_ingest was used and a comment was made that it is not yet part of lsst_distrib.
Also, the rename of the ingestion parts of datarel to daf_ingest was mentioned in the stack history page: https://confluence.lsstcorp.org/display/DM/DM+Stack+Package+History (making it a bit odd that the rename lost that code from lsst_distrib).
daf_ingest is also mentioned in our "stack tour" at https://confluence.lsstcorp.org/display/LSWUG/Tour+of+the+Software+Stack
See also this community post: https://community.lsst.org/t/database-ingest-of-exposure-metadata/481
DM-5766 is a good argument; I'd forgotten that daf_ingest was a prerequisite for that. I think we should encourage the relevant product owner (hi, Jim Bosch) to comment on its relative priority, but I do think that the spatial indexing is likely something we should carefully preserve.
I'm less clear about the catalogue ingestion code. I got the impression (possibly mistakenly) when I talked to Fritz Mueller about database ingestion a couple of months ago that this code wasn't really on his radar, and my guess was that this code had rotted. The major driver from SciPi will likely be ingesting results from upcoming ... data goodness(? QA? science validation? I'm lost) processing: I think it's a DAX call whether or not this is the appropriate code to use for that.
I think https://community.lsst.org/t/unified-exposure-metadata/1214 is also relevant, as far as I know daf_ingest is not ingesting the exposure metadata and a unified metadata as implemented in
DM-5503 was a requirement for that.
I hope to get to
DM-5766 soon. It would be good to get it done while Yusra AlSayyad is also playing with the warping. DM-5766 adds very useful functionality I'd love to have in the stack soon. I imagine implementing that will require adding daf_ingest as a dependency of pipe_tasks, which would bring it in as a dependency of lsst_apps, not just datarel. It looks like that will bring in sphgeom as well, which I don't anticipate adding any problems.
The actual ingest-processing-to-SQL-databases code has probably bitrotted quite a bit and I don't think it has ever been fully integrated with the rest of the stack (at least not for modern pipeline outputs; it's possible it was used to ingest the old Stripe 82 processing into the PDAC database). I don't think we need that code right now (at least in Science Pipelines), but it's a very important piece of the puzzle in integrating Science Pipelines with DAX/SUIT in future incarnations of the PDAC.
I don't see a clear gain or harm from adding daf_ingest to datarel earlier than
DM-5766 (it's just unnecessary work vs. letting part of RFC-57 linger).
Making pipe_tasks depend on daf_ingest because the former is actively using functionality provided by the latter certainly sits better with me than adding the dependency to _distrib on the basis that it might be useful someday.
I'd prefer not to be adding dependencies which pull in the likely-bitrotted table-ingest code, but I don't think it's worth either trying to veto the dependency or to split daf_ingest into multiple packages on that basis.
I will withdraw this RFC on the basis that
DM-5766 will pull daf_ingest into lsst_apps when the time is right and there is no need in the short term to add it to lsst_distrib just because the code from datarel used to be in lsst_distrib.
I do not believe that
DM-5766 will require an RFC to bring in daf_ingest based on the discussion in this RFC. If you feel that it does require one we can reopen this RFC at the time.
|Resolution||Done [ 10000 ]|
|Status||Proposed [ 10805 ]||Withdrawn [ 10605 ]|