Details
-
Type:
RFC
-
Status: Withdrawn
-
Resolution: Done
-
Component/s: DM
-
Labels:None
Description
We do not currently have a definition of the lsst_apps, lsst_obs, and lsst_distrib meta-packages in our documentation. It seems to me that these meta-packages are for the convenience of organizing the Stack code and might change from release to release, so they should be defined in pipelines.lsst.io. (A possible alternative might be to define them by fiat in LDM-148, but I think that is less desirable.) I propose to do so at the beginning of https://pipelines.lsst.io/install/index.html.
Pointers to the definitions should be placed in the DM Developer Guide developer.lsst.io, including Adding a Package, the lsstsw documentation, and even the Python Style Guide.
Proposed definitions:
- lsst_apps contains the Science Pipelines algorithmic code that we expect to use in production and its dependencies, including any packages needed to test and verify the others, such as a minimal set of obs_* observatory/camera definition packages.
- lsst_obs contains all supported obs_* packages.
- lsst_distrib contains additional LSST-supported software related to the Alert, Calibration Products, and Data Release Productions, including the following not in lsst_apps or lsst_obs:
- Optional algorithmic code and plugins not expected to be used in production or testing
- Production control software and configurations for organizing large-scale computations
- Interfaces to other operational services
Attachments
Attachments
- rfc.png
- 158 kB
Issue Links
- blocks
-
DM-6551 Please make an lsst_x package that includes all the packages we support
- Won't Fix
-
RFC-351 Top-level Packages for Productions
- Withdrawn
- relates to
-
RFC-251 Make obs_cfht (and other obs_* packages) installable via eups distrib install
- Implemented
-
RFC-351 Top-level Packages for Productions
- Withdrawn
Activity
Other than Tim Jenness's comment, I thin I'm ok with these definitions, though some better definition of "minimal set of obs_* packages" might be helpful.
Modified the definitions above to try to incorporate Tim's comment. The idea of the "minimal set of obs_* packages" is that that is what is needed to test everything else. I'm not sure how to state it more clearly.
Should we just rely in "everything else's" eups tables to defined that then?
As I mentioned in slack, I think it's important to have a consistent story for what packages we tell new (astronomer) users to install. With the current proposed delineations, I would probably say "lsst_apps + lsst_obs", which seems acceptably simple. So I would be ok with this proposal as is, but I'm not sure why we wouldn't just include all supported obs_* packages in lsst_apps (where the word "supported" serves as the gatekeeper).
RFC-251 defined lsst_obs in a flexible way to indicate that contents can vary between releases depending on breakage and demand. In particular it was designed to be a weaker statement than "supported", rather it implied "will not break things".
I'd like to clarify that "in production" for lsst_apps means "a production using the stack we have today", not something about operations. We have several meas_extensions packages that will be obsoleted by operations but are important stand-ins for functionality​ we haven't developed now, and (largely to satisfy Colin's "consistent story for users") I think it's important those go in lsst_apps until they are obsoleted.
What's the intended audience for lsst_apps? Specifically: who wants to install the algorithmic code that will be used "in production" (noting Jim's caveat), but wants neither the additional science algorithms (from which I surmise that they are not a science user) nor the control software and interfaces (from which I surmise that they are not setting up a production system)?
Building on the above, the downside of including a few additional algorithmic packages (basically, meas_extensions_foo) in _apps seems minimal: why not just have all science code in _apps?
Additionally:
- According to
DM-8256, lsst_obs simply contains all the obs packages we could find, with no implication they are supported (indeed, with an explicit statement that we'll drop them from the meta-package if they're broken). - There are currently packages which exist in the "lsst" GitHub org and which are clearly science algorithms, but are not part of any meta-package. meas_extensions_ngmix is the first example that comes to mind. Should we take this opportunity to clarify what their status is? I'd expected deprecated code to live in "lsst-dm", and supported code to be in _apps or _distrib.
meas_extensions_ngmix probably does just belong with deprecated code in lsst-dm. It never got polished/useful enough to be something I'd want to support science users using, and I don't see us burning effort to change that (work on making other parts of ngmix usable with the stack will proceed from other angles).
meas_extensions_simpleShape occupies a somewhat interesting category of not being that useful in general productions but having a specific important role in processing HSC focus sensors. It's harmless (easy to maintain and no onerous dependencies) and I could imagine it being useful to science users in other rare cases. Given the harmlessness, it seems like lsst_apps is the right choice, but we may want to think about where we'd put hypothetical code that e.g. did have onerous dependencies and a similarly peripheral role.
There's also meas_artifacts, which is not yet ready for production use/support but might benefit from being CI'd to avoid accumulating bitrot, as it shows promise as something we might want to use in the future. I think it just belongs in lsst_dm without being in a metapackage, but it's worth thinking about.
All of the other meas_extensions packages I can think of clearly belong in lsst_apps.
Kian-Tat Lim I think we should also mention explicitly where the lsst_sims package goes. I assume it will be in lsst_distrib, but it would be good to be explicit.
First I've heard that lsst_sims would be in lsst_distrib. It could do that in the distant future when development cycles are merged and we all agree to fix things in sims. Sims doesn't work with pybind11 at the moment so it being in lsst_distrib would have really hurt us recently.
O.K. Forget what I just said. I thought one of the metapackages included lsst_sims, but it is itself a meta package. Sorry for the noise.
Did we converge?
Do we need more time on this? How does our lsst_apps definition relate to RFC-97?
OK, after a bit of a re-think based on the above comments, here is a new proposal based on functional lines:
For any given release/tag/branch:
- lsst_obs contains all camera-specific packages (mappers, mapper configurations, camera definitions, etc.) that are tested to work with other packages in the same release/tag/branch.
- lsst_apps contains all Science Pipelines code (and its dependencies) that is expected to be used for normal processing of data. All packages of the same release/tag/branch must work together. lsst_apps includes lsst_obs.
- lsst_extensions (new) contains Science Pipelines code that is expected to be used only in specialized cases. These packages may have unusual or extensive dependencies, may be used only for certain kinds of testing, etc. All packages of the same release/tag/branch must work together and with lsst_apps at the same release/tag/branch.
- lsst_distrib contains code that implements a production environment or services that depend on one or more components of lsst_apps. All packages of the same release/tag/branch must work together, and thus this is a top-level product for CI. lsst_distrib includes lsst_apps and lsst_extensions.
- lsst_release (new) contains additional code distributed by LSST DM (whether written by us or not) that is guaranteed to work for a given release but not necessarily tag/branch. In particular, dependencies on lsst_distrib packages may have fixed versions after a release – they need not work against the master branch. Note that dependencies are not allowed to stagnate and rely on outdated code, but they don't need to be continually updated. Packages within this metapackage are their own top-level products for CI. lsst_distrib is one of them.
Other lsst_* packages:
- lsst_build and lsst_sims are separate products and are not contained within lsst_release.
- lsst_py3 is moved to legacy.
- lsst_libs and lsst_thirdparty are (at last) moved to legacy.
- lsst_distrib_tool is also moved to legacy unless someone wants to claim and save it.
- lsst_dm_stack_demo remains as is for now.
- lsst_ci contains data for additional quick testing of various cameras. I think this should become an additional default Jenkins top-level product but not part of lsst_distrib.
- lsst_qa contains data and tests for additional slower testing of various cameras. I think this should become an additional scheduled Jenkins top-level product.
I am asking for a stay to this RFC.
I am working on a counterproposal and but won't be able to get to in before the DMLT.
Given Frossie Economou's comment above, let's bump the end date of this RFC to late June. After that, I think we have to move ahead: we can't stay blocked forever.
This is closely coupled to RFC-351: setting due date to match.
This RFC is past due. Can we lock Kian-Tat Lim, Frossie Economou, and Jim Bosch in a room until we get a consistent answer to this and RFC-351?
Changed end to the week after AHM. Frossie Economou and Kian-Tat Lim will discuss.
I am changing the end date to next week. I'm not entirely convinced that will help.
Are we changing the planned end again?
Kian-Tat Lim what's the plan?
I've been waiting for the "counterproposal" from Frossie Economou, but otherwise I think I stated a clear position above. If anyone wants to set up a meeting to discuss, I'm willing to attend.
I propose we
1. eliminate lsst_distrib
2. thin out the core packages in preparation for being able to do operational deployments of smaller and smaller discrete components
3. we make a clear distinction for deployment, testing and verification between the data reduction apps, the services, and the control layers.
An example of how this could be done is attached to this ticket.
For the big bubble containing inter alia lsst_algorithms Simon Krughoff tells me it is plausible that we could break it down further to its included components.
Note that a corollary of this proposal is that only the blue components need to share a single build system, other components may pick other more appropriate tooling for dependency management and have different dependencies if necessary.
From a user docs perspective, I like Frossie's proposal. The central dotted region defines the software that is documented on "pipelines.lsst.io" itself. Each of the purple bubbles would also get developer-level codebase docs, and now it's clear that they're separate projects.
I think I like Frossie's proposal, but from a developer perspective, what would be the thing I rebuild to get "all the relevant packages" that I might want to touch as a developer, or that I'd want to test in Jenkins? lsst_distrib is a nice all-encompasing package. Would it be lsst_apps_opt? That's not an obvious name for a "test all the things" package, to me.
Given the layout of the diagram, does this imply that lsst_apps would not pull down any of lsst_testdata and thus, e.g. afwdata, testdata_cfht or testdata_jointcal, is there a missing dependency arrow there, or does lsst_testdata contain the validation_data_* datasets?
At first glance, I'm not sure it's obvious whether existing packages fall into the green or blue areas. If, for example, pipe_tasks is part of lsst_pipeline, then it might be difficult for ctrl_pool to be under lsst_mpi.
In addition,
Note that a corollary of this proposal is that only the blue components
need to share a single build system, other components may pick other
more appropriate tooling for dependency management and have different
dependencies if necessary.
If the purple or green components use code from the blue, it may be difficult for them to avoid using the same package infrastructure (unless we can provide multiple packaging mechanisms for those blue components).
Kian-Tat Lim the breakdown in that diagram is not intended to be a definitive proposal. It's more to indicate that there are already logical groups of packages. As you point out, there are some packages that are ambiguous about where they reside. I would suggest those indicate instances where we should consider breaking the package along architectural lines. I also think there are relatively few of those (less than 5).
The point of the proposal is that there are logical groupings of packages that have interesting meaning in the context of development and deployment, so we should think about possibly making those logical groupings explicit. Are you lodging a complaint about the structure of Frossie Economou proposal, or just pointing out that we need to think about it a little more?
I find it hard to understand what this proposal is without some written definitions of the different meta-packages.
(side note, lsst_apps might as well be renamed lsst_pipelines)
I think this is fine as a meta-proposal, but the devil may be in the details.
I find it hard to understand what this proposal is without some written definitions of the different meta-packages.
I think we can provide that. We just didn't have time to figure out the exact breaking points.
It would be nice to see a hierarchy like the one suggested in the diagram ultimately reflected in the product tree in LDM-148. If there is group of meta packages this should be at least conceptually described there possibly with pointers to pipelines.io. I suggest we start getting some descriptions in place to focus the discussion - i would start in LDM-148 but am happy to see it elsewhere. One may consider later if those repos which are "meta" associated should not become consolidated.
Where do we want to go with this RFC? Are we adopting with some triggered tickets to do the reorg?
My understanding is that we're waiting on a more detailed proposal from Frossie Economou & Simon Krughoff before we are ready to adopt (or otherwise) this.
OK, sorry. I didn't realize we were holding things up (though I obviously did volunteer to put something together). I'll try to do that this week.
I am hereby unvolunteering Simon Krughoff
This seems to be a problem that is most appropriate for the new release manager to tackle (who is not SQuaRE). If we need to resolve this on a shorter timeline perhaps architecture can field something more tuned to the operational release and deployment cadences?
s
I just want to deprecate lsst_distrib. How it is done I have no strong opinion on.
We've had problems achieving consensus on this and do not have sufficient cycles (particularly in SQuaRE) to resolve this at this point. We will wait to rethink this until the new Release Manager comes on board to offer fresh eyes and energy.
"contains all LSST-supported software" is too broad because then you have to wonder why DAX and Qserv (for example) aren't included.