Details
-
Type:
RFC
-
Status: Implemented
-
Resolution: Done
-
Component/s: DM
-
Labels:None
Description
Currently we store Task metadata in a PropertySet that contains PropertyList. This was done mostly because it was convenient at the time and supported some nice features like add() and splitting hierarchy on a period.
The downside of using PropertySet is that it leads to pipe_base depending on daf_base and pulls in a C++ compiler. This makes it effectively impossible for Task to be used outside the project (see also RFC-775 for related discussion on dependences of pipe_base).
Given that Task metadata is never passed into C++ code, there is no reason we have to use a specialist C++ class to implement it.
We propose that Task metadata be changed to use a pure Python Mapping class which reimplements the add() and single-level dot hierarchy but otherwise follows the standard dict-like interface.
As an interim measure this new TaskMetadata class will support other PropertySet methods but they will issue deprecation warnings and will be cleaned up over time. In particular, code that calls .set() can already be replaced with standard dict [] assignment syntax.
A simple TaskMetadata prototype has already been written that builds lsst_distrib and ci_hsc_gen3.
During the transition there will be a need to have new and old forms of metadata coexisting in single repositories. The simplest approach would be to adopt a new dataset type name but if continuity is required a migration script could be written that would change the definition in existing repositories and a formatter could be constructed that recognizes PropertySet and casts to TaskMetadata. I would welcome opinions on this in the RFC.
In conjunction with RFC-782 this should remove all C++ dependencies from the Task infrastructure.
Attachments
Issue Links
- is triggering
-
DM-32682 Create TaskMetadata class to replace PropertySet in pipe_base
- Done
-
DM-32883 Investigate PropertySet to TaskMetadata migration
- Done
-
DM-33155 Investigate dynamic task metadata type selection in pipelines
- Done
- relates to
-
DM-33220 lsst.verify tasks assume metadata is PropertySet
- Done
-
RFC-804 Make dataset type names non-unique in data repositories
- Withdrawn
-
RFC-775 Reorganize pipelines and packages at the top of the Science Pipelines codebase
- Adopted
-
RFC-782 Remove Task dependency on lsst.log
- Implemented
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
Ah, I wasn't thinking about the in-memory type at all, and yes, that's important to figure out. It'd be ideal if we could switch to using TaskMetadata in code all the time, and have the backwards-compatibility layer kick in on write (as well as in dataset type registration). I'll take a quick look at various code paths tomorrow and see how how hard it looks.