Details
-
Type:
RFC
-
Status: Adopted
-
Resolution: Unresolved
-
Component/s: DM
-
Labels:
Description
This proposal attempts to address a few very different ease-of-use, complexity, and performance problems in the butler dimensions system via a coordinated set of deprecations.
These problems include:
- In many common cases (most DRP pipeline steps after step1, I think), the main bottleneck in QuantumGraph generation is the use of isinstance checks inside DataCoordinate.___getitem__ that check to see if the key is a str or Dimension instance; we do millions of these in the course of processing query results and transforming the results into a graph, and they're all unnecessary from the standpoint that the calling code can control which it passes in.
- daf_butler maintains a lot of custom dict-like and set-like containers, in large part to support similar kinds of Dimension vs. str flexibility (though some of these are also used for flexibility between DatasetType instances and their str names, which this RFC does not address).
- The DataCoordinate class's handling of implied dimensions has long been a source of confusion (if you've ever wondered what the "..." means when converting a data ID to a string, or why "physical_filter" often doesn't appear in keys(), those are pieces of this). This comes down to a "pick any two" consistency problem between wanting DataCoordinate to formally be a Mapping, wanting DataCoordinate equality to involve only required dimensions, and wanting to allow DataCoordinate instances to hold values for implied dimensions as well.
- The DimensionGraph class has a bad name, especially given the many other graphs in the Gen3 middleware. DimensionGraph is more like sorted set (it's vaguely graph-like only in the sense that the sorting is topological), and it doesn't really provide any graph-based operations.
The classes in question here are low-level butler primitives that are absolutely ubiquitous. The changes I'd like to them probably won't disturb most of the code that uses them, but I still think they merit a deprecation period, and I'm proposing them together because I think that provides a good way to manage the transition. Here's what I'd like to change:
- Introduce a new DimensionGroup class and deprecate DimensionGraph in favor of it. Everywhere the DimensionGraph API uses a Dimension or DimensionElement instance, DimensionGroup will use only string names, but they will have the same underlying role and operations. Any other APIs that return DimensionGraph instances will also be deprecated in favor of similar ones that return DimensionGroup.
- Deprecate DataCoordinate.__getitem__ support for Dimension keys, leaving support for str only there.
- Deprecate DataCoordinate's keys, values, items, and __iter__ methods, and hence effectively deprecate its inheritance from collections.abc.Mapping.
- Similarly deprecate Mapping inheritance and DimensionElement key-lookup from DataCoordinate.record, leaving only str key lookup (note that the keys of this mapping are already available from the DataCoordinate.graph.elements, or in the future, DataCooordinate.group.elements).
- Deprecate the DataCoordinate.full view property, and instead add a true-Mapping .mapping view with str only keys that include both required and (if present) implied dimension values. Also add true Mapping DataCoordinate.required and DataCoordinate.implied views with str-only keys.
The net effect of this is that Dimension instances would largely disappear from Butler APIs and user awareness; they would be used only by mostly-internal code that needs to poke at the definitions of the dimensions, rather than merely identify them. A lot of isinstance checks and similar special-casing of types will go away. I think DataCoordinate will be a lot more intuitive - in many cases users would just do DataCoordinate.mapping to get the complete dict-like thing they usually want, but (unlike that mapping!) equality comparisons on the DataCoordinate instances themselves would behave the way we want (only check required dimensions). And nobody will be left wondering what kind of graph a DimensionGraph is.
I am not certain I will dive into implementing this RFC as soon as it is adopted; there are a lot of middleware priorities we are juggling right now, and it may make the most sense to add just enough of the new proposed APIs to speed up QG generation first, without deprecating anything (and that wouldn't require an RFC). But I would still like to have the larger vision understood and accepted before adding more ways to do things, and it's conceivable that some of the simplifications here will look good as a first step towards other more priority-driven changes.
Escalating since this is explicitly about changing APIs that people outside of Butler are using.