The proposals in this RFC are split into three stages:
- Stage 1 proposals are fully backwards compatible (and essentially already implemented on DM-19392) but make design choices at least partly predicated on additional backwards-incompatible (Stage 2) changes. I would like at least some scaled down version of the Stage 1 proposal for moderately high-priority DM work (
DM-18610). The rest can be deferred indefinitely, or until I have weekend/20%-time to work on it.
- Stage 2 proposals are backwards incompatible at a manageable level, either because a deprecation-based migration seems doable or because current (especially external) dependence on these existing APIs or behavior is essentially nonexistent.
- Stage 3 proposals are not actually being proposed in this RFC; they're either notes for future potential RFCs, or they're they're things I wish we could have done differently but feel we are now essentially locked into. I'd be happy for others to convince me I'm wrong about the latter (especially on future RFCs, or their operations-era equivalent).
(all Stage 1)
IntervalI and IntervalD are 1-d counterparts to Box2I and Box2D, which can now be constructed from a pair of x and y intervals, and have accessors to return them. Many box methods are now implemented by delegating to interval implementations, and I hope more can in the future (more on this below).
The APIs and behavior of intervals are not as similar to those of boxes as they could be, however, because they're instead more similar to those of the sphgeom library, and I'm proposing here that we use those instead as a model for how we should (carefully!) evolve the box APIs.
These differences include:
- IntervalD (like IntervalI, Box2I, and all box and interval classes in sphgeom) is defined as a closed interval, with both endpoints considered to be contained by the interval, while Box2D is defined to be half-open, with the minimum included and the maximum excluded. A closed interval permits non-empty zero-size intervals (i.e. points) to be represented exactly, and the symmetry simplifies the implementation.
- In C++, every interval mutator has a factory variant that returns a new object instead, with names differing in the tense of the verb (e.g. "expandTo" and "expandedTo"). The names of the methods themselves come from or are inspired by sphgeom's more precise method names. The current box API has only the mutator forms of these methods, with names that do not lend themselves as well to adding factory variants.
- In Python, only the factory variants are available, making the interval classes unmodifiable from Python, and hence safe to use as the return value of Python properties. The new getX and getY interval accessors on the box classes thus are accompanied by .x and .y read-only properties. Most interval accessors (which return int or float) also have associated properties.
- In C++, interval construction primarily goes through static methods (i.e. the named constructor idiom) like fromMinMax, fromMinSize, and fromCenterSize. The box classes have a static method (makeCenteredBox) for one set of arguments and use regular constructors for the rest (though they can get away with this because Point and Extent have different types, while the bounds and size of an interval do not).
- In Python, interval construction goes through a single constructor that requires all arguments to be passed as keywords, allowing it to take any sane combination of min, max, center, and size.
Making the interval classes intentionally similar to sphgeom and intentionally different from the current box classes only makes sense if we can update the Box, Point, and Extent classes:
- We should add the sphgeom/interval mutators and factory methods (erode[d]By, dilate[d]By, shift[ed]By, reflect[ed]About, expand[ed]To, clip[ped]To) to the box classes (Stage 1). This would mean we would have multiple, differently-named ways to perform some operations, as I consider removing the old methods (grow, shift, flipLR, FlipTB, include, clip) - or at least some of them - to be infeasible (i.e., Stage 3), though I would support documentation-level statements that the new methods are preferred in new code.
- We should add read-only properties for all scalar and interval accessors to Box, Point, and Extent (Stage 1).
- We should deprecate and ultimately remove all mutators (namely setX and setY) from Point and Extent in Python, eventually allowing them to be unmodifiable from Python and usable in properties (Stage 2).
- We should redefine Box2D to be a closed geometry (Stage 2), allowing it to delegate more to IntervalD in its implementation (and have better, less surprising behavior, IMO). I do not believe a deprecation period for this change is doable, but I don't think any downstream non-unittest code will notice the change in behavior.
- We should make box "emptiness" separable (Stage 2), allowing a box to be empty in one dimension without forgetting its extent in the other. Such a box would still be considered empty overall, but it would allow even more delegation from box to interval (in fact, the box implementation could become essentially trivial), and more importantly it permits a box to represent an interval exactly at one limit, just as an interval can represent a point at one limit. I do not believe a deprecation period for this change is doable, but I don't think any downstream non-unittest code will notice the change in behavior.
- We should rename makeCenteredBox to fromCenterSize (in C++), after a deprecation period, to match Interval's C++ API (Stage 2), and in Python support this operation via keyword arguments to the constructor (Stage 2). Making all box construction in Python require keyword arguments is probably infeasible and unnecessary given the strong typing of Point and Extent, though requiring them would make code easier to read (Stage 3).
DM-19392 adds several Python helper methods to several geom classes to make them more usable with NumPy. Most of these are straightforward additions (Stage 1):
- IntervalI is iterable (over the points within it), and also has an arange method to return those points in a 1-d numpy.ndarray.
- Box2I similarly has a grid method that returns a pair of 2-d numpy.ndarrays with the x and y coordinates within it (yes, just numpy.meshgrid on Interval.arange).
- All intervals, boxes, and sphgeom regions now have a vectorized contains method that takes separate arrays of x, y, and (for sphgeom) z and returns a bool array. These should work on arrays of any shape, and return an array of the same shape.
- LinearTransform and AffineTransform now have vectorized function call operators that take separate arrays of x and y and return separate arrays for transformed x and y.
- SpherePoint now has a toUnitXYZ static method that transforms arrays of longitude and latitude (with a required AngleUnit argument) to the x, y, z
unit-vector arrays expected by the sphgeom region classes.
These additions are worth mentioning in this RFC because they imply that several other vectorized operations should be available, but those are a bit more problematic:
- The number of input and output dimensions in an afw.geom.Transform may (with a generic endpoint) not be known until runtime, making it at least difficult to provide APIs that take a fixed number of single-coordinate arrays without (possibly breaking) changes to the class (Stage 2, at least for TransformPoint2ToPoint2I).
- We have two Angle classes - one in sphgeom and one in geom - but neither provides a safe way to represent an array of angles (they are also both mutable in Python, which is also unfortunate). At least some of these problems should probably be solved - perhaps by using astropy objects - before adding methods that accept or return vectorized arrays of angles to SkyWcs or sphgeom regions. (Note that these would differ from the existing vectorized methods in SkyWcs that operate on lists/vectors of SpherePoints; those are still difficult and inefficient if what you have are column arrays from afw.table, astropy.table, or Pandas, as is often the case).
Much of this RFC concerns additions to the Python API at the pybind11 layer or other intentional differences between C++ and Python APIs, and at present there's no way for users to find out about them. Even in a world in which we have tooling to create Sphinx docs from Doxygen content, documenting these Python interfaces precisely will be difficult.
Because the geom package is so central and ubiquitous, I think it is reasonable to consider writing completely distinct Python docstrings (in the pybind11 source files) in this package, and living with whatever duplication that entails. If others agree, I believe there are no technical obstacles to us starting to do that now (let's call that Stage 2). But when mechanisms to use Doxygen to provide Sphinx docstrings do arrive, those mechanisms would need to support having distinct Python and C++ documentation for the same class in some cases.