Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-593

Interval, sphgeom API consistency, and other enhancements to the geom package

    XMLWordPrintable

    Details

    • Type: RFC
    • Status: Implemented
    • Resolution: Done
    • Component/s: DM
    • Labels:
      None

      Description

      Proposal Stages

      The proposals in this RFC are split into three stages:

      • Stage 1 proposals are fully backwards compatible (and essentially already implemented on DM-19392) but make design choices at least partly predicated on additional backwards-incompatible (Stage 2) changes.  I would like at least some scaled down version of the Stage 1 proposal for moderately high-priority DM work (DM-18610).  The rest can be deferred indefinitely, or until I have weekend/20%-time to work on it.
      • Stage 2 proposals are backwards incompatible at a manageable level, either because a deprecation-based migration seems doable or because current (especially external) dependence on these existing APIs or behavior is essentially nonexistent.
      • Stage 3 proposals are not actually being proposed in this RFC; they're either notes for future potential RFCs, or they're they're things I wish we could have done differently but feel we are now essentially locked into. I'd be happy for others to convince me I'm wrong about the latter (especially on future RFCs, or their operations-era equivalent).

      Interval Classes

      (all Stage 1)

      IntervalI and IntervalD are 1-d counterparts to Box2I and Box2D, which can now be constructed from a pair of x and y intervals, and have accessors to return them. Many box methods are now implemented by delegating to interval implementations, and I hope more can in the future (more on this below).

      The APIs and behavior of intervals are not as similar to those of boxes as they could be, however, because they're instead more similar to those of the sphgeom library, and I'm proposing here that we use those instead as a model for how we should (carefully!) evolve the box APIs.

      These differences include:

      • IntervalD (like IntervalI, Box2I, and all box and interval classes in sphgeom) is defined as a closed interval, with both endpoints considered to be contained by the interval, while Box2D is defined to be half-open, with the minimum included and the maximum excluded. A closed interval permits non-empty zero-size intervals (i.e. points) to be represented exactly, and the symmetry simplifies the implementation.
      • In C++, every interval mutator has a factory variant that returns a new object instead, with names differing in the tense of the verb (e.g. "expandTo" and "expandedTo"). The names of the methods themselves come from or are inspired by sphgeom's more precise method names. The current box API has only the mutator forms of these methods, with names that do not lend themselves as well to adding factory variants.
      • In Python, only the factory variants are available, making the interval classes unmodifiable from Python, and hence safe to use as the return value of Python properties. The new getX and getY interval accessors on the box classes thus are accompanied by .x and .y read-only properties. Most interval accessors (which return int or float) also have associated properties.
      • In C++, interval construction primarily goes through static methods (i.e. the named constructor idiom) like fromMinMax, fromMinSize, and fromCenterSize. The box classes have a static method (makeCenteredBox) for one set of arguments and use regular constructors for the rest (though they can get away with this because Point and Extent have different types, while the bounds and size of an interval do not).
      • In Python, interval construction goes through a single constructor that requires all arguments to be passed as keywords, allowing it to take any sane combination of min, max, center, and size.

      Changes to Box and Point/Extent Classes

      Making the interval classes intentionally similar to sphgeom and intentionally different from the current box classes only makes sense if we can update the Box, Point, and Extent classes:

      • We should add the sphgeom/interval mutators and factory methods (erode[d]By, dilate[d]By, shift[ed]By, reflect[ed]About, expand[ed]To, clip[ped]To) to the box classes (Stage 1). This would mean we would have multiple, differently-named ways to perform some operations, as I consider removing the old methods (grow, shift, flipLR, FlipTB, include, clip) - or at least some of them - to be infeasible (i.e., Stage 3), though I would support documentation-level statements that the new methods are preferred in new code.
      • We should add read-only properties for all scalar and interval accessors to Box, Point, and Extent (Stage 1).
      • We should deprecate and ultimately remove all mutators (namely setX and setY) from Point and Extent in Python, eventually allowing them to be unmodifiable from Python and usable in properties (Stage 2).
      • We should redefine Box2D to be a closed geometry (Stage 2), allowing it to delegate more to IntervalD in its implementation (and have better, less surprising behavior, IMO). I do not believe a deprecation period for this change is doable, but I don't think any downstream non-unittest code will notice the change in behavior.
      • We should make box "emptiness" separable (Stage 2), allowing a box to be empty in one dimension without forgetting its extent in the other. Such a box would still be considered empty overall, but it would allow even more delegation from box to interval (in fact, the box implementation could become essentially trivial), and more importantly it permits a box to represent an interval exactly at one limit, just as an interval can represent a point at one limit. I do not believe a deprecation period for this change is doable, but I don't think any downstream non-unittest code will notice the change in behavior.
      • We should rename makeCenteredBox to fromCenterSize (in C++), after a deprecation period, to match Interval's C++ API (Stage 2), and in Python support this operation via keyword arguments to the constructor (Stage 2). Making all box construction in Python require keyword arguments is probably infeasible and unnecessary given the strong typing of Point and Extent, though requiring them would make code easier to read (Stage 3).

      NumPy Integration and Angle

      DM-19392 adds several Python helper methods to several geom classes to make them more usable with NumPy. Most of these are straightforward additions (Stage 1):

      • IntervalI is iterable (over the points within it), and also has an arange method to return those points in a 1-d numpy.ndarray.
      • Box2I similarly has a grid method that returns a pair of 2-d numpy.ndarrays with the x and y coordinates within it (yes, just numpy.meshgrid on Interval.arange).
      • All intervals, boxes, and sphgeom regions now have a vectorized contains method that takes separate arrays of x, y, and (for sphgeom) z and returns a bool array. These should work on arrays of any shape, and return an array of the same shape.
      • LinearTransform and AffineTransform now have vectorized function call operators that take separate arrays of x and y and return separate arrays for transformed x and y.
      • SpherePoint now has a toUnitXYZ static method that transforms arrays of longitude and latitude (with a required AngleUnit argument) to the x, y, z
        unit-vector arrays expected by the sphgeom region classes.

      These additions are worth mentioning in this RFC because they imply that several other vectorized operations should be available, but those are a bit more problematic:

      • The number of input and output dimensions in an afw.geom.Transform may (with a generic endpoint) not be known until runtime, making it at least difficult to provide APIs that take a fixed number of single-coordinate arrays without (possibly breaking) changes to the class (Stage 2, at least for TransformPoint2ToPoint2I).
      • We have two Angle classes - one in sphgeom and one in geom - but neither provides a safe way to represent an array of angles (they are also both mutable in Python, which is also unfortunate). At least some of these problems should probably be solved - perhaps by using astropy objects - before adding methods that accept or return vectorized arrays of angles to SkyWcs or sphgeom regions. (Note that these would differ from the existing vectorized methods in SkyWcs that operate on lists/vectors of SpherePoints; those are still difficult and inefficient if what you have are column arrays from afw.table, astropy.table, or Pandas, as is often the case).

      On Documentation

      Much of this RFC concerns additions to the Python API at the pybind11 layer or other intentional differences between C++ and Python APIs, and at present there's no way for users to find out about them.  Even in a world in which we have tooling to create Sphinx docs from Doxygen content, documenting these Python interfaces precisely will be difficult.

      Because the geom package is so central and ubiquitous, I think it is reasonable to consider writing completely distinct Python docstrings (in the pybind11 source files) in this package, and living with whatever duplication that entails. If others agree, I believe there are no technical obstacles to us starting to do that now (let's call that Stage 2). But when mechanisms to use Doxygen to provide Sphinx docstrings do arrive,  those mechanisms would need to support having distinct Python and C++ documentation for the same class in some cases.

        Attachments

          Issue Links

            Activity

            Hide
            swinbank John Swinbank added a comment -

            Can you comment on the relationship between Interval and Span? Obviously they aren't the same in detail, but superficially they seem to be working in the same problem space — do we need both?

            Show
            swinbank John Swinbank added a comment - Can you comment on the relationship between Interval and Span ? Obviously they aren't the same in detail, but superficially they seem to be working in the same problem space — do we need both?
            Hide
            jbosch Jim Bosch added a comment - - edited

            A Span knows its position in y, and is explicitly a range of x (and never y).  It is conceptually a y scalar coordinate and an x Interval, or an (x, y) Point and x size.  I forgot to mention above that on DM-19392 I have also added an interval constructor and accessor to Span, and it would probably make sense to go a step further also give the same treatment proposed for Box in terms of sphgeom-like mutators/factories and possibly consider Python-side immutability.  It would also probably benefit from being reimplemented explicitly in terms of Interval.

            Show
            jbosch Jim Bosch added a comment - - edited A Span knows its position in y, and is explicitly a range of x (and never y).  It is conceptually a y scalar coordinate and an x Interval, or an (x, y) Point and x size.  I forgot to mention above that on DM-19392 I have also added an interval constructor and accessor to Span, and it would probably make sense to go a step further also give the same treatment proposed for Box in terms of sphgeom-like mutators/factories and possibly consider Python-side immutability.  It would also probably benefit from being reimplemented explicitly in terms of Interval.
            Hide
            tjenness Tim Jenness added a comment -

            Is it still the case that we don't support 64-bit int for any of these things? That seems like an oversight in this day and age. Python has no problem with larger ints and in Python a BoxI could be whatever "I" we wanted. Is the only solution to add new templates for K (PointK, Box2K etc) to the API? (K is the FITS character for 64-bit int).

            Show
            tjenness Tim Jenness added a comment - Is it still the case that we don't support 64-bit int for any of these things? That seems like an oversight in this day and age. Python has no problem with larger ints and in Python a BoxI could be whatever "I" we wanted. Is the only solution to add new templates for K (PointK, Box2K etc) to the API? (K is the FITS character for 64-bit int).
            Hide
            rowen Russell Owen added a comment -

            I am concerned about using closed ranges for float intervals and boxes. With an open end it is possible for a set of intervals or boxes to exactly cover a range or region. With a closed end there will be overlap at the boundaries.

            Show
            rowen Russell Owen added a comment - I am concerned about using closed ranges for float intervals and boxes. With an open end it is possible for a set of intervals or boxes to exactly cover a range or region. With a closed end there will be overlap at the boundaries.
            Hide
            jbosch Jim Bosch added a comment - - edited

            I think we could probably switch from 32-bit int to 64-bit int with minimal to no loss in backwards compatibility.  We do currently rely on having int64 available to the implementation as a larger type in order to detect integer overflow (especially on the DM-19392 branch), but one could argue that loss of checking for overflow is fine if we, well, never overflow.  It would be a bigger change to propagate int64 usage to the image classes, which might be necessary to actually be able to take advantage of int64 geometries; I don't think I can comment usefully without looking more carefully at the code how hard that would be.

             

            I am concerned about using closed ranges for float intervals and boxes. With an open end it is possible for a set of intervals or boxes to exactly cover a range or region. With a closed end there will be overlap at the boundaries.

            This is indeed the big drawback of closed floating-point geometries.  I think I've just come to the conclusion that we simply don't have a use case for tiling a region with floating-point boxes, as any time we want exact, exclusive tiling we're really in an integer domain and can use Box2I/IntervalI.  In any case I wasn't able to come up with a use case.

            Show
            jbosch Jim Bosch added a comment - - edited I think we could probably switch from 32-bit int to 64-bit int with minimal to no loss in backwards compatibility.  We do currently rely on having int64 available to the implementation as a larger type in order to detect integer overflow (especially on the DM-19392 branch), but one could argue that loss of checking for overflow is fine if we, well, never overflow.  It would be a bigger change to propagate int64 usage to the image classes, which might be necessary to actually be able to take advantage of int64 geometries; I don't think I can comment usefully without looking more carefully at the code how hard that would be.   I am concerned about using closed ranges for float intervals and boxes. With an open end it is possible for a set of intervals or boxes to exactly cover a range or region. With a closed end there will be overlap at the boundaries. This is indeed the big drawback of closed floating-point geometries.  I think I've just come to the conclusion that we simply don't have a use case for tiling a region with floating-point boxes, as any time we want exact, exclusive tiling we're really in an integer domain and can use Box2I/IntervalI.  In any case I wasn't able to come up with a use case.
            Hide
            krzys Krzysztof Findeisen added a comment -

            This proposal makes me a bit nervous. You seem to be adding a lot of complexity to the Box/Point/Interval API (especially cross C++/Python, but also within either language) without actually adding any functionality. That will make these classes harder to learn and use consistently. Can you explain how exactly these changes are necessary for the cameraGeom work you cite?

            Show
            krzys Krzysztof Findeisen added a comment - This proposal makes me a bit nervous. You seem to be adding a lot of complexity to the Box / Point / Interval API (especially cross C++/Python, but also within either language) without actually adding any functionality. That will make these classes harder to learn and use consistently. Can you explain how exactly these changes are necessary for the cameraGeom work you cite?
            Hide
            jsick Jonathan Sick added a comment -

            Commenting on the documentation angle of this, everything you've said here sounds good to me:

            • Writing Numpydoc-formatted docstrings in the pybind11 layer makes total sense and should work out-of-the-box, as you say. Given that we are increasingly more process-oriented towards changing public APIs, it's also becoming safer to "duplicate" API reference content between C++ and pybind11 (and when we say "duplicate," we really mean copy and re-write as appropriate for the Python context). I think this copy+pasting+minor rewrite is fairy low cost for DM, and produces a vastly better user experience over any other scheme I can imagine.
            • I see no issue with having API documentation for both lsst::example::MyClass and lsst.example.MyClass. Our documenation infrastructure will treat those as entirely distinct APIs. I do need to create a story to document a recommended way of cross-referencing the C++ and Python APIs.
            Show
            jsick Jonathan Sick added a comment - Commenting on the documentation angle of this, everything you've said here sounds good to me: Writing Numpydoc-formatted docstrings in the pybind11 layer makes total sense and should work out-of-the-box, as you say. Given that we are increasingly more process-oriented towards changing public APIs, it's also becoming safer to "duplicate" API reference content between C++ and pybind11 (and when we say "duplicate," we really mean copy and re-write as appropriate for the Python context). I think this copy+pasting+minor rewrite is fairy low cost for DM, and produces a vastly better user experience over any other scheme I can imagine. I see no issue with having API documentation for both  lsst::example::MyClass and lsst.example.MyClass . Our documenation infrastructure will treat those as entirely distinct APIs. I do  need to create a story to document a recommended way of cross-referencing the C++ and Python APIs.
            Hide
            jbosch Jim Bosch added a comment - - edited

            This proposal makes me a bit nervous. You seem to be adding a lot of complexity to the Box/Point/Interval API (especially cross C++/Python, but also within either language) without actually adding any functionality. That will make these classes harder to learn and use consistently. Can you explain how exactly these changes are necessary for the cameraGeom work you cite?

            It'd be fair to characterize most of the changes here as adding some (I think modest) complexity to the implementation of these classes in order to (I hope!) make their interfaces easier to learn and use consistently from Python:

            • it gives us consistent method naming and semantics for the same operations across all geom and sphgeom classes;
            • it enables natural Python/NumPy idioms (keyword arguments rather than type-based dispatch in constructors, properties, vectorized operations).

            And while the current state of DM-19392 is indeed more complex than what we had before, I think it's likely that a future (post Stage 2) state of the codebase would actually be simpler than what we have now, because the Box and Span classes would delegate trivially to Intervals for nearly every operation, guaranteeing that they have the same behavior as each other (and the same behavior as Interval, of course) instead of relying on separate implementations.

            But to answer your original question, the only genuinely new functionality here is the interval classes, which are also what I'd like for the cameraGeom work, in particular for overscan and prescan regions.  Our current practice of using independent boxes to describe regions which can only differ in one dimension (e.g. the serial overscan must have the same y extent as the data section) has led to some confusion and misuse in the obs packages, and the interfaces and particularly the implementation I have in mind to address those problems would very much benefit from having a true interval class that's more than just a pair of integers.  That said, I should clarify that none of this is strictly necessary for that cameraGeom work - adding interval and making geom APIs consistent with sphgeom APIs is something I've been mulling (and prototyping on the side) for a long time (since I reviewed Serge's first draft of sphgeom, actually), and this just seemed like a good time to finally push to get it in, as it would make that cameraGeom work easier and the outcome (IMO) better.

            Show
            jbosch Jim Bosch added a comment - - edited This proposal makes me a bit nervous. You seem to be adding a lot of complexity to the Box / Point / Interval API (especially cross C++/Python, but also within either language) without actually adding any functionality. That will make these classes harder to learn and use consistently. Can you explain how exactly these changes are necessary for the cameraGeom work you cite? It'd be fair to characterize most of the changes here as adding some (I think modest) complexity to the implementation of these classes in order to (I hope!) make their interfaces easier to learn and use consistently from Python: it gives us consistent method naming and semantics for the same operations across all geom and sphgeom classes; it enables natural Python/NumPy idioms (keyword arguments rather than type-based dispatch in constructors, properties, vectorized operations). And while the current state of DM-19392 is indeed more complex than what we had before, I think it's likely that a future (post Stage 2) state of the codebase would actually be simpler than what we have now, because the Box and Span classes would delegate trivially to Intervals for nearly every operation, guaranteeing that they have the same behavior as each other (and the same behavior as Interval, of course) instead of relying on separate implementations. But to answer your original question, the only genuinely new functionality here is the interval classes, which are also what I'd like for the cameraGeom work, in particular for overscan and prescan regions.  Our current practice of using independent boxes to describe regions which can only differ in one dimension (e.g. the serial overscan must have the same y extent as the data section) has led to some confusion and misuse in the obs packages, and the interfaces and particularly the implementation I have in mind to address those problems would very much benefit from having a true interval class that's more than just a pair of integers.  That said, I should clarify that none of this is strictly necessary for that cameraGeom work - adding interval and making geom APIs consistent with sphgeom APIs is something I've been mulling (and prototyping on the side) for a long time (since I reviewed Serge's first draft of sphgeom, actually), and this just seemed like a good time to finally push to get it in, as it would make that cameraGeom work easier and the outcome (IMO) better.
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            Sorry, but I don't see how even the Python API is less confusing under this proposal:

            • Python having only non-modifying methods would be good in isolation, but the inconsistency with the C++ interface will cause confusion for anyone who's not a pure Python programmer. This will be true even if we carefully document the difference. If we could remove the modifying versions from C++ (also simplifying that API), it would be a different story.
            • The Python objects will still be mutable, because their state may change any time you pass them to a C++ function. Since the class looks immutable, and it's not always clear whether a method eventually calls C++ code or not, this fact may lead to hard-to-trace bugs.
            • Points, Intervals, and Boxes would follow different constructor conventions in Python, particularly with how keywords are handled (though arguably the situation with Box would be no worse than before).
            • Multiple methods (sphgeom vs. geom-style) that do the same thing, especially since you described removing the geom-style methods as "infeasible".
            • Mix of static and instance methods for related operations (though I think this only applies to SpherePoint?)
            Show
            krzys Krzysztof Findeisen added a comment - - edited Sorry, but I don't see how even the Python API is less confusing under this proposal: Python having only non-modifying methods would be good in isolation, but the inconsistency with the C++ interface will cause confusion for anyone who's not a pure Python programmer. This will be true even if we carefully document the difference. If we could remove the modifying versions from C++ (also simplifying that API), it would be a different story. The Python objects will still be mutable, because their state may change any time you pass them to a C++ function. Since the class looks immutable, and it's not always clear whether a method eventually calls C++ code or not, this fact may lead to hard-to-trace bugs. Points, Intervals, and Boxes would follow different constructor conventions in Python, particularly with how keywords are handled (though arguably the situation with Box would be no worse than before). Multiple methods (sphgeom vs. geom-style) that do the same thing, especially since you described removing the geom-style methods as "infeasible". Mix of static and instance methods for related operations (though I think this only applies to SpherePoint ?)
            Hide
            jbosch Jim Bosch added a comment - - edited

            Python having only non-modifying methods would be good in isolation, but the inconsistency with the C++ interface will cause confusion for anyone who's not a pure Python programmer. This will be true even if we carefully document the difference. If we could remove the modifying versions from C++ (also simplifying that API), it would be a different story.

            I'm willing to give much more weight to the pure-Python programmers, who will be far more numerous.  But simply removing the modifying versions from C++ is also a very good idea - we'd also want to go in and remove them from sphgeom, but I think those can be feasibly deprecated and retired.  When I started this I thought the in-place would be more efficient, and that'd be a good reason to keep them, but it turns out that if you want exception safety it's often better for the in-place operations to delegate to the factory operations.

            The Python objects will still be mutable, because their state may change any time you pass them to a C++ function. Since the class looks immutable, and it's not always clear whether a method eventually calls C++ code or not, this fact may lead to hard-to-trace bugs.

            Yes, and that's why I referred to these as not-modifiable from Python rather than immutable.  But I think the chance is essentially negligible that someone will write a totally new C++ signature that takes one of these objects by non-const reference or pointer and modifies it in-place, and then someone calls that on a object returned by a property and expects that to modify the holder of the property in-place.  And if they do, it's no more confusing than the kind that happens due to the ubiquitous aliasing of mutable objects in pure-Python code already.

            And your proposal to remove the mutators from C++ makes that confusion even less likely, because that would leave the C++ assignment operator as the only thing keeping these objects from being truly immutable (and I do want to preserve that, as removing it would lead to us having to hold them by pointer in lots of places, and that'd really be a waste).

            Points, Intervals, and Boxes would follow different constructor conventions in Python, particularly with how keywords are handled (though arguably the situation with Box would be no worse than before).

            Multiple methods (sphgeom vs. geom-style) that do the same thing, especially since you described removing the geom-style methods as "infeasible".

            The difference is that there would now be one set of constructor conventions and method signatures that would work for all of these classes, even though there might remain some conventions/signatures that only work for some of them.  I regard that as significantly better than the current state, in which there is no overlap in the conventions/signatures for the same operations across classes.

            Show
            jbosch Jim Bosch added a comment - - edited Python having only non-modifying methods would be good in isolation, but the inconsistency with the C++ interface will cause confusion for anyone who's not a pure Python programmer. This will be true even if we carefully document the difference. If we could remove the modifying versions from C++ (also simplifying that API), it would be a different story. I'm willing to give much more weight to the pure-Python programmers, who will be far more numerous.  But simply removing the modifying versions from C++ is also a very good idea - we'd also want to go in and remove them from sphgeom, but I think those can be feasibly deprecated and retired.  When I started this I thought the in-place would be more efficient, and that'd be a good reason to keep them, but it turns out that if you want exception safety it's often better for the in-place operations to delegate to the factory operations. The Python objects will still be mutable, because their state may change any time you pass them to a C++ function. Since the class looks immutable, and it's not always clear whether a method eventually calls C++ code or not, this fact may lead to hard-to-trace bugs. Yes, and that's why I referred to these as not-modifiable from Python rather than immutable.  But I think the chance is essentially negligible that someone will write a totally new C++ signature that takes one of these objects by non-const reference or pointer and modifies it in-place, and then someone calls that on a object returned by a property and expects that to modify the holder of the property in-place.  And if they do, it's no more confusing than the kind that happens due to the ubiquitous aliasing of mutable objects in pure-Python code already. And your proposal to remove the mutators from C++ makes that confusion even less likely, because that would leave the C++ assignment operator as the only thing keeping these objects from being truly immutable (and I do want to preserve that, as removing it would lead to us having to hold them by pointer in lots of places, and that'd really be a waste). Points, Intervals, and Boxes would follow different constructor conventions in Python, particularly with how keywords are handled (though arguably the situation with Box would be no worse than before). Multiple methods (sphgeom vs. geom-style) that do the same thing, especially since you described removing the geom-style methods as "infeasible". The difference is that there would now be one set of constructor conventions and method signatures that would work for all of these classes, even though there might remain some conventions/signatures that only work for some of them.  I regard that as significantly better than the current state, in which there is no overlap in the conventions/signatures for the same operations across classes.
            Hide
            krzys Krzysztof Findeisen added a comment -

            The difference is that there would now be one set of constructor conventions and method signatures that would work for all of these classes, even though there might remain some conventions/signatures that only work for some of them. I regard that as significantly better than the current state, in which there is no overlap in the conventions/signatures for the same operations across classes.

            Hmm... looking more closely at the code, it looks like Interval only allows the same kinds of combinations as Box. I'd argue that's confusing in a different way, but it does invalidate my original point.

            Show
            krzys Krzysztof Findeisen added a comment - The difference is that there would now be one set of constructor conventions and method signatures that would work for all of these classes, even though there might remain some conventions/signatures that only work for some of them. I regard that as significantly better than the current state, in which there is no overlap in the conventions/signatures for the same operations across classes. Hmm... looking more closely at the code, it looks like Interval only allows the same kinds of combinations as Box . I'd argue that's confusing in a different way, but it does invalidate my original point.
            Hide
            jbosch Jim Bosch added a comment - - edited

            I don't think there's much new debate likely to happen here unprompted, but opinions are not universal on whether everything here is an improvement, and this has an impact on the sorts of API stability questions the CCB ought to weigh in on. I'd like the CCB to take a look at this.

            This is of course a very large and detailed proposal, and it's probably not reasonable for everyone on the CCB to read it in detail. I can try to add some additional summary material, but I would like someone other than myself and Krzysztof Findeisen to have done so, however; would it be in the purview of the CCB to nominate someone?

            There is not a lot of time pressure; I am unlikely to be able to return to the cameraGeom work that would ideally build on this until mid-June.

            Show
            jbosch Jim Bosch added a comment - - edited I don't think there's much new debate likely to happen here unprompted, but opinions are not universal on whether everything here is an improvement, and this has an impact on the sorts of API stability questions the CCB ought to weigh in on. I'd like the CCB to take a look at this. This is of course a very large and detailed proposal, and it's probably not reasonable for everyone on the CCB to read it in detail. I can try to add some additional summary material, but I would like someone other than myself and Krzysztof Findeisen to have done so, however; would it be in the purview of the CCB to nominate someone? There is not a lot of time pressure; I am unlikely to be able to return to the cameraGeom work that would ideally build on this until mid-June.
            Hide
            ktl Kian-Tat Lim added a comment -

            The CCB is nominating me to take a look.  I will do so over the next couple of weeks.

            Show
            ktl Kian-Tat Lim added a comment - The CCB is nominating me to take a look.  I will do so over the next couple of weeks.
            Hide
            ktl Kian-Tat Lim added a comment -

            Finally getting to this.

            Questions:

            • Are the boundaries of a box (especially a BoxI when used with an image) infinitesimally-narrow lines, or are they single pixels? I can see closed interval behavior for the former, but it could be dangerous for the latter.
            • Would you be averse to an extra-long but still finite deprecation period for the duplicated methods? That helps to mitigate the "multiple methods doing the same thing" issue, since one will start to give an explicit warning.
            • I think it would be highly desirable to make these immutable objects. I don't think the C++ assignment operator itself poses a problem here; if I'm reading this right, the problem would come from exposing member variables to assignment. Is what you're suggesting?
            • I'm a little worried about all this explicit vectorization bypassing the type system, but maybe that's inevitable in a NumPy/pandas world. But can't you get the n-dimensional vectorized transform methods to take 1-d arrays by using *args? Something like TransformPoint2ToPoint2.applyForward(List[float], List[float]) -> Tuple[List[float], List[float]]

            In general, I think most of Krzysztof Findeisen's objections and my similar concerns can be handled by making the proposal more ambitious: fully align interfaces, remove duplication at the cost of backward-compatibility, go for C++ immutability.

            I think copy-and-edit documentation in pybind11 is going to be required for some time. I can't foresee write-once Python and C++ documentation without significant tooling build-out.

            Show
            ktl Kian-Tat Lim added a comment - Finally getting to this. Questions: Are the boundaries of a box (especially a BoxI when used with an image) infinitesimally-narrow lines, or are they single pixels? I can see closed interval behavior for the former, but it could be dangerous for the latter. Would you be averse to an extra-long but still finite deprecation period for the duplicated methods? That helps to mitigate the "multiple methods doing the same thing" issue, since one will start to give an explicit warning. I think it would be highly desirable to make these immutable objects. I don't think the C++ assignment operator itself poses a problem here; if I'm reading this right, the problem would come from exposing member variables to assignment. Is what you're suggesting? I'm a little worried about all this explicit vectorization bypassing the type system, but maybe that's inevitable in a NumPy/pandas world. But can't you get the n-dimensional vectorized transform methods to take 1-d arrays by using *args ? Something like TransformPoint2ToPoint2.applyForward(List [float] , List [float] ) -> Tuple[List [float] , List [float] ] In general, I think most of Krzysztof Findeisen 's objections and my similar concerns can be handled by making the proposal more ambitious: fully align interfaces, remove duplication at the cost of backward-compatibility, go for C++ immutability. I think copy-and-edit documentation in pybind11 is going to be required for some time. I can't foresee write-once Python and C++ documentation without significant tooling build-out.
            Hide
            jbosch Jim Bosch added a comment - - edited

            Are the boundaries of a box (especially a BoxI when used with an image) infinitesimally-narrow lines, or are they single pixels? I can see closed interval behavior for the former, but it could be dangerous for the latter.

            Box2D boundaries are infinitesimally narrow lines, while Box2I's are full pixels.  But Box2I has always been a closed geometry, and the conversions between the classes and their differing semantics for size reflect this.  The only change proposed here is that Box2D's infinitesimally narrow boundary is considered part of the box on all sides, rather than just the lower boundary.

            Would you be averse to an extra-long but still finite deprecation period for the duplicated methods? That helps to mitigate the "multiple methods doing the same thing" issue, since one will start to give an explicit warning.

            I forget whether our deprecation procedure provides a way to suppress warnings in old code.  If so, I'm on board with this - I'd love to have warnings in new code to encourage using the preferred APIs, but I don't want a ton of warnings in old code just because we haven't yet decided it's worthwhile to convert it yet.

            I think it would be highly desirable to make these immutable objects. I don't think the C++ assignment operator itself poses a problem here; if I'm reading this right, the problem would come from exposing member variables to assignment. Is what you're suggesting?

            Having a C++ assignment operator definitely does break immutability; I'm not sure if that's what you meant by "poses a problem".  I am indeed arguing that you can still have a C++ assignment operator (and other mutating methods) in C++ and still avoid the problems associated with Python properties that return mutable objects, as long as you don't provide Python versions of those methods.

            To expand a bit, I think "value-held" objects like those here are quite different from "shared_ptr-held" (often polymorphic) objects, which do gain considerably from C++ immutability.  Value-held objects are rarely aliased in C++ code, so there simply isn't much value to immutability there.  The problem of course is that Python effectively holds everything by shared_ptr, so there's a ton of aliasing there, and if there are ways to mutate these objects from Python that's a problem.  But we don't have to remove C++ mutability to achieve that; instead we can:

            • always copy or move when returning objects through wrapped C++ APIs;
            • never take non-const references to these objects in wrapped C++ APIs;
            • never wrap mutating methods on value-held classes.

            Aside from the last (which this proposal aims to address), these are good practices anyway, and if there are any exceptions to them (for geom classes, at least) in the codebase, they're extremely rare.   The result is that the set of (aliased, effectively immutable) Python instances and the set of (effectively unaliased, but mutable) C++ instances are disjoint, and hence we have no shared state problems from mutable, aliased instances.

            Apologies if I just explained something that you already understood, but I wanted to make sure we were on the same page.  If you are arguing that we should go further and also remove the mutating methods from the C++ APIs while leaving the assignment operator, I'd be okay with that; it would both make C++ and Python code more similar and simplify their implementations, both of which are good things, even if it doesn't affect the kind of immutability that matters for reasoning about shared state.

            I'm a little worried about all this explicit vectorization bypassing the type system, but maybe that's inevitable in a NumPy/pandas world.

            That's my conclusion, too - for better or (probably) worse, using NumPy efficiently really means using arrays of POD in many places where it'd be vastly preferable to use classes/structs in any compiled language.

            But can't you get the n-dimensional vectorized transform methods to take 1-d arrays by using *args? Something like TransformPoint2ToPoint2.applyForward(List[float], List[float]) -> Tuple[List[float], List[float]]

            Good point!  I'm not sure I'd want to add it to the Stage 1 implementation ticket I've basically completed, but this certainly seems quite doable.

            Show
            jbosch Jim Bosch added a comment - - edited Are the boundaries of a box (especially a BoxI when used with an image) infinitesimally-narrow lines, or are they single pixels? I can see closed interval behavior for the former, but it could be dangerous for the latter. Box2D boundaries are infinitesimally narrow lines, while Box2I's are full pixels.  But Box2I has always been a closed geometry, and the conversions between the classes and their differing semantics for size reflect this.  The only change proposed here is that Box2D's infinitesimally narrow boundary is considered part of the box on all sides, rather than just the lower boundary. Would you be averse to an extra-long but still finite deprecation period for the duplicated methods? That helps to mitigate the "multiple methods doing the same thing" issue, since one will start to give an explicit warning. I forget whether our deprecation procedure provides a way to suppress warnings in old code.  If so, I'm on board with this - I'd love to have warnings in new code to encourage using the preferred APIs, but I don't want a ton of warnings in old code just because we haven't yet decided it's worthwhile to convert it yet. I think it would be highly desirable to make these immutable objects. I don't think the C++ assignment operator itself poses a problem here; if I'm reading this right, the problem would come from exposing member variables to assignment. Is what you're suggesting? Having a C++ assignment operator definitely does break immutability; I'm not sure if that's what you meant by "poses a problem".  I am indeed arguing that you can still have a C++ assignment operator (and other mutating methods) in C++ and still avoid the problems associated with Python properties that return mutable objects, as long as you don't provide Python versions of those methods. To expand a bit, I think "value-held" objects like those here are quite different from "shared_ptr-held" (often polymorphic) objects, which do gain considerably from C++ immutability.  Value-held objects are rarely aliased in C++ code, so there simply isn't much value to immutability there.  The problem of course is that Python effectively holds everything by shared_ptr, so there's a ton of aliasing there, and if there are ways to mutate these objects from Python that's a problem.  But we don't have to remove C++ mutability to achieve that; instead we can: always copy or move when returning objects through wrapped C++ APIs; never take non-const references to these objects in wrapped C++ APIs; never wrap mutating methods on value-held classes. Aside from the last (which this proposal aims to address), these are good practices anyway, and if there are any exceptions to them (for geom classes, at least) in the codebase, they're extremely rare.   The result is that the set of (aliased, effectively immutable) Python instances and the set of (effectively unaliased, but mutable) C++ instances are disjoint, and hence we have no shared state problems from mutable, aliased instances. Apologies if I just explained something that you already understood, but I wanted to make sure we were on the same page.  If you are arguing that we should go further and also remove the mutating methods from the C++ APIs while leaving the assignment operator, I'd be okay with that; it would both make C++ and Python code more similar and simplify their implementations, both of which are good things, even if it doesn't affect the kind of immutability that matters for reasoning about shared state. I'm a little worried about all this explicit vectorization bypassing the type system, but maybe that's inevitable in a NumPy/pandas world. That's my conclusion, too - for better or (probably) worse, using NumPy efficiently really means using arrays of POD in many places where it'd be vastly preferable to use classes/structs in any compiled language. But can't you get the n-dimensional vectorized transform methods to take 1-d arrays by using *args ? Something like TransformPoint2ToPoint2.applyForward(List [float] , List [float] ) -> Tuple[List [float] , List [float] ] Good point!  I'm not sure I'd want to add it to the Stage 1 implementation ticket I've basically completed, but this certainly seems quite doable.
            Hide
            ktl Kian-Tat Lim added a comment -

            BTW, I was arguing for removing the mutating C++ methods.

            Show
            ktl Kian-Tat Lim added a comment - BTW, I was arguing for removing the mutating C++ methods.
            Hide
            jbosch Jim Bosch added a comment -

            BTW, I was arguing for removing the mutating C++ methods.

            I'll drop them.

             

            Show
            jbosch Jim Bosch added a comment - BTW, I was arguing for removing the mutating C++ methods. I'll drop them.  
            Hide
            jbosch Jim Bosch added a comment -

            Looks like I marked this as Implemented prematurely, because I only completed one of the two triggered tickets.  But now Jira doesn't seem to give me a way to fix that.

            Show
            jbosch Jim Bosch added a comment - Looks like I marked this as Implemented prematurely, because I only completed one of the two triggered tickets.  But now Jira doesn't seem to give me a way to fix that.

              People

              Assignee:
              jbosch Jim Bosch
              Reporter:
              jbosch Jim Bosch
              Watchers:
              Jim Bosch, John Parejko, Jonathan Sick, Kian-Tat Lim, Krzysztof Findeisen, Russell Owen, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Planned End:

                  Jenkins

                  No builds found.