Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-32953

Design gauss2dfit classes and new MultiProFit pipeline task

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Story Points:
      3
    • Team:
      Data Release Production
    • Urgent?:
      No

      Description

      Following up on DM-30040, design future additions to gauss2dfit to perform modelling entirely in C++, and re-design the existing pipelinetask to meet DM guidelines (no external IO) and fit one model per run, rather than doing PSF fitting + multiple model fits simultaneously.

        Attachments

          Activity

          Hide
          dtaranu Dan Taranu added a comment -

          DM-30040 is required reading for the descriptions of the packages and lower-level structures.

          Priors are the only major addition that could go into Parameters (at least the interface). I haven't fully decided how to implement them, except that I don't want a 1:1 mapping of Priors:Parameters, but 1:N. I almost certainly want only N-dimensional Gaussian priors applied to transformed and limited Parameters. If it makes sense to do so, I'll include the interface in Parameters.

          Show
          dtaranu Dan Taranu added a comment - DM-30040 is required reading for the descriptions of the packages and lower-level structures. Priors are the only major addition that could go into Parameters (at least the interface). I haven't fully decided how to implement them, except that I don't want a 1:1 mapping of Priors:Parameters, but 1:N. I almost certainly want only N-dimensional Gaussian priors applied to transformed and limited Parameters. If it makes sense to do so, I'll include the interface in Parameters.
          Hide
          dtaranu Dan Taranu added a comment - - edited

          Gauss2DFit will need Component, Source, Model and Modeller classes. I think the relationships between these in MultiProFit is essentially fine. I may restructure whether Sources or Components store a PhotometricModel (probably Sources), and slightly tweak the AstrometricModel (still only one planned implementation of a fixed, band-independent centroid).

          The inheritors and modifiers of Parameters can likely be removed entirely, since these will now all be shared_ptr. Inheritors should just be the same pointer, and the logic for modifiers should live in the relevant Source class.

          Many function implementations will change completely. Right now in MultiProFit (python), each Source/Component returns a dict of objects and/or floats, which are then re-interpreted by the Model to actually render images. This is wasteful and the main source of the slowness in MultiProFit. Each Component/Source should instead implement a get_gaussians method that returns a (Parametric)Gaussian, ready to be passed to a GaussianEvaluator. For model fitting, these objects can and should be generated only once prior to the start of fitting, and each fit iteration can simply update the free parameters directly.

          In the future, if we want to keep support for e.g. GalSim-based models, that should be possible. I will likely implement renderers as an enum class, and have something like a get_galsim_objects function. In principle, there's no reason why one couldn't mix and match sources or even components with GalSim and Gauss2DFit-based models, although I'm not going to make this a high priority.

          Show
          dtaranu Dan Taranu added a comment - - edited Gauss2DFit will need Component, Source, Model and Modeller classes. I think the relationships between these in MultiProFit is essentially fine. I may restructure whether Sources or Components store a PhotometricModel (probably Sources), and slightly tweak the AstrometricModel (still only one planned implementation of a fixed, band-independent centroid). The inheritors and modifiers of Parameters can likely be removed entirely, since these will now all be shared_ptr. Inheritors should just be the same pointer, and the logic for modifiers should live in the relevant Source class. Many function implementations will change completely. Right now in MultiProFit (python), each Source/Component returns a dict of objects and/or floats, which are then re-interpreted by the Model to actually render images. This is wasteful and the main source of the slowness in MultiProFit. Each Component/Source should instead implement a get_gaussians method that returns a (Parametric)Gaussian, ready to be passed to a GaussianEvaluator. For model fitting, these objects can and should be generated only once prior to the start of fitting, and each fit iteration can simply update the free parameters directly. In the future, if we want to keep support for e.g. GalSim-based models, that should be possible. I will likely implement renderers as an enum class, and have something like a get_galsim_objects function. In principle, there's no reason why one couldn't mix and match sources or even components with GalSim and Gauss2DFit-based models, although I'm not going to make this a high priority.
          Hide
          dtaranu Dan Taranu added a comment -

          For the Modeller class, I'm planning to make only one C++ implementation using GSL fitters (this will also necessitate an ImageGSL class, since they operate on GSL arrays rather than numpy). I'll consider making the interface compatible with Pagmo as well, since I did previously test Pygmo in MultiProFit; however, I didn't find the optimizers any more efficient than scipy, so I won't make that a priority.

          I'm not entirely sure what the best way to enable pure Python optimizers is (monkey-patching or inheritance of a pybind11 class?), so I'm also going to leave that for a later date.

          Show
          dtaranu Dan Taranu added a comment - For the Modeller class, I'm planning to make only one C++ implementation using GSL fitters (this will also necessitate an ImageGSL class, since they operate on GSL arrays rather than numpy). I'll consider making the interface compatible with Pagmo as well, since I did previously test Pygmo in MultiProFit; however, I didn't find the optimizers any more efficient than scipy, so I won't make that a priority. I'm not entirely sure what the best way to enable pure Python optimizers is (monkey-patching or inheritance of a pybind11 class?), so I'm also going to leave that for a later date.
          Hide
          dtaranu Dan Taranu added a comment -

          For the Tasks, the existing MultiProFitTask (DM-28429) is fine in principle, but the implementation is less than ideal as it does PSF fitting and multiple model fits all in one go (as does meas_modelfit, but that wasn't designed for gen3). There should instead be separate tasks for PSF fitting and source model fitting. Each Task should generate only one output dataset, ideally as a parquet DataFrame, since none of the SourceCatalog features are really needed. Model initialization and dependencies should be represented in the connections, likely with liberal use of optional inputs. This will probably necessitate more explicit model dependencies and more frequent updates of pipe_tasks, but that's not necessarily a bad thing, as the current model specification language is too flexible, error-prone and poorly documented.

          This will hopefully take care of most, if not all of the largely ugly code in MultiProFit's fitutils module.

          Show
          dtaranu Dan Taranu added a comment - For the Tasks, the existing MultiProFitTask ( DM-28429 ) is fine in principle, but the implementation is less than ideal as it does PSF fitting and multiple model fits all in one go (as does meas_modelfit , but that wasn't designed for gen3). There should instead be separate tasks for PSF fitting and source model fitting. Each Task should generate only one output dataset, ideally as a parquet DataFrame, since none of the SourceCatalog features are really needed. Model initialization and dependencies should be represented in the connections, likely with liberal use of optional inputs. This will probably necessitate more explicit model dependencies and more frequent updates of pipe_tasks , but that's not necessarily a bad thing, as the current model specification language is too flexible, error-prone and poorly documented. This will hopefully take care of most, if not all of the largely ugly code in MultiProFit's fitutils module.
          Hide
          dtaranu Dan Taranu added a comment -

          Lastly, what's going to be left in MultiProFit itself? Hopefully as little as possible. My original idea was to keep Gauss2D and Gauss2DFit largely astronomy-agnostic. I think that's still possible. Even if images have, for example, a band/filter name, that's not exactly specific to astronomy. Similarly, any Gaussian mixture components will need to live in Gauss2DFit, as I expect them all to need splines to evaluate and GSL splines are the obvious candidate. Even if the Sersic profile (and if I add it, Moffat) is not widely known outside astronomy, there's no reason why it couldn't be used for other purposes.

          I will definitely keep at least one example of using MultiProFit with public HSC data. Other than that, I might leave survey data-based initializers in MultiProFit, benchmarks/comparisons to other codes, and any usages of lsst packages that don't belong in the (Pipeline)Tasks. Some of that might be better off in meas_extensions_multiprofit (or _gauss2dfit.

          Show
          dtaranu Dan Taranu added a comment - Lastly, what's going to be left in MultiProFit itself? Hopefully as little as possible. My original idea was to keep Gauss2D and Gauss2DFit largely astronomy-agnostic. I think that's still possible. Even if images have, for example, a band/filter name, that's not exactly specific to astronomy. Similarly, any Gaussian mixture components will need to live in Gauss2DFit, as I expect them all to need splines to evaluate and GSL splines are the obvious candidate. Even if the Sersic profile (and if I add it, Moffat) is not widely known outside astronomy, there's no reason why it couldn't be used for other purposes. I will definitely keep at least one example of using MultiProFit with public HSC data. Other than that, I might leave survey data-based initializers in MultiProFit, benchmarks/comparisons to other codes, and any usages of lsst packages that don't belong in the (Pipeline)Tasks. Some of that might be better off in meas_extensions_multiprofit (or _gauss2dfit .
          Hide
          dtaranu Dan Taranu added a comment -

          Marking this done sans review as per Yusra AlSayyad's suggestion as there is no code (yet). I'll make a final note of the relevant follow-up ticket(s) when I've started them.

          Show
          dtaranu Dan Taranu added a comment - Marking this done sans review as per Yusra AlSayyad 's suggestion as there is no code (yet). I'll make a final note of the relevant follow-up ticket(s) when I've started them.

            People

            Assignee:
            dtaranu Dan Taranu
            Reporter:
            dtaranu Dan Taranu
            Watchers:
            Dan Taranu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Jenkins

                No builds found.