# Add derivatives-based optimizer to meas_multifit

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
10
• Sprint:
Measurement Sprint 1
• Team:
Data Release Production

#### Description

Story points estimate is for remaining work only (just the code review, which is still substantial).

Hide

### REVIEW PART 1

Feeling a little in over my head reviewing a 12000-line package for my first C++ review. So, I'm going to post comments in pieces. As you and I discussed, I looked at the high-level overview first. I tried to identify parts that would not be cut/moved and concentrated on those. (Then I'll look for missing docs) Lastly, I'll post trivial standards-level comments, but these are merely suggestions as I'm still familiarizing myself with our C++ standards.

Since you put in considerable work since the last commit to the 3146 branch, this review is based off of the HSC branch of meas_multifit. Most of the updates to the HSC branch we want in the LSST branch (e.g. "Don't use per-pixel weights by default", "Make testTruncateGaussian work with newer scipy versions"). On the other hand, my understanding is that CModel is a purely HSC driver. If this is the case, ignore any notes on CModel.

As of May 6th: the difference between the branches looks like:

 $git diff u/jbosch/3146..HSC --stat  data/broad-02_02-04_04.fits | 1 +  data/broad-02_02-08_08.fits | 1 +  data/broad-02_02-10_10.fits | 1 +  include/lsst/meas/multifit.h | 1 +  include/lsst/meas/multifit/CModel.h | 493 +++++++++++++++++++++++++++  include/lsst/meas/multifit/ProjectedLikelihood.h | 2 +-  include/lsst/meas/multifit/TruncatedGaussian.h | 2 +-  include/lsst/meas/multifit/optimizer.h | 1 +  include/lsst/meas/multifit/priors.h | 129 ++++++-  python/lsst/meas/multifit/CModel.i | 41 +++  python/lsst/meas/multifit/analysis.py | 317 +++++++++++++++++  python/lsst/meas/multifit/models.py | 9 +-  python/lsst/meas/multifit/multifitLib.i | 40 ++-  python/lsst/meas/multifit/optimizer.i | 2 +-  src/CModel.cc | 1217 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  src/ProjectedLikelihood.cc | 6 +  src/models.cc | 8 +-  src/optimizer.cc | 9 +-  src/priors.cc | 365 +++++++++++++++++---  tests/testCModel.py | 116 +++++++  tests/testPsfFitting.py | 4 -  tests/testSoftenedLinearPrior.py | 215 ++++++++++++  tests/testTruncatedGaussian.py | 6 +-  23 files changed, 2880 insertions(+), 106 deletions(-) #### obs_sdss/obs_lsstSim branch = /u/jbosch/3146 Output files would make more sense in sci-results/%(run)d/%(camcol)d/%(filter)s/modelfits/%(tag)s/modelfits-%(run)06d-%(filter)s%(camcol)d-%(field)04d.fits, but I assume this entry will go away once this is ported to plug-in mode anyway: line #827 of policy   modelfits: {  template: "modelfits/%(tag)s/modelfits-%(run)06d-%(filter)s%(camcol)d-%(field)04d.fits"  ... #### meas_multifit ##### high-level (.py and .h files) ###### python: • samplers.py - this mainly houses the AdaptiveImportanceSamplerTask. I assume you called it samplers.py because it will include other samplers one day. If it will always include pipe.base.tasks, maybe the name should be a verb? The docstring for AdaptiveImportanceSamplerTask could be more informative: (What is the task's responsibilty. What's the intended use?) • priors.py - Is this really best place for fitMixture? • measureMulti.py - Going away? Should I read? • measureCcd.py - Also going away? Should I read? • measureCoadd.py - Also going away? Should I read? • analysis.py - Looks very HSC specific. Should I read? • fitRegions.py - To me, the file name hints that it will fit regions. Change to "regionFit"? There might be a verb more clear than "setup". "prepareRegionFit"? Will setupFitRegion always grow the footprint in addition to fancier actions like masking bad pixels? Maybe "growRegionFit" would be appropriate in that case. Also, renaming the config to match the function would help readability (e.g. setupFitRegionConfig) ###### C++: Might be a good time to think about names of the source files and how the classes and functions are split accross them. I listed all .h files for completeness; most just have note and nothing to address. • AdaptiveImportanceSampler.h - Is the class named ImportanceSamplerControl and not AdaptiveImportanceSamplerControl because it will maybe be used for other types of importance samplers or because it was too long? If it's the latter, I think it's more readable to have the too-long name. I'm not sure what the relationship is between Samplers and Sampling. The base class Sampler is in Sampling.h. • Sampling.h - contains SamplingObjective and base classes Sampler and SamplingInterpreter. This could use a description of what the SamplingInterpreter's are responsible for as part of the galaxy fitting process. • CModel.h: - IGNORE: only on HSC. contains the main collection of classes that fit a galaxy model to pixels. We discussed this one in detail in person in March. • Looks like #ifndef LSST_MEAS_MULTIFIT_CModelFit_h_INCLUDED has an old name. • includePsfBBox: name doesn't really describe what it controls • The following controls could use some docstrings to make clear what each control is reponsbile for. • CModelStageControl • CModelRegionControl • CModelDiagnosticsControl • CModelControl Meeting notes: On the HSC side, this is also a factory object. When we're running in plugin mode instead of getting multishapelet.psf, it looks for the name of the field that's produced by meas_extensions, set by default to "multishapelet.psf". • CModelResult Meeting notes: Would be good to have a method that returns all the flags. • CModelStageResult • CModelAlgorithm: I have a note that I should ignore the ScaledFlux stuff. apply method is unit tested in testCModel.py. • DirectSampling.h/MarginalSampling.h: Perhaps these header files should be called DirectSamplingInterpreter.h and MarginalSamplingInterpreter.h since they only contain those classes. • Interpreter.h - are unit tests feasible for computeNonlinearQuantiles and computeAmplitudeQuantiles? • Likelihood.h - OK • Mixture.h tested in testMixture.py • ModelFitRecord.h - Add unit tests? • ProjectedLikelihood.h tested in testProjectedLiklihood.py "Projected" is an overloaded word in statistics. Users could use some docs describing what makes this liklihood projected. • TruncatedGaussian.h tested in testTruncatedGaussian.py • UnitSystem.h Just a couple constuctors for two structs. Are a selection of ctors in a .cc file because these will be extended? (probably not necessary as is) • constants.h contains typedefs for module. This file is commonly called "common.h". • integrals.h: unit tests for bvnu in testIntegrals.py • models.h contains Model and MultiModel. I was confused when I saw other Models (with virtual members) that inherit from Model defined in Model.cc. What's the relationship between Model/MultiModel/FixedCenterModel/SingleCenterModel/MultiCenterModel? What do we expect to inherit from them? I'm probably missing something obvious. • optimizer.h: solveTrustRegion tested in testOptimizer.py Is it feasible to add tests to cover OptimizerObjective/OptimizerControl/Optimizer? • priors.h tests in testSoftenedLinear.py. Is SoftenedLinear a HSC-only construct? Should there be tests for MixturePrior too? • psf.h You indicated that this is not tested and to replace the functionality in meas_extensionsMultishapelet. See issue DM-454. Because this will be changed in DM-454, no comments. Notes from meeting: It contains one class, MultiShapeletPsfLikelihood (special likelihood object that can fit psfs) and one function, makeMultiShapeletPsfModel(std::vector<int> const & orders) (models that are appropriate for psfs). High level docs Well done on including a front page (overview.dox) of the meas_multifit docs. It would benefit from having a paragraph at the beginning that describes the high-level responsibilities of meas_multifit geared towards someone who doesn't already know what it does. What is the package responsible for? What is it not responsible for? Which packages contain the code that it's not responsible for. Maybe the first sentence could replace the generic "data" with "image pixels" or something like that, since one could argue that all astronomers do is fit models of astronomical objects to data. This code specifically fits 2D galaxy models to source footprints. It was not obvious how responsibility is split between meas_extensions_multishapelet, shapelet, and meas_multifit. Shapelet could use an overview.dox file. You've joked around about renaming meas_multifit to meas_galaxyModels. Is this a good time to seriously consider this? General comment on a detail: asserts. When running python tasks on multiple id's, a sig-abort will prevent all subsequent id's from being run. Assuming this code is moving out of debugging stage, please change asserts to exceptions. They can be fatal exceptions, but the python task that uses it will continue on to process the next data id. (I removed an assert in shaplet/src/MultiShapeletBasis.cc in order to run the measureCcd tests on stripe 82 data: ) The following are a list of asserts (CModel included only for completeness.) meas_multifit (HSC):  CModel.cc:404: assert(model.getNonlinearDim() == nonlinear.getSize<0>()); CModel.cc:405: assert(model.getAmplitudeDim() == amplitudes.getSize<0>()); CModel.cc:406: assert(model.getFixedDim() == fixed.getSize<0>()); CModel.cc:701: assert(initial.ellipses.size() == 1u); // should be true of all Models that come from RadialProfiles CModel.cc:708: assert(data.amplitudes.getSize<0>() == 1); // should be true of all Models from RadialProfiles CModel.cc:742: assert(fits.countHdus() == 1); CModel.cc:748: assert(fits.countHdus() == 2); CModel.cc:754: assert(fits.countHdus() == 3); CModel.cc:760: assert(fits.countHdus() == 4); CModel.cc:1187: assert(source.getFootprint()->getArea()); MarginalSampling.cc:242: assert(directSchema == _mapper.getOutputSchema()); // should have already thrown an exception if != Mixture.cc:279: assert(k != cumulative.size()); shapelet (/u/jbosch/3146):  GaussHermiteEvaluator.cc:80: assert(dx.getSize<0>() == result.getSize<0>()); GaussHermiteProjection.cc:32: assert(inputOrder <= getMaxOrder()); GaussHermiteProjection.cc:33: assert(outputOrder <= getMaxOrder()); HermiteTransformMatrix.cc:49: assert(n < _coefficients.size()); ModelBuilder.cc:89: assert(_x.size() == _y.size()); MultiShapeletBasis.cc:173: assert(!utils::isnan(itemEllipse.getCore().getDeterminantRadius())); ### shapelet Detailed review of differences between master..u/jbosch/3146 the way. Show Yusra AlSayyad added a comment - REVIEW PART 1 Feeling a little in over my head reviewing a 12000-line package for my first C++ review. So, I'm going to post comments in pieces. As you and I discussed, I looked at the high-level overview first. I tried to identify parts that would not be cut/moved and concentrated on those. (Then I'll look for missing docs) Lastly, I'll post trivial standards-level comments, but these are merely suggestions as I'm still familiarizing myself with our C++ standards. Since you put in considerable work since the last commit to the 3146 branch, this review is based off of the HSC branch of meas_multifit . Most of the updates to the HSC branch we want in the LSST branch (e.g. "Don't use per-pixel weights by default", "Make testTruncateGaussian work with newer scipy versions"). On the other hand, my understanding is that CModel is a purely HSC driver. If this is the case, ignore any notes on CModel . As of May 6th: the difference between the branches looks like:$ git diff u/jbosch/3146..HSC --stat data/broad-02_02-04_04.fits | 1 + data/broad-02_02-08_08.fits | 1 + data/broad-02_02-10_10.fits | 1 + include/lsst/meas/multifit.h | 1 + include/lsst/meas/multifit/CModel.h | 493 +++++++++++++++++++++++++++ include/lsst/meas/multifit/ProjectedLikelihood.h | 2 +- include/lsst/meas/multifit/TruncatedGaussian.h | 2 +- include/lsst/meas/multifit/optimizer.h | 1 + include/lsst/meas/multifit/priors.h | 129 ++++++- python/lsst/meas/multifit/CModel.i | 41 +++ python/lsst/meas/multifit/analysis.py | 317 +++++++++++++++++ python/lsst/meas/multifit/models.py | 9 +- python/lsst/meas/multifit/multifitLib.i | 40 ++- python/lsst/meas/multifit/optimizer.i | 2 +- src/CModel.cc | 1217 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ src/ProjectedLikelihood.cc | 6 + src/models.cc | 8 +- src/optimizer.cc | 9 +- src/priors.cc | 365 +++++++++++++++++--- tests/testCModel.py | 116 +++++++ tests/testPsfFitting.py | 4 - tests/testSoftenedLinearPrior.py | 215 ++++++++++++ tests/testTruncatedGaussian.py | 6 +- 23 files changed, 2880 insertions(+), 106 deletions(-) obs_sdss / obs_lsstSim branch = /u/jbosch/3146 Output files would make more sense in sci-results/%(run)d/%(camcol)d/%(filter)s/modelfits/%(tag)s/modelfits-%(run)06d-%(filter)s%(camcol)d-%(field)04d.fits , but I assume this entry will go away once this is ported to plug-in mode anyway: line #827 of policy modelfits: { template: "modelfits/%(tag)s/modelfits-%(run)06d-%(filter)s%(camcol)d-%(field)04d.fits" ... meas_multifit high-level ( .py and .h files) python: samplers.py - this mainly houses the AdaptiveImportanceSamplerTask . I assume you called it samplers.py because it will include other samplers one day. If it will always include pipe.base.tasks , maybe the name should be a verb? The docstring for AdaptiveImportanceSamplerTask could be more informative: (What is the task's responsibilty. What's the intended use?) priors.py - Is this really best place for fitMixture ? measureMulti.py - Going away? Should I read? measureCcd.py - Also going away? Should I read? measureCoadd.py - Also going away? Should I read? analysis.py - Looks very HSC specific. Should I read? fitRegions.py - To me, the file name hints that it will fit regions. Change to "regionFit"? There might be a verb more clear than "setup". "prepareRegionFit"? Will setupFitRegion always grow the footprint in addition to fancier actions like masking bad pixels? Maybe "growRegionFit" would be appropriate in that case. Also, renaming the config to match the function would help readability (e.g. setupFitRegionConfig ) C++: Might be a good time to think about names of the source files and how the classes and functions are split accross them. I listed all .h files for completeness; most just have note and nothing to address. AdaptiveImportanceSampler.h - Is the class named ImportanceSamplerControl and not AdaptiveImportanceSamplerControl because it will maybe be used for other types of importance samplers or because it was too long? If it's the latter, I think it's more readable to have the too-long name. I'm not sure what the relationship is between Samplers and Sampling. The base class Sampler is in Sampling.h . Sampling.h - contains SamplingObjective and base classes Sampler and SamplingInterpreter . This could use a description of what the SamplingInterpreter 's are responsible for as part of the galaxy fitting process. CModel.h : - IGNORE: only on HSC. contains the main collection of classes that fit a galaxy model to pixels. We discussed this one in detail in person in March. Looks like #ifndef LSST_MEAS_MULTIFIT_CModelFit_h_INCLUDED has an old name. includePsfBBox : name doesn't really describe what it controls The following controls could use some docstrings to make clear what each control is reponsbile for. CModelStageControl CModelRegionControl CModelDiagnosticsControl CModelControl Meeting notes: On the HSC side, this is also a factory object. When we're running in plugin mode instead of getting multishapelet.psf , it looks for the name of the field that's produced by meas_extensions, set by default to "multishapelet.psf". CModelResult Meeting notes: Would be good to have a method that returns all the flags. CModelStageResult CModelAlgorithm : I have a note that I should ignore the ScaledFlux stuff. apply method is unit tested in testCModel.py . DirectSampling.h / MarginalSampling.h : Perhaps these header files should be called DirectSamplingInterpreter.h and MarginalSamplingInterpreter.h since they only contain those classes. Interpreter.h - are unit tests feasible for computeNonlinearQuantiles and computeAmplitudeQuantiles ? Likelihood.h - OK Mixture.h tested in testMixture.py ModelFitRecord.h - Add unit tests? ProjectedLikelihood.h tested in testProjectedLiklihood.py "Projected" is an overloaded word in statistics. Users could use some docs describing what makes this liklihood projected. TruncatedGaussian.h tested in testTruncatedGaussian.py UnitSystem.h Just a couple constuctors for two structs. Are a selection of ctors in a .cc file because these will be extended? (probably not necessary as is) constants.h contains typedefs for module. This file is commonly called "common.h". integrals.h : unit tests for bvnu in testIntegrals.py models.h contains Model and MultiModel . I was confused when I saw other Models (with virtual members) that inherit from Model defined in Model.cc . What's the relationship between Model/MultiModel/FixedCenterModel/SingleCenterModel/MultiCenterModel? What do we expect to inherit from them? I'm probably missing something obvious. optimizer.h : solveTrustRegion tested in testOptimizer.py Is it feasible to add tests to cover OptimizerObjective/OptimizerControl/Optimizer ? priors.h tests in testSoftenedLinear.py . Is SoftenedLinear a HSC-only construct? Should there be tests for MixturePrior too? psf.h You indicated that this is not tested and to replace the functionality in meas_extensionsMultishapelet . See issue DM-454 . Because this will be changed in DM-454 , no comments. Notes from meeting: It contains one class, MultiShapeletPsfLikelihood (special likelihood object that can fit psfs) and one function, makeMultiShapeletPsfModel(std::vector<int> const & orders) (models that are appropriate for psfs). High level docs Well done on including a front page ( overview.dox ) of the meas_multifit docs. It would benefit from having a paragraph at the beginning that describes the high-level responsibilities of meas_multifit geared towards someone who doesn't already know what it does. What is the package responsible for? What is it not responsible for? Which packages contain the code that it's not responsible for. Maybe the first sentence could replace the generic "data" with "image pixels" or something like that, since one could argue that all astronomers do is fit models of astronomical objects to data. This code specifically fits 2D galaxy models to source footprints. It was not obvious how responsibility is split between meas_extensions_multishapelet , shapelet , and meas_multifit . Shapelet could use an overview.dox file. You've joked around about renaming meas_multifit to meas_galaxyModels . Is this a good time to seriously consider this? General comment on a detail: asserts. When running python tasks on multiple id's, a sig-abort will prevent all subsequent id's from being run. Assuming this code is moving out of debugging stage, please change asserts to exceptions. They can be fatal exceptions, but the python task that uses it will continue on to process the next data id. (I removed an assert in shaplet/src/MultiShapeletBasis.cc in order to run the measureCcd tests on stripe 82 data: ) The following are a list of asserts (CModel included only for completeness.) meas_multifit (HSC): CModel.cc:404: assert(model.getNonlinearDim() == nonlinear.getSize<0>()); CModel.cc:405: assert(model.getAmplitudeDim() == amplitudes.getSize<0>()); CModel.cc:406: assert(model.getFixedDim() == fixed.getSize<0>()); CModel.cc:701: assert(initial.ellipses.size() == 1u); // should be true of all Models that come from RadialProfiles CModel.cc:708: assert(data.amplitudes.getSize<0>() == 1); // should be true of all Models from RadialProfiles CModel.cc:742: assert(fits.countHdus() == 1); CModel.cc:748: assert(fits.countHdus() == 2); CModel.cc:754: assert(fits.countHdus() == 3); CModel.cc:760: assert(fits.countHdus() == 4); CModel.cc:1187: assert(source.getFootprint()->getArea()); MarginalSampling.cc:242: assert(directSchema == _mapper.getOutputSchema()); // should have already thrown an exception if != Mixture.cc:279: assert(k != cumulative.size()); shapelet (/u/jbosch/3146): GaussHermiteEvaluator.cc:80: assert(dx.getSize<0>() == result.getSize<0>()); GaussHermiteProjection.cc:32: assert(inputOrder <= getMaxOrder()); GaussHermiteProjection.cc:33: assert(outputOrder <= getMaxOrder()); HermiteTransformMatrix.cc:49: assert(n < _coefficients.size()); ModelBuilder.cc:89: assert(_x.size() == _y.size()); MultiShapeletBasis.cc:173: assert(!utils::isnan(itemEllipse.getCore().getDeterminantRadius())); shapelet Detailed review of differences between master..u/jbosch/3146 the way.
Hide
Jim Bosch added a comment -

Reviewing what's on the HSC branch makes some sense; all of the work there has been on the CModel algorithm, but there are changes to other code driven by that. While we won't move CModel over to the LSST side in its current form, we will port all the code in some form - but some may be moved to Python rather than C++. So it may be worthwhile for you to look at it, but shouldn't be part of the formal review; we'll review that when we move it to the LSST side.

That raises another point - I'm afraid some of the Python code you did look into already does fit into the "will be discontinued" category - kind of. A lot of it has been rewritten in C++ and tweaked in CModel, but when it reappears on the LSST side it may or may not be back in Python, and we may not want to attach it as closely to the plugin as I have to CModel, as CModel may not be the only LSST-side plugin we have, and we'll want them to be able to share code. I'll point those out as I go through your review.

I haven't done any of the planned changes I've mentioned below, but unless I've stated that I plan to do them on another issue (or not do them at all), I'm planning to do them before this issue branch is merged.

#### obs_lsstSim/obs_sdss

Output files would make more sense in sci-results/%(run)d/%(camcol)d/%(filter)s/modelfits/%(tag)s/modelfits-%(run)06d-%(filter)s%(camcol)d-%(field)04d.fits, but I assume this entry will go away once this is ported to plug-in mode anyway

Exactly: you're right, but I don't think it's really worthwhile, as we'll be retiring modelfits anyway and in the interim maintaining backwards compatibility with old runs is probably more valuable than doing the cleanup.

#### meas_multifit

##### high-level (.py and .h files)
###### python:
• samplers.py - this mainly houses the AdaptiveImportanceSamplerTask. I assume you called it samplers.py because it will include other samplers one day. If it will always include pipe.base.tasks, maybe the name should be a verb? The docstring for AdaptiveImportanceSamplerTask could be more informative: (What is the task's responsibilty. What's the intended use?)

Actually, it's plural because there once was another sampler class (brute-force grid sampling), which I've since removed.

I'm not sure I understand your comment about making the name a verb; I think we have a tendency to name command-line tasks with verbs, but I don't think that's generally true for subtasks (and I'm not sure it's a tendency we'd want to elevate to a convention even for command-line tasks).

Eventually, this is a piece of code I'd like to remove in favor of a new plugin (not CModel); the boundary between this subtask and the Measure*Task that calls it has always been fluid and poorly-defined. I'd rather just remove it than try to define and document it now.

• priors.py - Is this really best place for fitMixture?

fitMixture is actually used exclusively for fitting priors to samples of fit results or simulation inputs. I need to improve and expand that code some day, and when I do that, I'll move it to a new file. In the meantime I think this is as good a place as any.

• measureMulti.py - Going away? Should I read?
• measureCcd.py - Also going away? Should I read?

Yup, all going away. Probably not worth reading.

• analysis.py - Looks very HSC specific. Should I read?

Yup, this is a hack that really doesn't belong in this package at all, and it will never appear on the LSST side. Don't read.

• fitRegions.py - To me, the file name hints that it will fit regions. Change to "regionFit"? There might be a verb more clear than "setup". "prepareRegionFit"? Will setupFitRegion always grow the footprint in addition to fancier actions like masking bad pixels? Maybe "growRegionFit" would be appropriate in that case. Also, renaming the config to match the function would help readability (e.g. setupFitRegionConfig)

I agree, but this is going away, so I'm not going to bother with it right now. It has been superceded by the determineInitialFitRegion and determineFinalFitRegion methods in CModel.

###### C++:

Might be a good time to think about names of the source files and how the classes and functions are split accross them. I listed all .h files for completeness; most just have note and nothing to address.

Agreed. I do have some cases where I've put everything within a hierarchy in one file, and others where
I've split the hierarchy across several. I'm leaning towards splitting up e.g. models and priors into separate headers. I think it does make sense to leave all the classes in e.g. DirectSampling and MarginalSampling in the same header, though, as those classes are all very closely related.

I'll put off actually doing this until the end of the review, however, as I don't want those changes to mess up the diffs you're looking at.

• AdaptiveImportanceSampler.h - Is the class named ImportanceSamplerControl and not AdaptiveImportanceSamplerControl because it will maybe be used for other types of importance samplers or because it was too long? If it's the latter, I think it's more readable to have the too-long name. I'm not sure what the relationship is between Samplers and Sampling. The base class Sampler is in Sampling.h.

ImportanceSamplerControl is named as it is because it contains the configuration for just one iteration of adaptive importance sampling, and when you do just one iteration, that's just regular importance sampling. You'll note that the Python AdaptiveImportanceSamplerConfig class has a dict of ImportanceSamplerControl; this all would be clearer if I could move that parent config to C+, but we don't have support in pex_config for dicts-of-control objects in C+.

I'm not really sure that having a base class for samplers is still really necessary; it's certainly be possible to add other samplers (e.g. Metropolis, fancier kinds of MCMC), but I don't currently have any plans to do that right now. So I may have designed for too much generality too early here, but now that I have it, I don't think it's worth taking it out.

Sampling.h is another case where I should probably split the classes into separate headers to avoid this confusion.

• Sampling.h - contains SamplingObjective and base classes Sampler and SamplingInterpreter. This could use a description of what the SamplingInterpreter's are responsible for as part of the galaxy fitting process.
• DirectSampling.h/MarginalSampling.h: Perhaps these header files should be called DirectSamplingInterpreter.h and MarginalSamplingInterpreter.h since they only contain those classes.
• Interpreter.h - are unit tests feasible for computeNonlinearQuantiles and computeAmplitudeQuantiles?

I'd like to actually get rid of the Interpreter classes, or at least refactor them quite a bit, at the same time we remove the Measure*Task classes: I think it was a mistake to try to make it so the results of both sampling and optimizer-based fitting could be stored in the same data structure. I think we'll still need something to handle the different interpretetations of marginal and direct sampling, but at the very least all the member functions that were needed to provide a common interface with OptimizerInterpreter should go away.

That said, I think it probably does make sense to rename the .h files; I think of them as also including the DirectSamplingObjective and MarginalSamplingObjective classes, because the interpreters provide factories for those, but since those are defined in the .cc file that's not apparent to the user, so the name is a bit confusing.

• ModelFitRecord.h - Add unit tests?

Will go away once we've moved from the Measure*Task classes to plugins, if I can figure out how. It's a little bit tricky because we don't have a way to save the outputs of sampling if we write it as a plugin: we need to store a table of samples for each object (i.e. we need to have an extra catalog with a one-to-many relatinoship with the src catalog). ModelFitRecord gives us a Record class that does that for now, but we need to integrate it with SourceRecord, and the hard part is that I don't just want to hard code it in; I want some ability for plugins to be able to store plugin-defined blobs and/or foreign-key tables in a SourceRecord.

• ProjectedLikelihood.h tested in testProjectedLiklihood.py "Projected" is an overloaded word in statistics. Users could use some docs describing what makes this liklihood projected.

I'd be quite happy to rename this, if you have another idea. Within a lot of the multifit and forced-photometry code there's a real need for a single verb that means "transformed (geometrically), scaled (photometrically), and convolved with a PSF".

• UnitSystem.h Just a couple constuctors for two structs. Are a selection of ctors in a .cc file because these will be extended? (probably not necessary as is)

Not sure what you mean here by "selection of ctors". I'm not planning to extend these.

• constants.h contains typedefs for module. This file is commonly called "common.h".

If that's the standard pattern, I'm happy to rename it.

• models.h contains Model and MultiModel. I was confused when I saw other Models (with virtual members) that inherit from Model defined in Model.cc. What's the relationship between Model/MultiModel/FixedCenterModel/SingleCenterModel/MultiCenterModel? What do we expect to inherit from them? I'm probably missing something obvious.

Model is the base class; it defines the API, and it's almost everything a user would want to interact with.

FixedCenterModel/SingleCenterModel/MultiCenterModel are a set of generic implementations which work for most galaxy models - any model that has only a single set of ellipse parameters can use these. They're defined in the .cc with factory functions in the .h file simply because they don't have anything to their interface that isn't already in Model, so there's no reason a use would want to explicitly have a reference to any of those classes rather than to the base class.

MultiModel provides an implementation that just combines other models together; you can use it to create bulge-disk models or even models for fitting multiple objects simultaneously. It's define in the .h instead of the .cc because it does have a small amount of interface beyond what's in Model (to get the list of constituent models), so it's conceivable a user might want a reference to MultiModel rather than just a reference to Model that is a MultiModel.

I'll add some Doxygen strings to the public classes by the end of this review.

• optimizer.h: solveTrustRegion tested in testOptimizer.py Is it feasible to add tests to cover OptimizerObjective/OptimizerControl/Optimizer?

Feasible but hard, as coming up with test problems for optimizers is tricky. I've been relying on the fact that this is implicitly tested (on the kinds of problems we care about) by code in testMeasureImage.py, so we need to ensure that test coverage is maintained when the Measure*Task classes are removed.

• priors.h tests in testSoftenedLinear.py. Is SoftenedLinear a HSC-only construct? Should there be tests for MixturePrior too?

We should probably port SoftenedLinearPrior to LSST. It's what we'll want to use if we want to fit with a flat prior.

I not too bothered by the lack of tests for MixturePrior, simply because it's a very thin wrapper around Mixture and TruncatedGaussian, both of which are very well tested. There's not much else in MixturePrior that could go wrong.

High level docs

Well done on including a front page (overview.dox) of the meas_multifit docs. It would benefit from having a paragraph at the beginning that describes the high-level responsibilities of meas_multifit geared towards someone who doesn't already know what it does. What is the package responsible for? What is it not responsible for? Which packages contain the code that it's not responsible for. Maybe the first sentence could replace the generic "data" with "image pixels" or something like that, since one could argue that all astronomers do is fit models of astronomical objects to data. This code specifically fits 2D galaxy models to source footprints.

It was not obvious how responsibility is split between meas_extensions_multishapelet, shapelet, and meas_multifit. Shapelet could use an overview.dox file. You've joked around about renaming meas_multifit to meas_galaxyModels. Is this a good time to seriously consider this?

I think this is the time to do the rename (well, not during the review, but perhaps immediately after it). I'm not sure what we'd want to name it to; I think I'll eventually have slow-moving-star fitting code in this package as well, so I'd rather not imply just galaxies. I'll probably email the list to ask for feedback on this.

As for the other packages, I'm expecting meas_extensions_multiShapelet to be completely gone by the end of this summer; it's simply an older, less-good in all ways version of what's in meas_multifit. We'll be able to start the removal in earnest after DM-454, as the shapelet-PSF fitting is the only thing we haven't done better in meas_multifit yet. And shapelet is where all the low-level stuff to manipulate shapelets is. I admit the line has gotten a little blurry with MultiShapeletBasis; I could see moving that to meas_multifit. I suppose it may always be a bit of a blurry boundary, but that doesn't mean there shouldn't be one.

I've opened DM-644 to remind myself improve the overview docs once DM-454 is done and
meas_extensions_multiShapelet is gone.

General comment on a detail: asserts.
When running python tasks on multiple id's, a sig-abort will prevent all subsequent id's from being run. Assuming this code is moving out of debugging stage, please change asserts to exceptions. They can be fatal exceptions, but the python task that uses it will continue on to process the next data id. (I removed an assert in shaplet/src/MultiShapeletBasis.cc in order to run the measureCcd tests on stripe 82 data: ) The following are a list of asserts (CModel included only for completeness.)

<snip>

Thanks for the complete list. I think most of these are internal consistency checks, and hence should stay asserts, but I'll look through each one to make sure it can only be triggered be a serious bug within the same logical unit (I think that was the conclusion of the discussion we had on this topic, to the extent that there was a conclusion; here's the beginning of that thread, with links to the rest: http://listserv.lsstcorp.org/mailman/private/lsst-data/2013-August/2278.html)

Show
Hide

### shapelet review:

include/lsst/shapelet/GaussHermiteConvolution.h
Just for my own information, what does this doing here? What scenario would SWIG not be in def?

 #ifndef SWIG  class Impl; #endif

GaussHermiteConvolution.cc:

• Some inconsistency in whitespace after commas:

 array[ndarray::makeVector(0,0,n3)]

on lines 94, 110, 115, but

 array[ndarray::makeVector(n1, 0, 0)]

otherwise.

src/HermiteTransformMatrix.cc

• In general, I don't notice whether it's functionEx(n+1, m+2) or functionEx(n + 1, m + 2), but I did notice the inconsistency on these lines 82/86.

python/lsst/shapelet/tractor.py

• Nice. I appreciated the README file.
• line 73: It's hard to avoid the semicolon key before hitting enter.
• Docstring for what it outputs would of exp or dev would help. (I can tell that it's a numpy array, but would be nice not to have to read the code. Is the profile normalized to something?)
• plotSuite: I like that you kept this. It'll continue being useful. It just needs to be a little spruced-up per standards. Check for whitespace after commas in function arguments. This is just my humble opinion, but I find the one character variable names difficult to read (with the exception of r, x, y, i and other letters that are words to.) Replacing the -7.66924944 with sdss.DEFAC would be readable too.

I saw your note on the asserts in the response. The only one that was making trouble (triggered by some sources) was in shapeletMultiShapeletBasis.cc line 173:

Show
Yusra AlSayyad added a comment - shapelet review: I just looked at the updates. All comments are just nitpicks. include/lsst/shapelet/GaussHermiteConvolution.h Just for my own information, what does this doing here? What scenario would SWIG not be in def? #ifndef SWIG class Impl; #endif GaussHermiteConvolution.cc : Some inconsistency in whitespace after commas: array[ndarray::makeVector(0,0,n3)] on lines 94, 110, 115, but array[ndarray::makeVector(n1, 0, 0)] otherwise. src/HermiteTransformMatrix.cc In general, I don't notice whether it's functionEx(n+1, m+2) or functionEx(n + 1, m + 2) , but I did notice the inconsistency on these lines 82/86. python/lsst/shapelet/tractor.py Nice. I appreciated the README file. line 73: It's hard to avoid the semicolon key before hitting enter. Docstring for what it outputs would of exp or dev would help. (I can tell that it's a numpy array, but would be nice not to have to read the code. Is the profile normalized to something?) plotSuite : I like that you kept this. It'll continue being useful. It just needs to be a little spruced-up per standards. Check for whitespace after commas in function arguments. This is just my humble opinion, but I find the one character variable names difficult to read (with the exception of r, x, y, i and other letters that are words to.) Replacing the -7.66924944 with sdss.DEFAC would be readable too. I saw your note on the asserts in the response. The only one that was making trouble (triggered by some sources) was in shapeletMultiShapeletBasis.cc line 173: assert(!utils::isnan(itemEllipse.getCore().getDeterminantRadius()));
Hide
Jim Bosch added a comment -

include/lsst/shapelet/GaussHermiteConvolution.h
Just for my own information, what does this doing here? What scenario would SWIG not be in def?

 #ifndef SWIG  class Impl; #endif

This simply tells Swig not to process this block of code; it will be skipped when Swig is generating the interfaces, but it will not be skipped when included by the Swig-generated .cc file. The reason I've done that here is that Swig can't handle nested classes.

GaussHermiteConvolution.cc:

• Some inconsistency in whitespace after commas:

 array[ndarray::makeVector(0,0,n3)]

on lines 94, 110, 115, but

 array[ndarray::makeVector(n1, 0, 0)]

otherwise.
src/HermiteTransformMatrix.cc

• In general, I don't notice whether it's functionEx(n+1, m+2) or functionEx(n + 1, m + 2), but I did notice the inconsistency on these lines 82/86.

Fixed.

python/lsst/shapelet/tractor.py

• Nice. I appreciated the README file.
• line 73: It's hard to avoid the semicolon key before hitting enter.
• Docstring for what it outputs would of exp or dev would help. (I can tell that it's a numpy array, but would be nice not to have to read the code. Is the profile normalized to something?)
• plotSuite: I like that you kept this. It'll continue being useful. It just needs to be a little spruced-up per standards. Check for whitespace after commas in function arguments. This is just my humble opinion, but I find the one character variable names difficult to read (with the exception of r, x, y, i and other letters that are words to.) Replacing the -7.66924944 with sdss.DEFAC would be readable too.

I've added some docstrings, made the whitespace more uniform, and switched some of the variable names.

I saw your note on the asserts in the response. The only one that was making trouble (triggered by some sources) was in shapeletMultiShapeletBasis.cc line 173:

I'd actually already fixed that one on the HSC side, and cherry-picked it over. But a few of the others were potentially unsafe as well (they could only be triggered by logic errors, but since they could be logic errors in Python code an exception was better), and I've fixed those too. I haven't done the meas_multifit ones yet.

Show
Hide
Jim Bosch added a comment -

I've just made the file-renaming, doc fixes, and assert inspection suggested in the meas_multifit review. I think everything else is captured on issues for later, aside from these two comments (repeated from above so they're easy to find), on which I was waiting for more input from Yusra:

ProjectedLikelihood.h: "Projected" is an overloaded word in statistics. Users could use some docs describing what makes this liklihood projected.

I'd be quite happy to rename this, if you have another idea. Within a lot of the multifit and forced-photometry code there's a real need for a single verb that means "transformed (geometrically), scaled (photometrically), and convolved with a PSF".

UnitSystem.h Just a couple constuctors for two structs. Are a selection of ctors in a .cc file because these will be extended? (probably not necessary as is)

Not sure what you mean here by "selection of ctors". I'm not planning to extend these.

Show
Jim Bosch added a comment - I've just made the file-renaming, doc fixes, and assert inspection suggested in the meas_multifit review. I think everything else is captured on issues for later, aside from these two comments (repeated from above so they're easy to find), on which I was waiting for more input from Yusra: ProjectedLikelihood.h: "Projected" is an overloaded word in statistics. Users could use some docs describing what makes this liklihood projected. I'd be quite happy to rename this, if you have another idea. Within a lot of the multifit and forced-photometry code there's a real need for a single verb that means "transformed (geometrically), scaled (photometrically), and convolved with a PSF". UnitSystem.h Just a couple constuctors for two structs. Are a selection of ctors in a .cc file because these will be extended? (probably not necessary as is) Not sure what you mean here by "selection of ctors". I'm not planning to extend these.
Hide

Great! On the two remaining items:

Re: Unit System: no action needed. I was just trying to figure out your system for when a class needs an implementation file and when not.

projectedLikelihood: I don't have any ideas for renames. But a projectedLikelihood docstring would be helpful to those figuring out how the class structure maps to the process of fitting a model to a galaxy. If I were reading the code, I'd like to know: What is being projected onto what? My interpretation now is that the Likelihood class takes data and a model and computes the likelihood. The projected likelihood class take data, information with which to standardize the data, and computes the likelihood. Is that right?

My current pick for a word that means transcalevolve ( <-- credit to Peter Y ) is "standardize". "Normalize" and "rescale" are too overloaded I think.

Show
Yusra AlSayyad added a comment - Great! On the two remaining items: Re: Unit System: no action needed. I was just trying to figure out your system for when a class needs an implementation file and when not. projectedLikelihood: I don't have any ideas for renames. But a projectedLikelihood docstring would be helpful to those figuring out how the class structure maps to the process of fitting a model to a galaxy. If I were reading the code, I'd like to know: What is being projected onto what? My interpretation now is that the Likelihood class takes data and a model and computes the likelihood. The projected likelihood class take data, information with which to standardize the data, and computes the likelihood. Is that right? My current pick for a word that means transcalevolve ( <-- credit to Peter Y ) is "standardize". "Normalize" and "rescale" are too overloaded I think.
Hide
Jim Bosch added a comment -

Re: Unit System: no action needed. I was just trying to figure out your system for when a class needs an implementation file and when not.

I figure if a method needs enough logic that there could conceivably be something wrong in the implementation (and there's no obvious need for actually inlining it), then I think the definition should go in a .cc file.

The projected likelihood class take data, information with which to standardize the data, and computes the likelihood. Is that right?
My current pick for a word that means transcalevolve ( <-- credit to Peter Y ) is "standardize". "Normalize" and "rescale" are too overloaded I think.

I think standardize is fine when you're converting raw-exposure information and converting it to universal information; the problem is that here we're doing the opposite: it takes a "standardized" model, and converts it to something that can be compared directly with the data. And I think "unstandardize" is a little unwieldy.

I'll add a docstring regardless (the one there right now is simply wrong, or at least very incomplete), but at this point I'm leaning towards "UnitTransformLikelihood", which evokes the UnitSystem and LocalUnitTransform classes it relies on heavily.

Show
Jim Bosch added a comment - Re: Unit System: no action needed. I was just trying to figure out your system for when a class needs an implementation file and when not. I figure if a method needs enough logic that there could conceivably be something wrong in the implementation (and there's no obvious need for actually inlining it), then I think the definition should go in a .cc file. The projected likelihood class take data, information with which to standardize the data, and computes the likelihood. Is that right? My current pick for a word that means transcalevolve ( <-- credit to Peter Y ) is "standardize". "Normalize" and "rescale" are too overloaded I think. I think standardize is fine when you're converting raw-exposure information and converting it to universal information; the problem is that here we're doing the opposite: it takes a "standardized" model, and converts it to something that can be compared directly with the data. And I think "unstandardize" is a little unwieldy. I'll add a docstring regardless (the one there right now is simply wrong, or at least very incomplete), but at this point I'm leaning towards "UnitTransformLikelihood", which evokes the UnitSystem and LocalUnitTransform classes it relies on heavily.
Hide
Jim Bosch added a comment -

I've renamed ProjectedLikelihood to UnitTransformLikelihood, added some docs to that and UnitSystem, and moving this onto tickets/DM-17. Merging to master!

Show
Jim Bosch added a comment - I've renamed ProjectedLikelihood to UnitTransformLikelihood, added some docs to that and UnitSystem, and moving this onto tickets/ DM-17 . Merging to master!

#### People

Assignee:
Jim Bosch
Reporter:
Jim Bosch
Reviewers:
Watchers: