REVIEW PART 1 REPLY
Reviewing what's on the HSC branch makes some sense; all of the work there has been on the CModel algorithm, but there are changes to other code driven by that. While we won't move CModel over to the LSST side in its current form, we will port all the code in some form - but some may be moved to Python rather than C++. So it may be worthwhile for you to look at it, but shouldn't be part of the formal review; we'll review that when we move it to the LSST side.
That raises another point - I'm afraid some of the Python code you did look into already does fit into the "will be discontinued" category - kind of. A lot of it has been rewritten in C++ and tweaked in CModel, but when it reappears on the LSST side it may or may not be back in Python, and we may not want to attach it as closely to the plugin as I have to CModel, as CModel may not be the only LSST-side plugin we have, and we'll want them to be able to share code. I'll point those out as I go through your review.
I haven't done any of the planned changes I've mentioned below, but unless I've stated that I plan to do them on another issue (or not do them at all), I'm planning to do them before this issue branch is merged.
Output files would make more sense in sci-results/%(run)d/%(camcol)d/%(filter)s/modelfits/%(tag)s/modelfits-%(run)06d-%(filter)s%(camcol)d-%(field)04d.fits, but I assume this entry will go away once this is ported to plug-in mode anyway
Exactly: you're right, but I don't think it's really worthwhile, as we'll be retiring modelfits anyway and in the interim maintaining backwards compatibility with old runs is probably more valuable than doing the cleanup.
high-level (.py and .h files)
- samplers.py - this mainly houses the AdaptiveImportanceSamplerTask. I assume you called it samplers.py because it will include other samplers one day. If it will always include pipe.base.tasks, maybe the name should be a verb? The docstring for AdaptiveImportanceSamplerTask could be more informative: (What is the task's responsibilty. What's the intended use?)
Actually, it's plural because there once was another sampler class (brute-force grid sampling), which I've since removed.
I'm not sure I understand your comment about making the name a verb; I think we have a tendency to name command-line tasks with verbs, but I don't think that's generally true for subtasks (and I'm not sure it's a tendency we'd want to elevate to a convention even for command-line tasks).
Eventually, this is a piece of code I'd like to remove in favor of a new plugin (not CModel); the boundary between this subtask and the Measure*Task that calls it has always been fluid and poorly-defined. I'd rather just remove it than try to define and document it now.
- priors.py - Is this really best place for fitMixture?
fitMixture is actually used exclusively for fitting priors to samples of fit results or simulation inputs. I need to improve and expand that code some day, and when I do that, I'll move it to a new file. In the meantime I think this is as good a place as any.
- measureMulti.py - Going away? Should I read?
- measureCcd.py - Also going away? Should I read?
- measureCoadd.py - Also going away? Should I read?
Yup, all going away. Probably not worth reading.
- analysis.py - Looks very HSC specific. Should I read?
Yup, this is a hack that really doesn't belong in this package at all, and it will never appear on the LSST side. Don't read.
- fitRegions.py - To me, the file name hints that it will fit regions. Change to "regionFit"? There might be a verb more clear than "setup". "prepareRegionFit"? Will setupFitRegion always grow the footprint in addition to fancier actions like masking bad pixels? Maybe "growRegionFit" would be appropriate in that case. Also, renaming the config to match the function would help readability (e.g. setupFitRegionConfig)
I agree, but this is going away, so I'm not going to bother with it right now. It has been superceded by the determineInitialFitRegion and determineFinalFitRegion methods in CModel.
Might be a good time to think about names of the source files and how the classes and functions are split accross them. I listed all .h files for completeness; most just have note and nothing to address.
Agreed. I do have some cases where I've put everything within a hierarchy in one file, and others where
I've split the hierarchy across several. I'm leaning towards splitting up e.g. models and priors into separate headers. I think it does make sense to leave all the classes in e.g. DirectSampling and MarginalSampling in the same header, though, as those classes are all very closely related.
I'll put off actually doing this until the end of the review, however, as I don't want those changes to mess up the diffs you're looking at.
- AdaptiveImportanceSampler.h - Is the class named ImportanceSamplerControl and not AdaptiveImportanceSamplerControl because it will maybe be used for other types of importance samplers or because it was too long? If it's the latter, I think it's more readable to have the too-long name. I'm not sure what the relationship is between Samplers and Sampling. The base class Sampler is in Sampling.h.
ImportanceSamplerControl is named as it is because it contains the configuration for just one iteration of adaptive importance sampling, and when you do just one iteration, that's just regular importance sampling. You'll note that the Python AdaptiveImportanceSamplerConfig class has a dict of ImportanceSamplerControl; this all would be clearer if I could move that parent config to C++.
I'm not really sure that having a base class for samplers is still really necessary; it's certainly be possible to add other samplers (e.g. Metropolis, fancier kinds of MCMC), but I don't currently have any plans to do that right now. So I may have designed for too much generality too early here, but now that I have it, I don't think it's worth taking it out.
Sampling.h is another case where I should probably split the classes into separate headers to avoid this confusion.
- Sampling.h - contains SamplingObjective and base classes Sampler and SamplingInterpreter. This could use a description of what the SamplingInterpreter's are responsible for as part of the galaxy fitting process.
- DirectSampling.h/MarginalSampling.h: Perhaps these header files should be called DirectSamplingInterpreter.h and MarginalSamplingInterpreter.h since they only contain those classes.
- Interpreter.h - are unit tests feasible for computeNonlinearQuantiles and computeAmplitudeQuantiles?
I'd like to actually get rid of the Interpreter classes, or at least refactor them quite a bit, at the same time we remove the Measure*Task classes: I think it was a mistake to try to make it so the results of both sampling and optimizer-based fitting could be stored in the same data structure. I think we'll still need something to handle the different interpretetations of marginal and direct sampling, but at the very least all the member functions that were needed to provide a common interface with OptimizerInterpreter should go away.
That said, I think it probably does make sense to rename the .h files; I think of them as also including the DirectSamplingObjective and MarginalSamplingObjective classes, because the interpreters provide factories for those, but since those are defined in the .cc file that's not apparent to the user, so the name is a bit confusing.
- ModelFitRecord.h - Add unit tests?
Will go away once we've moved from the Measure*Task classes to plugins, if I can figure out how. It's a little bit tricky because we don't have a way to save the outputs of sampling if we write it as a plugin: we need to store a table of samples for each object (i.e. we need to have an extra catalog with a one-to-many relatinoship with the src catalog). ModelFitRecord gives us a Record class that does that for now, but we need to integrate it with SourceRecord, and the hard part is that I don't just want to hard code it in; I want some ability for plugins to be able to store plugin-defined blobs and/or foreign-key tables in a SourceRecord.
- ProjectedLikelihood.h tested in testProjectedLiklihood.py "Projected" is an overloaded word in statistics. Users could use some docs describing what makes this liklihood projected.
I'd be quite happy to rename this, if you have another idea. Within a lot of the multifit and forced-photometry code there's a real need for a single verb that means "transformed (geometrically), scaled (photometrically), and convolved with a PSF".
- UnitSystem.h Just a couple constuctors for two structs. Are a selection of ctors in a .cc file because these will be extended? (probably not necessary as is)
Not sure what you mean here by "selection of ctors". I'm not planning to extend these.
- constants.h contains typedefs for module. This file is commonly called "common.h".
If that's the standard pattern, I'm happy to rename it.
- models.h contains Model and MultiModel. I was confused when I saw other Models (with virtual members) that inherit from Model defined in Model.cc. What's the relationship between Model/MultiModel/FixedCenterModel/SingleCenterModel/MultiCenterModel? What do we expect to inherit from them? I'm probably missing something obvious.
Model is the base class; it defines the API, and it's almost everything a user would want to interact with.
FixedCenterModel/SingleCenterModel/MultiCenterModel are a set of generic implementations which work for most galaxy models - any model that has only a single set of ellipse parameters can use these. They're defined in the .cc with factory functions in the .h file simply because they don't have anything to their interface that isn't already in Model, so there's no reason a use would want to explicitly have a reference to any of those classes rather than to the base class.
MultiModel provides an implementation that just combines other models together; you can use it to create bulge-disk models or even models for fitting multiple objects simultaneously. It's define in the .h instead of the .cc because it does have a small amount of interface beyond what's in Model (to get the list of constituent models), so it's conceivable a user might want a reference to MultiModel rather than just a reference to Model that is a MultiModel.
I'll add some Doxygen strings to the public classes by the end of this review.
- optimizer.h: solveTrustRegion tested in testOptimizer.py Is it feasible to add tests to cover OptimizerObjective/OptimizerControl/Optimizer?
Feasible but hard, as coming up with test problems for optimizers is tricky. I've been relying on the fact that this is implicitly tested (on the kinds of problems we care about) by code in testMeasureImage.py, so we need to ensure that test coverage is maintained when the Measure*Task classes are removed.
- priors.h tests in testSoftenedLinear.py. Is SoftenedLinear a HSC-only construct? Should there be tests for MixturePrior too?
We should probably port SoftenedLinearPrior to LSST. It's what we'll want to use if we want to fit with a flat prior.
I not too bothered by the lack of tests for MixturePrior, simply because it's a very thin wrapper around Mixture and TruncatedGaussian, both of which are very well tested. There's not much else in MixturePrior that could go wrong.
High level docs
Well done on including a front page (overview.dox) of the meas_multifit docs. It would benefit from having a paragraph at the beginning that describes the high-level responsibilities of meas_multifit geared towards someone who doesn't already know what it does. What is the package responsible for? What is it not responsible for? Which packages contain the code that it's not responsible for. Maybe the first sentence could replace the generic "data" with "image pixels" or something like that, since one could argue that all astronomers do is fit models of astronomical objects to data. This code specifically fits 2D galaxy models to source footprints.
It was not obvious how responsibility is split between meas_extensions_multishapelet, shapelet, and meas_multifit. Shapelet could use an overview.dox file. You've joked around about renaming meas_multifit to meas_galaxyModels. Is this a good time to seriously consider this?
I think this is the time to do the rename (well, not during the review, but perhaps immediately after it). I'm not sure what we'd want to name it to; I think I'll eventually have slow-moving-star fitting code in this package as well, so I'd rather not imply just galaxies. I'll probably email the list to ask for feedback on this.
As for the other packages, I'm expecting meas_extensions_multiShapelet to be completely gone by the end of this summer; it's simply an older, less-good in all ways version of what's in meas_multifit. We'll be able to start the removal in earnest after
DM-454, as the shapelet-PSF fitting is the only thing we haven't done better in meas_multifit yet. And shapelet is where all the low-level stuff to manipulate shapelets is. I admit the line has gotten a little blurry with MultiShapeletBasis; I could see moving that to meas_multifit. I suppose it may always be a bit of a blurry boundary, but that doesn't mean there shouldn't be one.
I've opened DM-644 to remind myself improve the overview docs once
DM-454 is done and
meas_extensions_multiShapelet is gone.
General comment on a detail: asserts.
When running python tasks on multiple id's, a sig-abort will prevent all subsequent id's from being run. Assuming this code is moving out of debugging stage, please change asserts to exceptions. They can be fatal exceptions, but the python task that uses it will continue on to process the next data id. (I removed an assert in shaplet/src/MultiShapeletBasis.cc in order to run the measureCcd tests on stripe 82 data: ) The following are a list of asserts (CModel included only for completeness.)
Thanks for the complete list. I think most of these are internal consistency checks, and hence should stay asserts, but I'll look through each one to make sure it can only be triggered be a serious bug within the same logical unit (I think that was the conclusion of the discussion we had on this topic, to the extent that there was a conclusion; here's the beginning of that thread, with links to the rest: http://listserv.lsstcorp.org/mailman/private/lsst-data/2013-August/2278.html)